Big Data. Big Decisions
InformationWeek
Special Coverage Series


A Model For The Big Data Era

Data-centric architecture is becoming fashionable again By Rajive Joshi

Wired and wireless communication networks are making data collection and transmission cheap and widespread. In the future, networks will weave many devices and subsystems into complex integrated distributed systems that will become the fabric of business and daily life.

Building such distributed systems is far from simple; they must be assembled from independently developed software components. Integration, especially combined with real-time performance demands, becomes the key challenge.

This article outlines fundamental design principles that enable integration of distributed systems from components. I use a data-centric approach to this design, as the data is the key element that must flow through the various systems.

The key to data-centric design is to separate data from behavior. The data and data-transfer contracts then become the primary organizing constructs. With carefully controlled data relationships and timing, the system can then be built from independent components with loosely coupled behaviors. Data changes drive the interactions between components, not vice versa as in traditional or object-oriented design.

The resulting loosely coupled software components with data-centric interfaces are then integrated into a working system through a data bus. The data bus connects data producers to consumers and enforces the associated quality-of-service (QoS) contracts on the data transfers. This design technique is naturally supported by the Data Distribution Service (DDS) specification (information week.com/1295/ddj/spec) for real-time systems, which is a standard from the Object Management Group (omgwiki.org/dds). Implementations of this standard are available from many vendors.

The techniques described here are proven in hundreds of mission-critical applications including robotics, unmanned vehicles, medical devices, transportation, combat systems, finance, and simulation.

A Future Distributed System

To understand the dynamic nature of next-generation distributed systems, it helps to examine a representative scenario: an air traffic control system. Air traffic control in the future will integrate a variety of disparate systems into a seamless whole--a system of systems. On the edge is a real-time avionics system inside the aircraft. The control tower in the center communicates with the avionics system, and then out to data servers at the airport. The system thus comprises connectivity from the "edge" (devices) to the "enterprise" (infrastructure services).

The data in the avionics system flows at high rates and is time-critical. Violating timing constraints could result in the failure of the aircraft or jeopardize safety. Although aircrafts traditionally operate as independent units, future aircraft must integrate closely with automated traffic control and ground systems.

The control tower is another independent real-time system. It monitors various aircraft in the region, coordinates their traffic flow and generates alarms to highlight unusual conditions. The data is time-sensitive for proper local and wide area system operation. However, the system may have greater tolerance for delays than the avionics systems.

The control tower communicates with the airport's enterprise information systems, which track flight status and other data and may communicate with multiple control towers and other enterprise information systems. It also must synthesize passenger, flight arrival, and departure status information. Because it isn't in the time-critical path, the enterprise information system can be more tolerant of delays.

Key Design Challenges

This so-called "system of systems" must deal with a many issues, such as correctly handling myriad differences in data exchange, performance, and real-time requirements. The architecture also involves different technology stacks, design models, and component life cycles.

To support system growth and evolution, the integration must be robust enough to handle changes on either side of an interface. To do this, only minimal assumptions should be made about the interfaces between systems--the interface specifications should describe only the invariants in the interaction. Behavior can then be implemented independently by each system; the interface between them shouldn't include any component-specific state or behavior. This avoids tight coupling.

The systems on either side of an interface may differ in quantitative aspects of their behavior, including different data volumes, rates, and real- time constraints. The term "impedance mismatch" is shorthand for all the nonfunctional differences in the information exchange between two systems. Critically, a developer can capture these nonfunctional aspects of the information exchange by attaching QoS attributes to the data transfer. With explicit QoS terms, responses to impedance mismatches can be automated, monitored, and governed.

Principles Of Data-Centric Design

Data-centric design recognizes that the essential invariant is the information exchange between systems or components. It describes the exchange in terms of a "data model" and data producers and consumers of the data, and it relies on four basic principles:

1. Expose the data and metadata. Data-centric design exposes the data and metadata as first-class citizens, and uses them as the primary means of interconnecting heterogeneous systems. Data is the primary means of describing the world as it is, independent of any component-specific behavior. Metadata refers to information about the data's layout and structure. A data-centric interface is defined by the metadata, which must contain all of the information required to encode and decode the data in a given format.

2. Hide the behavior. Data-centric design hides any direct references to operations or code of the component interfaces. An interface can't embed any component-specific state or behavior. Components implement behaviors that can change the data or respond to changes in data (the "world model").

3. Delegate data handling to a data bus. Separation of data handling and application logic is necessary for loosely coupled systems. The component application logic should focus on manipulating interface data, not managing and distributing it. The data bus is responsible for data handling and is the authoritative source of the world model shared amongst the components.

4. Explicitly define data-handling contracts. These contracts should be specified by the application at design time, and enforced by the data bus at runtime. Delivery contracts specify the QoS attributes on data produced and consumed by a component, including timing, reliability, durability, etc. The data bus examines these "contracts," and if compatible, establishes data flows. The data bus then enforces QoS contracts, thereby providing the application code clear, known expectations.

In contrast, traditional messaging designs focus on functional or operational interfaces and overlook impedance problems. The interface QoS and timing aren't modeled, so all the interface state and communications issues are implicitly assumed. The result: a brittle, tightly coupled design. Adding components or interactions violates the assumptions, forcing system designers to rework the interfaces. The architecture becomes very hard to maintain and evolve.

Data-Centric Interfaces

A data-centric interface specifies the common, logically shared data model produced and consumed by a component, along with the QoS requirements.

A component can be seen as plugging into a software data bus via the data-centric interface that defines data inputs and outputs. When multiple components are present, the result is an information-driven, data-centric architecture in which data updates drive interactions between loosely coupled components.

A data-centric architecture reduces the integration problem since a component only has to integrate with the common data model intrinsic to the problem domain. Components implement data-centric interfaces that declare what they produce or consume. The QoS contracts ensure timing, reliability, and other requirements are met for any component, new or old. Thus, the system can grow and evolve.

The Data Bus

From a component programmer's perspective, the application code simply consumes and produces logically shared input and output variables on the data bus. Responsibility for data routing, delivery, and managing QoS is decoupled from the application logic and delegated to the implementation of the data bus.

The data bus requirements are fulfilled naturally by software that conforms with the DDS specification. That document defines the data-centric, publish-subscribe communication model for building distributed systems.

Several implementations of the DDS standard are available today, including an open source implementation and several commercial versions from RTI, Gallium, and Milsoft, among others. Leading DDS implementations provide deterministic low-latency, high-throughput messaging and data caching. While the most natural fit for these products has been in industrial, avionic, and military applications, they also have long been used in financial services, where the rapid distribution and processing of data is critical. And increasingly, as companies must handle large volumes of data, these products are entering business IT organizations.

One of the principal benefits for businesses is that a data-centric architecture paves the way for the use of generic infrastructure components. These include databases, complex event processing modules, and Web services. These components plug into the bus without the need for extensive custom coding to integrate them into the computing infrastructure. Done right, this model makes it possible for a spreadsheet to automatically populate cells from data items it subscribes to from a larger data fabric.

Because data-centric architectures have no direct coupling among the application component interfaces, components in the DDS model can be added and removed in a modular and scalable manner, letting companies add producers and consumers of data without a jump in complexity. As data volume expands, the simplicity of this architecture is likely to become a crucial part of a business's ability to keep up.

Rajive Joshi has worked in high-performance real-time distributed systems for more than 18 years, including implementing distributed messaging and data distribution caching infrastructure. Write to us at iwletters@techweb.com.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.