A Model For The Big Data Era

Data-centric architecture is becoming fashionable again By Rajive Joshi
Wired and wireless communication networks are making data collection and transmission cheap and widespread. In the future, networks will weave many devices and subsystems into complex integrated distributed systems that will become the fabric of business and daily life.

Building such distributed systems is far from simple; they must be assembled from independently developed software components. Integration, especially combined with real-time performance demands, becomes the key challenge.

This article outlines fundamental design principles that enable integration of distributed systems from components. I use a data-centric approach to this design, as the data is the key element that must flow through the various systems.

The key to data-centric design is to separate data from behavior. The data and data-transfer contracts then become the primary organizing constructs. With carefully controlled data relationships and timing, the system can then be built from independent components with loosely coupled behaviors. Data changes drive the interactions between components, not vice versa as in traditional or object-oriented design.

The resulting loosely coupled software components with data-centric interfaces are then integrated into a working system through a data bus. The data bus connects data producers to consumers and enforces the associated quality-of-service (QoS) contracts on the data transfers. This design technique is naturally supported by the Data Distribution Service (DDS) specification (information for real-time systems, which is a standard from the Object Management Group ( Implementations of this standard are available from many vendors.

The techniques described here are proven in hundreds of mission-critical applications including robotics, unmanned vehicles, medical devices, transportation, combat systems, finance, and simulation.

A Future Distributed System

To understand the dynamic nature of next-generation distributed systems, it helps to examine a representative scenario: an air traffic control system. Air traffic control in the future will integrate a variety of disparate systems into a seamless whole--a system of systems. On the edge is a real-time avionics system inside the aircraft. The control tower in the center communicates with the avionics system, and then out to data servers at the airport. The system thus comprises connectivity from the "edge" (devices) to the "enterprise" (infrastructure services).

The data in the avionics system flows at high rates and is time-critical. Violating timing constraints could result in the failure of the aircraft or jeopardize safety. Although aircrafts traditionally operate as independent units, future aircraft must integrate closely with automated traffic control and ground systems.

The control tower is another independent real-time system. It monitors various aircraft in the region, coordinates their traffic flow and generates alarms to highlight unusual conditions. The data is time-sensitive for proper local and wide area system operation. However, the system may have greater tolerance for delays than the avionics systems.

The control tower communicates with the airport's enterprise information systems, which track flight status and other data and may communicate with multiple control towers and other enterprise information systems. It also must synthesize passenger, flight arrival, and departure status information. Because it isn't in the time-critical path, the enterprise information system can be more tolerant of delays.

Key Design Challenges

This so-called "system of systems" must deal with a many issues, such as correctly handling myriad differences in data exchange, performance, and real-time requirements. The architecture also involves different technology stacks, design models, and component life cycles.

To support system growth and evolution, the integration must be robust enough to handle changes on either side of an interface. To do this, only minimal assumptions should be made about the interfaces between systems--the interface specifications should describe only the invariants in the interaction. Behavior can then be implemented independently by each system; the interface between them shouldn't include any component-specific state or behavior. This avoids tight coupling.

The systems on either side of an interface may differ in quantitative aspects of their behavior, including different data volumes, rates, and real- time constraints. The term "impedance mismatch" is shorthand for all the nonfunctional differences in the information exchange between two systems. Critically, a developer can capture these nonfunctional aspects of the information exchange by attaching QoS attributes to the data transfer. With explicit QoS terms, responses to impedance mismatches can be automated, monitored, and governed.

Principles Of Data-Centric Design

Data-centric design recognizes that the essential invariant is the information exchange between systems or components. It describes the exchange in terms of a "data model" and data producers and consumers of the data, and it relies on four basic principles: