Time Triggered Architecture
by Hermann Kopetz
Time Triggered Architecture meets future requirements for building cost-effective dependable embedded systems from components.
Dependable Embedded Systems are an enabling technology for a huge set of critical technical applications, eg in automotive, aerospace, railways and other transportation systems, industrial automation and process control, medical systems and the like, where hard real-time requirements have to be met in a dependable, predictable manner, because people's life may depend on the services provided by this critical systems, subsystems and components. At the moment, sectoral or proprietory solutions are the basis of highly dependable systems, and systems are designed case-by-case almost from scratch.
The challenge is to facilitate the systematic design of large dependable control systems out of components. The mission is to develop a basic methodology and technology which allows to significantly reduce the design, deployment and life-cycle cost of critical embedded applications. A crucial part of the problem is the interaction of the components which is realized by the exchange of messages across linking interfaces (LIFs) to a real-time communication system.
The driving forces for the composition of a large System of Systems (SOS) out of a set of components (component systems) are:
- cognitive complexity reduction in order to reduce the design and development effort
- reuse of components: the components may be newly designed according to a given architectural style or may be already existing systems (legacy systems)
- simplified diagnostics and repair.
Silicon Trend and New Failure Modes
The trend for systems capable of cost-efficient mass-deployment leads to 'Systems on a Chip (SOC)', where at the end of the day complete nodes of a distributed system are on a single die. According to semiconductor industry studies, the further shrinkage of silicon building blocks is still progressing. The further shrinkage leads to new failure modes of SOC, such as:
- transient multi-bit failures caused by a single fault event
- intermittent failures of the interconnect that can affect different functions on the die simultaneously.
It is expected that in future the rate for permanent failures will remain unchanged, but that the rate for intermittent and transient failures will increase. Therefore, the assumption that a fail-silent node can be implemented on a single die that hosts two independent FCUs is not sustainable in future high-dependability applications, which severely influences system architectures and composability of embedded SoS.
What is a Component?
In the abstract, a component is an encapsulated building block that is of use when building a large system. The focus here is on system components. Components are characterized by their interfaces with respect to composability and are described by:
- their data properties, ie, the structure and semantics of the data items crossing the interface; the semantics are expressed by an interface model
- their temporal properties, ie, the temporal conditions that have to be satisfied by the interface: control and temporal data validity.
Event triggered vs. Time Triggered
It implies a reliable global notion of time throughout the system if we want to give warranties on timelesness. Simplified, an event triggered system follows the principle of reaction on demand, where temporal control is enforced from the environment onto the system in an unpredictable manner (interrupts), with all the undesirable problems of jitter, missing precise temporal specification of interfaces and membership, scheduling etc, but good for sporadic action/data, low-power sleep modes, best-effort soft real-time systems with high utilization of resources. Time-triggered systems derive control of follows the principle global progression of time, such allowing precise temporal specification of interfaces and 'temporal firewalls' to protect from unpredictable outside interference, membership identification, interoperability and replica determinism.
A properly designed time-triggered architecture (TTA) can provide generically at the level of the architecture:
- strong composability: independently developed functions can be integrated with minimal integration effort
- effective fault propagation barriers
- fault-tolerance by active replication of components
- strong diagnosability: the loss of consistency of the distributed computing base can be promptly detected and diagnosed
- formal analysis of critical architecture functions.
We do not know how to provide these necessary characteristics if the base architecture is event-triggered.
The TTA assumes that a large distributed control system:
- is structured into clusters of components
- every component has access to a fault-tolerant sparse global time-base of known precision. Important time-critical actions are triggered by the progression of this global time
- components communicate by the exchange of messages with a priori known latency and minimal jitter across well-specified (in the domains of time and value) interfaces
- a component is a fault-containment unit (FCU)
- in a properly configured cluster any one component of a cluster can fail in an arbitrary (Byzanthine) failure mode without affecting the proper operation of the components not affected by the fault.
The TTA distinguishes cleanly between fault containment and elimination of error propagation:
- a node is an FCU (fault containment unit) that can fail in an arbitrary failure mode
- control errors are detected by an independent FCU, the guardian
- data error masking is in the responsibility of the application, not on TTA level.
In the past few years, a number of time-triggered communication protocols have appeared or have been announced that provide the clock synchronization service needed for the TTA at the protocol level:
- TTP/C (silicon available since 1998)
- TT-CAN (available since 2002)
- TTP/A (standardized by the OMG in 2002)
- FlexRay (silicon planned end of 2004)
- TT-ETHERNET (in planning phase).
The main difference among these protocols is the attitude towards the inherent design conflict between safety, flexibility and cost. A component-based approach has to take into account deployment of different TTP protocols.
Such an architecture designed for composability must support:
- independent development of components - relates to the architecture
- stability of prior services- relates to the components
- performability of the Communication System - relates to the communication system
- replica determinism - to support transparent implementation of fault tolerance
- diagnostics - it must be possible to identify the sending FCU (Fault Containment Unit) of every message.
|Composability - a hard real-time Air Traffic Control system of five clusters with maybe different TT protocols.
The Figure shows such an example for an ATC system. There will be in future always several buses and 'clusters' in an airplane, car etc, and there must be support for mixed traffic and integration of or co-existence with legacy systems in less critical parts. This can be achieved by putting the less critical application parts and protocols on top of TTA, or by providing gateways or 'firewalls' between the critical and less critical parts by provision of appropriate interfaces.
Hermann Kopetz, Technical University Vienna