faultRobust

rationale R A T I O N A L E
State of the artState-of-the-art

State-of-the art solutions for high reliability systems make use of one or more of the following approaches.

At software level
It is based on sw redundancy, as for instance n-Version Programming and recovery block.
Many drawbacks - Performance degradation, sw overhead, higher detection latency, strong application dependency and higher effort to achieve iec61508 compliance for each application.

At system level
It is based on mcu redundancy. In this case a certain number of mcus, typically two or three depending if fault-tolerance is required, are used in the same system, with comparators or with mutual check.
Many drawbacks - High cost at system level for hw overhead, packaging and pcb, system dependency.

At MCU level
It is based on CPU redundancy. It can be either symmetric, with comparators or with mutual checks; or asymmetric, where a smaller CPU or watchdogs.
Many drawbacks - Symmetric solutions (such as lock-step or dual core architectures) lack the diversity required by iec61508 and the overheads (gate count, performance and power) rapidly grow beyond practicality in the attempt to apply these concepts to high-performance cores. Asymmetric solutions are mainly based on watchdogs that suffer from low diagnostic coverage and thus require a complex SW infrastructure to overcome this limitation. Therefore, they are mainly used for low SIL systems.

At gate level
It can be used logic redundancy for instance using concurrent checkers in alu, or modifying the pipeline with ecc codes.
Many drawbacks - Specific cpu redesign, performance overhead (timing), diagnostic mixed with safety function (not recommended by iec61508).

At transistor level
It can be used a particular process or layout techniques to harden the technology against errors, such for instance to design srams with dram architecture to make them less prone to soft errors.
Many drawbacks - Specific to certain types of faults, very high cost and overheads.