Co-design and testing of safety-critical embedded systems

Содержание

Слайд 2

2 General course information 2. Prerequisites: Computer Systems and System Analysis;

2

General course information

2. Prerequisites:
Computer Systems and System Analysis; Foundations of

Logic Engineering; Probability Theory; Theory of Self-Checking Circuits; Modeling Foundation knowledge.

3. Subject of Study:
Principles, methods and techniques in co-design and testing of S-CES.

4. Aims:
Acquisition of knowledge about methods and techniques in co-design and testing of S-CES and their components.

Object of Study:
Concepts of Safety-Critical Embedded Systems (S-CES): Co-design and Testing.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 3

Teaching and Learning Time Allocation Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 3

Teaching and Learning Time Allocation

Master Course. Co-Design and Testing of Safety-Critical

Embedded Systems

3

Слайд 4

MODULE 1. Co-design foundation of S-CES 4 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

MODULE 1.
Co-design foundation of S-CES

4

Master Course. Co-Design and Testing

of Safety-Critical Embedded Systems
Слайд 5

MODULE 1. Co-Design Foundation of S-CES 5 Lecture 1. Traditional ideas

MODULE 1. Co-Design Foundation of S-CES

5

Lecture 1. Traditional ideas of S-CES

co-design

1.2. Standards regulating legislative of S-CES

1.3. Life-cycle of S-CES

1.1. Component approach

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 6

1.1. Component Approach 6 Component-based technology is information technology based on

1.1. Component Approach

6

Component-based technology is information technology based on component representation

of systems and on use of well-tested software and hardware products.

COTS-approach (Commercial-Off-The-Shelf) – reuse of commercial components.

CrOTS-approach (Critical-Off-The-Shelf) – reuse of components in critical applications.

Component approach constitutes the use of library components developed formerly and commonly employed in commercial and critical applications, including the components of one’s own design.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 7

1.2. Standards regulating legislative of S-CES 7 IEC 61508 (general for

1.2. Standards regulating legislative of S-CES

7

IEC 61508 (general for
electronics

& digital)
and
EN 50126 (Railway)

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

DO 178-B (Avionics)
and
ISO 26262 (Automotive)

IEC 61513
(Nuclear power plants)
and
IEC 62061 (Machines)

IEC – International Electrotechnical Commission

This slide from presentation of M. Fusani ISTI - CNR, Pisa, Italy

Слайд 8

1.2. Standards regulating legislative of S-CES 8 IEC 61508 – Safety

1.2. Standards regulating legislative of S-CES

8

IEC 61508 – Safety of

electrical, electronic and
programmable systems important to safety

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

IEC 61508-1:1998 ‘General requirements’

IEC 61508-2:2000 ‘Requirements to electrical, electronic and programmable systems’

IEC 61508-3:1998 ‘Requirements to software’

IEC 61508-4:1998 ‘Definitions to Abbreviations’

IEC 61508-5:1998 ‘Examples of methods for determining safety integrity levels’

IEC 61508-6:2000 ‘Guide for use of IEC 61508-2 and IEC 61508-3’

IEC 61508-7:2000 ‘Overview of techniques and measures’

Слайд 9

1.2. Standards regulating legislative of S-CES 9 Features of IEC 61508

1.2. Standards regulating legislative of S-CES

9

Features of IEC 61508 standard

Master

Course. Co-Design and Testing of Safety-Critical Embedded Systems

1. The use of safety integrity levels concept – every unit of equipment is developed and analysed with contribution in safety of critical object.

2. Consideration of full life-cycle of S-CES

3. Positioning of software as essential S-CES component which is source of possible failures influencing on safety of critical object

4. Flexibility of requirements for the critical objects. It allows to be foundation for development of standards to specific areas of industry

Слайд 10

1.2. Standards regulating legislative of S-CES 10 IEC 61508 standard as

1.2. Standards regulating legislative of S-CES

10

IEC 61508 standard as foundation

for development
of standards to specific areas of industry

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

ECSS – European Cooperation for Space Standardization

ECSS-E-10 ‘Space Engineering – System Development’

ECSS-E-40A ‘Space Engineering – Software Development’

ECSS-Q-20 ‘Guarantee Production Space Destination – Quality Assurance’

ECSS-Q-80B ‘Guarantee Production Space Destination – Quality Assurance of Software’

Слайд 11

1.2. Standards regulating legislative of S-CES 11 IEC 61508 standard as

1.2. Standards regulating legislative of S-CES

11

IEC 61508 standard as foundation

for development
of standards to specific areas of industry

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

RTCA – Radio Technical Commission for Aeronautics

DO-178B:1992 ‘Consideration of software at certification of
on-board systems and equipments’

MIRA – Motor Industry Research Association

MISRA-C:2004 ‘Guide for use of language C++ in critical systems‘

CENELEC – European Committee for Electrotechnical Standardization

EN 50126 ‘Objects of railway transport. Requirements and validation of dependability, reliability, maintainability and safety‘

Слайд 12

1.2. Standards regulating legislative of S-CES 12 IEC 61508 standard as

1.2. Standards regulating legislative of S-CES

12

IEC 61508 standard as foundation

for development
of standards to specific areas of industry

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

IAEA – International Atomic Energy Agency

IAEA NS-G-1.1 ‘Software and computer-based systems important to safety in nuclear power plants’

IAEA NS-G-1.2 ‘Safety assessment and verification for nuclear power plants’

IAEA NS-G-1.3 ‘Instrumentation and control systems important to safety in nuclear power plants’

Слайд 13

1.2. Standards regulating legislative of S-CES 13 IEC 61508 standard as

1.2. Standards regulating legislative of S-CES

13

IEC 61508 standard as foundation

for development
of standards to specific areas of industry

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

IEC – International Technical Commission

IEC 60780:1998 ‘Nuclear power plants – Electrical equipment of the safety system - Qualification’

IEC 60880:2006 ‘Nuclear power plants – Instrumentation and control systems important to safety – Software aspects for computer-based systems performing category A functions’

IEC 60980:1989 ‘Recommended practices for seismic qualification of electrical equipment of the safety system for nuclear generating stations’

IEC 60987:2007 ‘Nuclear power plants – Instrumentation and control systems important to safety – Hardware design requirements for computer-based systems’

Слайд 14

1.2. Standards regulating legislative of S-CES 14 IEC 61508 standard as

1.2. Standards regulating legislative of S-CES

14

IEC 61508 standard as foundation

for development
of standards to specific areas of industry

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

IEC – International Technical Commission

IEC 61226:2005 ‘Nuclear power plants – Instrumentation and control systems important to safety – Classification of instrumentation and control functions’

IEC 61513:2001 ‘Nuclear power plants – Instrumentation and control systems important to safety – General requirements for systems’

IEC 62138:2004 ‘Nuclear power plants – Instrumentation and control systems important to safety – Software aspects for computer-based systems performing category B or C functions’

IEC 62340:2007 ‘Nuclear power plants – Instrumentation and control systems important to safety – Requirements for coping with common cause failure’

Слайд 15

1.3. Life-cycle of S-CES 15 1. Development of signal formation algorithm

1.3. Life-cycle of S-CES

15

1. Development of signal formation algorithm block-diagram.

1. Stages

of FPGA-based digital component development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. Development of program models of control algorithms in CASE-tools environment.

3. Integration of signal formation algorithm block-diagram program models in CASE-tools environment.

4. Implementation of integrated digital component program models to FPGA.

CASE – Computer Aided Software / System Engineering

Слайд 16

1.3. Life-cycle of S-CES 16 1. Block-diagrams according to control algorithms.

1.3. Life-cycle of S-CES

16

1. Block-diagrams according to control algorithms.

2. Results of

FPGA-based digital component development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. Program models of control algorithms in CASE-tools environment.

3. Integrated program model of control algorithms in CASE-tools environment.

4. FPGA with implemented integrated program model.

Слайд 17

1.3. Life-cycle of S-CES 17 1. Verification of block-diagrams according to

1.3. Life-cycle of S-CES

17

1. Verification of block-diagrams according to control algorithms.

3.

Verification stages of FPGA-based digital component development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. Verification of program models of control algorithms in CASE-tools environment.

3. Verification of integrated program model in CASE-tools environment.

4. Verification of FPGA with implemented integrated program model.

Слайд 18

1.3. Life-cycle of S-CES 18 2. A life-cycle of FPGA-based S-CES

1.3. Life-cycle of S-CES

18

2. A life-cycle of FPGA-based S-CES

Master Course. Co-Design

and Testing of Safety-Critical Embedded Systems
Слайд 19

Reading List 19 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

19

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Бахмач

Е.С., Герасименко А.Д., Головир В.А. и др. Отказобезопасные информационно-управляющие системы на программируемой логике / Под ред. Харченко В.С. и Скляра В.В. – Национальный аэрокосмический университет «ХАИ», Научно-производственное предприятие «Радий», 2008. – 380 с.
В3 Программные средства и их влияние на надежность и безопасность ИУС, с. 17, 18; 2.1 Обзор нормативных документов в области ИУС критических объектов, с. 55 – 59; 3.3. Жизненный цикл ИУС с программируемой логикой, с. 81 – 86.
Kharchenko V.S., Sklyar V.V. FPGA-based NPP Instrumentation and Control Systems: Development and Safety Assessment / Bakhmach E.S., Herasimenko A.D., Golovyr V.A. a.o.. – Research and Production Corporation “Radiy”, National Aerospace University “KhAI”, State Scientific Technical Center on Nuclear and Radiation Safety, 2008. – 188 p.
1.4.1 Problems of ensuring dependability, p. 22, 23; 5.2 Analysis of I&C systems conformity to regulatory safety requirements, p.127 – 133; 2.3.1. Life cycle of FPGA-based Instrumentation and Control Systems, p. 44 – 49.
Слайд 20

Conclusion 20 2. Component approach constitutes the use of library components

Conclusion

20

2. Component approach constitutes the use of library components developed

formerly and commonly employed in commercial and critical applications, including the components of one’s own design.

1. Co-design of S-CES is based on traditional ideas such as Component approach, Standards regulating legislative and Life-cycle of S-CES

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3. The main standard is IEC 61508 – Safety of electrical, electronic and programmable systems important to safety.

4. Life-cycle of FPGA-based S-CES digital component contains 4 stages of development with verification of results obtained on every stage.

Слайд 21

Questions and tasks 21 What is the S-CES? What Traditional ideas

Questions and tasks

21

What is the S-CES?
What Traditional ideas of S-CES

co-design do you know?
What is the Component approach?
What Standards regulate legislative of S-CES?
What Stages are contained with Life-cycle of FPGA-based S-CES?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 22

MODULE 2. Dependability of S-CES and their digital components Master Course.

MODULE 2.
Dependability of S-CES
and their digital components

Master Course. Co-Design and

Testing of Safety-Critical Embedded Systems

22

Слайд 23

MODULE 2. Dependability of S-CES and their digital components 23 Lecture

MODULE 2. Dependability of S-CES
and their digital components

23

Lecture 2. Foundation of

Dependability

2.2. Dependability Threats

2.3. Dependability Attributes

2.1. Introduction into dependability

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2.4. Dependability Measures

2.5. Safety and Reliability

2.6. Forms of Dependability Requirements

2.7. The Means to attain Dependability Techniques

Слайд 24

2.1. Introduction into Dependability 24 Increase of requirements to modern computer

2.1. Introduction into Dependability

24

Increase of requirements to modern computer systems from

Reliability to Dependability.

Growth of computer system complexity

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2.1.1. Motivation of Dependability Consideration

Expansion of a set of tasks solved with use of computer systems including critical application areas

Amplification of interdependence and interaction between hardware and software of computer systems including processes of co-design S-CES on programmable elements.

Reasons:

Слайд 25

2.1.2. Related Works 25 Different aspects of Dependability, principles of construction

2.1.2. Related Works

25

Different aspects of Dependability, principles of construction and realization

of dependable computer systems have been studied for the last two decades.

1. Avizienis A., Laprie J.-C. Dependable Computing: From Concepts to Application // IEEE Transactions on Computers, 1986. Vol. 74, No. 5. P. 629-638.
Authors formulated the principle of “Dependable Computing” as computation resistant to hardware and software failures (caused by their defects brought in design and not revealed in the course of detected).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. Dobson I., Randell B. Building Reliable Secure Computing Systems out of Unreliable Insecure components // Proc. of IEEE Conference on Security and Privacy, Oakland, USA. 1986. P. 186-193.
Authors defined “Secure-Fault Tolerance” and proposed a principle of its realization for various types of computer systems.

3. Avizienis A., Laprie J.-C, Randell B., Landwehr C. Basic Concepts and Taxonomy of Dependable and Secure Computing // IEEE Transactions on Dependable and Secure Computing, 2004. Vol. 1. No. 1. P. 11-33.

Слайд 26

2.1.3. Definition of Dependability 26 Dependability is ability to avoid service

2.1.3. Definition of Dependability

26

Dependability is ability to avoid service failures that

are more frequent or more severe than is acceptable. When service failures are more frequent or more severe than acceptable: dependability failure.

Attributes - properties expected from the system and according to which assessment of service quality resulting from threats and means opposing to them is conducted.

Means - methods and techniques enabling
to provide service on which reliance can be placed
to have confidence in its ability.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Threats - undesired (not unexpected) circumstances causing or resulting from undependability (reliance cannot or will not any longer be placed on the service.

Слайд 27

2.2. Dependability Threats 27 Dependability Threats - Faults, Errors, Failures. Master

2.2. Dependability Threats

27

Dependability Threats - Faults,
Errors,
Failures.

Master Course. Co-Design

and Testing of Safety-Critical Embedded Systems

Faults: development ( design) or operational (phase of creation or occurrence),
internal or external (system boundaries),
hardware or software (domain),
natural or human-made (phenomenological case), accidental, non-malicious, deliberate or deliberately malicious (intent),
permanent or transient (persistence).

Слайд 28

2.2. Dependability Threats 28 Master Course. Co-Design and Testing of Safety-Critical

2.2. Dependability Threats

28

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

Faults: Development or Design Faults
Physical Faults
Interaction Faults

Development or Design Faults:
erroneous acts or decisions in system development bring to appearance of a fault in its design which becomes apparent in computer system operation under certain terms and causes an error in computation process, thus leading to a malfunction or failure (non-rendering of service)
software flaws,
malicious logics.

Слайд 29

2.2. Dependability Threats 29 Master Course. Co-Design and Testing of Safety-Critical

2.2. Dependability Threats

29

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

Physical Faults:
due to natural (internal) causes a fault appears bringing to an error in computation process, thus leading to a malfunction or failure.

Interaction Faults:
due to external information, physical or other effects a fault appears bringing to an error in computation process and then a computer system malfunction or failure.

Слайд 30

2.2. Dependability Threats 30 Master Course. Co-Design and Testing of Safety-Critical

2.2. Dependability Threats

30

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

Failures: content, early or late timing,
halt or erratic (domain),
signaled or unsignaled (detectability),
consistent or inconsistent (consistency),
minor or catastrophic (consequences).

Слайд 31

2.2. Dependability Threats 31 Fault error failure chain is a way

2.2. Dependability Threats

31

Fault error failure chain is a way from

correct service up to incorrect service.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

This slide from presentation of Felicita Di Giandomenico ISTI - CNR, Pisa, Italy

Слайд 32

2.3. Dependability Attributes 32 Readiness for usage – Availability. Continuity of

2.3. Dependability Attributes

32

Readiness for usage – Availability.

Continuity of service –

Reliability.

Absence of catastrophic consequences on the users & env. – Safety.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Availability, Confidentiality, Integrity – Security.
Absence of unauthorized access to, or handling of, system state.

Absence of unauthorized disclosure of inf. – Confidentiality.

Absence of improper system alterations – Integrity.

Ability to undergo repairs and evolutions – Maintainability.

Слайд 33

2.4. Dependability Measures 33 The alternation of correct-incorrect service delivery is

2.4. Dependability Measures

33

The alternation of correct-incorrect service delivery is quantified

to define the Measures of Dependability:

Reliability: a measure of the continuous delivery of correct service – or the time to failure;

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Maintainability: a measure of the time to service restoration since the last failure occurrence.

Availability: a measure of the delivery of correct service with respect to the alternation of correct and incorrect service;

Слайд 34

2.5. Safety and Reliability 34 Safety is an extension of Reliability:

2.5. Safety and Reliability

34

Safety is an extension of Reliability:
the

state of correct service and the states of incorrect service due to non-catastrophic failure are grouped into a safe state:

• Safety is a measure of continuous safeness, or equivalently, of the time to catastrophic failure;

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Safety is thus Reliability with respect to catastrophic failures.

Слайд 35

2.6. Forms of Dependability Requirements 35 Availability: – “The database must

2.6. Forms of Dependability Requirements

35

Availability: – “The database must be

accessible 99% of the time"

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Other forms of requirements:
Fault tolerance: this system must provide uninterrupted service with up to one component failure, and fail safely if two fail;
Specific defensive mechanisms: "these data shall be held in duplicate on two disks.

Rate of occurrence of failures: – "the probability that a failure of a flight control system will cause an accident with fatalities or loss of aircraft must be less than 10-9 per hour of flight“.

Probability of surviving mission: – The probability that the flight and ordnance control system in a fighter plane are still operational at the end of a two hour mission must be more than...

Слайд 36

2.7. The Means to attain Dependability Techniques 36 The development of

2.7. The Means to attain Dependability Techniques

36

The development of a

Dependable Computing System calls for the combined utilization of a set of four techniques:

• Fault prevention: how to prevent the occurrence or introduction of faults;

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Fault removal: how to reduce the number or severity of faults;

• Fault forecasting: how to estimate the present number, the future incidence and the likely consequences of faults.

• Fault tolerance: how to deliver correct service in the presence of faults.

Слайд 37

2.7.1. Fault Prevention 37 Fault Prevention is attained by quality control

2.7.1. Fault Prevention

37

Fault Prevention is attained by quality control techniques

employed during the design and manufacturing of hardware and software:

• They include structured programming, information hiding, modularization, etc., for software, and rigorous design rules and selection of high-quality, mass-manufactured hardware components for hardware.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Simple design, possibly at the cost of constraining functionality or increasing cost

• Formal proof of important properties of the design

• Provision of appropriate operating environment (air conditioning, protection against mechanical damage) intend to prevent operational physical faults, while training, rigorous procedures for maintenance, ‘foolproof’ packages, intend to prevent interaction faults.

Слайд 38

2.7.2. Fault Removal 38 Fault Removal is performed both during the

2.7.2. Fault Removal

38

Fault Removal is performed both during the development,

and during the operational life of a system.

• During development it consists of three steps: verification, diagnosis, correction.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Verification is the process of checking whether the system adheres to given properties. If it does not, the other two steps follow:

• After correction, verification should be repeated to check that fault removal had no undesired consequences; the verification performed at this stage is usually termed non-regression verification.

• Checking the specification is usually referred to as validation.

Слайд 39

2.7.2.1. Fault Removal during Development 39 Verification Techniques can be classified

2.7.2.1. Fault Removal during Development

39

Verification Techniques can be classified according

to whether or not they exercise the system.

• Without actual execution is static verification: static analysis (e.g., inspections or walk-through), model-checking, theorem proving.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Exercising the system is dynamic verification: either with symbolic inputs in the case of symbolic execution, or actual inputs in the case of testing.

• As well as verifying that the system cannot do more than what is specified important to safety and security.

• Important is the verification of fault tolerance mechanisms, especially a) formal static verification, and b) testing that includes faults or errors in the test patterns: fault injection.

Слайд 40

2.7.2.2. Fault Removal during the Operational Life 40 Fault Removal during

2.7.2.2. Fault Removal during the Operational Life

40

Fault Removal during the

operational life of a system is corrective or preventive maintenance.

• Corrective maintenance is aimed at removing faults that have produced one or more errors and have been reported.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Preventive maintenance is aimed to uncover and remove faults before they might cause errors during normal operation. a) physical faults that have occurred since the last preventive maintenance actions;
b) design faults that have led to errors in other similar systems.

• These forms of maintenance apply to non-fault-tolerant systems as well as fault-tolerant systems, that can be maintainable on-line (without interrupting service delivery) or off-line (during service outage).

Слайд 41

2.7.3. Fault Forecasting 41 Fault Forecasting is conducted by performing an

2.7.3. Fault Forecasting

41

Fault Forecasting is conducted by performing an evaluation

of the system behavior with respect to fault occurrence or activation.

• Qualitative Evaluation: aims to identify, classify, rank the failure modes, or the event combinations (component failures or environmental conditions) that would lead to system failures.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

• Qualitative Evaluation or probabilistic: which aims to evaluate in terms of probabilities the extent to which the relevant attributes of dependability are satisfied.

• Through either specific methods (e.g., FMEA for qualitative evaluation, or Markov chains and stochastic Petri nets for quantitative evaluation).

• Methods applicable to both forms of evaluation (e.g., reliability block diagrams, fault-trees).

Слайд 42

Reading List 42 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

42

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Бахмач Е.С.,

Герасименко А.Д., Головир В.А. и др. Отказобезопасные информационно-управляющие системы на программируемой логике / Под ред. Харченко В.С. и Скляра В.В. – Национальный аэрокосмический университет «ХАИ», Научно-производственное предприятие «Радий», 2008. – 380 с.
1.2 Гарантоспособность и ее свойства, с. 29 – 36; 1.4.2 Отказоустойчивость и отказобезопасность, с. 42 – 45.
Kharchenko V.S., Sklyar V.V. FPGA-based NPP Instrumentation and Control Systems: Development and Safety Assessment / Bakhmach E.S., Herasimenko A.D., Golovyr V.A. a.o.. – Research and Production Corporation “Radiy”, National Aerospace University “KhAI”, State Scientific Technical Center on Nuclear and Radiation Safety, 2008. – 188 p.
1.2 Dependability and its attributes, p. 16 – 34.
3. Avizienis A., Laprie J.-C, Randell B., Landwehr C. Basic Concepts
and Taxonomy of Dependable and Secure Computing // IEEE Transactions on Dependable and Secure Computing, 2004. Vol. 1. No. 1. P. 11- 33.
Слайд 43

Conclusion 43 2. Dependability threats consist of Faults, Errors and Failures.

Conclusion

43

2. Dependability threats consist of Faults, Errors and Failures.

1. Dependability

integrates a set of attributes, such as Availability, Reliability, Safety, Confidentiality, Integrity and Maintainability.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3. Measures of Dependability are defined using Reliability, Availability and Maintainability

5. Means to attain Dependability contain 4 Techniques: Prevention, Removal, Forecasting and Tolerance of Faults.

4. Safety can be considered as an extension of reliability

6. Evolution of the Dependability concept: Resilience, Survivability and Trustworthiness (Reliability of Results).

Слайд 44

Questions and tasks 44 What is the Dependability? What Dependability threats

Questions and tasks

44

What is the Dependability?
What Dependability threats of S-CES

do you know?
What kinds of faults do you know?
Define essence of Availability, Reliability, Safety, Confidentiality, Integrity and Maintainability.
What Components of Security do you know?
What Measures of Dependability do you know?
What Techniques are contained with Means to attain Dependability?
Define essence of Prevention, Removal, Forecasting and Tolerance of Faults.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 45

MODULE 2. Dependability of S-CES and their digital components 45 Lecture

MODULE 2. Dependability of S-CES
and their digital components

45

Lecture 3. Fault Tolerance

of S-CES and their digital components

3.2. Error Detection

3.3. Recovery

3.1. Introduction into Fault Tolerance

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4. Dependability Measures

3.5. Fault Tolerant Technologies

Слайд 46

3.1. Introduction into Fault Tolerance 46 Fault Tolerance is a base

3.1. Introduction into Fault Tolerance

46

Fault Tolerance is a base

of any S-CES and their components.

Fault Tolerance is the main mechanism, instrument ensuring Dependability

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.1.1. Motivation of Fault Tolerance Consideration

Reasons:

Fault Tolerance ensures operative resistance to hardware and software failures

Слайд 47

3.1.2. Related Works 47 Master Course. Co-Design and Testing of Safety-Critical

3.1.2. Related Works

47

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Dobson I., Randell B. Building Reliable Secure Computing Systems out of Unreliable Insecure components // Proc. of IEEE Conference on Security and Privacy, Oakland, USA. 1986. P. 186-193.
Authors defined “Secure-Fault Tolerance” and proposed a principle of its realization for various types of computer systems.

3. Lee P.A. and Anderson T., Fault Tolerance - Principles and Practice, second edition, Springer Verlag/Wien, 1990

2. Jean-Claude Laprie, Jean Arlat, Christian Beounes, Karama Kanoun and Catherine Hourtolle, Hardware and Software Fault Tolerance: Denition and Analysis of Architectural Solutions, in Proceedings FTCS 17, 1987

Слайд 48

3.1.3. Definition of Fault Tolerance 48 Fault Tolerance is intended to

3.1.3. Definition of Fault Tolerance

48

Fault Tolerance is intended to preserve the

delivery of correct service in the presence of active faults.

Effectiveness of Fault Tolerance: the effectiveness of error and fault handling mechanisms (their coverage) has a strong influence on Dependability Measures

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Fault Tolerance:
Error Detection
Recovery

Слайд 49

3.2. Error Detection 49 Error Detection defines the presence of an

3.2. Error Detection

49

Error Detection defines the presence of an error.

There

exist two classes of error detection techniques:
• concurrent error detection, which takes place during service delivery,
• preemptive error detection, which takes place while service delivery is suspended; it checks the system for latent errors and dormant faults.

Error detection originates an error signal or message within the system. An error that is present but not detected is a latent error.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Fault Tolerance is generally implemented by error detection and subsequent system recovery.

Слайд 50

3.3. Recovery 50 System Recovery transforms a system state that contains

3.3. Recovery

50

System Recovery transforms a system state that contains one

or more errors and (possibly) faults into a state without detected errors and faults that can be activated again.

Recovery consists of
Error Handling
Fault Handling (Fault treatment).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 51

3.3.1. Error Handling 51 Error Handling eliminates errors from the system

3.3.1. Error Handling

51

Error Handling eliminates errors from the system state.

Error

Handling may take three forms:
• Rollback: the state transformation consists of returning the system back to a saved state that existed prior to error detection; that saved state is a checkpoint;
• Compensation: the erroneous state contains enough redundancy to enable error elimination;
• Rollforward: the state without detected errors is a new state.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 52

3.3.2. Fault Handling 52 Fault Handling prevents located faults from being

3.3.2. Fault Handling

52

Fault Handling prevents located faults from being activated

again.

Fault Handling involves four steps:
• Fault Diagnosis: identifies and records the cause(s) of error(s), in terms of both location and type;
• Fault Isolation: performs physical or logical exclusion of the faulty components from further participation in service delivery, i.e., it makes the fault dormant;
• System Reconfiguration: either switches in spare components or reassigns tasks among non-failed components;
• System Reinitialization: checks, updates and records the new configuration and updates system tables and records.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 53

3.4. Fault-Tolerant Technologies 53 Fault-Tolerant Technologies traditionally used in co-design of

3.4. Fault-Tolerant Technologies

53

Fault-Tolerant Technologies traditionally used in co-design of S-CES:
Use

of Detecting and Correcting codes.
Majority Structures.
Multi-Version Systems.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Fault-Tolerant Technologies based on various kinds of Redundancy and Reconfiguration.

Operative nature of the opposition to faults in safety-critical I&CS determines the important role of the methods and means of On-Line Testing in maintenance of Fault Tolerance.

Слайд 54

3.4.1 Use of Detecting and Correcting codes 54 Residue check equations:

3.4.1 Use of Detecting and Correcting codes

54

Residue check equations:
KA +

KB = KS for an operation of addition A + B = S
KA ⋅ KB = KV for an operation of multiplication A ⋅ B = V
KB ⋅ KC + KD = KA for an operation of division A / B,
C = A div B, D = A mod B,
where KA, KB, KS, KV, KC, KD – residue check codes
by modulo m,
KA = A mod m, KB = B mod m, KS = S mod m,
KV = V mod m, KC = C mod m, KD = D mod m.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.1.1. Residue Checking for Error Detection in arithmetic components

Слайд 55

55 Blocks BCA and BCB check the operands A and B

55

Blocks BCA and BCB check the operands A and B

by computing the check codes KA and KB and also comparing them with the input check codes KA and KB. Results of comparison are the error indication codes KA and KB.
Block CB calculates the check code KR of the result R (R = S for addition and R = V for multiplication).
Block BCR checks the result R comparing its by modulo with the check code KR

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.1.1. Residue Checking for Error Detection in arithmetic components

Слайд 56

56 Code K3 K2 K1 defines number of an erroneous bit

56

Code K3 K2 K1 defines number of an erroneous bit 1,

2, 3, 4, 5, 6 or 7.
K1 = 1 ⊕ 3 ⊕ 5 ⊕ 7 Both the bit 1 and check bit k1 have number 1
K2 = 2 ⊕ 3 ⊕ 6 ⊕ 7 Both the bit 2 and check bit k2 have number 2
K3 = 4 ⊕ 5 ⊕ 6 ⊕ 7 Both the bit 4 and check bit k3 have number 4
For unique defining a number of the erroneous bit, the bits 1, 2 and 4 are eliminated: K1* = 3 ⊕ 5 ⊕ 7, K2* = 3 ⊕ 6 ⊕ 7, K1* = 5 ⊕ 6 ⊕ 7.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.1.2. Hamming Correcting Code for Memory Recover

Generating Matrix of linear code

Слайд 57

57 Circuit for Memory Recover using Hamming Correcting Code Master Course.


57

Circuit for Memory Recover using Hamming Correcting Code

Master Course. Co-Design

and Testing of Safety-Critical Embedded Systems

3.4.1.2. Hamming Correcting Code for Memory Recover

Слайд 58

Generating Matrix of correcting code for Majority Structures Majority circuit 3.4.2.

Generating Matrix
of correcting code for Majority Structures
Majority circuit

3.4.2. Majority Structures

58

Majority structure can be obtained using correcting code

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Majority element calculates carry function of full adder C = 12∨13∨23

The errors caused by input faults are not detected

Слайд 59

Multi-Version System (MVS) contains more than one version for solving a

Multi-Version System (MVS) contains more than one version for solving

a computing task.

The version is defined as a method of system function realization. For embedded systems it can be hardware means to solve a computing task.

Multi-Version System are aimed to provide protection against failure due to common reason:
Errors of design;
Physical Defects of Manufactory;
Faults during Operation.

3.4.3. Multi-Version Systems

59

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 60

Multi-Version System based on Diversity (Multi-Versity or Version Redundancy). Diversity means

Multi-Version System based on Diversity (Multi-Versity or Version Redundancy).

Diversity means a

type of redundancy based on introduction of two or more versions.

In regulatory documents the application of Version Redundancy goes under the name of “Principle of diversity”

3.4.3. Multi-Version Systems

60

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Nuclear engineering uses a class of MVS including two versions in accordance with international standards, such as:
IEC 61513:2001 ‘Nuclear power plants – Instrumentation and control systems important to safety – General requirements for systems’
IEC 62340:2007 ‘Nuclear power plants – Instrumentation and control systems important to safety – Requirements for coping with common cause failure’

Слайд 61

A two-version system W is described by quintuple: W = {X,

A two-version system W is described by quintuple:
W = {X,

F, Z, V, U},
where X and Z – input and output signals;
F – set of functions performed;
V – two-element set of versions v1, v2 with outputs U1, U2;
U – function of version execution results processing (representations of Z1, Z2 in Z).

Control signal Z (system output) is generated by solver in accordance with outputs of versions Z1 and Z2.
The solver may be realized as OR circuit if faulty version defines its output in ‘zero’ value.

3.4.3. Multi-Version Systems

61

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems
A Structure of two-version S-CES

Слайд 62

A Classification of Diversity Types 3.4.3. Multi-Version Systems 62 Master Course.

A Classification of Diversity Types

3.4.3. Multi-Version Systems

62

Master Course. Co-Design and

Testing of Safety-Critical Embedded Systems

Software diversity is the use of different programs designed and implemented by different development groups with different programming languages and tools to accomplish the same safety goals.

Equipment (hardware) diversity is the use of different equipment to perform similar safety functions in which different means sufficiently unlike as to significantly decrease vulnerability to common failure.

Human (life cycle) diversity is the use of different project groups with different key personnel to accomplish the same project goals.

Слайд 63

A Classification of Diversity Types 3.4.3. Multi-Version Systems 63 Master Course.

A Classification of Diversity Types

3.4.3. Multi-Version Systems

63

Master Course. Co-Design and

Testing of Safety-Critical Embedded Systems

Design diversity is the use of different approaches including both software and hardware, to solve the same or similar problem.

Signal diversity is the use of different sensed parameters to initiate protective action, In which any of parameters may independently indicate in abnormal condition, even if the other parameters fail to be sensed correctly.

Functional diversity is the use of different physical functions performing though they may have overlapping safety effects.

Слайд 64

3.4.3. Multi-Version Systems 64 Master Course. Co-Design and Testing of Safety-Critical

3.4.3. Multi-Version Systems

64

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Diversity types in FPGA-based S-CES
Слайд 65

Two-version system is considered as simplest MVS. It has only two

Two-version system is considered as simplest MVS. It has only two

independent versions. And requirement of independent versions is used for each two versions of MVS.
That’s why complexity of MVS is increased with growing amount of versions. And this complexity is the main limitation of multi-version technology development.

We offer a new set of MVS with strongly connected versions (SVS), which protects against failure due common reason having maximal common part of versions.

We revise requirement to undependability of versions and show that only common part of all versions should be absent for protecting against failure due common reason:

A1 ∩ … Ai ∩ … ∩ AN = ∅. (1)

65

3.4.4. Multi-Version Systems

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 66

Computer Systems with Strongly Connected Versions is MVS for which exception

Computer Systems with Strongly Connected Versions is MVS for which exception

of means for performance of any one version excludes opportunities of performance of any other version.

Let's designate addition to version Ai as

Then the determining attribute of SVS is that additions to versions do not include versions,

Ai = A \ Ai.

i.e. for i = 1 ÷ N and j = 1 ÷ N is carried out Ai ⊄ Aj. (2)

66

3.4.5. Computer Systems with Strongly Connected Versions

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 67

Basis for SVS creation are CS that have a modular structure

Basis for SVS creation are CS that have a modular structure

using sets of identical elements .

Identical elements of initial CS are united in identical sections

The amount of additional sections in SVS is less than the amount of sections in a version.

Structure of SVS

CS

SVS

67

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Слайд 68

A minimum quantity of versions in a SVS is three A

A minimum quantity of versions in a SVS is three

A

maximum quantity of versions in a SVS is achieved in case the section has one element:

Structure of SVS

CS

SVS

SVS is simplified with increase of versions quantity

68

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Слайд 69

The SVS becomes protected from failure due to the common reason

The SVS becomes protected from failure due to the common reason

using two components:

means of a choice of the true version.

the multitude of versions, that contains at least one true version;

Protection from Failure due to the Common Reason

69

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Слайд 70

Complexity of SVS QIE = R + R / K, QSVS

Complexity of SVS

QIE = R + R / K,

QSVS MIN = R (1+1/K) 2,

QCM = (K + 1) λ,

Complexity of SVS

QDC MIN/QSVS

MIN = 2(1–2K/(K+1) 2).

QSVS = QIE + QCM,

where QIE – complexity of identical elements; QCM – complexity of choice means.

where R – quantity of identical elements in CS; K – quantity of identical elements in CS; λ – coefficient of proportionality.

QDC MIN = 2R (1+1/K 2).

K = √R/λ

70

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Слайд 71

The SVS can be realized with: Choice of the True Version

The SVS can be realized with:

Choice of the True Version

a consecutive choice of the true version.

a parallel choice of the true version;

71

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Слайд 72

Choice of the true version is executed by the on-line testing

Choice of the true version is executed by the on-line testing

methods using means of hardware check

The version can be checked up using two approaches.

internal, i.e. check of each version by its own means.

external, i.e. check of total system;

The check of the version can be:

indirect, which estimates its addition.

direct, which estimates the version itself;

72

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Choice of the True Version

Слайд 73

A parallel choice of the true version is realized by the

A parallel choice of the true version is realized by the

internal check of versions.

Direct check puts the true version into operation

Change of versions is carried out before detection of the true version.

A consecutive choice of the true version is based on external check of versions.

Indirect check disconnects the incorrect addition of the true version.

73

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3.4.5. Computer Systems with Strongly Connected Versions

Choice of the True Version

Слайд 74

Reading List 74 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

74

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Бахмач Е.С.,

Герасименко А.Д., Головир В.А. и др. Отказобезопасные информационно-управляющие системы на программируемой логике / Под ред. Харченко В.С. и Скляра В.В. – Национальный аэрокосмический университет «ХАИ», Научно-производственное предприятие «Радий», 2008. – 380 с.
1.4.3 Принцип диверсности (многоверсионности), с. 45 – 47; 8.5 Жизненный цикл многоверсионных ИУС, с. 119 – 224.
Kharchenko V.S., Sklyar V.V. FPGA-based NPP Instrumentation and Control Systems: Development and Safety Assessment / Bakhmach E.S., Herasimenko A.D., Golovyr V.A. a.o.. – Research and Production Corporation “Radiy”, National Aerospace University “KhAI”, State Scientific Technical Center on Nuclear and Radiation Safety, 2008. – 188 p.
4.1 General concepts of multi-version system theory, p. 70 – 71.
4.1 Diversity types in FPGA-based I&C systems, p. 71 – 74.
3. Monographs of System Dependability. Dependability of Networks. – Wroclaw, Poland. – 2010. – 210 p.
3. Multi-version computer systems with use of strongly connected versions, p. 39 – 50.
Слайд 75

Conclusion 75 1. Fault Tolerance is a base of any S-CES

Conclusion

75

1. Fault Tolerance is a base of any S-CES and

their components ensuring Dependability.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5. Multi-Version System ensures resistance to failure due to common reason.

4. Fault-Tolerant Technologies based on various kinds of Redundancy and Reconfiguration using the methods and means of On-Line Testing.

6. Computer Systems with Strongly Connected Versions is simplified with increase of versions quantity.

2. Fault Tolerance of S-CES is executed by Error Detection and Recovery.

3. Recovery consists of Error Handling (rollback, compensation, rollforward) and Fault Treatment (Fault diagnosis and isolation, System reconfiguration and reinitialization).

Слайд 76

Questions and tasks 76 What is the Fault Tolerance? What kinds

Questions and tasks

76

What is the Fault Tolerance?
What kinds of the

Fault Tolerance do you know?
Recite the Error detection techniques.
What forms of Error Handling and Fault Treatment do you know?
What property of On-Line Testing is essential for Fault-Tolerant Technologies?
What is it “Principle of diversity”?
What types of Diversity do you know?
Define essence of Computer Systems with Strongly Connected Versions.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 77

MODULE 3. On-line testing for digital component of S-CES Master Course.

MODULE 3.
On-line testing for digital component of S-CES

Master Course. Co-Design

and Testing of Safety-Critical Embedded Systems

77

Слайд 78

MODULE 3. On-line testing for digital components of S-CES 78 Lecture

MODULE 3. On-line testing
for digital components of S-CES

78

Lecture 4. Processing

and checking of exact data

4.3. Self-checking circuits

4.4. Purpose of on-line testing

4.2. Stages of on-line testing development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.5. Model of exact data

4.6. Processing of exact and approximate data

4.7. Component on-line testing

4.1. Introduction into on-line testing

Слайд 79

4.1. Introduction into On-Line Testing 79 On-Line Testing is a base

4.1. Introduction into On-Line Testing

79

On-Line Testing is a base

of any S-CES and their components.

On-Line Testing is aimed to ensure reliability of the calculated results

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.1.1. Motivation of On-Line Testing Consideration

Reasons:

On-Line Testing ensures first response to hardware and software failures

Слайд 80

4.1.2. Related Works 80 Master Course. Co-Design and Testing of Safety-Critical

4.1.2. Related Works

80

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Metra C., Favalli M. and Ricco B. Concurrent Checking of clock signal correctness // IEEE Design & Test October 1998, P. 42 – 48.

5. Горяшко А. П. Синтез диагностируемых схем вычислительных устройств. – М.: Наука, 1987. – 288 c.

2. Touba N. A. and McCluskey E. J. Logic synthesis techniques for reduced area implementation of multilevel circuits with concurrent error detection // Proc. IEEE Inf. Conf. on Computer Aided Design. – 1994. – P. 651 – 654.

3. Metra C., Schiano L., Favalli M and Ricco B. Self-checking scheme for the on-line testing of power supply noise. – Proc. Design, Automation and Test in Europe Conf. Paris (France). – 2002. – P. 832 – 836.

4. Nicolaidis M. and Zorian Y. On-line testing for VLSI – a compendium of approaches // Electronic Testing: Theory and Application (JETTA). – 1998. – V. 12. – P. 7 – 20.

Слайд 81

4.1.3. Definition of On-Line Testing 81 It has many names: concurrent

4.1.3. Definition of On-Line Testing

81

It has many names:
concurrent checking

[1], concurrent error detection [2], executing an error detection simultaneously with work of the digital circuit (DC);
on-line testing operatively estimating a technical condition of DC [3];
hardware check in accordance with its hardware realization as against to program one [4];
built-in check as opposed to the remote check taking into account inseparable connection with circuit [5].

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

On-line testing is considered to be the check of digital circuit operation correctness over working influences.

Слайд 82

4.2. Stages of On-Line Testing Development 82 the initial stage; stage

4.2. Stages of On-Line Testing Development

82

the initial stage;
stage of becoming

– the development stage of self-checking circuits which expand the on-line testing for own means within the framework of the exact data processing;
the present stage expanding the on-line testing for processing of the approximate data.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

In development of on-line testing it is possible to select three stages:

Слайд 83

The basis of the theory and practice of on-line testing of

The basis of the theory and practice of on-line testing of

computer systems was made with achievements in the field of noiseless data transmission on distance.
Transmitter
Receiver

Data transmission on distance

4.2.1. Initial Stage of On-Line Testing Development

83

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 84

The noises on air deformed transmitted messages. 84 4.2.1. Initial Stage

The noises on air deformed transmitted messages.

84

4.2.1. Initial Stage of

On-Line Testing Development

Data transmission on distance

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 85

To transfer correct message the redundant coding the data with help

To transfer correct message the redundant coding the data with help

of correcting or detecting codes was used.

Noise combating code

85

4.2.1. Initial Stage of On-Line Testing Development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Data transmission on distance

Слайд 86

To transfer correct message the redundant coding the data with help

To transfer correct message the redundant coding the data with help

of correcting or detecting codes was used.

86

4.2.1. Initial Stage of On-Line Testing Development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 87

To transfer correct message the redundant coding the data with help

To transfer correct message the redundant coding the data with help

of correcting or detecting codes was used.
Correcting codes allow to correct errors restoring the message.

Correcting code

87

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 88

To transfer correct message the redundant coding the data with help

To transfer correct message the redundant coding the data with help

of correcting or detecting codes was used.
Detecting codes allow to check up correctness of the transmitted data. In case of error detection the message will be transferred again.

88

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Detecting code

Слайд 89

the elements of the transmitted message are coded by numbers from

the elements of the transmitted message are coded by numbers

from 0002 up to 1112.

For example,

89

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 90

The coder transforms they into words of the group code, which

The coder transforms they into words of the group code, which

can be defined by the generating array 2 with linear - independent words 1, 2 and 4.

90

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 91

The decoder detects an error if it is non-code word. The

The decoder detects an error if it is non-code word. The

code words are checked using the linear equation that defines check bits 4, 5 and 6 as the modulo 2 sum of the information bits 1, 2 and 3.

For example, bit 4 is equal to the modulo 2 sum of the bits 2 and 3.

91

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 92

In case the all equations are true, it is codeword, i.e.

In case the all equations are true, it is codeword, i.e.

correct, and otherwise it is non-codeword and it contains an error.

92

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 93

The equations defines the error detection circuit. If the circuit detects

The equations defines the error detection circuit. If the circuit detects

an error, its output E = 1, otherwise E = 0.

93

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.2.1. Initial Stage of On-Line Testing Development

Слайд 94

Coders and decoders were considered absolutely reliable during message transfer and

Coders and decoders were considered absolutely reliable during message transfer and

consequently were checked only by test in pauses of work.

It has been inherited by on-line testing, where the error detection circuits were used without checking while operation.

94

4.2.1. Initial Stage of On-Line Testing Development

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 95

In 1968 on the congress in Edinburgh Carter and Schneider for

In 1968 on the congress in Edinburgh Carter and Schneider for

the first time have paid attention to necessity to check the error detection circuit during its work.

To achieve this purpose, they have suggested to build the self-checking circuits.

It was the important step in development of on-line testing, which for the first time has been expanded on his error detection circuits.

95

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.3. Self-Checking Circuits

Слайд 96

A circuit is fault-secure for a set of faults F if

A circuit is fault-secure for a set of faults F if

for every fault in F the circuit never produces an incorrect codeword at the output for an input codeword.

A circuit is self-testing for a set of faults F if for every fault in F the circuit produces a non-codeword at the output for at least an input codeword.

If the circuit is both fault-secure and self-testing it is said to be totally self-checking.

Definitions

96

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4.3. Self-Checking Circuits

Слайд 97

A circuit is fault-secure for a set of faults F if

A circuit is fault-secure for a set of faults F

if for every fault in F the circuit never produces an incorrect codeword at the output for an input codeword.

A code distance d between codewords of the pair is an amount of their bits with the differ value.

If fault generates the error in t bits and t < d then the circuit is fault-secure because it produces non-codeword that can not be incorrect codeword.

0

1

2

3

4

5

6

7

d = 3

d = 4

4.3. Self-Checking Circuits

Fault-secure circuit

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

97

Слайд 98

A circuit is fault-secure for a set of faults F if

A circuit is fault-secure for a set of faults F

if for every fault in F the circuit never produces an incorrect codeword at the output for an input codeword.

A code distance d between codewords of the pair is an amount of their bits with the differ value.

If fault generates the error in t bits and t < d then the circuit is fault-secure because it produces non-codeword that can not be incorrect codeword.

4.3. Self-Checking Circuits

Fault-secure circuit

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Definition of fault-secure circuit determines how much information redundancy is needed to detect one fault.

98

Слайд 99

The self-testing property is aimed to create a condition at which

The self-testing property is aimed to create a condition at which

the first fault f1 should be detected prior to the second fault f2 of F has occurred.

This condition means that all input codewords should be obtained during the time-interval between faults f1 and f2 .

It is satisfied due to rare occurrence of faults.

operation cycle

4.3. Self-Checking Circuits

A circuit is self-testing for a set of faults F if for every fault in F the circuit produces a non-codeword at the output for at least an input codeword.

Self-Testing circuit

99

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 100

The self-testing property is aimed to create a condition at which

The self-testing property is aimed to create a condition at which

the first fault f1 should be detected prior to the second fault f2 of F has occurred.

This condition means that all input codewords should be obtained during the time-interval between faults f1 and f2 .

It is satisfied due to rare occurrence of faults.

4.3. Self-Checking Circuits

A circuit is self-testing for a set of faults F if for every fault in F the circuit produces a non-codeword at the output for at least an input codeword.

Self-Testing circuit

100

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 101

The self-testing property is aimed to create a condition at which

The self-testing property is aimed to create a condition at which

the first fault f1 should be detected prior to the second fault f2 of F has occurred.

This condition means that all input codewords should be obtained during the time-interval between faults f1 and f2 .

It is satisfied due to rare occurrence of faults and high-frequency operations of the computing circuits.

4.3. Self-Checking Circuits

A circuit is self-testing for a set of faults F if for every fault in F the circuit produces a non-codeword at the output for at least an input codeword.

Self-Testing circuit

101

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 102

The self-testing property is aimed to create a condition at which

The self-testing property is aimed to create a condition at which

the first fault f1 should be detected prior to the second fault f2 of F has occurred.

This condition means that all input codewords should be obtained during the time-interval between faults f1 and f2 .

The self-testing property is based on a high level of reliability and productivity of modern computing circuits.

4.3. Self-Checking Circuits

A circuit is self-testing for a set of faults F if for every fault in F the circuit produces a non-codeword at the output for at least an input codeword.

Self-Testing circuit

102

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 103

According to these definitions the designed circuit is not self-checking in

According to these definitions the designed circuit is not self-checking

in a set of stuck-at faults.

4.3. Self-Checking Circuits

Non-Self-Testing circuit

103

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Such circuit is not self-testing and not self-checking in set of the stuck-at faults.

Слайд 104

Such circuit is not self-testing and not self-checking in set of

Such circuit is not self-testing and not self-checking in set of

the stuck-at faults.

Stuck-at «0» fault in the points 2, 3 or 4 makes the error detection circuit also not self-checking.

4.3. Self-Checking Circuits

According to these definitions the designed circuit is not self-checking in a set of stuck-at faults.

Non-Self-Testing circuit

104

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 105

4.3. Self-Checking Circuits In order to design self-checking circuit the bits

4.3. Self-Checking Circuits

In order to design self-checking circuit the bits 4,

5 and 6 are complemented with their inverse bits 4, 5 and 6.

Design of Self-Checking circuit

105

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 106

If even one input pair contains equal bits the output pair

If even one input pair contains equal bits the output pair

will contain equal bits too.

SELF-CHECKING CIRCUITS

4.3. Self-Checking Circuits

This circuit contains Carter's unit (UC), which will transform two pairs of inverse bits X1=¬X2 and Y1=¬Y2 to one pair of inverse bits F1=¬F2.

Design of Self-Checking circuit

106

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 107

If even one input pair contains equal bits the output pair

If even one input pair contains equal bits the output pair

will contain equal bits too.

SELF-CHECKING CIRCUITS

4.3. Self-Checking Circuits

This circuit contains Carter's unit (UC), which will transform two pairs of inverse bits X1=¬X2 and Y1=¬Y2 to one pair of inverse bits F1=¬F2.

Design of Self-Checking circuit

107

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

The self-checking circuit has two bits output E{1,2}.

Слайд 108

The next decades on-line testing has received wide development in a

The next decades on-line testing has received wide development in a

part of the self-checking circuit.

Using parity, residue and other methods of checking, the self-checking circuits were designed:
self-checking combinational circuits;
self-checking asynchronous and synchronous sequential machines;
self-checking Adders and ALUS, Multiply and Divide Arrays.

4.3. Self-Checking Circuits

Design of Self-Checking circuit

108

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 109

The definitions of self-checking circuit have executed an important role in

The definitions of self-checking circuit have executed an important role in

on-line testing development.

There were determined:
conditions to detect faults using resources required for one error;
requirements to on-line testing methods to detect a fault using the first error produced in computed result;
high level reliability and productivity of modern computing circuits.

4.3. Self-Checking Circuits

Value of Self-Checking circuit

109

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 110

However, the definitions of self-checking circuit have also negative influence on

However, the definitions of self-checking circuit have also negative influence on

on-line testing development.

They have fixed the following dogmas:

Purpose of on-line testing is to detect a fault of the circuit.

On-line testing methods have to detect a fault using the first error produced in computed result.

The correct circuit calculates a reliable result, and non-reliable result is computed only on faulty circuit.

4.4. Purpose of On-Line Testing

Dogmas of Self-Checking Circuit Theory

110

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 111

The correct circuit calculates a reliable result, and non-reliable result is

The correct circuit calculates a reliable result, and non-reliable result is

computed only on faulty circuit.

Is this truth?

the correct circuit is necessary only to calculate reliable result, and in itself is not meaningful.

The truth is that

4.4. Purpose of On-Line Testing

Dogmas of Self-Checking Circuit Theory

111

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 112

What is a purpose of on-line testing? Today the purpose of

What is a purpose of on-line testing?

Today the purpose of on-line

testing comes from definitions of self-checking circuits.

Purpose of on-line testing is

to detect a fault of the circuit

to estimate reliability of the circuit

to answer a question “Is the circuit correct or not?”

during the main operations

using actual data.

or

4.4. Purpose of On-Line Testing

Dogmas of Self-Checking Circuit Theory

112

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 113

What is a purpose of on-line testing? Today the purpose of

What is a purpose of on-line testing?

Today the purpose of on-line

testing comes from definitions of self-checking circuits.

This presentation will show that declared purpose

defies common sense

contradicts actual on-line testing application

is not achievable for self-checking circuits

during the main operations

using actual data.

and

4.4. Purpose of On-Line Testing

Dogmas of Self-Checking Circuit Theory

113

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 114

Creation of the critical conditions is the best way to detect

Creation of the critical conditions is
the best way to detect

a fault!

Purpose of on-line testing is to detect a circuit fault during the main operations using actual data.

Declared purpose defies common sense.

Let’s consider computational process as a plane flight.

Detection of the plane faults should be carried out before the flight start.

Search for faults during the flight would extremely surprise the passengers.

Creation of the critical conditions is
the best way to detect a fault!

The fault can be much more efficiently detected using the off-line testing methods during pauses of the operations.

4.4. Purpose of On-Line Testing

114

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 115

Search of faults during computations defies common sense as detection of

Search of faults during computations defies common sense as detection of

mines using farmers (actual data).

Faulty circuit can be considered as a mine field.

Test input words are minesweepers that detect mines before the main operations.

Actual data is a farmer working in the field.

Circuit fault is a mine.

4.4. Purpose of On-Line Testing

Purpose of on-line testing is to detect a circuit fault during the main operations using actual data.

Declared purpose defies common sense.

115

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 116

Declared purpose contradicts actual application. The errors are produced by transient

Declared purpose contradicts actual application.

The errors are produced by transient and

permanent faults.

Transient faults occur much more often than permanent faults.

Therefore, as a rule, the first detected error is produced by transient fault.

Transient faults are valid for a short period of time.

Therefore, after this period a circuit will be correct again.

That’s why on-line testing is not used for circuit fault detection.

4.4. Purpose of On-Line Testing

Purpose of on-line testing is to detect a circuit fault during the main operations using actual data.

116

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 117

Purpose of on-line testing is to answer a question “Is the

Purpose of on-line testing is to answer a question “Is the

circuit correct or not?”

Declared purpose is not achievable for self-checking circuits

The first detected error can be produced
by either transient or permanent faults.

In case of transient fault
the conclusion that the circuit is faulty will not be true after a short period of time.

The first detect is not enough to identity the permanent fault. It requires to detect many errors.

Therefore, the first detected error cannot answer a question "Is the circuit faulty or not?"

4.4. Purpose of On-Line Testing

117

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 118

Actual purpose of on-line testing is to detect an error, which

Actual purpose of on-line testing is

to detect an error, which

reduces reliability of the calculated result

to estimate reliability of the calculated result

to answer a question “Is the result reliable or not?”

during the main operations using actual data.

or

4.4. Purpose of On-Line Testing

118

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Actual purpose of on-line testing can be derived from the practice of its application.

The correct circuit is only necessary to get a reliable result from actual data.

That is why reliability of the circuit by itself should not be the subject of estimation during the main operations.

Слайд 119

Declared purpose Declared vs. Actual purpose Actual purpose is to estimate

Declared purpose

Declared vs. Actual purpose

Actual purpose

is to estimate reliability

of a result

is to estimate reliability of a circuit

Correct circuit is only required to get a reliable result from actual data

The result is checked to answer a question “Is a circuit correct or faulty”

Means to achieve purpose

PURPOSE

4.4. Purpose of On-Line Testing

119

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 120

This model means that all numbers irrespectively of their true nature

This model means that
all numbers
irrespectively of their true nature

are considered as
exact data.

What is the reason to declare incorrect purpose?

This reason is the Model of Exact Data

4.5. Model of Exact Data

120

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 121

The universe of the approximated data The universe outside of an

The universe of the approximated data

The universe outside of an error


does not exist, does not develop, cannot be studied.

The error is a difference between absolute and relative trues,
i.e. the universe is learnt by means of an error.

Development of the universe is carried out
by a trial and error method.

All exists within the limits of admissions.
The right to make an error is the right to exist.

Quantitative estimations of all things in the universe are numbers with admissions, which are their vital space.
These numbers are the approximated data.

4.5. Model of Exact Data

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

121

Слайд 122

All values of codeword can be mapped to the respective ordinal

All values of codeword can be mapped to the respective ordinal

numbers. They are integers by nature and belong to Exact Data. Everything that can be written down in a field of a computer format is the exact data as well as it can be numbered.

For example, 4-bits codeword has the following values and their ordinal numbers:

What is Exact Data?

The Exact Data enumerates elements of a set, i.e., it includes only “integers by nature”.

0 0 0 0

0

0 0 0 1

1

0 0 1 0

2

0 0 1 1

3

4.5. Model of Exact Data

122

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 123

The exact data model means that all numbers irrespectively of their

The exact data model means that all numbers irrespectively of their

true nature are considered as exact data.

Many concepts
first of all connected to a computer,
are under influence of model of the exact data

4.5. Model of Exact Data

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

123

Слайд 124

On-line testing is based on the Model of Exact Data This

On-line testing is based on the Model of Exact Data

This

logic is based on assumption that
the correct circuit calculates a reliable result always,
and non-reliable result is received only on faulty circuit.

It is true only
in case of exact data.

self-checking circuit techniques to obtain reliable results on correct circuit only;

4.5. Model of Exact Data

124

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 125

On-line testing is based on the Model of Exact Data All

On-line testing is based on the Model of Exact Data

All

errors are essential for reliability of an exact result.

This identifies the declared and actual purposes
for the case of exact data.

A detected error concurrently shows that the calculated result is non-reliable and the circuit has a fault.

4.5. Model of Exact Data

125

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

the declared on-line testing purpose to estimate reliability of a circuit through detection of its fault;

Слайд 126

Every error in exact result makes it non-reliable and the computing

Every error in exact result makes it non-reliable and the computing

task terminates abnormally.

The first error detection allows to recalculate this result as soon as it is possible in case of exact data.

The first error detection is the fastest way to receive reliable results in case of exact data.

the main requirement to on-line testing methods: detect the first error produced by the circuit fault;

4.5. Model of Exact Data

126

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

On-line testing is based on the Model of Exact Data

Слайд 127

self-checking circuit techniques to obtain reliable results on correct circuit only;

self-checking circuit techniques to obtain reliable results on correct circuit only;


the declared on-line testing purpose to estimate reliability of a circuit through detection of its fault;

the main requirement to on-line testing methods: detect the first error produced by the circuit fault;

the on-line testing development within the framework of the exact data processing only.

4.5. Model of Exact Data

127

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

On-line testing is based on the Model of Exact Data

Слайд 128

Reading List 128 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

128

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Дрозд А.

Этапы развития рабочего диагностирования вычислительных устройств / А. Дрозд // Компьютерные науки и технологии. – Варна (Болгария), 2009. – № 1. – С. 44 – 50.
Пархоменко П. П., Согомонян Е. С. и др. Основы технической диагностики. – М.: Энергия, 1981. – 320 c.
Согомонян Е. С., Слабаков Е. В. Самопроверяемые вычислительные устройства и системы (обзор) // Автоматика и телемеханика. – 1981. – № 11. – С. 147 – 167.
Согомонян Е. С., Слабаков Е. В. Самопроверяемые устройства и отказоустойчивые системы. – М.: Радио и связь, 1989. – 208 с.
Дрозд А.В. Нетрадиционный взгляд на рабочее диагностирование вычислительных устройств // Проблемы управления. – 2008. – № 2. – С. 48 – 56. с.
Дрозд А.В. Нетрадиционный взгляд на рабочее диагностирование вычислительных устройств / А.В. Дрозд // Автоматизированные системы управления и приборы автоматики. – 2009. – Вып. 147. – С. 15 – 24.
Слайд 129

Conclusion 129 1. On-line testing is a base of any S-CES

Conclusion

129

1. On-line testing is a base of any S-CES and

their components ensuring reliability of calculated results.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4. Self-checking circuits theory defines a purpose of on-line testing as estimation of the circuit reliability, however the actual purpose is checking the result reliability.

5. Model of exact data defines development of on-line testing within the framework of the exact data processing

2. In development of on-line testing it is possible to select three stages: the initial stage, stage of becoming – self-checking circuits development expanding the on-line testing for own means within the framework of the exact data processing, the present stage of on-line testing development for processing of the approximate data.

3. Totally self-checking circuits detect the faults using the first error of the calculated results

Слайд 130

Questions and tasks 130 What names of on-line testing do you

Questions and tasks

130

What names of on-line testing do you know?
Recite

the stages of on-line testing.
Describe the initial stage of on-line testing development.
What conditions of self-checking circuits do you know?
What does fault security and self-testing mean?
What purpose of on-line testing follows from definitions of a self-checking circuit?
What is actual purpose of on-line testing?
What is Exact Data?
What is the Model of Exact Data?
Describe the role which the Model of Exact Data plays in on-line testing development.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 131

MODULE 3. On-line testing for digital components of S-CES Lecture 5.

MODULE 3. On-line testing
for digital components of S-CES

Lecture 5. Approximate

Data Processing

5.3. Complete and Truncated Operations

5.4. Features of Approximate Data Processing

5.2. Floating-point Formats and Arithmetic

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.5. Probability of an essential error

5.1. Introduction into Approximate Data Processing

131

Слайд 132

5.1. Introduction into Approximate Data Processing The majority of processed numbers

5.1. Introduction into Approximate Data Processing

The majority of processed

numbers is approximate data and their volume only increase.

Our Universe is approximate and all in it are structured under its realities including computer Processing

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.1.1. Motivation of Approximate Data Processing Consideration

Reasons:

That’s why Universe generates approximate data

132

Слайд 133

5.1.2. Related Works Master Course. Co-Design and Testing of Safety-Critical Embedded

5.1.2. Related Works

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Гук М. Процессоры Intel: от 8086 до Pentium II / Гук М. – СПб: Питер, 1997. – 224 c.

5. Drozd A. On-line testing of computing circuits at approximate data processing / A. Drozd // Радіоелектроніка та інформатика. 2003. № 3. – С. 113 – 116.

2. ANSI/IEEE Std 754-1985. IEEE Standard for Binary Floating-Point Arithmetic. IEEE, New York, USA, 1985. – 18 c.

3. Рабинович З. Л., Раманаускас В. А. Типовые операции в вычислительных машинах. – Киев: Техника, 1980. – 264 c.

4. Савельев А. Я. Прикладная теория цифровых автоматов. – М.: Высш. шк., 1987. – 272 c.

133

6. Демидович Б.П., Марон И.А. Основы вычислительной математики. – М.: Физматгиз, 1966. – 664 с.

Слайд 134

2. Like special dedicated computing systems. 1. Like reactor-trip systems for

2. Like special dedicated computing systems.

1. Like reactor-trip systems for

nuclear power plants.

Sensors

Comparators

Processor

Sensors

Processor

Comparators

Two kinds of the S-CES:

5.1.3. Data processed in the S-CES

RM , RE and RA – are the results of measurements, exact and approximate data processing accordingly

Processor of the first kind of S-CES operates with exact data

Processor of the second kind of S-CES operates with approximate data

134

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 135

Approximate data Approximate data contain results of measurements and are processed

Approximate data

Approximate data contain results of measurements and are processed in

floating-point format.

A significance of approximate data processing rapidly increases with the computers development.

For example, Intel processors 286 and 386 are complemented in PC by outside coprocessors 287 and 387 operating with floating-point formats.

Starting from processor Intel 486DX the inside coprocessors are used for operating with floating-point formats.

Pentium-processors have pipeline inside coprocessors.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.1.3. Approximate Data Processing

135

Слайд 136

Normal form of data representation Let a computer works with 8-bit

Normal form of data representation

Let a computer works with 8-bit codeword

in range from
0000 00002 ÷ 1111 11112 or 0 ÷ 255.

However it is necessary to solve a computing task in range
0 ÷ 1000.

For example, it needs to calculate 800 + 100.

This problem was decided using scale index kМ ≥ 1000 / 255

Initial data transforms from range of the computing task into range of the codeword:
kМ = 4: 800 / 4 = 200; 100 / 4 = 25; 200 + 25 = 225;
Restoring range of the computing task: 225 × 4 = 900.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats and Arithmetic

136

Слайд 137

Normal form of data representation So, Normal form of data representation

Normal form of data representation

So, Normal form of data representation using

two components have discovered:
m × kМ,
where m is mantissa or significant;
kМ = B E - scale index;
B - base of numerical system; E - exponent;

The exact data are represented in true form using one component because volume of range and accuracy strongly connected between themselves by size of the codeword.

Approximate data are represented in normal form using two components by reason of significantly different requirements advanced to volume of range and accuracy.

Size of mantissa determines accuracy and exponent size – range.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats and Arithmetic

137

Слайд 138

Normal form of data representation Normal form m × BE represents

Normal form of data representation

Normal form m × BE represents data

using operation of multiplication in a record of floating-point numbers.

That’s why
multiplication is presented in all operations executed with mantissas;
operations with mantissas and their results inherits the properties and features of a multiplication and a product accordingly

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats and Arithmetic

138

For example,
an addition of mantissas is executed by matching the exponents shifting one of the mantissas, where shift is special case of multiplication.
a results of two-place operation has double size

Слайд 139

Extended Formats: Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Extended Formats:

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2.

Floating-point Formats and Arithmetic

139

Standard IEEE-754 (1985)

Base Formats

Single Formats

Double Formats

Single and Double

Слайд 140

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.2. Floating-point

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats

and Arithmetic

140

Standard IEEE-754 (1985)

Слайд 141

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.2. Floating-point

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats

and Arithmetic

141

Standard IEEE-754 (1985)

Слайд 142

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.2. Floating-point

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.2. Floating-point Formats

and Arithmetic

142

Standard IEEE-754 (1985)

Real number in true form

Слайд 143

5.3. Complete and Truncated Operations 143 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.3. Complete and Truncated Operations

143

Master Course. Co-Design and Testing of Safety-Critical

Embedded Systems
Слайд 144

5.3. Complete and Truncated Operations Truncated multiplication 144 Master Course. Co-Design

5.3. Complete and Truncated Operations

Truncated multiplication

144

Master Course. Co-Design and Testing

of Safety-Critical Embedded Systems

A{1 ÷ n}:

B{1 ÷ n}:

V{1 ÷ 2n}:

n = 8

V{1 ÷ 2n – k}:

V{1 ÷ k}:

k

k = n – log2n

k = 5

Truncated multiplication with mantissas reduces almost twice hardware overhead
and time operation without lowering an accuracy

Слайд 145

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.3. Complete

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.3. Complete and

Truncated Operations

145

Truncated restoring division

Truncated restoring division with mantissas reduces almost twice hardware overhead
and time operation without lowering an accuracy

Слайд 146

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.3. Complete

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.3. Complete and

Truncated Operations

146

Truncated non-restoring division

Truncated non-restoring division with mantissas reduces almost twice hardware overhead
and time operation without lowering an accuracy

Слайд 147

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 5.3. Complete

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.3. Complete and

Truncated Operations

147

Truncated operation of shift in mantissa addition

Truncated operation of mantissas shift
twice reduces hardware overhead
without lowering an accuracy

Слайд 148

Deleting of low bits of the calculated result An approximate number

Deleting of low bits of the calculated result

An approximate number

A is represented as a product. For example in floating-point format
A = m B E
where m is mantissa; B is a base of notation; E is an exponent.

A product of two operands doubles a size of the result.

Therefore, the main floating-point formats have a single precision.

According to the error theory, a number of exact bits in a result does not exceed a number of exact bits in the operand.

148

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.4. Features of approximate data processing

Слайд 149

Addition of one million with one million of units by implementing

Addition of one million with one million of units by

implementing the binary operations with codeword size n < 20

Addition of one million to a unit renders the result of one million because the unit is lost during the exponents matching.
One million of such operations also renders the result equal to the first number, which is one million.

5.4. Features of approximate data processing

2. Data processing in extended formats

149

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 150

To restore the associative law, the size of the codeword should

To restore the associative law, the size of the codeword

should be increased.
The correct circuit can calculate non-reliable result.

5.4. Features of approximate data processing

2. Data processing in extended formats

150

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Addition of one million with one million of units by implementing the binary operations with codeword size n < 20

Слайд 151

This action is frequently executed in such operations as addition, subtraction

This action is frequently executed in such operations as addition, subtraction

and matching operands.

Mantissa of the number with the smaller exponent is shifted down with loss of least significant bits (LSB).
Then, the LSB in the result of all previous operations are eliminated from further calculations.

5.4. Features of approximate data processing

3.1. Denormalization of an operand mantissa at the matching the exponents

151

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 152

This action is executed with results in such operations as addition,

This action is executed with results in such operations as addition,

subtraction and multiplication.

Mantissa of the result is cyclic shifted to the left with filling the low position by LSB.
Then, the result of all following operations contain the additional LSB.

5.4. Features of approximate data processing

3.2. Normalization of the result mantissa

152

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 153

The error produced by a fault of the computing circuit considered

The error produced by a fault of the computing circuit considered

as essential error if it reduces the number of exact bits in final result. Otherwise it is considered as inessential.

Definition:

An approximate result has exact most significant bits (MSB) and non-exact LSB:

5.5. Probability of an Essential Error

Essential and Inessential Errors

153

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 154

1. Error elimination with discarded bits of the result K1 =

1. Error elimination with discarded bits of the result

K1 = n / nс

K1 = 0.5

The faulty circuit

can calculate the reliable result in case of inessential errors.

Eliminated errors are inessential.

A half of all errors is inessential.

Factor K1 defines a share of errors remained after elimination of LSB.

n and nс are numbers of kept and total calculated bits.

The factors lowering a probability of essential error

154

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

5.5. Probability of an Essential Error

Слайд 155

K2 = nE / n nE and n are the number

K2 = nE / n

nE and n are the number of exact bits and total number of bits

in enlarged mantissa of the extended format.

Factor K2 defines a share of essential errors in extended format.

In the formats for floating-point arithmetic on PC size of mantissa increases 2.7 times from 24 bits in a single format up to 64 bits in a double extended format.

5.5. Probability of an Essential Error

The factors lowering a probability of essential error

2. Increase of a share of inessential errors with use of the extended formats

155

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 156

OS and OC are the hardware overhead of computing circuits preceding

OS and OC are the hardware overhead of computing circuits preceding a shifter and

total number of computing circuits.

For series of denormalization, K3 is defined as a product of the factors K3.1 calculated for each of these operations.

5.5. Probability of an Essential Error

The factors lowering a probability of essential error

3.2. Elimination of errors in results of all previous operations

156

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 157

1 ... n-d 1 ... n-d OS and OC are the

1 ... n-d

1 ... n-d

OS and OC are the hardware overhead of computing circuits

following after a shifter and total number of computing circuits.

For series of normalization, K3 is defined as a product of the factors K3.2 calculated for each of these operations.

5.5. Probability of an Essential Error

The factors lowering a probability of essential error

3.2. Reducing the essential errors amount in results of operations following after normalization

157

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

with inessential errors in results of all next operations

Слайд 158

Probability that the occurred error is essential PE = K1 K2

Probability that the occurred error is essential
PE = K1 K2 K3

PE << 1

For approximate

data processing the majority of errors produced by the circuit faults belongs to inessential errors.

5.5. Probability of an Essential Error

The factors lowering a probability of essential error

158

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 159

Reading List 159 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

159

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Полин Е.

Л. Арифметика ЭВМ . Часть 2 / Одеськ. нац. політехніч. ун.-т. – Одеса: АО Бахва, 2002. – 150 с.
7.1.3. Свойства формата с плавающей точкой, с. 115 – 122.
7.2. Стандарт IEEE 754, с. 123 – 131.
2 Дрозд О.В. Контроль за модулем обчислювальних пристроїв. Навч. посібн. для студ. спеціальності 7.091501 – «Комп’ютерні та інтелектуальні системи та мережі» / Одеськ. нац. політехніч. ун.-т. – Одеса: АО Бахва, 2002. – 144 с.
3.1. Скорочення обчислень у ОП, с. 51 – 74.
Дрозд А. Этапы развития рабочего диагностирования вычислительных устройств / А. Дрозд // Компьютерные науки и технологии. – Варна (Болгария), 2009. – № 1. – С. 44 – 50.
Слайд 160

Conclusion 160 1. The majority of processed numbers is approximate data

Conclusion

160

1. The majority of processed numbers is approximate data and

their volume only increase.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

4. The truncated operations are the main methods for processing mantissas in floating-point formats.

5. The errors produced by the circuit faults in MSB and LSB of approximated results are essential and inessential accordingly

2. Approximate data contain results of measurements and are processed in normal form using the floating-point formats, such as Standard IEEE 754 formats.

3. Approximate data are represented using two components by reason of significantly different requirements advanced to volume of range and accuracy: size of mantissa determines accuracy and exponent size – range.

6. Features of approximate data processing determine factors significantly lowering a probability of an essential error which is the general parameter of on-line testing objects.

Слайд 161

Questions and tasks 161 What role do the approximate data play

Questions and tasks

161

What role do the approximate data play in computer

processing?
What kind of the approximate data do you know?
Describe the issues of Standard IEEE 754.
Why approximate data are represented using two components?
What role do the truncated operations play in mantissa processing?
What are the essential and inessential errors?
What features of approximate data processing do the factors lowering a probability of an essential error determine?
What role do the probability of an essential error play in on-line testing?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 162

MODULE 3. On-line testing for digital components of S-CES Lecture 6.

MODULE 3. On-line testing
for digital components of S-CES

Lecture 6. Reliability

of on-line testing methods

6.4. Residue checking a truncated multiplication

6.5. Residue checking a truncated division of mantissas

6.2. The ways for increasing on-line testing reliability

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking a truncated operation of shift

6.1. Reliability of traditional on-line testing methods

162

6.3. The first way for increasing on-line testing reliability

Слайд 163

6.1. Reliability of traditional on-line testing methods Estimation in reliability of

6.1. Reliability of traditional on-line testing methods

Estimation in reliability of

traditional on-line testing methods should be revised.

Our universe is approximate and all in it are structured under its realities including on-line testing methods

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.1.1. Motivation of traditional on-line testing methods reliability consideration

Reasons:

Traditional on-line testing methods have been developed for exact data processing and was estimated within framework of Exact Data Model.

163

Слайд 164

6.1.2. Related Works Master Course. Co-Design and Testing of Safety-Critical Embedded

6.1.2. Related Works

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Журавлев Ю. П., Котелюк Л. А., Циклинский Н. И. Надежность и контроль ЭВМ. – М.: Советское радио, 1978. – 416 c.

2. Щербаков Н. С. Достоверность работы цифровых устройств. – М.: Машиностроение, 1989. – 224 c.

4. Рабинович З. Л., Раманаускас В. А. Типовые операции в вычислительных машинах. – Киев: Техника, 1980. – 264 c.

5. Савельев А. Я. Прикладная теория цифровых автоматов. – М.: Высш. шк., 1987. – 272 c.

164

6. Граф Ш., Гессель М. Схемы поиска неисправностей. – М.: Энергоатомиздат, 1989. – 144 с.

3. Согомонян Е. С., Слабаков Е. В. Самопроверяемые устройства и отказоустойчивые системы. – М.: Радио и связь, 1989. – 208 с.

Слайд 165

Traditionally, reliability of on-line testing method is estimated and considered as

Traditionally, reliability of on-line testing method is estimated and considered as

probability of error detection

6.1.3. What is reliability of on-line testing methods?

Such view on reliability of on-line testing method does not take into account features of on-line testing objects:

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

165

Reliability of on-line testing method should be considered using two parameters:
probability of error detection characterizing an on-line testing method;
probability of essential error characterizing an on-line testing object.

Слайд 166

Reliability of on-line testing method can be considered using unit-side square.

Reliability of on-line testing method can be considered using unit-side square.


6.1.3. What is reliability of on-line testing methods?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

166

РE is a probability of an essential error

PDN is a probability of inessential error detection.

РE

РN
РD
РS

PDE is a probability of essential error detection.

РD is a probability of error detection

PSN is a probability of inessential error skipping.

PSE is a probability of essential error skipping.

РN is a probability of an inessential error РN = 1 – РE

РS is a probability of error skipping РS = 1 – РD

PDE +
+ PDN +
+ PSE +
+ PSN = 1

Слайд 167

Reliability of on-line testing methods is defined on dependence of the

Reliability of on-line testing methods is defined on dependence of the

purpose of on-line testing

6.1.3. What is reliability of on-line testing methods?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

167

РE

РN
РD
РS

Estimation of on-line testing method Reliability as a Probability of error detection ignoring a Probability of essential error follows from the Model of Exact Data.

According to declared purpose of on-line testing a method is reliable if the circuit fault is detected irrespectively of error type (essential or inessential).

RDR = PDE + PDN =
= PD

Слайд 168

Reliability of on-line testing methods is defined on dependence of the

Reliability of on-line testing methods is defined on dependence of the

purpose of on-line testing

6.1.3. What is reliability of on-line testing methods?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

168

РE

РN
РD
РS

According to actual purpose of on-line testing a method is reliable if correctly estimates a calculated result as reliable or non-reliable.

RAR = PDE + PSN =
= PD PE + (1 - PD) (1 - PE)

An on-line testing method defines a result as non-reliable by the error detection. However an actual tag of non-reliable result is essential error occurrence.

it states the truth about the result: detects the essential errors in case of non-reliable result and skip inessential ones otherwise.

Reliability of on-line testing method is consist of the checking the results

Слайд 169

Traditional on-line testing methods based on totally self-checking circuit theory have

Traditional on-line testing methods based on totally self-checking circuit theory have

high detection probability PD >> PS.

Exact results have probability PE = 1.

Traditional on-line testing methods demonstrate high reliability in checking the exact results.

6.1.4. Reliability of on-line testing methods for exact data
РD

РS

1
РDE

3 РSE

RAR = PDE + PSN = PD PE + (1 - PD) (1 - PE)

РE

169

RAR = PD

RAR → 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 170

1. Traditional on-line testing methods based on self-checking circuit theory within

1. Traditional on-line testing methods based on self-checking circuit theory

within framework of the Model of Exact Data have high probability of error detection PD.

РE

РN
РS
РD

6.1.5. Low reliability of traditional on-line testing methods

RAR = PDE + PSN = PD PE + (1 - PD) (1 - PE)

2. Approximate results have low probability of essential error PE

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

170

Reliability of traditional on-line testing methods contains low parts 1 and 4 of unit-side square: RAR → 0.

Слайд 171

3. The part 2 demonstrates a new property of an on-line

3. The part 2 demonstrates a new property of an

on-line testing method to eject reliable results. For exact data ejection of reliable results can be only in case of fault in error detection circuit.

РE

РN
РS
РD

6.1.5. Low reliability of traditional on-line testing methods

New property of on-line testing methods

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

171

An on-line testing method becomes approximate as our Universe.

1. A difference between declared and actual purpose of on-line testing is defined by the part 2 describing a probability of inessential error.

2. This part 2 is largest in unit-side square and its area is close to unit: PDN → 1

Слайд 172

CURRENT VIEW Existing on-line testing is applicable to any type of

CURRENT VIEW
Existing on-line testing is applicable to any type of data.
A

purpose of on-line testing is to estimate reliability of computing circuit.
All processed numbers are considered as the exact data.
All errors are essential for reliability of computed result.
Traditional on-line testing methods have high reliability: detect almost all errors and faults. 

NEW VIEW
Existing on-line testing is applicable to the exact data only.
A purpose of on-line testing is to estimate reliability of computation result.
Processed numbers are in most cases approximate data.
Basically, the errors are inessential.
Traditional on-line testing methods have low reliability of result checking: mainly detect inessential errors. 

172

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.1.5. Low reliability of traditional on-line testing methods

COMPARISON

Слайд 173

1. РE > 0,5 D = РD Р E + (1-РD

1. РE > 0,5

D = РD Р E + (1-РD )(1-Р

E)

D↑= РD ↑ Р E ↑ или РS↓ Р N ↓

2. РE < 0,5

3. РD-E > РD-N

173

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.2. The ways for increasing on-line testing reliability

Слайд 174

D = РD Р E + (1-РD )(1-Р E) D↑= РD

D = РD Р E + (1-РD )(1-Р E)

D↑= РD ↑

Р E ↑ or РS↓ Р N ↓

174

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.2. The ways for increasing on-line testing reliability

Слайд 175

D↑= РD ↑ Р E ↑ (РE > 0,5) & (РD

D↑= РD ↑ Р E ↑

(РE > 0,5) & (РD >

0,5)

1. The first way is increasing the part 1 of unit-side square raising a probability of essential error

175

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.3. The first way for increasing on-line testing reliability

3. This way provides the high probability of essential error detection

2. The first way allows to develop the on-line testing methods with traditionally high probability of error detection

Слайд 176

176 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.3.

176

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.3. The first

way for increasing on-line testing reliability

High probability of essential error
РE > 0,5
can be achieved only for
truncated operations

Residue checking is the main on-line testing method for arithmetic of complete operations

That’s why residue checking is rationally to extend on truncated operations

1. Residue checking of truncated operations

Слайд 177

V{1 ÷ 2n}: n = 14 k = 10 V1 V2

V{1 ÷ 2n}:

n = 14

k = 10






V1

V2

V3

V6

V8

V9

V10

V11

V5

V7

V4

177

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking a truncated multiplication

The method is based on a decomposition of high part of the product conjunction array (PCA) into fragments.

A fragment is defined as a part of PCA described with a product
Vi = ±Ai Bi ,
where Ai and Bi are operands A and B or their parts.

For example, fragment V1:
V1= –A{5 ÷8} B{11 ÷14} 2–22,
A1= A{5 ÷8} 2–8; B1=B{11÷14} 2–14

The method compares the check codes of truncated product calculated by two ways:
using truncated product;
using operands.

High part of the PCA can be represented as a sum of fragments:

The method uses definition of a fragment and representation of a truncated product in check codes:

Слайд 178

178 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

178

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

Blocks BA and BB check the operands A and B by computing the check codes KA and KB and comparing them with the input check codes KA and KB. Results of comparison are the error indication codes KA and KB.
The check codes KAi and KBi are composed of operand bits or computed during the generation of the check codes KA and KB.

Block M computes the check codes KVi, i=1÷k-1, of the fragments by the formula (1). Block A calculates the check code KVT of the truncated product by the formula (2).
The block G generates the check code KVS of the excluded bits VS. Block S computes the check code of the result KVV.
Block BV checks the result VR by comparing it with the check code KVV. Result of comparison is the error indication code KV.

Слайд 179

179 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

179

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

The method of residue checking a truncated multiplication defines the following steps:
Choice of the PCA decomposition into fragments;
Description of fragments;
Description of the check codes KAi and KBi composed of operands bits;
Definition of formulas for calculated check codes KAi and KBi;
Design of the blocks BA and BB in accordance with obtained formulas;
Design of the blocks M and A taking into account the descriptions of fragments and check codes KAi, KBi;
Design of the blocks G and S using values of n and k;
Design of the block BV as a block BA for the following error detection circuit where result is used as operand.

Слайд 180

180 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

180

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

Choice of the PCA decomposition into fragments should be aimed to design a high quality error detection circuit.

Hardware overhead of the error detection circuit is mainly defined by complexity of the blocks BA and BB which as compaction scheme does not depend in complexity on the PCA decomposition.

Time of check can be reduced using the following procedure for defining the PCA decomposition.

Decomposition is defined specifying a sequence of central - symmetric fragments.

The first central - symmetric fragment
Vi = –A{n-Li+1 ÷ n} B{n-Li+1 ÷ n}2-2n
has size Li=2 Е(k/4+1).

It defines high and low parts like the PCA high part with k = k – Li. Process is following before k>1.

Слайд 181

181 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

181

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

Blocks of the error detection circuit are developed taking into account decomposition of the PCA into fragments.

Слайд 182

182 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

182

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

Development
of the block BB

Sequence of Computations
KB1= B{11÷14} mod 3;
KB7= B{5÷8} mod 3;
KB5= KB1+B{9, 10};
KB11= KB5+KB7+B{1÷4} mod 3

Adders 1 ÷ 7 by modulo 3

Слайд 183

183 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.4.

183

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.4. Residue checking

a truncated multiplication

Hardware overhead
of Error Detection Circuit:
HEDC = 4n + k (in FA – full adder)
of Multiplier:
HMUL = n2 – k2 / 2 (in FA)
Relative
HE / M = (8n + 2k) / (2n2 – k2)

Слайд 184

184 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.5.

184

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.5. Residue checking

a truncated division of mantissas

Correlation of truncated multiplication and division

A truncated non-restoring division is an inverse operation for truncated multiplication of the binary divisor on quotient represented in notation 1,1.

Truncated multiplication of divisor D = d{1 ÷ n}⋅2-n on quotient Q = q{0 ÷ n}⋅2-n determines left part 1 of Conjunctions Array (CA).
Truncated (2n – k)-bits product
VTR = V{1 ÷ 2n – k}⋅2–(2n–k),
is calculated on this part as VTR = A – RTR, where A=a{1 ÷ n}⋅2-n is dividend; RTR=r{1 ÷ n–k}⋅2–(n–k) is truncated remainder.

CA for product of divisor on quotient

Слайд 185

185 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.5.

185

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.5. Residue checking

a truncated division of mantissas

Decomposition of the CA left part on k+1 fragments Vi = Di ⋅ Qi , i = 1 ÷ k+1 (k=3, i = 1 ÷ 4)

V1 = D{1÷3} ⋅ Q{6}⋅ 2-9;
V2 = D{1÷4} ⋅ Q{5}⋅2-9;
V3 = D{1÷5} ⋅ Q{4}⋅ 2-9;
V4 = D{1÷6} ⋅ Q{0÷3}⋅2-9.

KD1 = – D{1÷3} mod 3;
KD2 = (KD1 + D{4}) mod 3;
KD3 = (KD2 – D{5}) mod 3;
KD4 = (KD3 + D{6}) mod 3;

KQ1 = Q{6};
KQ2 = –Q{5};
KQ3 = (Q{6};
KQ4 = – Q{0÷3} mod 3;

Слайд 186

186 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.5.

186

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.5. Residue checking

a truncated division of mantissas

Error Drtection circuit

Blocks 1 and 2 check the input numbers: dividend A and divisor D.
Blocks 3 and 4 generate check codes KQ and KR of quotient Q and residue R.
Blocks 5 and 6 calculate check codes КVTR and КVTR*.
Block 7 compares check codes КVTR, КVTR* and calculates indicate code КQ.

A

D

КA

КD

2

1

3

4

RTR

Q

5

6

7

KQi

KRTR

KQ

KDl

KVTR

KVTR*

KQ

KA

KD

КA

Слайд 187

187 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

187

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Truncated shift is executed in floating-point addition

1. Definition of operation C=A+B,
where A=a1⋅2a2; B=b1⋅2b2; C=c1⋅2c2.

2. Execution of operation

2.2. Processing the mantissas
a1 SHIFT = a1⋅2-da;
b1 SHIFT = b1⋅2-db;
c1 = a1 SHIFT + b1 SHIFT.

2.1. Processing the exponents
c2 = max (a2, b2);
da = c2 - a2; db = c2 - b2.

1

2

3

a1 SHIFT

b1 SHIFT

c2

c1

b2

a2

a1

b1

da

db

3. The floating-point adder consists of
the block 1 for the exponent processing,
barrel-shifters 2 and 3,
adder 4.

Слайд 188

188 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

188

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Arithmetic shift of a mantissa

An operation of arithmetic shift contains three actions: aSHIFT = a⋅2-d - a0 + as.
1. The reduction of the bit weights for the mantissa a in 2d times.
2. The truncation of the d low bits of the mantissa a (the code a0=a{n-d+1÷n}).
3. The sign bit padding in the position with bit weights 2-1÷2-d for complement code of the mantissa a. Sign bits sa … sa compose the code as.

Слайд 189

189 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

189

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Arithmetic shift is executed using the Barrel-shifter

The Barrel-shifter contains n of n-to-1 multiplexers.
The multiplexer hardware overhead q is proportional to the operand size n.
The barrel-shifter hardware overhead QSHIFT=nq is proportional to the square of the operand size n and makes the main hardware overhead of the floating-point adder.
Barrel-shifter executes a truncated operation, which reduces twice the hardware overhead in comparison with the long shifter computing complete 2n-bit result aC=aSHIFT{1÷2n}2-2n.

Слайд 190

190 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

190

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Shift matrix

Слайд 191

191 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

191

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Conversion a0 into a01 = a0⋅2d

Слайд 192

192 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

192

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Conversion a01 into a02 with keeping the bit weights by mod 3

Слайд 193

193 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

193

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Conversion a01 into a02 with calculating the check codes

ka4÷7{2,1}=
a{4÷7}mod3

ka12÷15{2,1}=
a{12÷15}mod3

ka8÷15{2,1}=
(a{8÷11}+
ka12÷15{2,1})
mod3

Слайд 194

194 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

194

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Simplification of the checking computation

1. Conversion of the restricted bits a0 in the code a01 simplifies the unit 3 in σ01 = 1.5 times.

kaSHIFT

Ka

ka

2

1

a

ka

3

d

4

d{1}

7

sa

a03

5

kad

kas1

kaV

6

ka03

2. Conversion of the code a01 in a02 simplifies the unit 3 in σ02=2n/r times. For n=15 σ02=7,5.

3. Conversion of the code a02 in a03 simplifies the unit 3 in σ03=2n/3 times and the unit 6 in σ=n/(2r-1) times. For n=15 σ03=10, σ =2.1.

The checking hardware overhead reduces from square dependence on the operand size to linear one.

Слайд 195

195 Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 6.6.

195

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

6.6. Residue checking

a truncated operation of shift

Unit 1: modulo-3 generator Unit 3: generator of the check code ka03
Unit 2: modulo-3 comparator Unit 4: generator of the check code kas1

Слайд 196

Reading List 196 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

196

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1. Дрозд А.В.

Нетрадиционный взгляд на рабочее диагностирование вычислительных устройств / А.В. Дрозд // Автоматизированные системы управления и приборы автоматики. – 2009. – Вып. 147. – С. 15 – 24.
2. Дрозд О.В. Контроль за модулем обчислювальних пристроїв. Навч. посібн. для студ. спеціальності 7.091501 – «Комп’ютерні та інтелектуальні системи та мережі» / Одеськ. нац. політехніч. ун.-т. – Одеса: АО Бахва, 2002. – 144 с.
3. Контроль ОП зі скороченим виконанням операцій, с. 74 – 135.
Drozd A. V., Lobachev M. V. Efficient On-line Testing Method for Floating-Point Adder. – Proc.. Design, Automation and Test in Europe. Conference and Exhibition 2001 (DATE 2001). Munich, Germany, 13 – 16 March 2001. – P. 307 – 311.
Drozd A. V., Lobachev M.V., Drozd J. V. Efficient On-line Testing Method for a Floating-Point Iterative Array Divider. – Proc. Design, Automation and Test in Europe. Conference and Exhibition 2002 (DATE 2002). Paris, France, 4 – 8 March 2002. – P. 1127.
Слайд 197

Conclusion 197 1. Traditional on-line testing methods have low reliability of

Conclusion

197

1. Traditional on-line testing methods have low reliability of approximated

result checking: mainly detect inessential errors.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3. The firs way can be realized using truncated operations only because only these operations can have the high probability of essential error.

4. The first way allows to develop the on-line testing methods with traditionally high probability of error detection

2. On-line testing reliability can be increased by three ways: increasing a probability of essential error; reducing a probability of error detection and also detecting essential and inessential errors with different probabilities.

5. The truncated multiplication can be checked by modulo using decomposition of product conjunction array into fragments.

6. The another truncated operations can be checked using fragment approach as well as they inherit the properties of multiplication.

Слайд 198

Questions and tasks 198 What is a reliability of the on-line

Questions and tasks

198

What is a reliability of the on-line testing methods?
What

reliability do the traditional on-line testing methods demonstrate in approximate data processing?
Describe the ways to increase reliability of the traditional on-line testing methods for approximate data processing.
What conditions does the first way use for increasing the reliability of the on-line testing methods?
What role do the truncated arithmetic operations play in mantissa checking?
What approach does the residue checking method use for truncated operations?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 199

MODULE 3. On-line testing for digital components of S-CES Lecture 7.

MODULE 3. On-line testing
for digital components of S-CES

Lecture 7. Increase

of on-line testing methods reliability

7.4. Checking of a squarer

7.5. Checking by simplified operation

7.2. Checking with use of natural information redundancy

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.6. The models of operation simplification

7.1. The second way for increasing on-line testing reliability

199

7.3. The use of product information redundancy

7.7. Execution of check calculations

Слайд 200

7.1. The second way for increasing on-line testing reliability Second way

7.1. The second way for increasing on-line testing reliability

Second way

answers a common case of on-line testing objects.

The second way increases on-line testing reliability using a low probability of essential error.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.1.1. Motivation of increasing an on-line testing reliability by the second way

Reasons:

On-line testing objects, as a rule, have a low probability of essential error.

200

Слайд 201

7.1.2. Related Works Master Course. Co-Design and Testing of Safety-Critical Embedded

7.1.2. Related Works

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Савченко Ю. Г. Цифровые устройства, нечувствительные к неисправностям элементов. – М.: Советское радио, 1977. – 176 c.

2. Сушкевич А. К. Теория чисел. – Харьков: Изд. ХГУ, 1956.

201

4. Граф Ш., Гессель М. Схемы поиска неисправностей. – М.: Энергоатомиздат, 1989. – 144 с.

3. Селлерс Ф. Методы обнаружения ошибок в работе ЭЦВМ. – М.: Мир, 1972. – 310 c.

Слайд 202

7.1. The second way for increasing on-line testing reliability In case

7.1. The second way for increasing on-line testing reliability

In case

of a low probability of essential error the increase of on-line testing reliability can be achieved only reducing a probability of error detection.

Reduction requirements to error detection promote simplification of the check circuits.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.1.3. Features of the second way

202

Earlier reduction of an error detection probability has been aimed at simplification of the on-line testing means.
However now the goal is increase of reliability of the on-line testing methods. This goal can be achieved with simplification of the check circuits.

Слайд 203

7.1. The second way for increasing on-line testing reliability The main

7.1. The second way for increasing on-line testing reliability

The main

requirement to reduction of an error detection probability is to keep a set of detected faults.

Every probable fault should be detected at least an input codeword.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.1.3. Features of the second way

203

The probable fault distorts a result at the output of single-step arithmetic circuits on the weight of any one bit.
The error looks like ±2r, where r is number of the result bit.

The set of faults detected by residue checking (modulo three) can be used as the comparison templet of set of the probable faults.

Слайд 204

7.2. Checking with use of natural information redundancy The code containing

7.2. Checking with use of natural information redundancy

The code containing

the forbidden words is characterized by its information redundancy.

Natural information redundancy is alternative to information redundancy created by expansion of a code introducing the additional bits.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.2.1. Natural information redundancy

204

Considered checking methods use natural information redundancy of the arithmetic operation results.

Слайд 205

7.3. The use of product information redundancy Really the product contains

7.3. The use of product information redundancy

Really the product contains the

forbidden words.
This follows from execution of the commutative law or multiplication to zero

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

A product of complete operation has natural information redundancy.

205

Both sets of input and output words of multiplication have the same capacity 22n, where n is size of operands.

However the same output word can correspond to several input words.

Слайд 206

7.3. The use of product information redundancy Fermat (1601-1665) supposition: the

7.3. The use of product information redundancy

Fermat (1601-1665) supposition: the

number C = 2n + 1, n=2x (x is natural number) are prime.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

206

A prime number С = 2n + 1 cannot be a product of two n-bit binary factors.

Euler (1707-1783) refuted of
Fermat statement for x = 5, but the statement are true for x < 5
including the cases of wide-spread word size n = 8 and n = 16.

Слайд 207

7.3. The use of product information redundancy A prime number С

7.3. The use of product information redundancy

A prime number С =

2n+1 and numbers which is multiply to C are forbidden words for a product of two n-bit binary factors.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

207

These words compose double code G(n, n) without zero-word.

Слайд 208

7.3. The use of product information redundancy The checking method verifies

7.3. The use of product information redundancy

The checking method verifies

that:
multipliers A{1÷n} and B{1÷n} are not zero
product V{1÷2n} is forbidden word k (2n+1).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

208

Error is detected, if only one of two conditions performs:
(A{1 ÷ n} ≠ 0) & (B{1 ÷ n} ≠ 0);
V{1 ÷ n} = V{n + 1 ÷ 2n}.

Every probable fault of iterative array multiplier is detected at least on one input word: A{1 ÷ n} B{1 ÷ n} ± 2r = k (2n + 1).
It is proved by factorization of the formula k (2n + 1) ± 2r on multipliers A{1 ÷ n} and B{1 ÷ n} at least for one value k.

Слайд 209

7.3. The use of product information redundancy The checker consists of

7.3. The use of product information redundancy

The checker consists of

two blocks and forms two-bits check code E{1, 2}:
E{1} = ((A{1 ÷ n} ≠ 0) & (B{1 ÷ n} ≠ 0));
E{2} = (V{1 ÷ n} = V{n + 1 ÷ 2n}).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

209

The first block B1 consists of two n-bits gates OR 1.1 and 1.2 which check the conditions A{1 ÷ n} ≠0 and B{1 ÷ n} ≠ 0, and gate AND 1.3 computes the bit E{1} from condition, that both of the factors are not zero.
The second block B2 is comparator of the low and high product bits. It computes the bit E{2}.

The code E{1, 2} = 002, if at least one of factors is zero and the product is not zero: the low and high parts of product are different.
The code E{1, 2} = 112, if both of the factors are not zero and the product assumes forbidden word: the low and high bits of product are equal.
The code E{1, 2} = 012, if at least one of the factors is zero and the low and high bits of product are equal: V{1 ÷ 2n} = 0.
The code E{1, 2} = 102, if both of the factors are not zero and the low and high parts of non-zero product are different.

If E{1, 2} = 002 or 112 then fault is detected;
If work is correct then E{1, 2} = 01 or 10.

Слайд 210

7.3. The use of product information redundancy This checking method can

7.3. The use of product information redundancy

This checking method can

be extended on mantissa processing taking into account a range of the normalized mantissa codeword: 2n – 1 ÷ 2n – 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

210

Such range excludes zero as a value of a product.

The checker contains only the comparator (Block B2) which can be designed on Carter's units.

This peculiarity eliminates a check of factors to be equal to zero and eliminates the block B1 of the checker.

Слайд 211

7.3. The use of product information redundancy A probability of error

7.3. The use of product information redundancy

A probability of error

detection PD = 3⋅ 2 –n,
PD n=8 = 0,012; PD n=16 = 4,6 ⋅10 –5.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

211

Time of permanent fault detection T = ln 2 / PD,
Tn=8 = 59; Tn=16 = 15142 (clock units);

The checker based on use of prime numbers is simplest for multipliers. It is simpler of the residue checker more than 5,3 times.

A reliability of the checking method R = 1 – PE,
R = 0,9 for PE = 0,1.

Слайд 212

7.3. The use of product information redundancy The described checking method

7.3. The use of product information redundancy

The described checking method

has such lack as limited application: only for two size of word – n = 8 and n = 16.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

212

This checking method can be extended on another size of word using prime number C* = 2n – 1.

A prime number С* = 2n – 1 can be a product of two n-bit binary factors only in case the factor is equal to С*.

Слайд 213

7.3. The use of product information redundancy A prime number С*

7.3. The use of product information redundancy

A prime number С* =

2n–1 and numbers which is multiply to C* can be a product of two n-bit binary factors.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

213

These words compose double code G(n, ¬n) with inverse part without words which are equal to С* in their high part.

Слайд 214

7.3. The use of product information redundancy The checking method verifies

7.3. The use of product information redundancy

The checking method verifies

that:
multipliers A{1÷n} and B{1÷n} are not C* and not zero
product V{1÷2n} is word k (2n – 1).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

214

Error is detected, if only one of two conditions performs:
(A{1 ÷ n} ≠ C*) & (B{1 ÷ n} ≠ C*) for A{1 ÷ n}, (B{1 ÷ n} ≠ 0
V{1 ÷ n} = ¬ V{n + 1 ÷ 2n}.

Every probable fault of iterative array multiplier is detected at least on one input word: A{1 ÷ n} B{1 ÷ n} ± 2r = k (2n – 1).
It is proved by factorization of the formula k (2n – 1) ± 2r on multipliers A{1 ÷ n} and B{1 ÷ n} at least for one value k.

Слайд 215

7.3. The use of product information redundancy The checker consists of

7.3. The use of product information redundancy

The checker consists of

two blocks and forms two-bits check code E{1, 2}:
E{1} = ((A{1 ÷ n} = C*) or (B{1 ÷ n} = C*));
E{2} = ¬ (V{1 ÷ n} = ¬ V{n + 1 ÷ 2n}).

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

215

The first block B1 consists of two n-bits gates AND 1.1 and 1.2 which check the conditions A{1 ÷ n} = C* or B{1 ÷ n} = C*, and gate OR 1.3 computes the bit E{1} from condition, that at least one of the factors is equal to C*.
The second block B2 is comparator of the low and inverse high product bits with inverse output. It computes the bit E{2}.

The code E{1, 2} = 112, if at least one of factors is C* and the low and high parts of product are not inverse.
The code E{1, 2} = 002, if both of the factors are not equal to C* and the low and high bits of product are inverse.
The code E{1, 2} = 012, if at least one of the factors is C* and the low and high bits of product are inverse.
The code E{1, 2} = 012, if both of the factors are not equal to C* and the low and high parts of non-zero product are not inverse.

If E{1, 2} = 002 or 112 then fault is detected;
If work is correct then E{1, 2} = 01 or 10.

Слайд 216

7.3. The use of product information redundancy The checking method is

7.3. The use of product information redundancy

The checking method is

not correct in case at least one of factors is equal to zero. This case should be identified in checker additionally for codeword in range 0 ÷ 2n – 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

216

Both the checking method and checker are quite correct for mantissa processing taking into account a range of the normalized mantissa codeword: 2n – 1 ÷ 2n – 1.

Слайд 217

7.3. The use of product information redundancy A probability of error

7.3. The use of product information redundancy

A probability of error

detection PD = 3⋅ 2 –n,
PD n=7 = 0,023; PD n=17 = 2,3 ⋅10 –5.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Checking the products using prime numbers

217

Time of permanent fault detection T = ln 2 / PD,
Tn=7 = 30; Tn=8 = 30284 (clock units);

The checker based on use of prime numbers is simplest for multipliers. It is simpler of the residue checker more than 5,3 times.

A reliability of the checking method R = 1 – PE,
R = 0,9 for PE = 0,1.

Слайд 218

Block B1 calculates residue R by modulo m of result S

Block B1 calculates residue R by modulo m of result S = A2.


A
Squarer
B1
B2

S

E

Error detection circuit

Block B2 calculates check code E which identifies the forbidden values of residue R.

Way 2. Decrease of PD 

218

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.4. Checking of a squarer

Error detection circuit of squarer

Слайд 219

1. Calculation of square S = A2 and residue R =

1. Calculation of square S = A2 and residue R =

S mod m for values of an operand on the half of period А = 0 ÷ (m – 1) / 2.

m = 15

3. Creation of a set Z of the forbidden values z;

Estimation of error detection probability

2. Creation of a set X of the allowed values x for the residue R and an index F of their occurrences for values of an operand on the period А = 0 ÷ m – 1.

219

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.4. Checking of a squarer

Слайд 220

m = 15 4. Creation of a set Y of the

m = 15

4. Creation of a set Y of the typical

error y = ± 2r by modulo m, where r is number of a bit in result, r = 0 ÷ 2n – 1.

Estimation of error detection probability

220

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.4. Checking of a squarer

4.1 A set Y of the typical error y = ± 2r by modulo m is finite: positive errors not more m and negative errors not more m.

4.2 The typical error y = ± 2r by modulo m can be obtained duplicating value of the error by modulo m from 1 before 1 or – 1.

4.3 This process can be considered in detail on example m = 13.

20=1, 1×2=2, 2×2=4, 4×2=8, 8×2=16: 16 mod 15 = 1.

20=1, 1×2=2, 2×2=4, 4×2=8, 8×2=16:
–13
3, 3×2=6, 6×2=12
–13
–1

m = 13

Y {1, 2, 4, 8, 3, 6}.

Слайд 221

5. Creation of the error detection table using occurrences of allowed

5. Creation of the error detection table using occurrences of allowed

values x from condition z = (x + y) mod m;

6. Calculation of maximal PH and minimal PL error detection probabilities:
PH = SumMAX / (m Y*);
PL = SumMIN / (m Y*),
where SumMAX is the sum of all elements of the table;
SumMIN is the least sum of lines which elements cover all columns;
Y* is amount of elements in set Y.

PD H = 0,75

PD L = 0,15

SumMAX = 90

SumMIN = 18 for z = 11 and z = 14

Y* = 8

221

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

m = 15

7.4. Checking of a squarer

Estimation of error detection probability

Слайд 222

R = PD PE + ( 1 - PD ) (

R = PD PE + ( 1 - PD ) (

1 – PE )

222

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.4. Checking of a squarer

Estimation of the checking method reliability

Слайд 223

7.5. Checking by simplified operation The checking method is based on

7.5. Checking by simplified operation

The checking method is based

on operation simplification limiting of a set of the input words down to the set of check words.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Simplification of operation

223

Such solution is not correct: the probable faults – shorts between the same bits of the factors – are do not detected.

This solution can be improved using the factors which are equal by modulo 3.

For example, a multiplier can be checked as squarer on input words composed of equal factors.

Слайд 224

7.5. Checking by simplified operation The method defines limiting conditions for

7.5. Checking by simplified operation

The method defines limiting conditions

for operands and results.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Limiting conditions

224

Simplification bottom-up: limiting conditions imposed upon operands determine limiting condition for the result.

Simplification top-down: limiting condition imposed upon result determine limiting conditions for the operands.

Слайд 225

7.5. Checking by simplified operation A model of simplification of the

7.5. Checking by simplified operation

A model of simplification of

the computing operation contains limiting conditions (LC) and logic operation executed with their.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

The models of the Operation Simplification

225

Composite LC is LC for operands composed of some LC.

The LC for operands can be dependent or independent determining equal or different LC for the result accordingly.

In order to keep a set of the detected fault
the dependent LC should be processed only using logic operations OR or XOR;
the independent LC should be processed only using logic operations AND.

Слайд 226

Block B1 uses LC for operands identifying the input words, on

Block B1 uses LC for operands identifying the input words, on

which the operation can be transformed to simplified form.
Block B2 checks LC for results of the operation considered in simplified form.
Block B3 forms an error indication code, which detects an error only in case of the input word identification in block B1 and detection of this error in block B2.

Object of
on-line testing
B2
B3

V

E

Error detection circuit

B

A
B1

226

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

7.5. Checking by simplified operation

Structure of the Error detection circuit

Слайд 227

7.5. Checking by simplified operation Two kinds of the check calculations

7.5. Checking by simplified operation

Two kinds of the check

calculations are used:
forming the codes of LC for the operands and the result;
execution of logic operations with the codes of LC.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

The models of execution of the check calculations

227

The codes of LC are formed by modulo 3 keeping a set of the faults detected if the residue checking.

Both the logic operation OR with allowed values and AND with forbidden values of the LC codes are executed on a Carter's unit.

The codes of LC can take allowed values 012 or 102 and forbidden values 002 or 112.

The logic operation NOT transforms the allowed values to forbidden one’s or on the contrary inverting one of code bits by NOT-unit.

The Carter's and NOT units allow to execute any logic operation as well as OR, AND, NOT compose functionally complete basis.

Слайд 228

7.5. Checking by simplified operation Initial data for checker design is

7.5. Checking by simplified operation

Initial data for checker design

is a required probability PD of error detection. It is used for determining the LC for operands.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Design of the Checker

228

For example, the LC for multiplier checker (complete operation) with low PD = 0,07 can be determined as follows.
G is a set of total inputs word

Слайд 229

7.5. Checking by simplified operation Master Course. Co-Design and Testing of

7.5. Checking by simplified operation

Master Course. Co-Design and Testing of

Safety-Critical Embedded Systems

Design of the Checker

229

M – the generator of residue code;
UC – the Carter’s unit;
UN – the NOT-unit;
– the inverse output;
BD – the block forming the dependent LC;
BI – the block forming the independent LC;
BL – the block executing the logic operation with the codes of LC;
KL – the composite code of dependent LC;
KC – the composite code of independent LC;
KM– the code of error indication

КA = A mod 3; КV1 = V1 mod 3;
КB = B mod 3; КV2 = V2 mod 3;
КR* = R mod* 3.

Слайд 230

7.5. Checking by simplified operation Master Course. Co-Design and Testing of

7.5. Checking by simplified operation

Master Course. Co-Design and Testing of

Safety-Critical Embedded Systems

Estimation of the method

230

Reliability of the checking by simplified operation in comparison with the residue checking method

Слайд 231

Reading List 231 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

231

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1. Drozd

A. V. Efficient Method of Failure Detection in Iterative Array Multiplier. – Proc. Design, Automation and Test in Europe. Conference and Exhibition 2000 (DATE 2000). Paris, France, 27 – 30 March 2000. – P. 764.
Said Mouafak Montaha M. New On-Line Testing Method to Increase the Reliability of Checking Approximated Results / M. Said Mouafak Montaha, M.V. Lobachev, O.V. Drozd // 4-th international Conference “Advanced Computer Systems and Networks: Design and Application”. Lviv, Ukraine, 17-19 December, P. 166-168, 2009.
Слайд 232

Conclusion 232 1. The second way can be realized using natural

Conclusion

232

1. The second way can be realized using natural information

redundancy of results of the arithmetic operations or simplifying a calculating operation in check.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. The natural information redundancy of a complete product can be realized using the prime numbers.

4. The squarer can be effectively checked using the forbidden values of a residue by modulo.

5. The checking by simplified operation determines and forms by modulo the limiting conditions for operands and result and also executes the logic operation with these conditions.

6. The second way for increasing a reliability of on-line testing methods reduces a probability of error detection without truncating a set of the detected faults.

3. The use of the prime numbers allows to design the simplest checkers for on-line testing of the iterative array multiplier.

Слайд 233

Questions and tasks 233 What is the second method for increasing

Questions and tasks

233

What is the second method for increasing a reliability

of the on-line testing methods?
What the methods are by the second way realized?
Describe the use of the prime numbers for on-line testing the complete product of mantissas.
Describe the procedure of the error detection probability assessment in the method of the squarer on-line testing ?
What the models are in the checking method by simplified operation used?
What the main requirement does upon the methods by the second way impose?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 234

MODULE 3. On-line testing for digital components of S-CES Lecture 8.

MODULE 3. On-line testing
for digital components of S-CES

Lecture 8. Checking

by logarithm, inequalities, segments

8.4. The checking by segments

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.1. The third way for increasing on-line testing reliability

234

8.3. The checking by inequalities

Слайд 235

8.1. The third way for increasing on-line testing reliability The third

8.1. The third way for increasing on-line testing reliability

The third

way allows to obtain the most effective solutions.

The third way is directly aimed at distinction of essential and inessential errors taking into account a size of the error.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.1.1. Motivation of increasing an on-line testing reliability by the third way

Reasons:

235

Слайд 236

8.1.2. Related Works Master Course. Co-Design and Testing of Safety-Critical Embedded

8.1.2. Related Works

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Селлерс Ф. Методы обнаружения ошибок в работе ЭЦВМ. – М.: Мир, 1972. – 310 c.

2. Журавлев Ю. П., Котелюк Л. А., Циклинский Н. И. Надежность и контроль ЭВМ. – М.: Советское радио, 1978. – 416 c.

236

4. Тоценко В. Г., Киселев И.М. Метод повышения эффективности диагностирования дискретных устройств с регулярной структурой // Управляющие системы и машины. – 1977. – № 5. – С. 97 – 102.

3. Моллов В. К. Структурно-функциональные методы оперативного контроля и диагностики цифровых устройств управляющих систем: Автореф. дис. . . канд. техн. наук: 05.13.13 / Киевск. политехн. ин-т – Киев, 1989. – 16 с.

5. Байда Н. П., Кузьмин И., Шпилевой В. Микропроцессорные системы поэлементного диагностирования РЭА. – М.: Радио и связь, 1987. – 256 c.

Слайд 237

8.1. The third way for increasing on-line testing reliability The main

8.1. The third way for increasing on-line testing reliability

The main

feature of a third way is use of the different probabilities of detection for essential and inessential errors.

The third way increases on-line testing reliability estimating a size of the result and its error.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.1.3. Features of the third way

237

The methods of a third way difference the essential and inessential errors as well as well detect an error in high and low bits of the result.

Слайд 238

The logarithm checking is based on the use of the Natural

The logarithm checking is based on the use of the

Natural Information Redundancy (NIR) of data formats in form of not quite use of the codeword high positions.

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

238

0

0

0

0

0

1

0

0

0

0

1

0

1

NIR

NIR

1. Fixed-point format

2. Floating-point format

8.2.1. The use of the Natural Information Redundancy

Слайд 239

Check code КА of fixed-point number A is equal to amount

Check code КА of fixed-point number A is equal to

amount of bits of a significant part of this number.

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

239

1. Fixed-point format

2. Floating-point format

KA

KA

8.2.2. Definition of the check code of a number or mantissa

KA = Int (log 2 A) for A > 0;
KA = 0 for A = 0.

Check code КА of mantissa A is equal to amount of bits of a check part of this mantissa.

KA = Int (log 2 (A-1) for A > 0.

Слайд 240

The check code is calculated using the truth form of a

The check code is calculated using the truth form of

a number or a mantissa by two steps:

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

240

8.2.3. Calculation of the check code of a number or a mantissa

1. Filling the most significant (check) part by the units;

2. Calculation of units amount.

Слайд 241

B{2} 1 A{2} 8.2. The logarithm checking Master Course. Co-Design and

B{2}

1

A{2}

8.2. The logarithm checking

Master Course. Co-Design and Testing

of Safety-Critical Embedded Systems

241

8.2.3.1. Filling the most significant (check) part by the units

0

0

0

0

1

1

1

1

1

1

1

1

1

1

A

B

1

1

1

A{15}

A{14}

A{13}

A{12}

A{11}

A{10}

A{9}

A{8}

A{7}

A{6}

A{5}

A{4}

A{3}

A{1}

1

1

1

1

1

1

1

1

1

1

B{15}

B{14}

B{13}

B{12}

B{11}

B{10}

B{9}

B{8}

B{7}

B{6}

B{5}

B{4}

B{3}

B{1}

1

Слайд 242

8.2. The logarithm checking Master Course. Co-Design and Testing of Safety-Critical

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

242

8.2.3.1. Filling the most significant (check) part by the units

A circuit with a serial-group calculation of the code B

A circuit with a serial calculation of the bits in groups of the code B

Слайд 243

8.2. The logarithm checking Master Course. Co-Design and Testing of Safety-Critical

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

243

8.2.3.2. Calculation of units amount

B{1÷15}

11



23

1

8

3

21

1

1

20

1

1

0




22

0

3

0


2


Слайд 244

The check codes of operands allow predict the check code of

The check codes of operands allow predict the check code of

arithmetic operation result with difference α ≤ 1 

  ∙  For addition S = A + B, A ≥ 0 and B ≥ 0: KS = KS* + α, where KS* = max(KA, KB); α = 0 or α = 1.

∙  For multiplication P = A⋅B, A > 0 and B > 0: KP = KP* – α, where KP* = KA + KB; α = 0 or α = 1.

∙   For division Q = A / B, A > 0 and B > 0: KQ = KQ* + α, where KQ* = KA – KB; α = 0 or α = 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

244

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

Слайд 245

∙ For addition S = A + B, A ≥ 0

  ∙  For addition S = A + B, A ≥ 0 and B ≥ 0: KS = KS* + α, where

KS* = max(KA, KB); α = 0 or α = 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

245

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

KA

KB

KS

KA

KB

KS

α = 0

α = 1

Слайд 246

∙ For addition S = A + B: KS = KS*

  ∙  For addition S = A + B: KS = KS* + α,
where KS* = max (KAR, KBR);

α = 0 or α = 1.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

246

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

KAR = KA ∧ ¬U1 ∨ KB ∧ U1; KBR = KB ∧ ¬ U2 ∨ KS ∧ U2; KSR = KA ∧ U1 ∨ KS ∧ ¬ U2 ∨ KB ∧ U3,
where U1 = Sign A ⊕ Sign S, U2 = Sign A ⊕ Sign B, U3 = Sign A ⊕ Sign S.

Слайд 247

∙ For multiplication: P = A B, A > 0 and

∙  For multiplication: P = A B, A > 0 and B > 0, KP = KP* – α, where KP* = KA + KB ;

α = 0 or α = 1.

 2 KA – 1 ≤ A < 2 KA

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

247

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

 2 KP – 1 ≤ P < 2 KP

 For KA = 3: 1002 ≤ A < 10002

 KP – 1 = (KA – 1) + (KB – 1)

 KP = KA + KB – 1

 KP = KA + KB

 2 KB – 1 ≤ B < 2 KB

Слайд 248

∙ For multiplication: P = A ⋅ B, A ≥ 0

∙  For multiplication: P = A ⋅ B, A ≥ 0 and B ≥ 0,
KP = KP* – α ;
KP* = KA ⋅ ZB

+ KB ⋅ ZA;
where α = 0 or α = 1;
ZA – tag of zero for A;
ZA = 0 if A = 0 and ZA = 1 if A ≠ 0;
ZB – tag of zero for B;
ZB = 0 if B = 0 and ZB = 1 if B ≠ 0.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

248

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

Слайд 249

2 KA – 1 ≤ A Master Course. Co-Design and Testing

 2 KA – 1 ≤ A < 2 KA

Master Course.

Co-Design and Testing of Safety-Critical Embedded Systems

249

8.2. The logarithm checking

8.2.4. The check equations for the arithmetic operations

 2 KQ – 1 ≤ Q < 2 KQ

KQ – 1 = (KA – 1) – KB

 KQ = KA – KB

 KQ = KA – (KB – 1)

 2 KB – 1 ≤ B < 2 KB

∙   For division: Q = A / B, A > 0 and B > 0, KQ = KQ* + α, where KQ* = KA – KB; α = 0 or α = 1.

 KQ = KA – KB + 1

Слайд 250

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 250 8.2.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

250

8.2. The logarithm

checking

8.2.4. The check equations for the arithmetic operations

∙   For division: Q = A / B, A  ≥ 0 and B > 0,
KQ = KQ* + α ;
KQ* = KA – KB;
where α = 0 or α = 1;
ZA – tag of zero for A;
ZA = 0 if A = 0 and ZA = 1 if A ≠ 0;

Слайд 251

1, 2, 3 – formers of check codes V – unit

1, 2, 3 – formers of check codes
V – unit of

check codes rename 4 – checking block
4.1, 4.2 – gates AND
4.3 – adder
5 – comparator

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

251

8.2. The logarithm checking

8.2.5. Circuits of the check

Слайд 252

1. The error 0 → 1 in the bit γ γ

1. The error 0 → 1 in the bit γ

γ

2.

The error 1 → 0 in the bit γ

0→1

252

8.2.6. Error detection

8.2. The logarithm checking

γ

1→0

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 253

1. The error 0 → 1 in the bit γ is

1. The error 0 → 1 in the bit γ is

detected with PD = 2 – n + j – 1

2. The error 1 → 0 in the bit γ is detected with PD = 2 – n + j – 2

253

8.2.6. Error detection

8.2. The logarithm checking

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

The error detection probability is proportional to value 2 – j of an error in the bit γ.

n – j + 2

n – j + 1

Слайд 254

A method of the checking by inequalities includes: 2. Comparison of

A method of the checking by inequalities includes:

2. Comparison

of the result with its high and low boards

8.3. Checking by inequalities

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

254

1. Definition and calculation of high and low boards of the result

Слайд 255

9/16 Y X YH YL Y = x2 0.5 ≤ x

9/16

Y

X

YH

YL

Y = x2

0.5 ≤ x < 1

YH =

3/2 x - 1/2

YL = 3/2 x - 9/16

8.3. Checking by inequalities

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

255

8.3.1. Definition of the result boards for a mantissa squarer

1. The low board YL is tangent to the high bound passing the point (0.75, 9/16) of the result graph.

1. The high board YH connects boundary points (0.5, 0.25) and (1, 1) of the result graph.

Слайд 256

ΔYH = YH - Y 1/16 1/16 ΔYH а = 3/2

ΔYH = YH - Y

1/16

1/16

ΔYH

а = 3/2 x - 1/2

- x2,

Positive error а = ΔYH

PN-D H = 2 (x1 - x2),

PD H =√(1-16a), a < 1/16.

ΔYL = Y - YL

b = x2 - 3/2 x + 9/16 ,

Negative error b = ΔYL

PN-D L = 1 + 2 (x1 - x2),

PD L = 1- 4√ b, b < 1/16

ΔYL

8.3. Checking by inequalities

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

256

8.3.2. Error detection estimation

Слайд 257

8.3. Checking by inequalities Master Course. Co-Design and Testing of Safety-Critical

8.3. Checking by inequalities

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

257

8.3.2. Error detection estimation

The error detection probability is increased with growing an error.

Слайд 258

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems 258 8.4.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

258

8.4. The checking

by segments

The method is based on use of the natural time redundancy in form of the Passive Stock of Checking Time (PSCT).

Слайд 259

Examples of the PSCT components 1. Time during which the result

Examples of the PSCT components

1. Time during which the result remains

reliable despite of action of fault in circuit

2. Time during which the unreliable result is not dangerous

Probability of error detection in a segment of the result PD* = ln 2 / TPSCT

259

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.4. The checking by segments

8.4.1. Natural Time Redundancy

Слайд 260

Estimation of reliability in checking the result D = РDE +

Estimation of reliability in checking the result

D = РDE + РSN

without

consideration of PSCT

with consideration of PSCT

РD*

DPSCT = РDE* + РSN

260

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.4. The checking by segments

8.4.2. Reliability of the checking by segments

Слайд 261

1. Division of a result on segments of the bits 2.

1. Division of a result on segments of the bits

2. Serial

checking the segments

3. Setting the frequency distribution of a checking the result segments.

Computing circuit (CC)

Operands

Result

Segment selection block by inputs of the CC

Segment selection block by outputs of the CC

Segment check block

Control block for selection of the segments

E

Error detection scheme

The segment-serial checking allows to raise check frequency of the high true bits of the result and probability of essential error detection

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

261

8.4. The checking by segments

8.4.3. Segment-serial checking method

Слайд 262

PD = 1 / n hN = nE / nN hD

PD = 1 / n

hN = nE / nN

hD = PDE

/ PDN , hD >1

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

8.4. The checking by segments

8.4.4. Segment-serial checking of the Barrel Shifter

262

Слайд 263

The block BO connects inputs of the circuit elements, which calculate

The block BO connects inputs of the circuit elements, which

calculate the selected segments, to blocks BC.
The block BR connects outputs of the circuit elements.
The block BS sets sequence of a choice of segments groups.
The blocks BC check the selected segments and calculate check codes, which specify correctness of result in these segments.
The block BP compresses the check codes up to code E of result correctness.

BO – operand block BR – result block
BS – control block
BC – check blocks
BP – pack block

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

263

8.4. The checking by segments

8.4.5. Error Detection Circuit with some check blocks

An amount of the BC
NT = ] PSUM / PD [, where
PSUM = ∑ Pi

Z

i=1

Слайд 264

Array P of bits Pi j in binary codes of probabilities

Array P of bits Pi j
in binary codes of probabilities Pi


Sequences of segment checks

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

264

8.4. The checking by segments

8.4.6. Choice of check points

Слайд 265

Reliability of the checking the result in a segment i: Di

Reliability of the checking the result in a segment i:
Di = Pi PE + (1 – Pi) (1 – PE).

The

size of increase in reliability for segment i:
δDi = (PD – Pi) (1 – PE), PD >> Pi

The size of increase in reliability:
δD =  (δEi δDi),
where δEi = Ei / ECC;
Ei is complexity of segment calculation;
ECC is complexity of computing circuit.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

265

8.4. The checking by segments

8.4.7. Increase in reliability

For example, for PD = 0.5, Pi = 0.1, PE = 0.1, the size of increase in reliability δDi = 0,36.

Слайд 266

Reading List 266 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

266

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1. Дрозд А. В. Использование

логарифмического контроля для обнаружения отказов арифметических устройств // Вісн. НТУУ «КПІ». Інф., упр. та обчисл. техніка. – К., 1998. – Вип. 31. – С. 224 – 231.
2. Дрозд А. В., Зуда М., Лобачев М. В. Использование логарифмических оценок в функциональном диагностировании вычислительных устройств с плавающей точкой // Тр. Одес. политехн. ун-та. – Одесса, 2001. – Вып. 1 (13). – С. 93 – 96.
3. Drozd A. , Al-Azzeh R., Drozd J., Lobachev M. The logarithmic checking method for on-line testing of computing circuits for processing of the approximated data. – Proc. of Euromicro Symposium on Digital System Design, Rennes, France, pp. 416 – 423, 2004.
4. Дрозд А. В. Контроль вычислительных устройств по неравенствам // Ученые записки Симферопольского гос. ун-та. – Винница-Симферополь, 1998. – Спецвып. – С. 237 – 240.
5. Drozd A., Lobachev M., Reza Kolahi. “Effectiveness of on-line testing methods in approximate data processing,” in Proc. IEEE East-West Design & Test Conference, Odessa, Ukraine, 15 –19 Sept., pp. 62 – 65, 2005.
Слайд 267

Conclusion 267 1. The third way is directly aimed at distinction

Conclusion

267

1. The third way is directly aimed at distinction of

essential and inessential errors tacking into account a size of the error.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

3. The logarithm checking is based on the use of the Natural Information Redundancy of data formats in form of not quite use of the codeword high positions.

4. The checking by inequalities estimates a result as reliable in case this result is allocated within its high and low bounds.

5. The checking by segments is based on use of the natural time redundancy in form of the Passive Stock of Checking Time

2. The logarithm checking, the checking by inequalities and the checking by segments increase a reliability of on-line testing methods using the third way.

6. The methods developed by the third way show high effectiveness using the natural time and information redundancy.

Слайд 268

Questions and tasks 268 What feature of the third way for

Questions and tasks

268

What feature of the third way for increasing a

reliability of the on-line testing methods do you know?
What the methods are by the third way realized?
Describe the use of the natural information redundancy of the data format in the logarithm checking.
What tag does the reliable result in the checking by inequalities determine?
Describe the use of the natural time redundancy in the checking by segments.
What does the high effectiveness of the third way methods ensure?

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 269

MODULE 4. Checkability of S-CES digital components Master Course. Co-Design and

MODULE 4.
Checkability of S-CES digital components

Master Course. Co-Design and Testing

of Safety-Critical Embedded Systems

269

Слайд 270

MODULE 4. Checkability of S-CES digital components Lecture 9. Checkability of

MODULE 4. Checkability of S-CES digital components

Lecture 9. Checkability of S-CES

digital components: a problem, assessment, solutions

9.4. The ways to increase a checkability of S-CES digital components

9.2. The model of a digital component in view of the on-line testing for S-CES

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

9.1. Introduction into checkability

270

9.3. The method for estimating a checkability of S-CES digital components

Слайд 271

9.1. Introduction into checkability Master Course. Co-Design and Testing of Safety-Critical

9.1. Introduction into checkability

Master Course. Co-Design and Testing of Safety-Critical Embedded

Systems

9.1.1. Motivation of the checkability consideration for digital components of the S-CES

Reasons:

271

2. A Fault-Tolerant Technology is traditional solution of a safety problem for the digital components.

3. The Fault-Tolerant Technology can not solve the problem of digital component safety in case of S-CES.

1. High requirements in safety impose upon the digital components of S-CES.

Слайд 272

9.1.2. Related Works Master Course. Co-Design and Testing of Safety-Critical Embedded

9.1.2. Related Works

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

1.

Yastrebenetsky M.A. (edit.). NPP I&Cs: Problems of Safety / M.A. Yastrebenetsky. – Ukraine, Кyiv: Теchnika, 2004.

2. Локазюк В.Н., Остроумов С.Б., Поморова О.В. и др. Отказоустойчивые встроенные системы на программируемой логике. Лекционный материал / Под ред. Харченко В.С. – Министерство образования и науки Украины. Национальный аэрокосмический университет «ХАИ», 2008. – 264 с.

272

5. Беннетс Р.Дж. Проектирование тестопригодных логических схем. М.: Радио и связь, 1995. – 180 с.

4. Щербаков Н. С. Достоверность работы цифровых устройств. – М.: Машиностроение, 1989. – 224 c.

3. Kharchenko V.S., Sklyar V.V. FPGA-based NPP Instrumentation and Control Systems: Development and Safety Assessment / Bakhmach E.S., Herasimenko A.D., Golovyr V.A. a.o.. – Research and Production Corporation “Radiy”, National Aerospace University “KhAI”, State Scientific Technical Center on Nuclear and Radiation Safety, 2008. – 188 p.

Слайд 273

1. Two main operational modes, i.e. normal and emergency ones of

1. Two main operational modes, i.e. normal and emergency ones of

S-CES and heir components.

For most of operating time, the S-CES run in the normal mode. The emergency one, i.e. for which the S-CES are designed, is a rare event as a rule and at best may never occur.

9.1.3 Peculiarities of the S-CES

First peculiarity generates a problem of maintaining the functionality of the components in the emergency mode by taking advantage of the normal mode provisions.

2. Some certain degree of inertia of the controlled objects in comparison with that of high-rate digital components.

Second peculiarity provides a resource of time which may be used to resolve the problem.

273

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 274

Both in the normal and emergency modes, the S-CES components operate

Both in the normal and emergency modes, the S-CES components operate

with different sets of input data.

On such a limited set of the input words the digital circuit of the component takes constant values in many its points.

9.1.4. A problem of maintaining the functionality of the S-CES components in the emergency mode

This fact generates the conditions for latent accumulation of constant faults which may appear at the input words in the emergency mode and counteract the component to perform its functions.

In the normal mode, the input data vary within small ranges.

274

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 275

It is correct for the digital components operating in a single

It is correct for the digital components operating in a single

i.e. only normal mode.

9.1.5. Purpose of on-line testing for the S-CES components in the emergency mode

For S-CES this purpose should be expanded adding the checking of the availability of the digital component to calculate reliable results in the emergency mode.

On-line testing is aimed at the checking the reliability of the results calculated by a digital component during basic operations performance on operating sequences of input words.

275

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 276

M(SN, SC, S), where: SN is a component description characterizing its

M(SN, SC, S),
where: SN is a component description characterizing its

functioning in the normal mode – a limited set IN of input words in the normal mode of operation;
SC is a component description characterizing its functioning in the emergency mode – a limited set IC of input words used for identifying the emergency mode;
S is a component description common both for normal and emergency modes (description D of the digital circuit of the tested component and the set F of its typical faults).

9.2. The model of a digital component in view of the on-line testing for S-CES

9.2.1 The initial model

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

276

Слайд 277

Description D of the digital circuit should be illustrated by the

Description D of the digital circuit should be illustrated by

the specific elements.
For instance, the description of the digital circuit on FPGA should contain the list of points of two types:
internal points, i.e. bits of memory LUT;
external points which include all other points like bits of LUT address or its output.

External points can be input and output (check points).
Besides, the description should contain the functions which define the dependences of ones external points upon others (from input points up to output points).

9.2. The model of a digital component in view of the on-line testing for S-CES

9.2.1 The initial model

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

277

Слайд 278

9.2.2. Controllable points of the digital component 1. An internal point

9.2.2. Controllable points of the digital component

1. An internal point

of the digital circuit is a controllable one if the limited set of input words contains at least one word, on which this point is chosen in its LUT. Otherwise, the internal point is a non-controllable one.

2. An external point of the digital circuit is a partially controllable one (0 or 1-controllable point) if this point takes only a value ‘0’ or only a value ‘1’ on the limited set of input words. Otherwise, the external point is a controllable one.

9.2. The model of a digital component in view of the on-line testing for S-CES

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

278

Слайд 279

9.2.3. Observable points of the digital component: 1. A point of

9.2.3. Observable points of the digital component:

1. A point of the

digital circuit is a partially observable one (0 or 1- observable point) if a path from this point up to a check point is activated on the limited set of input words only for one value ‘0’ or ‘1’.

2. In case the path is activated for both values ‘0’ and ‘1’ the point is observable one.

3. Otherwise the point is a non-observable one.

9.2. The model of a digital component in view of the on-line testing for S-CES

The path is activated if a change of value of the given point is transferred to a check point.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

279

Слайд 280

9.2.4. Properties of the controllable and observable points 9.2. The model

9.2.4. Properties of the controllable and observable points

9.2. The model of

a digital component in view of the on-line testing for S-CES

Statement 1. The observable internal point is also a controllable.

Statement 2. For the assigned input word the result is determined only by the values of points of the circuit, which are observable ones.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

280

Слайд 281

9.2.5. Controllability and observability of the points Controllability C can accept

9.2.5. Controllability and observability of the points

Controllability C can accept

3 values: 0, 1, 2 or 1, 2, 3 for an internal and external point, accordingly.
Values 0, 1, 2 and 3 distinguish cases of non-controlled, 1-controlled, 0-controlled and controlled point, accordingly.

Observability O of an external point can accept 4 values: 0, 1, 2 and 3 in cases of non-observable, 1-observable, 0-observable and observable point, accordingly.
Observability of an internal point can accept only values 0, 1 and 2.

9.2. The model of a digital component in view of the on-line testing for S-CES

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

281

Слайд 282

9.2. The model of a digital component in view of the

9.2. The model of a digital component in view of the

on-line testing for S-CES


M(CN, ON, CC, OC),
where: CN and ON are the controllability C and observability O for every points of the S-CES digital component in a normal mode;
CC and OC are the controllability C and observability O for every points of the S-CES digital component in an emergency mode.

9.2.6 The resulting model

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

282

Слайд 283

9.3.1. The dangerous points of the S-CES digital components possibility of

9.3.1. The dangerous points of the S-CES digital components

possibility of the

latent fault occurrence in the normal mode;

possibility of this fault appearance in the emergency mode.

9.3. The method for estimating a checkability of S-CES digital components

A checkability of the digital component is in break in the considered point under coincidence of two events:

Such point is dangerous for the S-CES digital component.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

283

Слайд 284

9.3.2. Possibilities of the latent fault accumulation in a normal mode

9.3.2. Possibilities of the latent fault accumulation in a normal

mode

The point is a non-controllable one and a value in it coincides with a value defined by the stuck-at fault

The point is a non-observable one.

9.3. The method for estimating a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

284

Слайд 285

The point is an observable and non-controllable and its value as

The point is an observable and non-controllable and its value

as a value of the non-controllable point is distinct from the value defined by the stuck-at fault;

The point is a controllable and an observable one.

9.3. The method for estimating a checkability of S-CES digital components

9.3.3. Possibilities of activity of the accumulated fault in the emergency mode

285

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 286

The external point is dangerous to an emergency mode under the

The external point is dangerous to an emergency mode under

the following condition:
((CN + CE = 3) or (ON + CE = 3) or (ON = 0)) and
(OE > 0).

The internal point is dangerous to an emergency mode under the following condition:
(ON = 0)) and (OE > 0).

9.3. The method for estimating a checkability of S-CES digital components

9.3.4. Conditions of dangerous points detection

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

286

Слайд 287

9.3.5. Checkability of a digital component Checkability of a digital component

9.3.5. Checkability of a digital component

Checkability of a digital

component can be appreciated by the following formula:
K = 1 – NE / NT,
where NE – amount of dangerous points;
NT – total of the circuit points.

9.3. The method for estimating a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

287

Слайд 288

9.4.1. Research of the digital component checkability 9.4. The ways to

9.4.1. Research of the digital component checkability

9.4. The ways to

increase a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

288

Iterative array multiplier of 8-bits mantissas

The base value of the factors in a normal mode is 128.
The threshold is 245.

The range of the factors in a normal mode is changed from 10 by step 10 up to 80.

An amount of the dangerous points reduces from 97 down to 0

The multiplier checkability increases from 65% up to 100%

Слайд 289

9.4.1. Research of the digital component checkability 9.4. The ways to

9.4.1. Research of the digital component checkability

9.4. The ways to

increase a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

289

Iterative array multiplier of 8-bits mantissas

In a normal mode
the base value is 128.
The range of factors is 10.

The threshold is reduced from 245 by step -10 down to 175.

An amount of the dangerous points reduces
from 97 down to 48.

The multiplier checkability increases from 65.3% up to 82.8%

Слайд 290

9.4.1. Research of the digital component checkability 9.4. The ways to

9.4.1. Research of the digital component checkability

9.4. The ways to

increase a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

290

Serial-parallel comparator of 16-bits codewords

1 bit 16 clock unit comparator,
2 bit 8 clock unit comparator,
4 bit 4 clock unit comparator, 8 bit 2 clock unit comparator, 16 bit 1 clock unit comparator,

The threshold is 245.
Range of input word A in an normal mode is 5

The comparator checkability increases from 50% up to 100%

Слайд 291

Particularities of the S-CES digital components: 1. High level of the

Particularities of the S-CES digital components:

1. High level of the input

data consistency in a normal mode.

9.4.2. Reasons of low checkability of the S-CES digital components

2. High value of ratio of the threshold per noise.

9.4. The ways to increase a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

291

3. High level of the circuit parallelism.

There are results of use of the high technology

Слайд 292

Particularities of the S-CES digital components: 9.4.2. Reasons of low checkability

Particularities of the S-CES digital components:

9.4.2. Reasons of low checkability of

the S-CES digital components

1. The limited change of input data in the normal mode.

3. Processing of input data in a parallel code using the simultaneous circuits.

Aftermath:

9.4. The ways to increase a checkability of S-CES digital components

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

292

1. High level of the input data consistency in a normal mode.

2. High value of ratio of the threshold per noise.

3. High level of the circuit parallelism.

2. The limited persent of input data in the normal mode.

Слайд 293

1. Change of input data alternating a normal mode with a

1. Change of input data alternating a normal mode with a

simulated one

2. Reducing the threshold accuracy

9.4. The ways to increase a checkability of S-CES digital components

9.4.3. Conditions to overcome a low checkability

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

293

3. Reuse of the circuit points during data processing in a serial code.

Слайд 294

1. Simulated mode is aimed at testing of the digital components

1. Simulated mode is aimed at testing of the digital components

on input words of an emergency mode.

3. Reduction of these risks demands to check application of the simulated mode using the on-line testing methods and means.

2. Transition of the digital component in a simulated mode is associated with risks of its total exclusion from operation in a normal or simulated mode and creation of emergency mode.

9.4. The ways to increase a checkability of S-CES digital components

9.4.3.1. Change of input data alternating a normal mode with a simulated one

294

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 295

9.4.3.2. Reducing the threshold accuracy 1. The threshold accuracy can be

9.4.3.2. Reducing the threshold accuracy

1. The threshold accuracy can be

as high as to difference a normal and an emergency modes in both directions:
from a normal mode to an emergency one;
from an emergency mode to a normal one.

9.4. The ways to increase a checkability of S-CES digital components

295

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 296

9.4.3.3. Reuse of the circuit points during data processing in a

9.4.3.3. Reuse of the circuit points during data processing in

a serial code

1. Frequency of data processing can be reduced taking into account some certain degree of inertia of the controlled objects, sensors and analog-to-digital converters in comparison with that of high-rate digital components.

possibilities to parallel the serial code processing, without essential lowering of the S-CES component checkability.

high frequency of the bits processing in a serial code;

2. Frequency of serial data processing can be increased using

9.4. The ways to increase a checkability of S-CES digital components

296

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 297

9.4.4.1. Influence of the serial code processing on controlability and observability

9.4.4.1. Influence of the serial code processing on controlability and observability

of the circuit points.

1. Reuse of circuit points can change the values of them. This increases controlability of the circuit points.

2. The serial code processing shortens ways from circuit points up to check points. This can increase observability of the circuit points.

9.4.4. Processing input data in a serial code using the clocked circuits

9.4. The ways to increase a checkability of S-CES digital components

297

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Слайд 298

9.4.4.2.Influence of the serial code processing on a checkability of the

9.4.4.2.Influence of the serial code processing on a checkability of the

S-CES components.

1. Increase of controlability and observability in a normal mode leads to reducing an amount of the dangerous points.

2. Increase of controlability and observability in an emergency mode results in increase of an amount of the dangerous points.

3. A checkability of the S-CES components can be increased or reduced by the serial code processing.

298

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

9.4. The ways to increase a checkability of S-CES digital components

Слайд 299

9.4.4.3. Dominant role of a checkability of the points in a

9.4.4.3. Dominant role of a checkability of the points in a

normal mode.

1. In case the circuit point is checkable (controlable and observable) in a normal mode it is not dangerous one irrespectively of an emergency mode.

2. That’s why increase of a checkability of the circuit points in both normal and emergency modes should increase a checkability of the S-CES components.

299

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

9.4. The ways to increase a checkability of S-CES digital components

Слайд 300

Reading List 300 Master Course. Co-Design and Testing of Safety-Critical Embedded

Reading List

300

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

Drozd A.

On-line testing of safety-critical I&C systems in normal and emergency modes: Problems and solutions / A. Drozd, V. Kharchenko, S. Antoshchuk, M. Drozd // First International Workshop “Critical Infrastructure Safety and Security“ (CrISS-DESSERT’11). – Kirovograd, Ukraine, 11 – 13 May, P. 139 – 147, 2011.
Drozd A. Checkability of safety-critical I&Cc system components in normal and emergency modes / A.Drozd, V.Kharchenko, S.Antoshchuk, M.Drozd // Journal of Information, Control and Management Systems. – 2011. – Vol. 1, No.1.
Drozd A. Checkability of the digital components in safety-critical systems: problems and solutions / A. Drozd, V. Kharchenko, S. Antoshchuk, J. Sulima, M. Drozd // Proc. IEEE East-West Design & Test Symposium. – Sevastopol, Ukraine. – 9-12 Sept., 2011. – P. 411 – 416.
Слайд 301

Conclusion 301 1. The fault tolerant technology does not solve a

Conclusion

301

1. The fault tolerant technology does not solve a problem

of safety for the S-CES.

Master Course. Co-Design and Testing of Safety-Critical Embedded Systems

2. The reason of this follows from peculiarities of the S-CES like two-modes systems and consists of low checkability of the digital components.

3. This conclusion is confirmed by using the method for checkability estimation. The method is based on analysis of controllability and observabiity of the digital component points in both an normal and an emergency modes.

4. The reasons of the low digital component checkability follow from use of the high technologies, such as high level of the input data consistency in a normal mode, high value of ratio of the threshold per noise, high level of the circuit parallelism.

5. The ways to increase checkability are based on rational use of the high technologies.