An all-digital hardware monitoring system at run-time to improve the reliability of electronic devices
1. Introduction
The monitoring action on a system, from a general point of view, provides information about its state. These ones can be processed (e.g., filtered, interpolated) to obtain indications about parameters that can be useful also to consider reliability issues [1]. In such a context, this news describes a use case developed in the scope of the ECSEL IREL40 European research project [2]. The main goal of such a use case is to exploit an all-digital hardware monitoring system at run-time [3][4][5] to collect data related to system behavior, and to finally improve reliability with focus on verification activities. For such a purpose, a pacemaker has been developed in two versions: one based on a Common Off-The-Shelf (COTS) microcontroller on a Printed Circuit Board (PCB), where some properties are verified by means of a classical offline approach (i.e., script-based analysis of traces collected by means of a logical analyzer), and one based on a soft-core General-Purpose Processor (GPP) on a Field Programmable Gate Array (FPGA), where the same properties are verified at run-time by means of the integration of the adopted all-digital hardware monitoring system. The final goal is to compare the effectiveness and the efficiency of the two approaches by showing how it is possible (i) to reduce the time needed to perform verification and (ii) to provide the opportunity to verify more complex properties with respect to classical Built-In Self-Test (BIST) approaches.
2. The Proposed Use Case
As said before, the proposed use case targets the development of a pacemaker [6][7] in two different versions: one based on a COTS microcontroller on PCB, and one based on a soft-core GPP on FPGA. So, the rest of this news provides more details about the development approaches adopted for the two pacemaker versions, the considered reliability properties, and obtained results with respect to properties verification.
2.1. Pacemaker Behavior
The considered system is the core of a DDDR type pacemaker [8], i.e., the sub-system dedicated to the execute the algorithm related to the basic pacemaker functionality, i.e., without considering sensing/actuations sub-systems and extra functionality. The behavior that the pacemaker core must exhibit to support the heart in case of failure is specified by the Finite State Machine (FSM) reported in Figure 1.
Figure 1. Pacemaker core behavior.
2.2. Development approaches
In order to show the benefits of the integration of an all-digital hardware monitoring system for the verification of properties related to reliability, it has been first developed a baseline pacemaker core, based on a COTS microcontroller (i.e., Microchip Atmel ATMega328P) on a PCB, for which verification relies on a classical approach.
Other than the basic verification activity, it is also possible to check for verification of more complex properties related to reliability issues. For example, it is possible to consider the following ones:
- Unexpected Timer Fired (UTF)
- Unexpected States Sequence (USS)
UTF is an event that signals when the timer activated at the exit from the AEIr state (Figure 1), should be stopped in the CSW state (i.e. when going towards AVIr) but it is found already fired. This situation cannot happen in a correct design, but it can happen in the final implementation due to timing issues. USS is an event that signals when the pacemaker is moving along a state sequence that is not correct (i.e., not admissible) with respect to the possible ones defined by the related FSM.
Both events can happen due to faults or system performance degradation, as a consequence of undetected manufacturing physical defects and/or aging [9]. Both the events are of critical importance and their notification to an external sub-system can be of great relevance. For example, the voting sub-system that manages the Triple Module Redundancy (TMR) typically used in a real product could also consider such events, other than comparing the outputs, to determine which instance of the pacemaker core is the more reliable at a given time.
To verify the previously described properties (at design time), test vectors have been generated and provided to the pacemaker by an advanced verification system. Figure 2 shows an additional microcontroller used to extract test vectors from real ECGs (Figure 3) and to provide them to the microcontroller implementing the pacemaker core. The outputs are then analyzed by looking at more expressive traces generated from the debug ones by means of some scripts (Figure 4).
Figure 2. Advanced verification system.
Figure 3. Test traces extraction from real ECGs.
Figure 4. Script-based verification activity.
The same pacemaker core has been then developed based on a soft-core GPP (i.e., Xilinx Microblaze, shown in Figure 5) on an FPGA (i.e., Xilinx Artix7), and integrated with the all-digital hardware monitoring system available in [10] to collect data related to system behavior and to verify, automatically and at run-time, the same properties considered above.
Figure 5. Soft-core-based pacemaker core.
2.3. Results analysis
The comparison of the two verification approaches has shown how it is possible (i) to reduce the time needed to perform verification and (ii) to provide the opportunity to verify more complex properties with respect to classical BIST approaches. In fact, during the development, it has been measured that the time needed to perform verification, with respect to the classical off-line approach (i.e., script-based analysis of output traces provided to a logic state analyzer) is reduced of 30% when considering the time needed to write and run the scripts, and analyze the results provided results. It is worth noting that the saved time would increase when considering more properties. Moreover, thanks to the proposed verification approach, it is possible to verify complex properties during operative mode, as it is not possible by means of classical BIST approaches. Finally, in order to compare the effectiveness and efficiency of the two verification approaches from a more general point of view it is possible to also consider that:
- The proposed approach allows to monitor internal system components while the classical one is limited to the observability offered by the debug serial line;
- The proposed approach allows to reduce overhead with respect to SW instrumentation-based ones without affecting the nominal behavior of the system.
3. Conclusions
For a lot of COTS digital devices, current reliability assessment is based on offline time-consuming analysis of traces collected while providing as input proper test vectors, and available BIST approaches shall be performed in a non-operative mode and are limited in the complexity of the verifiable properties. By exploiting an all-digital hardware monitor integrated into the system it is possible to collect, at run-time, data related to system behavior for the verification of properties related to reliability in a more effective and efficient way. The adopted all-digital HW embedded monitor [2][4][5] outperforms current SoA with respect to flexibility, size (i.e., cost), and energy consumption. For all this, it can be exploited to build next-generation embedded systems that need non-intrusive run-time monitoring actions (e.g., safety-critical systems like pacemakers).
Author
Luigi Pomante
Università degli Studi dell’Aquila – DISIM/DEWS, ITALY. E-mail: luigi.pomante@univaq.it
Keywords
Embedded Systems, Monitoring systems, Reliability, Verification, Microcontroller, FPGA
Reference
- Kornaros and D. Pnevmatikatos. 2013. A survey and taxonomy of on-chip monitoring of multicore systems-on-chip. ACM Trans. Des. Autom. Electron. Syst. 18, 2 (Apr. 2013).
- Pressel et al., "The H2020-ECSEL Project “iRel40” (Intelligent Reliability 4.0)," 2021 24th Euromicro Conference on Digital System Design (DSD), Palermo, Italy, 2021.
- Valente et al., “Hardware Performance Sniffers for Embedded Systems Profiling,” in 12th Workshop Intelligent Solutions Embedded Syst., 2015, pp. 29-34.
- Valente et al., “A flexible profiling sub-system for reconfigurable logic architectures”, in Proceedings of PDP 2016, 2016.
- Giacomo Valente, Tiziana Fanni, Carlo Sau, Tania Di Mascio, Luigi Pomante, and Francesca Palumbo. 2021. A Composable Monitoring System for Heterogeneous Embedded Platforms. ACM Trans. Embed. Comput. Syst. 20, 5, Article 42 (September 2021), 34 pages.
- Siva K. Mulpuru, Malini Madhavan, Christopher J. McLeod, Yong-Mei Cha, Paul A. Friedman. Cardiac Pacemakers: Function, Troubleshooting, and Management: Part 1 of a 2-Part Series, Journal of the American College of Cardiology, Volume 69, Issue 2, 2017.
- Malini Madhavan, Siva K. Mulpuru, Christopher J. McLeod, Yong-Mei Cha, Paul A. Friedman, Advances and Future Directions in Cardiac Pacemakers: Part 2 of a 2-Part Series, Journal of the American College of Cardiology, Volume 69, Issue 2, 2017.
- Bolchini, L. Pomante, F. Salice, and D. Sciuto “Reliability Properties Assessment at System Level: A Co-Design Framework,” Tech. Rep. 2001-85, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy, 2001.
- A. Rambo et al., "The Self-Aware Information Processing Factory Paradigm for Mixed-Critical Multiprocessing," in IEEE Transactions on Emerging Topics in Computing.
- JOINTER, https://github.com/alkalir/jointer.git