Actions and Recommendations from the nonrad SB Dependability Analysis
A = proposed Action; R = Recommendation TBD ; see also full report
To be checked if completed or strikethrough if obsolete
- [A] Cyclic Redundancy Check (CRC) for the peripheral board communication.
- [A, ongoing] Removal of the DIP switch for the operational version of the DI/OT (Boot only from the QSPI flash).
- [A] Temperature tests of the DI/OT crate at top performance to assess the accuracy of previous FEA simulations. Subsequent evaluation of the criticality of a DI/OT operation without fans and if necessary implementation of failure monitoring and an immediate maintenance intervention for the case of a partial or complete fans failure. Depending on the application specific heat generation as well as the assessed criticality an adaption of this maintenance strategy is possible.
- [R] Re-assessment of the MPSoC maximum junction temperature for application-specific firmware implementation using the Xilinx power estimator tool [16]. Validation of estimation by performing temperature measurements during tests.
-
[R] Use of the industrial grade S40FC004C1B2C00000 eMMC memory for higher junction temperature rating (85°C), therefore higher derating. Note: Part number and component procurement to be checked for the “embedded wireless” grade which exists since 03/10/2020 and is qualified for 85°C and seems to replace the currently used commercial grade. Procured batch should be from past this date. - [R, optional] Monitoring of IC27 (MAX16025TE+) HTOL tested failure rate of the manufacturer as well as in DI/OT operation. Note: The component qualification shows no failures during the testing. Only the tested time is lower than for most other such components. No significant higher risk is expected for this component.
- [R] For analog sampling applications, careful analysis of a drifting oscillator failure mode leading to erroneous ADC samples for instance.
- [R] SD card slot to not be used in operation -> User manual.
- [R] Implementation of Error Correction Code (ECC) and/or CRC if critical data is stored in DDR memories.
- [R] Implement critical MPSoC functions in the FPGA part. Alternatively in bare metal code running on a dedicated processor depending on constraints. Note: Various safety features provided by Xilinx can be implemented making the MPSoC suitable for SIL 3 applications, see Technical Ref. Manual.
- [R] Implementation of a check to verify nonrad SB <-> FMC communication, e.g. to write to a mezzanine register, read back, and compare. Potential failure modes: ‘FMC line/pin open’.
- [R] Extension of the FMECA analysis (and potentially an FTA) to application specific DI/OT designs. This includes the analysis of the custom designed peripheral boards, but also consideration of application specific failure modes, rather failure effects.
- [A] Tests at top performance parameters for functional validation of the nonrad SB.
- [A] Validation tests at determined environmental stress limits (see here) and top performance.
- [A] Continuation of high stress tests to determine the robustness and sufficient margin for both functional errors and hardware failures against different stresses (see here).
- [R] Additional temperature measurements during high temperature stress tests of high dissipating components, e.g. the MPSoC for individual firmware implementations.
- [A] High quality requirements during the PCB and PCBA production process (IPC class 3), supported by a high level of inspections. Final End of Line functional test bench (“PTS”), as well as potentially intermediate component test benches to be designed and used.
- [A] Screening (temperature cycling) and reliability testing (run in) as outlined here.
- [R] Comprehensive use of on board parameters for diagnostics monitoring and (automated) analysis of data.
- [R] Comprehensive failure monitoring, root-cause analysis and failure data analysis (Weibull analysis) for all units installed.