Actions and Recommendations from the nonrad SB Dependability Analysis
A = proposed Action; R = Recommendation TBD ; see also full report
To be checked if completed or strikethrough if obsolete
-
[A] Cyclic Redundancy Check (CRC) for the peripheral board communication. -
[A, ongoing] Removal of the DIP switch for the operational version of the DI/OT (Boot only from the QSPI flash). -
[A] Temperature tests of the DI/OT crate at top performance to assess the accuracy of previous FEA simulations. Subsequent evaluation of the criticality of a DI/OT operation without fans and if necessary implementation of failure monitoring and an immediate maintenance intervention for the case of a partial or complete fans failure. Depending on the application specific heat generation as well as the assessed criticality an adaption of this maintenance strategy is possible. -
[R] Re-assessment of the MPSoC maximum junction temperature for application-specific firmware implementation using the Xilinx power estimator tool [16]. Validation of estimation by performing temperature measurements during tests. -
[R] Use of the industrial grade S40FC004C1B2C00000 eMMC memory for higher junction temperature rating (85°C), therefore higher derating. Note: Part number and component procurement to be checked for the “embedded wireless” grade which exists since 03/10/2020 and is qualified for 85°C and seems to replace the currently used commercial grade. Procured batch should be from past this date. -
[R, optional] Monitoring of IC27 (MAX16025TE+) HTOL tested failure rate of the manufacturer as well as in DI/OT operation. Note: The component qualification shows no failures during the testing. Only the tested time is lower than for most other such components. No significant higher risk is expected for this component. -
[R] For analog sampling applications, careful analysis of a drifting oscillator failure mode leading to erroneous ADC samples for instance. -
[R] SD card slot to not be used in operation -> User manual. -
[R] Implementation of Error Correction Code (ECC) and/or CRC if critical data is stored in DDR memories. -
[R] Implement critical MPSoC functions in the FPGA part. Alternatively in bare metal code running on a dedicated processor depending on constraints. Note: Various safety features provided by Xilinx can be implemented making the MPSoC suitable for SIL 3 applications, see Technical Ref. Manual. -
[R] Implementation of a check to verify nonrad SB <-> FMC communication, e.g. to write to a mezzanine register, read back, and compare. Potential failure modes: ‘FMC line/pin open’. -
[R] Extension of the FMECA analysis (and potentially an FTA) to application specific DI/OT designs. This includes the analysis of the custom designed peripheral boards, but also consideration of application specific failure modes, rather failure effects. -
[A] Tests at top performance parameters for functional validation of the nonrad SB. -
[A] Validation tests at determined environmental stress limits (see here) and top performance. -
[A] Continuation of high stress tests to determine the robustness and sufficient margin for both functional errors and hardware failures against different stresses (see here). -
[R] Additional temperature measurements during high temperature stress tests of high dissipating components, e.g. the MPSoC for individual firmware implementations. -
[A] High quality requirements during the PCB and PCBA production process (IPC class 3), supported by a high level of inspections. Final End of Line functional test bench (“PTS”), as well as potentially intermediate component test benches to be designed and used. -
[A] Screening (temperature cycling) and reliability testing (run in) as outlined here. -
[R] Comprehensive use of on board parameters for diagnostics monitoring and (automated) analysis of data. -
[R] Comprehensive failure monitoring, root-cause analysis and failure data analysis (Weibull analysis) for all units installed.