Actions and Recommendations from the radtol SB Dependability Analysis
A = proposed Action; R = Recommendation TBD ; see also full report
To be checked if completed or strikethrough if obsolete
- [A] Implementation of a MoniMod sanity check on the DI/OT system board and issuing of a warning when a MoniMod gets lost.
- [R] Foresee a Triple Modular Redundancy (TMR) gateware configuration for all critical functions implemented on the IGLOO2 FPGA.
-
[R] Temperature optimisation of the IGLOO2 FPGA:
a. Potential use of the ‘T1’ automotive grade (or ‘T2’, or ‘M’ military grade) to increase the derating.
Caveat: Potential conflict with radiation performance to be taken into account.
Note: The IGLOO2 maximum recommended operating junction temperature is 100°C. This is not the temperature stress limit in terms of maximum robustness, but may affect the FPGA functionality in the short term when operating at top performance. Also in the long term the reliability to provide the required functionality may be affected by degradation, hence a lowering of this temperature limit.
b. Other potential actions are:
i. Gateware design to keep the component temperature reasonably low.
ii. Implementation of diagnostics to monitor for potential degradation, e.g. temperature measurements, current consumption, TMR error rate.
iii. Transition to a “safe-mode” for operation above critical temperature threshold. -
[R] IGLOO2 FPGA application-specific gateware recommendations:
a. Assess power dissipation and maximum junction temperature of the IGLOO2 FPGA for application specific gateware configurations.
b. Recommendations for critical applications:
i. FMECA extension to assess detection and mitigation probability of critical failure modes.
ii. Fault Tree Analysis (FTA) using the bottom up system model as input. Extension of the FTA to the DI/OT and entire system level, see here. - [R] Implementation of a watchdog on peripheral boards to detect potential IGLOO2 failure, see FMECA p.4 in Annex 6.2. In addition:
- [R] Implementation of a watchdog on the FMC nanoFIP to detect potential failures.
- [R] Implementation of a check to verify radtol SB <-> FMC communication, e.g. to write to a nanoFIP register, read back, and compare (see FMECA p.10/14 in Annex 6.2). Potential failure modes: ‘FMC line/pin open’
- [R] If used, storage of highly critical data on the SPI flash memory may require additional mitigation actions such as error checking. Note that the memory has been tested up to a dose of 500Gy without observing any Single Event Upset, see report.
- [R] Implementation of a flip‑flop‑logic close to timing limits to detect variations and drift of the oscillators.
- [R] If board space allows, placement of additional heat sinks on the linear regulators for improved heat dissipation.
- [R] Potential use of automotive grade (AEC‑Q200) qualified components for quartz crystals Q1‑Q3.
- [R] Potential increase of C8 voltage rating or design change. Currently 16V rated for 12V applied.
- [R] Add additional resistor to MOSFET gates of level translator to protect from Single Event Gate Rupture (see schematics p.7 and FMECA in Annex 6.2). Note that these rare events have not been observed during irradiation testing.
- [A] Tests at top performance parameters for functional validation of the radtol SB.
- [A] Validation tests at determined environmental stress limits (see here) and top performance.
- [A] High stress tests to determine the robustness and sufficient margin for both functional errors and hardware failures against different stresses (see here).
- [A] High quality requirements during the PCB and PCBA production process (IPC class 3), supported by a high level of inspections. Final End‑of‑Line functional test bench (“PTS”), as well as potentially intermediate component test benches to be designed and used.
- [A] Screening (temperature cycling) and reliability testing (run in) as outlined here.
- [R] Comprehensive failure monitoring, root-cause analysis and failure data analysis (Weibull analysis) for all units installed.
- [R] Monitoring and analysis of on-board MoniMod diagnostic parameters for potential degradation.