CHARM irradiation tests (Aug 2023)
Test plan
During this irradiation campaign the following DI/OT components will be tested:
The goal of this campaign is to verify improvements of the System Board IGL v2 (previously v1 was tested in CHARM during several campaigns), gather more confidence of the Fan Tray operation (previously survived 370Gy), investigate improvements of RaToPUSv2 DC/DC stage design to survive power cycles beyond 250Gy.
Test setup
Results and observations
The irradiation was done 2-9 August 2023 in CHARM mixed field facility at CERN. During 1 week of irradiation in total 520Gy were reached.
Observations
Hydra SoC:
- 7 freezes requiring FPGA reset, first of these freezes happened already at ~10Gy
- We've seen these kind of freezes before, but then Tristan fixed Hydra and in the last campaign of 2022 there were no freezes of the Management CPU. However since that time there were some modifications by both Tristan and Roberto. Probably these modifications introduced some bug.
-
Next steps:
- Investigate Hydra freezes
- Analyse logs to make sure there were no other issues
System Board v2 + FMC nanoFIP:
- Hardware survived the whole irradiation run, i.e. 520Gy
- We did 3 manual power cycles to test the behaviour of hardware
- There was one behaviour which was super weird and we don't understand it at all:
- Sunday morning (330Gy) we power cycled the System Board + FMC nanoFIP, after that power cycle, no communication over WorldFIP, but current consumption indicated that System Board started correctly. FPGA reset or another power cycle did not restore the communication
- Monday morning (430Gy) we did again FPGA reset (not full power cycle) and the communication over WorldFIP to Hydra was restored
-
Next steps:
- Figure out why we were unable to communicate with the setup on Sunday morning... If it was a Hydra issue or what? Or maybe power cycle issue on FMC nanoFIP (I doubt it because we tested FMC nanoFIP successfully several times last year).
2x RaToPUS:
- 3 power cycles until 160Gy without issues
- power cycle at 160Gy made 5V of RaToPUS B unstable (sometimes 5V, sometimes 2.8V). At this stage also the ADC started giving us wrong measurements on some channels (the same ADC used on the System Board reported correct values through the whole irradiation).
- at 211Gy 5V channel of RaToPUS A became unstable
- at 265Gy both RaToPUSes produced unstable 12V, i.e. complete failure of RaToPUS at this stage.
-
Next steps:
- Investigate very early ADC failure.
- check the data logger file from previous campaign when we reached 250Gy. If really none of the voltages were unstable before that, we should maybe re-test in Sep and go with that design.
FanTray:
- 34 automatic, successful power cycles by the System Board due to Monimod freeze (this is expected and correct behavior) - fixed since the last campaign
- plus 2 manual power cycles (last at 290 Gy).
- ~100 Gy we started observing wrong (0 or non stable value) RPMs on fan2 and then fan3 (while fan2 returned to normal) - however, current consumption shows that fans were rotating well. However wrong RPM measurements caused some current spikes due to regulation algorithm reacting to these wrong RPM measurements.
- Until 430Gy we were getting unstable RPM values on different fans
- At ~430Gy manual power cycle of the fantray. Then it was completely dead.
-
Next steps:
- none, except for maybe modifying reset conditions and filtering out RPM "outliers" for fans regulation loop.
Test preparation
Main script
The main script is charm-test.py
(in mfip_urv/
). It needs to be updated for the new outputs.
To use it, you need to run mfip_urv
in passive mode:
./mfip_urv --port 2000
You can then connect to mfip_urv using any tcp tool, like:
telnet localhost 2000
or
ncat localhost 2000
There is no prompts, but you can enter commands.
During operations, you can kill the python script and then send commands over fip using telnet
or ncat
without interrupting mfip_urv
.
To be retested.
WorldFIP commands
- Generate reset to System Board FPGA:
fip-reset
- Restart Application CPU:
plc N reset
(whereN
is 0 or 1) - Reload Application CPU binary:
plc N stop
andload N prg.elf
- Get statistics from Hydra operation (done by the python script):
supervisor
,stat-regs
,plc-supervisor N
- Get statistics from i2c sensors, FanTray, RaToPUS ADC:
fip-err
,system-read
,
Don't do unless really required:
- Reload Management CPU binary - don't do during irradiation
spi-flash-elf FILENAME VERSION [PAGE]
followed byrestart-cfg VAL
(which can also specify the page) andrestart now
. You should erase the program at pages 0-16 in case of flash failure (as this is the default program).
Application CPUs Flow
CPU 0
sudo ./mfip_urv plc 0 reset
sudo ./mfip_urv plc 0 halt
sudo ./mfip_urv load 0 ../plc_urv/plc0_mbox_inc.elf
CPU 1
sudo ./mfip_urv plc 1 reset
sudo ./mfip_urv plc 1 halt
sudo ./mfip_urv load 1 ../plc_urv/plc1_mbox_not.elf
TODO:
- Verify on MasterFIP side if data received from Application CPU is correct
- Add a possibility in Management CPU to enable/disable readout from individual i2c devices
- Power cycle FanTray and on-board ADC (including driving i2c signals low)
- Enable TMR and check the netlist in Libero