The goal of this campaign is to verify improvements of the System Board IGL v2 (previously v1 was tested in CHARM during several campaigns), gather more confidence of the Fan Tray operation (previously survived 370Gy), investigate improvements of RaToPUSv2 DC/DC stage design to survive power cycles beyond 250Gy.
Results and observations
The irradiation was done 2-9 August 2023 in CHARM mixed field facility at CERN. During 1 week of irradiation in total 520Gy were reached.
7 freezes requiring FPGA reset, first of these freezes happened already at ~10Gy
We've seen these kind of freezes before, but then Tristan fixed Hydra and in the last campaign of 2022 there were no freezes of the Management CPU. However since that time there were some modifications by both Tristan and Roberto. Probably these modifications introduced some bug.
Investigate Hydra freezes
Analyse logs to make sure there were no other issues
System Board v2 + FMC nanoFIP:
Hardware survived the whole irradiation run, i.e. 520Gy
We did 3 manual power cycles to test the behaviour of hardware
There was one behaviour which was super weird and we don't understand it at all:
Sunday morning (330Gy) we power cycled the System Board + FMC nanoFIP, after that power cycle, no communication over WorldFIP, but current consumption indicated that System Board started correctly. FPGA reset or another power cycle did not restore the communication
Monday morning (430Gy) we did again FPGA reset (not full power cycle) and the communication over WorldFIP to Hydra was restored
Figure out why we were unable to communicate with the setup on Sunday morning... If it was a Hydra issue or what? Or maybe power cycle issue on FMC nanoFIP (I doubt it because we tested FMC nanoFIP successfully several times last year).
3 power cycles until 160Gy without issues
power cycle at 160Gy made 5V of RaToPUS B unstable (sometimes 5V, sometimes 2.8V). At this stage also the ADC started giving us wrong measurements on some channels (the same ADC used on the System Board reported correct values through the whole irradiation).
at 211Gy 5V channel of RaToPUS A became unstable
at 265Gy both RaToPUSes produced unstable 12V, i.e. complete failure of RaToPUS at this stage.
check the data logger file from previous campaign when we reached 250Gy. If really none of the voltages were unstable before that, we should maybe re-test in Sep and go with that design.
34 automatic, successful power cycles by the System Board due to Monimod freeze (this is expected and correct behavior) - fixed since the last campaign
plus 2 manual power cycles (last at 290 Gy).
~100 Gy we started observing wrong (0 or non stable value) RPMs on fan2 and then fan3 (while fan2 returned to normal) - however, current consumption shows that fans were rotating well. However wrong RPM measurements caused some current spikes due to regulation algorithm reacting to these wrong RPM measurements.
Until 430Gy we were getting unstable RPM values on different fans
At ~430Gy manual power cycle of the fantray. Then it was completely dead.
The main script is charm-test.py (in mfip_urv/). It needs to be updated for the new outputs.
To use it, you need to run mfip_urv in passive mode:
./mfip_urv --port 2000
You can then connect to mfip_urv using any tcp tool, like:
telnet localhost 2000
ncat localhost 2000
There is no prompts, but you can enter commands.
During operations, you can kill the python script and then send commands over fip using telnet or ncat without interrupting mfip_urv.
To be retested.
Generate reset to System Board FPGA: fip-reset
Restart Application CPU: plc N reset (where N is 0 or 1)
Reload Application CPU binary: plc N stop and load N prg.elf
Get statistics from Hydra operation (done by the python script):
supervisor, stat-regs, plc-supervisor N
Get statistics from i2c sensors, FanTray, RaToPUS ADC:
Don't do unless really required:
Reload Management CPU binary - don't do during irradiation
spi-flash-elf FILENAME VERSION [PAGE]
followed by restart-cfg VAL (which can also specify the page)
and restart now.
You should erase the program at pages 0-16 in case of flash failure
(as this is the default program).