kernel crashes upon multiple (five to ten) module reloads.
When I start reloading the kernel modules to do our usual restart tests and after a few, maybe five to ten or so reloads, the thing breaks.
I'm using the wrpc-v3.0 from the provided binaries.
Nov 24 17:58:01 white-rabbit kernel: [ 975.714609] spec 0000:03:00.0:
remove
Nov 24 17:58:01 white-rabbit kernel: [ 975.717550] spec 0000:03:00.0:
probe for device 0003:0000
Nov 24 17:58:01 white-rabbit kernel: [ 975.718124] spec 0000:03:00.0:
firmware: direct-loading firmware fmc/spec-init.bin
Nov 24 17:58:01 white-rabbit kernel: [ 975.718131] spec 0000:03:00.0:
got file "fmc/spec-init.bin", 1485236 (0x16a9b4) bytes
Nov 24 17:58:02 white-rabbit kernel: [ 975.909666] spec 0000:03:00.0:
FPGA programming successful
Nov 24 17:58:02 white-rabbit kernel: [ 976.339285] spec 0000:03:00.0:
mezzanine 0
Nov 24 17:58:02 white-rabbit kernel: [ 976.339287] Manufacturer:
CERN
Nov 24 17:58:02 white-rabbit kernel: [ 976.339288] Product name:
FmcDIO5chTTLa
Nov 24 17:58:02 white-rabbit kernel: [ 976.340368] fmc
FmcDIO5chTTLa-0300: Driver has no ID: matches all
Nov 24 17:58:02 white-rabbit kernel: [ 976.340407] spec 0000:03:00.0:
reprogramming with fmc/wrpc_v3.0.bin
Nov 24 17:58:02 white-rabbit kernel: [ 976.340749] spec 0000:03:00.0:
firmware: direct-loading firmware fmc/wrpc_v3.0.bin
Nov 24 17:58:02 white-rabbit kernel: [ 976.532347] spec 0000:03:00.0:
FPGA programming successful
Nov 24 17:58:02 white-rabbit kernel: [ 976.575620] fmc_trivial
FmcDIO5chTTLa-0300: Can't find SDB at address 0x0
Nov 24 17:59:01 white-rabbit CRON[1266]: (root) CMD ( modprobe -r
fmc-trivial spec fmc; modprobe spec; modprobe fmc-trivial
gateware=fmc/wrpc_v3.0.bin)
Nov 24 17:59:01 white-rabbit kernel: [ 1035.568446] spec 0000:03:00.0:
remove
Nov 24 17:59:01 white-rabbit kernel: [ 1035.571587] spec 0000:03:00.0:
probe for device 0003:0000
Nov 24 17:59:01 white-rabbit kernel: [ 1035.572087] spec 0000:03:00.0:
firmware: direct-loading firmware fmc/spec-init.bin
Nov 24 17:59:01 white-rabbit kernel: [ 1035.572093] spec 0000:03:00.0:
got file "fmc/spec-init.bin", 1485236 (0x16a9b4) bytes
Nov 24 17:59:02 white-rabbit kernel: [ 1035.763627] spec 0000:03:00.0:
FPGA programming successful
Nov 24 17:59:02 white-rabbit kernel: [ 1035.803920] spec 0000:03:00.0:
Can't find SDB magic
The machine crashes. It appears as though the SDB is broken, but I suspect something different is going on. When I reboot the machine I can again reload for 5 to 10 times, then the same error. My conclusion is that something is broken, not so much in the v3.0 spec-init.bin binary, but in the kernel modules?
At the LM32 terminal, the output simply stops, so no (visible) trace of booting and crashing.
This might be a duplicate from the colleague on the tracker...
We learn:
0-59/1 * * * * root modprobe -r fmc-trivial spec fmc; modprobe spec;
modprobe fmc-trivial gateware=fmc/wrpc_v3.0.bin
in /etc/crontab is great for tracking down all kinds of issues...
Yours, Tjeerd