switching core stuck with errors: wrn_start_xmit: discarding tx frame that got no timestamp
Link between two switches, with VLANs. Link is: Master (8) <-> Slave (1)
Both switches report EXT_OFF (could be leftover from previous restart)
On master there are log entries in syslog:
2021-09-07T22:03:13.571764+00:00 ctdwa-774-cbt.cern.ch watchdog: Warning (/wr/bin/wrs_watchdog):Switching core stuck... resetting
2021-09-07T22:03:13.726284+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:14.101259+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:14.226278+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:15.159258+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:15.239231+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:15.239350+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:15.578197+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:16.077261+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:16.380182+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:16.484171+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:16.574603+00:00 ctdwa-774-cbt.cern.ch watchdog: Warning (/wr/bin/wrs_watchdog):Switching core stuck... resetting
2021-09-07T22:03:17.356149+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:17.575613+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:17.751123+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:17.808113+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:19.107715+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
2021-09-07T22:03:19.107834+00:00 ctdwa-774-cbt.cern.ch kernel: wrn_start_xmit: discarding tx frame that got no timestamp
Entries with wrn_start_xmit
are also in dmesg.
Such error was seen twice already, once on a WRS in the lab at GSI. Second at CERN during testing.
Restart of PPSI on master and slave does not change anything.
On master there are following messages in the syslog:
2021-09-08T00:09:58.378848+00:00 ctdwa-774-cbt.cern.ch snmpd[2248]: SNMP: Error wrsSwcoreStatus: Endpoint TX frames number (0) on port 8 (wri 8) does not match the number of frames forwarded from other ports (12) and NIC (107), some frames got lost... Difference is more than 5, since last check (53s)
Restart of hal on the slave switch, brought back the sync to normal state. It was probably due to the link down event.
According to tcpdump frames send from slave are received on master, but frames send from master are not received on slave.
Ifconfig down/up on slave enables the slave to sync with master. On master are still appearing the same messages in syslog.
Restart of HAL on slave did not help this time. Messages on master are still appearing.
Restart of HAL on master did help. No messages on master are appearing.
When wrn_start_xmit
messages appear, then there is no communication with other peer on one or more ports. It depends on the restart. How many and which ports are problematic can be taken from SNMP's error messages in syslog.
Ideas to try:
Turn off SFPs with disable TX pin during FPGA programingTurn off SFPs with disable TX pin during RX/TX calibration- Connect other devices to non LPDC ports (13 and later)