White Rabbit Switch - Gateware issueshttps://ohwr.org/project/wr-switch-hdl/issues2023-11-28T17:27:37Zhttps://ohwr.org/project/wr-switch-hdl/issues/44WRS unable to establish link occasionally2023-11-28T17:27:37ZEvangelia GousiouWRS unable to establish link occasionallyReported by Adam: 5 times out of 4203, WRS was not able to establish the link at all. In such case it does not help to unplug/plug the fiber/sfp or restart PPSI. In such case it is necessary to restart HAL (or restart WRS) to bring back WRS to the normal functionality. I suspect the problem is that for some reason LPDC does not finish.
>> What's the state of the LPDC TX/RX FSMs on a port that doesn't work?
Logs from when it was caught, it seems the problem might be on master:
2023-11-24T03:54:13.204235+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):rxcal: early link flag lost on port wri8
master:
23-11-24T03:41:08.424606+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:41:08.425768+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:41:08.426914+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:41:08.431031+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:41:16.998222+00:00 ctdwa-774-cbt.cern.ch kernel: wri8: Link down.
2023-11-24T03:41:17.604912+00:00 ctdwa-774-cbt.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri8 went down, removing corresponding entries...
2023-11-24T03:41:27.308084+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 12007 ps (after 50 attempts).
2023-11-24T03:41:27.397831+00:00 ctdwa-774-cbt.cern.ch kernel: wri8: Link up, lpa 0x4020.
2023-11-24T03:41:37.001693+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri8: bitslide= 0 [ps]
2023-11-24T03:42:00.618326+00:00 ctdwa-774-cbt.cern.ch nslcd[1957]: [b2564f] <group/member="root"> ldap_result() failed: Can't contact LDAP server
2023-11-24T03:42:00.619862+00:00 ctdwa-774-cbt.cern.ch nslcd[1957]: [b2564f] <group/member="root"> ldap_abandon() failed to abandon search: Can't contact LDAP server: Transport endpoint is not connected
2023-11-24T03:42:00.687852+00:00 ctdwa-774-cbt.cern.ch crond[2228]: USER root pid 29811 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:42:06.596354+00:00 ctdwa-774-cbt.cern.ch kernel: wri8: Link down.
2023-11-24T03:42:06.702056+00:00 ctdwa-774-cbt.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri8 went down, removing corresponding entries...
2023-11-24T03:42:11.596550+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 3974 ps (after 1 attempts).
2023-11-24T03:42:13.596090+00:00 ctdwa-774-cbt.cern.ch kernel: wri8: Link up, lpa 0x4020.
2023-11-24T03:42:26.498863+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri8: bitslide= 0 [ps]
2023-11-24T03:42:52.056273+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:42:52.056458+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:42:52.056715+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:42:52.061027+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:43:00.695657+00:00 ctdwa-774-cbt.cern.ch crond[2228]: user root: process already running: /etc/init.d/system_clock_monitor
2023-11-24T03:43:28.593270+00:00 ctdwa-774-cbt.cern.ch kernel: wri8: Link down.
2023-11-24T03:43:28.860962+00:00 ctdwa-774-cbt.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri8 went down, removing corresponding entries...
2023-11-24T03:43:34.806532+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 8026 ps (after 10 attempts).
2023-11-24T03:43:45.318262+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:43:45.319554+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:43:45.320711+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:43:45.324582+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:44:00.710447+00:00 ctdwa-774-cbt.cern.ch crond[2228]: USER root pid 29869 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:44:01.749348+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:44:01.753212+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:44:01.754610+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:44:01.756598+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
it seems the problem starts here
2023-11-24T03:44:49.014203+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):rxcal: early link flag lost on port wri8
2023-11-24T03:44:56.644020+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 4021 ps (after 21 attempts).
2023-11-24T03:45:00.719163+00:00 ctdwa-774-cbt.cern.ch crond[2228]: user root: process already running: /etc/init.d/system_clock_monitor
2023-11-24T03:45:08.367335+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:45:08.367655+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:45:08.367832+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:45:08.372718+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:46:00.730065+00:00 ctdwa-774-cbt.cern.ch crond[2228]: USER root pid 29961 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:46:09.908481+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):rxcal: early link flag lost on port wri8
2023-11-24T03:46:17.307324+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 7958 ps (after 18 attempts).
2023-11-24T03:46:51.137965+00:00 ctdwa-774-cbt.cern.ch sshd[29983]: Connection closed by 172.18.203.151
2023-11-24T03:46:51.958644+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:46:51.959773+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:46:51.960917+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:46:51.964930+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:47:00.738632+00:00 ctdwa-774-cbt.cern.ch crond[2228]: user root: process already running: /etc/init.d/system_clock_monitor
2023-11-24T03:47:30.402493+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):rxcal: early link flag lost on port wri8
2023-11-24T03:47:38.635117+00:00 ctdwa-774-cbt.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri8: RX calibration complete at phase 11962 ps (after 28 attempts).
2023-11-24T03:47:45.317432+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:47:45.317695+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:47:45.317875+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:47:45.322322+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPTPFramesFlowing: No RX PTP frames flowing for port 16 (wri16) which is up and in WR mode
2023-11-24T03:48:00.749683+00:00 ctdwa-774-cbt.cern.ch crond[2228]: USER root pid 30014 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:48:01.967617+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not for Gigabit Ethernet
2023-11-24T03:48:01.968808+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 10 (wri10) is not in the database. Change the SFP or declare port as non-wr or none
2023-11-24T03:48:01.972184+00:00 ctdwa-774-cbt.cern.ch snmpd[2251]: SNMP: Error wrsPortStatusSfpError: SFP in port 16 (wri16) is not in the database. Change the SFP or declare port as non-wr or none
slave
2023-11-24T03:39:33.528744+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_up: Port wri1 PDOWN detected
2023-11-24T03:39:33.985193+00:00 ctdwa-774-cbts1.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri1 went down, removing corresponding entries...
2023-11-24T03:39:39.882035+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri1: RX calibration complete at phase 3951 ps (after 7 attempts).
2023-11-24T03:39:40.013980+00:00 ctdwa-774-cbts1.cern.ch kernel: wri1: Link up, lpa 0x4020.
2023-11-24T03:39:59.001330+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri1: bitslide= 0 [ps]
2023-11-24T03:40:00.632233+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 14238 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:40:05.448996+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):Adjust: counter = seconds [+1]
2023-11-24T03:40:08.390323+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):Adjust: counter = nanoseconds [+546127472]
2023-11-24T03:40:25.723060+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_up: Port wri1 PDOWN detected
2023-11-24T03:40:26.003348+00:00 ctdwa-774-cbts1.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri1 went down, removing corresponding entries...
2023-11-24T03:40:31.514560+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri1: RX calibration complete at phase 8017 ps (after 4 attempts).
2023-11-24T03:40:33.017478+00:00 ctdwa-774-cbts1.cern.ch kernel: wri1: Link up, lpa 0x4020.
2023-11-24T03:40:50.836963+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri1: bitslide= 0 [ps]
2023-11-24T03:40:56.737084+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):Adjust: counter = seconds [+1]
2023-11-24T03:40:59.850986+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):Adjust: counter = nanoseconds [+92109824]
2023-11-24T03:41:00.097262+00:00 ctdwa-774-cbts1.cern.ch nslcd[2024]: [2a9e12] <group/member="root"> ldap_result() failed: Can't contact LDAP server
2023-11-24T03:41:00.098706+00:00 ctdwa-774-cbts1.cern.ch nslcd[2024]: [2a9e12] <group/member="root"> ldap_abandon() failed to abandon search: Can't contact LDAP server: Transport endpoint is not connected
2023-11-24T03:41:00.295934+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 14486 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:41:17.032124+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_up: Port wri1 PDOWN detected
2023-11-24T03:41:17.556825+00:00 ctdwa-774-cbts1.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri1 went down, removing corresponding entries...
2023-11-24T03:41:23.432166+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri1: RX calibration complete at phase 12007 ps (after 10 attempts).
2023-11-24T03:41:27.474755+00:00 ctdwa-774-cbts1.cern.ch kernel: wri1: Link up, lpa 0x4020.
2023-11-24T03:41:42.140448+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri1: bitslide= 0 [ps]
2023-11-24T03:41:48.301821+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):Adjust: counter = nanoseconds [+184361904]
2023-11-24T03:42:00.036285+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 14743 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:42:06.453512+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_up: Port wri1 PDOWN detected
2023-11-24T03:42:06.468836+00:00 ctdwa-774-cbts1.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri1 went down, removing corresponding entries...
2023-11-24T03:42:13.685600+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):wri1: RX calibration complete at phase 3974 ps (after 18 attempts).
2023-11-24T03:42:13.895199+00:00 ctdwa-774-cbts1.cern.ch kernel: wri1: Link up, lpa 0x4020.
2023-11-24T03:42:31.561659+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down:wri1: bitslide= 0 [ps]
2023-11-24T03:43:00.049134+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 14986 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:43:28.844277+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_up: Port wri1 PDOWN detected
2023-11-24T03:43:29.629656+00:00 ctdwa-774-cbts1.cern.ch rtud: <30>Info (/wr/bin/wrsw_rtud):Port wri1 went down, removing corresponding entries...
2023-11-24T03:44:00.065847+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 15234 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:44:25.529494+00:00 ctdwa-774-cbts1.cern.ch nslcd[2024]: [5ac048] <passwd="sshd"> ldap_result() failed: Can't contact LDAP server
2023-11-24T03:44:25.530933+00:00 ctdwa-774-cbts1.cern.ch nslcd[2024]: [5ac048] <passwd="sshd"> ldap_abandon() failed to abandon search: Can't contact LDAP server: Transport endpoint is not connected
2023-11-24T03:44:25.809536+00:00 ctdwa-774-cbts1.cern.ch sshd[15360]: Connection closed by 188.185.10.249
no more link up...
2023-11-24T03:44:49.459321+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:45:00.080297+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 15498 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:46:00.094762+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 15760 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:46:10.274329+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:47:00.108966+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 16008 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:47:30.740415+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:48:00.121837+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 16261 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:48:51.348556+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:49:00.133833+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 16509 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:50:00.146875+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 16771 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:50:12.047549+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:51:00.160115+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 17024 cmd /etc/init.d/system_clock_monitor
2023-11-24T03:51:32.552597+00:00 ctdwa-774-cbts1.cern.ch hald: <30>Info (/wr/bin/wrsw_hal):port_fsm_state_link_down: Port wri1 PDOWN detected
2023-11-24T03:52:00.176450+00:00 ctdwa-774-cbts1.cern.ch crond[2284]: USER root pid 17276 cmd /etc/init.d/system_clock_monitorhttps://ohwr.org/project/wr-switch-hdl/issues/43Merge dmtd_with_deglitcher fix2023-10-30T17:21:08ZEvangelia GousiouMerge dmtd_with_deglitcher fixFix: https://ohwr.org/project/wr-cores/commit/ccd0d2d62642b2ba4644231f95898e3bf20605af
Original issue: https://ohwr.org/project/wrpc-sw/issues/61https://ohwr.org/project/wr-switch-hdl/issues/42RTU: reads of PCR register for CPU (port 18) returns always 02023-07-07T16:05:24ZAdam WujekRTU: reads of PCR register for CPU (port 18) returns always 0In RTU, reads of PCR register for CPU (port 18) returns always 0.https://ohwr.org/project/wr-switch-hdl/issues/41Jumbo frames dropped after a particular rate2024-01-31T22:31:11ZMaciej LipinskiJumbo frames dropped after a particular rate=============== REPORT from Antonio.
We are evaluating the version 6.0.1 for the WR Switches and we have issue related to packet losses with jumbo frames.
With MTU >= 2000 we can see that switches start to loss packets. You can find attached a document ([WR_Tests_6.0.1_v1_.pdf](/uploads/75a360cff9559b7863389fd1a885670e/WR_Tests_6.0.1_v1_.pdf)) with some experiments where Version 5.0.1 works fine under different conditions, but version 6.0.1 does not behave the same way. You can see the wrsw_pstat outputs.https://ohwr.org/project/wr-switch-hdl/issues/40Low Phase Drift Calibration (LPDC)2020-06-08T14:46:43ZGrzegorz DanilukLow Phase Drift Calibration (LPDC)Low Phase Drift Calibration is a new feature added in release v6.0 that improves phase stability between WR switch restarts to <10ps. Due to FPGA limitations this functionality is present only on ports 1-12. The LPDC requires an additional, automated calibration procedure to run for Tx and Rx path of each affected FPGA transceiver. The Tx calibration is performed once for all ports at startup of the WR switch. The Rx calibration is performed each time a link goes up on a port. Both Tx and Rx calibration is indicated by blinking of Link/WR Mode LED (left). The downside is that the calibration makes startup of the WR switch longer. Similarly, the time between connecting a fiber and the link going up (e.g. as observed in *wr_mon*) is noticeably longer.https://ohwr.org/project/wr-switch-hdl/issues/39Protocol-based VLAN classification to distinguish PTP/SNMP/ARP traffic2020-02-20T19:14:16ZMaciej LipinskiProtocol-based VLAN classification to distinguish PTP/SNMP/ARP trafficsee: https://ohwr.org/project/wr-switch-sw/issues/207https://ohwr.org/project/wr-switch-hdl/issues/1Port mirroring does not work2020-06-08T14:48:54ZAdam WujekPort mirroring does not workThe performed test had the following setup:
\--a traffic running on port 1 (PTP, LLDP and application)
\--sniffer connected to the port 2
Expected result was that all the traffic will be redirected from port 1
to port 2.
Configuration of a switch:
# disable port mirroring in case it was enabled
# read RX_CTR register
devmem 0x10060014 32
# remove flag 0x20 (MR_ENA) from the given result (0x19 & ~0x20 = 0x30)
devmem 0x10060014 32 0x19
# Configure destination port to port 2
devmem 0x10060024 32 0
devmem 0x10060028 32 0x2
# Configure reception traffic mirror source to port 1
devmem 0x10060024 32 2
devmem 0x10060028 32 0x1
# Configure transmission traffic mirror source
devmem 0x10060024 32 3
devmem 0x10060028 32 0x1
# Enable port mirroring in case it was enabled
# read RX_CTR register
devmem 0x10060014 32
# add flag 0x20 (MR_ENA) to the given result (0x19 | 0x20 = 0x30)
devmem 0x10060014 32 0x39
Additional observations:
\--on a port 2, LLDP packets were transmitted from CPU (should not
happen)https://ohwr.org/project/wr-switch-hdl/issues/2PTP frames not sent after reboot despite link up2020-06-08T14:49:39ZGrzegorz DanilukPTP frames not sent after reboot despite link upIn the setup with 2 WR PTP Cores (v4.0) connected to the WR Switch v5.0,
both nodes are synchronized. I restarted the switch (by executing
*reboot* in the linux shell) and after booting up again, only one WRPC
was synchronized, the other was stuck in listening not receiving any
traffic.
Having a quick look in the WRS it looks that despite the link being up,
no frames are sent on one of the ports. In SNMP, PTP Tx frames counter
increments on that port, and also in Pstats NIC\_Tx counter increments.
However Tx frames counter in the Endpoint remains 0.
ifconfig wriX down; ifconfig wriX up fixes the problem.https://ohwr.org/project/wr-switch-hdl/issues/3update Manifest.py files to support hdlmake v2.12019-02-12T09:51:29ZGrzegorz Danilukupdate Manifest.py files to support hdlmake v2.1Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/4Improve multiport Linked List inside the Switching Core2019-02-12T09:51:29ZGrzegorz DanilukImprove multiport Linked List inside the Switching CoreIn some situations the multiport linked list (MLL) module is a
performance bottleneck of the Switching Core. This happens especially
when Switching Core has to handle high priority (HP) frames. In this
case, since our switch is cut-through, we have a race condition between
the Input Block (IB) requesting memory pages to receive a frame and
Output Block (OB) reading validated pages and sending data to the
Endpoint.
If MLL is not able to provide quickly new memory page to IB, the sending
FSM in OB will stall waiting for next page being completely written (or
with end-of-frame bit set). If OB stalls, Tx PCS in the Endpoint can
underrun and the frame is cut (as a consequence, only a fragment with
incorrect CRC is received on the other side of the link).
Multiport Linked List should be re-written to be able to serve requests
from all 18 ports of the switch more efficiently.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/5Limit the ports from which frames with given VID are accepted2019-07-15T09:27:12ZAdam WujekLimit the ports from which frames with given VID are acceptedRight now packets with VID not configured on a given port are forwarded
to all ports configured with this VID. Limit the ports from which frames
with given VID are accepted.https://ohwr.org/project/wr-switch-hdl/issues/6rx pcs should accept preamble shrinkage2020-06-08T14:26:18ZGrzegorz Danilukrx pcs should accept preamble shrinkageIn some cases (depends on the transceiver implementation on the other
side of the link) the preamble which is normally 6 bytes can be shrank
to 5 bytes. More information about the reasons for preamble shrinkage
can be found e.g. in [Xilinx Ethernet PCS/PMA product
guide](http://www.xilinx.com/support/documentation/ip_documentation/gig_ethernet_pcs_pma/v15_1/pg047-gig-eth-pcs-pma.pdf)
Although this situation with our TX PCS, we should be able to handle
this on RX side to allow connecting various non-WR devices to the WR
switch.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/7Data errors when RXCLK offset > 30 ppm2019-02-12T09:51:31ZIlia SlepnevData errors when RXCLK offset > 30 ppmProblem: "Invalid code" RX errors under low and moderate traffic load.
Test setup:
WR Node \<<s>fiber</s>\> WRS3-18 \<<s>UTP</s>\> LAN switch
Test software is communicating with WR Node through LAN and WR Switch.
Traffic rate is about 2000 packets per second bidirectionally.
Different LAN switches have been tested. Some worked without errors,
some exposed 'invalid code' rx error and packets was dropped.
There's strong correlation between error rate and TX\_CLK frequency
offset on LAN switch port.
No errors with those switches that have TX\_CLK frequency offset below
30 ppm, i.e. in range 124,996,250 - 125,003,750 Hz
Many errors with switches with TX\_CLK offset greater than 75 ppm but
still below 100 ppm defined by standard.
TX\_CLK frequency offset of LAN switch port is measured with custom
board with LSI ET1011C Gigabit Ethernet PHY chipset. It has RX\_CLK pin
(recovered clock) that is monitored by spectrum analyser with 1 ppm
accuracy.
The results of clock offset in a series of commercial HP switches are
very unexpected. Modern HP 3800 and 2910al have offset *80 ppm*- 6 ppm.
This is near the limit allowed by standard. I can't believe that it was
a series of bad crystals with this large offset. It's systematic.
The measurement report in attachment.
The rx code error rate does not depend on transceiver type used. Same
results with WDM (BX), GLC-LH-SM and copper GLC-T transceivers. There's
no dependence on specific WR Switch or port. Multiple WR switches tested
with same result. WRS ports 2-18 behave similarly. Port 1 (slave) was
not used for tests.
For WR Switch to comply with IEEE 802.3 standard it should handle +- 100
ppm clock offsets.
### Files
* [Ethernet_Switches_-_TX_Clock_Offset_-_Copy_of_Sheet1.pdf](/uploads/99a1b5b7dc06b6eb711af1d130ad1865/Ethernet_Switches_-_TX_Clock_Offset_-_Copy_of_Sheet1.pdf)https://ohwr.org/project/wr-switch-hdl/issues/8improve PSTATs for timing closure2019-02-12T09:51:31ZGrzegorz Danilukimprove PSTATs for timing closurePSTATs module has a very long chain of combinational logic. This causes
problems with synthesis timing closure when building gateware for full
18-port switch.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/9add HDL watchdog module2019-02-12T09:51:32ZGrzegorz Danilukadd HDL watchdog moduleWe did some serious corner-case testing of the HDL and we have
identified (and fixed) quite a lot of bugs. The stability of the HDL was
very much improved and now it can withstand high loads of Ethernet
traffic going through the switch. However, we should be able to monitor
and reset Ethernet-switching HDL modules in case there is still some not
explored corner-case causing it to hang.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/10add SDB support2019-02-12T09:51:33ZGrzegorz Danilukadd SDB supportWRS HDL should use SDB record to describe all the Wishbone modules with
base addresses. In addition to that we should use the SDB metadata to
export the synthesis information (build date, version, commit hash from
wr-switch-hdl, wr-cores and general-cores repositories).Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/11SoftPLL does not increment DelCnt when 10MHz unplugged2019-02-12T09:51:33ZGrzegorz DanilukSoftPLL does not increment DelCnt when 10MHz unpluggedWhen SoftPLL is locked to 10MHz input in GrandMaster mode, then it does
not report delock (by incrementing DelCnt) when 10MHz becomes unplugged.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/12provide pstats counter to count all the frames that should be sent on a wrX i...2019-02-12T09:51:34ZGrzegorz Danilukprovide pstats counter to count all the frames that should be sent on a wrX interfaceCurrently we have RTUfwd counter, but it counts only RTU decisions for
traffic forwarded from another wrX port. We should count also frames
coming from NIC and being forwarded to given WR interface.Grzegorz DanilukGrzegorz Danilukhttps://ohwr.org/project/wr-switch-hdl/issues/13sometimes after WRS reboot one of the ports does not receive any traffic2019-02-12T09:51:34ZGrzegorz Daniluksometimes after WRS reboot one of the ports does not receive any trafficWhile testing WRS with SmartBits network tester I discovered that once
in a while after the reboot one of the endpoints does not receive any
traffic. PSTATS counters show that this port has generated one PCS
overflow event and after that there are no other events.https://ohwr.org/project/wr-switch-hdl/issues/14Synchronization between endpoint status/config and RTU forwarding decision2019-02-12T09:51:35ZMaciej LipinskiSynchronization between endpoint status/config and RTU forwarding decisionThe status/config of WR port (up/down) managed/interfaced by ifconfig or
wrsw\_hal concerns only Endpoints. This information is not synchronized
with RTU. This means that RTU forwards frames to ports which are down,
if such a port is enabled RTU config register. This essentially is not
harmful if all works well. Two cases where it causes problem,
currently:
1\. autonegotiation fails - it happens sometimes with 1GbE devices. in
such case, in the end, the frames are successfully
received/sent/forwarded but port is seen by ifconfig as down
2\. Frames are forwarded to all ports, even if not all ports are up -
this means reading from swcore by all ports - it triggers "high-load"
bugs, even if not all ports are connected
It seems that the status/config of an each Endpoint should be taken into
account in the RTU forwarding decision (AND the status/config mask with
final decision of RTU)Grzegorz DanilukGrzegorz Daniluk