Commit 9bb5dc70 authored by Adam Wujek's avatar Adam Wujek 💬 Committed by Grzegorz Daniluk

docs/specifications/management: update wrs_failures

Add info about dependencies of TODO items
Signed-off-by: Adam Wujek's avatarAdam Wujek <adam.wujek@cern.ch>
parent 8993c35c
...@@ -8,7 +8,7 @@ WR network.\\ ...@@ -8,7 +8,7 @@ WR network.\\
\item {\bf \emph{PPSi} went out of \texttt{TRACK\_PHASE}} \item {\bf \emph{PPSi} went out of \texttt{TRACK\_PHASE}}
\label{fail:timing:ppsi_track_phase} \label{fail:timing:ppsi_track_phase}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on ppsi shm)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave} \item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -16,8 +16,9 @@ WR network.\\ ...@@ -16,8 +16,9 @@ WR network.\\
that means something bad has happened and switch has lost the that means something bad has happened and switch has lost the
synchronization to its Master. synchronization to its Master.
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::ppsiServoState}\\ \texttt{WR-SWITCH-MIB::ppsiServoState} \emph{(implemented as string)}\\
\texttt{WR-SWITCH-MIB::ppsiServoStateN} \texttt{WR-SWITCH-MIB::ppsiServoStateN} \emph{(not implemented, as integer)}
%ppsiServoStateN shall contain state as a integer taken from ppsi shm
\item [] \underline{Note}: we should also monitor PPSi state inside the \item [] \underline{Note}: we should also monitor PPSi state inside the
switch to build up the general WRS status word. switch to build up the general WRS status word.
\end{packed_enum} \end{packed_enum}
...@@ -25,7 +26,7 @@ WR network.\\ ...@@ -25,7 +26,7 @@ WR network.\\
\item {\bf Offset jump not compensated by Slave} \item {\bf Offset jump not compensated by Slave}
\label{fail:timing:offset_jump} \label{fail:timing:offset_jump}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on ppsi shm)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave} \item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -33,14 +34,15 @@ WR network.\\ ...@@ -33,14 +34,15 @@ WR network.\\
lost the link to its Master higher in the hierarchy or to external lost the link to its Master higher in the hierarchy or to external
clock), but Slave switch does not follow the jump. clock), but Slave switch does not follow the jump.
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::ppsiClockOffsetPs} \texttt{WR-SWITCH-MIB::ppsiClockOffsetPs} \emph{(implemented)}\\
\item [] \underline{Note}: add also 32-bit signed value of the offset. \texttt{WR-SWITCH-MIB::ppsiClockOffsetPsHR} \emph{(not implemented, 32 bit signed human readable value)}
\item [] \underline{Note}: add also 32-bit signed value of the offset. With saturation on overflow.
\end{packed_enum} \end{packed_enum}
\item {\bf Detected jump in the RTT value calculated by \emph{PPSI}} \item {\bf Detected jump in the RTT value calculated by \emph{PPSI}}
\label{fail:timing:rtt_jump} \label{fail:timing:rtt_jump}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave} \item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -58,7 +60,7 @@ WR network.\\ ...@@ -58,7 +60,7 @@ WR network.\\
$\Delta_{RXS}$ values are reported to the \emph{PPSi} daemon} $\Delta_{RXS}$ values are reported to the \emph{PPSi} daemon}
\label{fail:timing:deltas_report} \label{fail:timing:deltas_report}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on ppsi shm)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -66,7 +68,7 @@ WR network.\\ ...@@ -66,7 +68,7 @@ WR network.\\
it won't be able to calculate a proper Master-to-Slave delay. Although it won't be able to calculate a proper Master-to-Slave delay. Although
the estimated offset in \emph{PPSi} is close to 0, WRS won't be the estimated offset in \emph{PPSi} is close to 0, WRS won't be
synchronized to Master with the sub-nanosecond accuracy. synchronized to Master with the sub-nanosecond accuracy.
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}: \emph{(not implemented)}\\
\texttt{WR-SWITCH-MIB::ppsiDeltaTxM.<n>}\\ \texttt{WR-SWITCH-MIB::ppsiDeltaTxM.<n>}\\
\texttt{WR-SWITCH-MIB::ppsiDeltaRxM.<n>}\\ \texttt{WR-SWITCH-MIB::ppsiDeltaRxM.<n>}\\
\texttt{WR-SWITCH-MIB::ppsiDeltaTxS.<n>}\\ \texttt{WR-SWITCH-MIB::ppsiDeltaTxS.<n>}\\
...@@ -76,7 +78,7 @@ WR network.\\ ...@@ -76,7 +78,7 @@ WR network.\\
\item {\bf \emph{SoftPLL} became unlocked} \item {\bf \emph{SoftPLL} became unlocked}
\label{fail:timing:spll_unlock} \label{fail:timing:spll_unlock}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -89,7 +91,7 @@ WR network.\\ ...@@ -89,7 +91,7 @@ WR network.\\
clock down. In that case, the switch goes into Free-running mode and clock down. In that case, the switch goes into Free-running mode and
resets WR time. Later we will have a holdover to keep the Grand Master resets WR time. Later we will have a holdover to keep the Grand Master
switch disciplined in case it loses external reference. switch disciplined in case it loses external reference.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)} \item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
\texttt{WR-SWITCH-MIB::spllMode}\\ \texttt{WR-SWITCH-MIB::spllMode}\\
\texttt{WR-SWITCH-MIB::spllSeqState}\\ \texttt{WR-SWITCH-MIB::spllSeqState}\\
\texttt{WR-SWITCH-MIB::spllAlignState}\\ \texttt{WR-SWITCH-MIB::spllAlignState}\\
...@@ -104,7 +106,7 @@ WR network.\\ ...@@ -104,7 +106,7 @@ WR network.\\
\item {\bf \emph{SoftPLL} has crashed/restarted} \item {\bf \emph{SoftPLL} has crashed/restarted}
\label{fail:timing:spll_crash} \label{fail:timing:spll_crash}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read), (require changes in lm32 software)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -112,7 +114,7 @@ WR network.\\ ...@@ -112,7 +114,7 @@ WR network.\\
either reseted or random (if for some reason variables were overwritten either reseted or random (if for some reason variables were overwritten
with junk values). In such case PLL becomes unlocked and switch is not with junk values). In such case PLL becomes unlocked and switch is not
able to provide synchronization to other devices. able to provide synchronization to other devices.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)} \item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
\texttt{WR-SWITCH-MIB::spllIrqCnt} \texttt{WR-SWITCH-MIB::spllIrqCnt}
\item [] \underline{Note}: we need to have a similar mechanism as in the \item [] \underline{Note}: we need to have a similar mechanism as in the
\emph{wrpc-sw} to detect if the LM32 program has restarted because of \emph{wrpc-sw} to detect if the LM32 program has restarted because of
...@@ -140,7 +142,7 @@ WR network.\\ ...@@ -140,7 +142,7 @@ WR network.\\
\item {\bf PTP frames don't reach ARM} \item {\bf PTP frames don't reach ARM}
\label{fail:timing:no_frames} \label{fail:timing:no_frames}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on ppsi shm?)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -154,11 +156,11 @@ WR network.\\ ...@@ -154,11 +156,11 @@ WR network.\\
\item \emph{wr\_nic.ko} driver crash \item \emph{wr\_nic.ko} driver crash
\item wrong VLANs configuration \item wrong VLANs configuration
\end{itemize} \end{itemize}
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::portPtpTxFrames.<n>}\\ \texttt{WR-SWITCH-MIB::portPtpTxFrames.<n>} \emph{(not implemented)}\\
\texttt{WR-SWITCH-MIB::portPtpRxFrames.<n>}\\ \texttt{WR-SWITCH-MIB::portPtpRxFrames.<n>} \emph{(not implemented)}\\
\texttt{WR-SWITCH-MIB::portLink.<n>}\\ \texttt{WR-SWITCH-MIB::portLink.<n>} \emph{(implemented)}\\
\texttt{WR-SWITCH-MIB::portMode.<n>} \texttt{WR-SWITCH-MIB::portMode.<n>} \emph{(implemented)}
\item [] \underline{Note}: If the kernel driver crashes, there is not much \item [] \underline{Note}: If the kernel driver crashes, there is not much
we can do. We end up with either our system frozen or a reboot. For we can do. We end up with either our system frozen or a reboot. For
wrong VLAN configuration and HDL problems we can monitor if PTP frames wrong VLAN configuration and HDL problems we can monitor if PTP frames
...@@ -183,13 +185,14 @@ WR network.\\ ...@@ -183,13 +185,14 @@ WR network.\\
Despite \emph{PPSi} offset being close to 0 \emph{ps}, the device won't Despite \emph{PPSi} offset being close to 0 \emph{ps}, the device won't
be properly synchronized. be properly synchronized.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\ \item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
\texttt{WR-SWITCH-MIB::portSfpID.<n>}\\ \texttt{WR-SWITCH-MIB::portSfpID.<n>} \emph{(info available via hal shm)}\\
\texttt{WR-SWITCH-MIB::portSfpInDB.<n>} \texttt{WR-SWITCH-MIB::portSfpInDB.<n>}\\
\texttt{WR-SWITCH-MIB::portSfpGbE.<n>}
\item [] \underline{Note}: HAL should provide this info to the shared \item [] \underline{Note}: HAL should provide this info to the shared
memory, so in general we should build a per-port structure describing memory, so in general we should build a per-port structure describing
SFP info: SFP info:
\begin{packed_items} \begin{packed_items}
\item SFP ID (e.g. AXGE-1254-0531) \item SFP ID (e.g. AXGE-1254-0531) \emph{(available via halshm)}
\item Matched ? (to SFP database entries) \item Matched ? (to SFP database entries)
\item Gigabit ? (based on supported speeds read from i2c eeprom) \item Gigabit ? (based on supported speeds read from i2c eeprom)
\end{packed_items} \end{packed_items}
...@@ -205,7 +208,7 @@ WR network.\\ ...@@ -205,7 +208,7 @@ WR network.\\
\item {\bf \emph{PPSi} process has crashed/restarted} \item {\bf \emph{PPSi} process has crashed/restarted}
\label{fail:timing:ppsi_crash} \label{fail:timing:ppsi_crash}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on monit)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -213,9 +216,9 @@ WR network.\\ ...@@ -213,9 +216,9 @@ WR network.\\
capabilities. If, in the future, we will have another process that could capabilities. If, in the future, we will have another process that could
bring \emph{PPSi} back to live, such a restart would still create a time bring \emph{PPSi} back to live, such a restart would still create a time
jump and has to be reported. jump and has to be reported.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::ppsiBootCnt}\\ \texttt{WR-SWITCH-MIB::ppsiRunCnt} \emph{(not implemented)}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
\item [] \underline{Note}: list of the processes has to be monitored, if \item [] \underline{Note}: list of the processes has to be monitored, if
\emph{PPSi} is there and if its PID has changed (it was restarted). \emph{PPSi} is there and if its PID has changed (it was restarted).
\end{packed_enum} \end{packed_enum}
...@@ -223,7 +226,7 @@ WR network.\\ ...@@ -223,7 +226,7 @@ WR network.\\
\item {\bf \emph{HAL} process has crashed/restarted} \item {\bf \emph{HAL} process has crashed/restarted}
\label{fail:timing:hal_crash} \label{fail:timing:hal_crash}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on monit)}
\item [] \underline{Severity}: WARNING (but only after we modify PPSi so \item [] \underline{Severity}: WARNING (but only after we modify PPSi so
it reconnects to HAL, and HAL does not re-initialize SoftPLL after it reconnects to HAL, and HAL does not re-initialize SoftPLL after
crash) crash)
...@@ -232,7 +235,9 @@ WR network.\\ ...@@ -232,7 +235,9 @@ WR network.\\
If \emph{HAL} crashes, \emph{PPSi} is not able to communicate with If \emph{HAL} crashes, \emph{PPSi} is not able to communicate with
hardware i.e. read phase shift, get timestamps, phase shift the clock hardware i.e. read phase shift, get timestamps, phase shift the clock
etc. etc.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)} \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::halRunCnt} \emph{(not implemented)}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
\item [] \underline{Note}: list of processes has to be monitored, if \item [] \underline{Note}: list of processes has to be monitored, if
\emph{wrsw\_hal} is there and if its PID has changed (it was restarted). \emph{wrsw\_hal} is there and if its PID has changed (it was restarted).
\end{packed_enum} \end{packed_enum}
...@@ -240,7 +245,7 @@ WR network.\\ ...@@ -240,7 +245,7 @@ WR network.\\
\item {\bf Wrong configuration applied} \item {\bf Wrong configuration applied}
\label{fail:timing:wrong_config} \label{fail:timing:wrong_config}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(to be done later)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Mode}: \emph{all} \item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -305,7 +310,7 @@ between devices connected to the ports.\\ ...@@ -305,7 +310,7 @@ between devices connected to the ports.\\
\item {\bf Link down} \item {\bf Link down}
\label{fail:data:link_down} \label{fail:data:link_down}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: DONE \emph{(to be changed later for switchover)}
\item [] \underline{Severity}: ERROR (will be WARNING with the \item [] \underline{Severity}: ERROR (will be WARNING with the
switch-over) switch-over)
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
...@@ -347,7 +352,7 @@ between devices connected to the ports.\\ ...@@ -347,7 +352,7 @@ between devices connected to the ports.\\
\item {\bf Problem with the \emph{SwCore} or Endpoint HDL module} \item {\bf Problem with the \emph{SwCore} or Endpoint HDL module}
\label{fail:data:swcore_hang} \label{fail:data:swcore_hang}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on HDL, then hal?)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If any of these HDL modules hangs, there is usually not much the user If any of these HDL modules hangs, there is usually not much the user
...@@ -377,7 +382,7 @@ between devices connected to the ports.\\ ...@@ -377,7 +382,7 @@ between devices connected to the ports.\\
\item {\bf RTU is full and cannot accept more requests} \item {\bf RTU is full and cannot accept more requests}
\label{fail:data:rtu_full} \label{fail:data:rtu_full}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on HDL)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If RTU is full for a given port, it's not able to accept more requests If RTU is full for a given port, it's not able to accept more requests
...@@ -393,7 +398,7 @@ between devices connected to the ports.\\ ...@@ -393,7 +398,7 @@ between devices connected to the ports.\\
\item {\bf Too much HP traffic / Per-priority queue full} \item {\bf Too much HP traffic / Per-priority queue full}
\label{fail:data:too_much_HP} \label{fail:data:too_much_HP}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on HDL)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If we get too much High Priority traffic, then SwCore will be busy all If we get too much High Priority traffic, then SwCore will be busy all
...@@ -401,7 +406,7 @@ between devices connected to the ports.\\ ...@@ -401,7 +406,7 @@ between devices connected to the ports.\\
won't be flowing through the switch. In the extreme case, HP traffic won't be flowing through the switch. In the extreme case, HP traffic
queue may become full and we start losing HP frames, which is queue may become full and we start losing HP frames, which is
unacceptable. unacceptable.
\item [] \underline{SNMP objects}: \emph{(not fully implemented)}\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::pstatsWR<n>.33} - HP frames on a port\\ \texttt{WR-SWITCH-MIB::pstatsWR<n>.33} - HP frames on a port\\
\texttt{WR-SWITCH-MIB::pstatsWR<n>.20} - Total number of Rx frames on \texttt{WR-SWITCH-MIB::pstatsWR<n>.20} - Total number of Rx frames on
the port\\ the port\\
...@@ -414,7 +419,7 @@ between devices connected to the ports.\\ ...@@ -414,7 +419,7 @@ between devices connected to the ports.\\
\item {\bf \emph{RTUd} has crashed} \item {\bf \emph{RTUd} has crashed}
\label{fail:data:rtu_crash} \label{fail:data:rtu_crash}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on monit)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If RTUd crashed, traffic would be still routed between the WRS ports, but If RTUd crashed, traffic would be still routed between the WRS ports, but
...@@ -423,7 +428,9 @@ between devices connected to the ports.\\ ...@@ -423,7 +428,9 @@ between devices connected to the ports.\\
removed from the RTU table if a device is disconnected from port. Since removed from the RTU table if a device is disconnected from port. Since
there would be no learning, each frame with yet unknown destination MAC there would be no learning, each frame with yet unknown destination MAC
will be broadcast to all ports (within a VLAN). will be broadcast to all ports (within a VLAN).
\item [] \underline{SNMP objects}: \emph{(not yet implemented)} \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::rtuRunCnt} \emph{(not implemented)}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
\item [] \underline{Note}: the list of processes has to be monitored, if \item [] \underline{Note}: the list of processes has to be monitored, if
\emph{RTUd} is there and if its PID has changed (it was restarted). \emph{RTUd} is there and if its PID has changed (it was restarted).
\end{packed_enum} \end{packed_enum}
...@@ -431,7 +438,7 @@ between devices connected to the ports.\\ ...@@ -431,7 +438,7 @@ between devices connected to the ports.\\
\item {\bf Network loop - two or more identical MACs on two or more ports} \item {\bf Network loop - two or more identical MACs on two or more ports}
\label{fail:data:net_loop} \label{fail:data:net_loop}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(to be done later)}
\item [] \underline{Severity}: ERROR \item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
In such case we have a ping-pong situation. If two ports receive frames In such case we have a ping-pong situation. If two ports receive frames
...@@ -447,7 +454,7 @@ between devices connected to the ports.\\ ...@@ -447,7 +454,7 @@ between devices connected to the ports.\\
\item {\bf Wrong configuration applied (e.g. wrong VLAN config)} \item {\bf Wrong configuration applied (e.g. wrong VLAN config)}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(to be done later)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
The same problem as described in the timing fault The same problem as described in the timing fault
...@@ -508,7 +515,7 @@ between devices connected to the ports.\\ ...@@ -508,7 +515,7 @@ between devices connected to the ports.\\
\item {\bf Any userspace daemon has crashed/restarted} \item {\bf Any userspace daemon has crashed/restarted}
\label{fail:other:daemon_crash} \label{fail:other:daemon_crash}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on monit)}
\item [] \underline{Severity}: ERROR / WARNING (depending on the process) \item [] \underline{Severity}: ERROR / WARNING (depending on the process)
\item [] \underline{Description}: \item [] \underline{Description}:
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}:\\
...@@ -595,7 +602,7 @@ wrs-192.168.16.242# devmem 0xfffffd04 ...@@ -595,7 +602,7 @@ wrs-192.168.16.242# devmem 0xfffffd04
\item {\bf System nearly out of memory} \item {\bf System nearly out of memory}
\label{fail:other:no_mem} \label{fail:other:no_mem}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(DONE?, create new object to report if error?)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \item [] \underline{Description}:
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}:\\
...@@ -612,11 +619,16 @@ wrs-192.168.16.242# devmem 0xfffffd04 ...@@ -612,11 +619,16 @@ wrs-192.168.16.242# devmem 0xfffffd04
\item {\bf CPU load too high} \item {\bf CPU load too high}
\label{fail:other:cpu} \label{fail:other:cpu}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(DONE?)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \item [] \underline{Description}:
\item [] \underline{SNMP objects}:\\ \item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::cpuLoad} \texttt{WR-SWITCH-MIB::cpuLoad} \emph{(not implemented)}\\
Can \texttt{HOST-RESOURCES-MIB::hrProcessorLoad} be used?
("The average, over the last minute, of the percentage
of time that this processor was not idle.
Implementations may approximate this one minute
smoothing period if necessary.")
\item [] \underline{Note}: similar situation as with the memory. We need \item [] \underline{Note}: similar situation as with the memory. We need
to monitor, report and alarm if CPU load is close to 100\% (but still to monitor, report and alarm if CPU load is close to 100\% (but still
enough to keep the system running). enough to keep the system running).
...@@ -625,7 +637,7 @@ wrs-192.168.16.242# devmem 0xfffffd04 ...@@ -625,7 +637,7 @@ wrs-192.168.16.242# devmem 0xfffffd04
\item {\bf Temperature inside the box too high} \item {\bf Temperature inside the box too high}
\label{fail:other:temp} \label{fail:other:temp}
\begin{packed_enum} \begin{packed_enum}
\item [] \underline{Status}: TODO \item [] \underline{Status}: TODO \emph{(depends on HDL)}
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If the temperature raises too high we might break our electronics inside If the temperature raises too high we might break our electronics inside
...@@ -639,7 +651,7 @@ wrs-192.168.16.242# devmem 0xfffffd04 ...@@ -639,7 +651,7 @@ wrs-192.168.16.242# devmem 0xfffffd04
\item \emph{IC18} - temperature near the VCXO and PLLs (AD9516, \item \emph{IC18} - temperature near the VCXO and PLLs (AD9516,
CDCM6100) CDCM6100)
\end{itemize} \end{itemize}
\item [] \underline{SNMP objects}: \emph{(not yet implemented)} \item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
\texttt{WR-SWITCH-MIB::tempFPGA}\\ \texttt{WR-SWITCH-MIB::tempFPGA}\\
\texttt{WR-SWITCH-MIB::tempScbPsu.1}\\ \texttt{WR-SWITCH-MIB::tempScbPsu.1}\\
\texttt{WR-SWITCH-MIB::tempScbPsu.2}\\ \texttt{WR-SWITCH-MIB::tempScbPsu.2}\\
...@@ -657,7 +669,7 @@ wrs-192.168.16.242# devmem 0xfffffd04 ...@@ -657,7 +669,7 @@ wrs-192.168.16.242# devmem 0xfffffd04
\item [] \underline{Severity}: WARNING \item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\ \item [] \underline{Description}:\\
If not supported Gigabit Fiber SFP is plugged into the cage, then it's a If not supported Gigabit Fiber SFP is plugged into the cage, then it's a
timing issue \ref{tail:timing:wrong_sfp}. However, if a non 1-Gb SFP is timing issue \ref{fail:timing:wrong_sfp}. However, if a non 1-Gb SFP is
used, then no Ethernet traffic would be flowing on that port. It's due used, then no Ethernet traffic would be flowing on that port. It's due
to the fact, that we don't have 10/100Mbit Ethernet implemented inside to the fact, that we don't have 10/100Mbit Ethernet implemented inside
the WRS. the WRS.
......
...@@ -300,21 +300,21 @@ not necessary that the running PTP engine is PPSi. ...@@ -300,21 +300,21 @@ not necessary that the running PTP engine is PPSi.
\vspace{12pt} \vspace{12pt}
(timing: \ref{fail:timing:ppsi_crash}, \ref{fail:timing:hal_crash}; data: (timing: \ref{fail:timing:ppsi_crash}, \ref{fail:timing:hal_crash}; data:
\ref{fail:data:rtu_crash}; other: \ref{fail:other:daemon_crash}) \ref{fail:data:rtu_crash}; other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::ppsiCrashCnt}\\ - how many times PPSi daemon \item [] \texttt{WR-SWITCH-MIB::ppsiRunCnt}\\ - how many times PPSi daemon
has crashed (timing: \ref{fail:timing:ppsi_crash}) has crashed (timing: \ref{fail:timing:ppsi_crash})
\item [] \texttt{WR-SWITCH-MIB::halCrashCnt}\\ - how many times HAL daemon \item [] \texttt{WR-SWITCH-MIB::halRunCnt}\\ - how many times HAL daemon
has crashed (timing: \ref{fail:timing:hal_crash}) has crashed (timing: \ref{fail:timing:hal_crash})
\item [] \texttt{WR-SWITCH-MIB::rtuCrashCnt}\\ - how many times RTU daemon \item [] \texttt{WR-SWITCH-MIB::rtuRunCnt}\\ - how many times RTU daemon
has crashed (data: \ref{fail:data:rtu_crash}) has crashed (data: \ref{fail:data:rtu_crash})
\item [] \texttt{WR-SWITCH-MIB::sshCrashCnt}\\ - how many times Dropbear \item [] \texttt{WR-SWITCH-MIB::sshRunCnt}\\ - how many times Dropbear
daemon has crashed (other: \ref{fail:other:daemon_crash}) daemon has crashed (other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::udhcpdCrashCnt}\\ - how many times DHCP daemon \item [] \texttt{WR-SWITCH-MIB::udhcpdRunCnt}\\ - how many times DHCP daemon
has crashed (other: \ref{fail:other:daemon_crash}) has crashed (other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::rsyslogCrashCnt}\\ - how many times rsyslog \item [] \texttt{WR-SWITCH-MIB::rsyslogRunCnt}\\ - how many times rsyslog
daemon has crashed (other: \ref{fail:other:daemon_crash}) daemon has crashed (other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::snmpdCrashCnt}\\ - how many times SNMP daemon \item [] \texttt{WR-SWITCH-MIB::snmpdRunCnt}\\ - how many times SNMP daemon
has crashed (other: \ref{fail:other:daemon_crash}) has crashed (other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::httpdCrashCnt}\\ - how many times HTTPd daemon \item [] \texttt{WR-SWITCH-MIB::httpdRunCnt}\\ - how many times HTTPd daemon
has crashed (other: \ref{fail:other:daemon_crash}) has crashed (other: \ref{fail:other:daemon_crash})
\item [] \texttt{WR-SWITCH-MIB::sysCnfDate}\\ - TAI seconds when last \item [] \texttt{WR-SWITCH-MIB::sysCnfDate}\\ - TAI seconds when last
time the configuration was changed time the configuration was changed
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment