Commit f7aea4ee authored by Adam Wujek's avatar Adam Wujek 💬

Merge greg-failures branch into master

Improve doc/wrs_failures document.
Signed-off-by: Adam Wujek's avatarAdam Wujek <adam.wujek@cern.ch>
parents dea09a23 065bef73
......@@ -4,10 +4,14 @@ all : wrs_failures.pdf
RELEASE = $(shell git describe --always --dirty)
wrs_failures.pdf : wrs_failures.tex fail.tex intro.tex snmp_exports.tex
wrs_failures.pdf : wrs_failures.tex fail.tex intro.tex snmp_exports.tex snmp_objects.tex
@echo '\\newcommand{\\gitrevinfo}{'$(RELEASE)'}' > revinfo.tex
pdflatex wrs_failures.tex
pdflatex wrs_failures.tex
# To speed up generation of document for development, please comment out:
# % print alphabetical list
# \printnoidxglossary[type=snmp_all,style=tree,sort=letter]
# from doc/wrs_failures/snmp_exports.tex file.
clean :
rm -f *.eps *.dat *.log *.out *.aux *.dvi *.ps *.toc *.pdf revinfo.tex
......
This section presents an example how a problem could be diagnosed and
appropriate procedure applied by the operator of a WR Switch based on the
general status objects described in section \ref{sec:snmp_exports:basic}. The
screenshots included in this example were made from \emph{Diamon} tool used at
CERN for diagnostics. Any other SNMP manager (like \emph{Nagios} or
\emph{Icinga}) can be used to fetch the value of these status objects.
\begin{enumerate}
\item Operator gets an e-mail/sms alarm or notices in the SNMP manager that
the status of a WR switch has changed to \texttt{Error}
(fig.\ref{fig:diamon:wrs_error})
\item By checking the general status objects
(\texttt{glshyperlink{WR-SWITCH-MIB::wrsOSStatus}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsTimingStatus}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsNetworkingStatus}}) one can realize
that the problem is reported by the synchronization subsystem. The value of
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsTimingStatus}} is \emph{2}= error
(fig.\ref{fig:diamon:wrs_sync_error}).
\item Following the tree structure of status objects from figure
\ref{fig:snmp_oper}, if
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsTimingStatus}} reports an error,
then status objects: \texttt{\glshyperlink{WR-SWITCH-MIB::wrsPTPStatus}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSoftPLLStatus}},\\
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSlaveLinksStatus}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPTPFramesFlowing}} should be
checked. In this example
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSlaveLinksStatus}} reports an error
(fig.\ref{fig:diamon:slave_link_error}).
\item The operator should search section \ref{sec:snmp_exports:basic} for
procedure to follow when
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSlaveLinksStatus}} reports an error.
\item In this example, the WR Switch that reports a problem works in the
Boundary Clock mode, which means that the first step according to the
procedure should be checking a fiber connection on the slave port.
\item Plugging the fiber to the slave port fixes the problem and WR Switch
does not report more errors (fig.\ref{fig:diamon:wrs_ok}).
\end{enumerate}
\begin{figure}
\begin{center}
\includegraphics[width=.9\textwidth]{img/wrs_error.png}
\caption{SNMP manager reports an error on a WR Switch}
\label{fig:diamon:wrs_error}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=.9\textwidth]{img/wrs_sync_error.png}
\caption{WR Switch has problem with the synchronization subsystem}
\label{fig:diamon:wrs_sync_error}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=.9\textwidth]{img/wrs_link_error.png}
\caption{\texttt{wrsSlaveLinksStatus} object reports an error}
\label{fig:diamon:slave_link_error}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=.9\textwidth]{img/wrs_ok.png}
\caption{WR Switch does not report any errors}
\label{fig:diamon:wrs_ok}
\end{center}
\end{figure}
This section tries to identify all the possible ways the White Rabbit Switch can
fail. The structure of each error description is the following:
\begin{itemize}[leftmargin=0pt]
\item [] \underline{Status}: describes the implementation status of the WRS
diagnostics detecting the fault. Can be one of the following:
\begin{packed_items}
\item DONE: all the SNMP objects are implemented and the problem is
reported by a switch
\item TODO: not all of the SNMP objects are already implemented, the
problem is either reported only in some situations or not reported at
all
\item \emph{for later}: the problem concerns functionality that is not yet
present in the stable release of the WR switch firmware i.e. it will
never happen with the current stable firmware release.
\end{packed_items}
\item [] \underline{Severity}: describes how critical is the fault. Currently
we distinguish two severity levels:
\begin{packed_items}
\item WARNING - means that despite the fault the synchronization and
Ethernet switching functionality were not affected so the switch behaves
correctly in the WR network.
\item ERROR - means that the fault is critical and most probably a WR
switch misbehaves in a WR network, possibly causing also problems to
other WR devices connected to this switch.
\end{packed_items}
\item [] \underline{Mode}: for timing failures, it describes which modes are
affected. Possible values are:
\begin{packed_items}
\item \emph{Boundary Clock} - the WR Switch has at least one Slave port
synchronized to another WR device higher in the timing hierarchy (though
it may be also Master to other WR/PTP devices lower in the timing
hierarchy).
\item \emph{Grand Master} - the WR Switch at the top of the
synchronization hierarchy. It is synchronized to an external clock (e.g.
GPS, Cesium) and provides timing to other WR/PTP devices.
\item \emph{Free-Running Master} - the WR Switch at the top of the
synchronization hierarchy. It provides timing to other WR/PTP devices
but runs from a local oscillator (not synchronized to an external
clock).
\item \emph{all} - any WR switch can be affected regardless the timing
mode.
\end{packed_items}
\item [] \underline{Description}: What the problem is about, how important it
is and what are the effects if it occurs.
\item [] \underline{SNMP objects}: Which SNMP objects should be monitored to
detect the failure. These may be objects from \texttt{WR-SWITCH-MIB} or one
of the standard MIBs used by the \emph{net-snmp}.
\item [] \underline{Notes}: Optional comment for the SNMP implementation. It
may describe the current implementation of ideas or how to implement it in
the future.
\end{itemize}
\subsection{Timing error}
As a timing error we define WR Switch not being able to provide its slave
\label{sec:timing_fail}
As a timing error we define the WR Switch not being able to provide its slave
nodes/switches with correct timing information consistent with the rest of the
WR network.
......@@ -7,168 +61,191 @@ WR network.
\subsubsection{\bf \emph{PTP/PPSi} went out of \texttt{TRACK\_PHASE}}
\label{fail:timing:ppsi_track_phase}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Mode}: \emph{Boundary Clock}
\item [] \underline{Description}:\\
If the \emph{PTP/PPSi} WR servo goes out of the \texttt{TRACK\_PHASE}
state, this means something bad has happened and switch lost the
state, this means something bad has happened and the switch lost the
synchronization to its Master.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPtpServoState.<n>} -- PTP servo state as string\\
\texttt{WR-SWITCH-MIB::wrsPtpServoStateN.<n>} -- PTP servo state as number\\
\texttt{WR-SWITCH-MIB::wrsPtpServoStateErrCnt}\\
\texttt{WR-SWITCH-MIB::wrsPTPStatus}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPtpServoState.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpServoStateN.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpServoStateErrCnt.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPTPStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: PTP servo state is exported as a string and a number.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Offset jump not compensated by Slave}
\label{fail:timing:offset_jump}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Mode}: \emph{Boundary Clock}
\item [] \underline{Description}:\\
This may happen if Master resets its WR time counters (e.g. because it
lost the link to its Master higher in the hierarchy or to external
clock), but Slave switch does not follow the jump.
This may happen if the Master resets its WR time counters (e.g. because
it lost the link to its Master higher in the hierarchy or to external
clock), but the WR Slave does not follow the jump.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPtpClockOffsetPs.<n>} -- value of the offset in ps\\
\texttt{WR-SWITCH-MIB::wrsPtpClockOffsetPsHR.<n>} -- 32-bit signed value of the offset in ps; with
saturation on overflow and underflow\\
\texttt{WR-SWITCH-MIB::wrsPtpClockOffsetErrCnt}\\
\texttt{WR-SWITCH-MIB::wrsPTPStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPtpClockOffsetPs.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpClockOffsetPsHR.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpClockOffsetErrCnt.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPTPStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Detected jump in the RTT value calculated by \emph{PTP/PPSi}}
\label{fail:timing:rtt_jump}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Mode}: \emph{Boundary Clock}
\item [] \underline{Description}:\\
Once WR link is established round-trip delay (RTT) can change smoothly
due to the temperature variations. If a sudden jump is detected, that
means erroneous timestamp was generated either on Master or Slave side.
Once a WR link is established the round-trip delay (RTT) can change
smoothly due to the temperature variations. However, if a sudden jump is
detected, that means that an erroneous timestamp was generated either on
the Master or the Slave side.
One cause of that could be the wrong value of the t24p transition point.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPtpRTT.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPtpRTTErrCnt}\\
\texttt{WR-SWITCH-MIB::wrsPTPStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPtpRTT.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpRTTErrCnt.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPTPStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Wrong $\Delta_{TXM}$, $\Delta_{RXM}$, $\Delta_{TXS}$,
$\Delta_{RXS}$ values are reported to the \emph{PTP/PPSi} daemon}
\label{fail:timing:deltas_report}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\
If \emph{PTP/PPSi} doesn't get the correct values of fixed hardware delays,
it won't be able to calculate a proper Master-to-Slave delay. Although
the estimated offset in \emph{PTP/PPSi} is close to 0, WRS won't be
synchronized to Master with the sub-nanosecond accuracy.
the estimated offset in \emph{PTP/PPSi} is close to 0, the WRS won't be
synchronized to the Master with the sub-nanosecond accuracy.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPtpDeltaTxM.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPtpDeltaRxM.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPtpDeltaTxS.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPtpDeltaRxS.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPTPStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPtpDeltaTxM.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpDeltaRxM.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpDeltaTxS.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPtpDeltaRxS.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPTPStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf \emph{SoftPLL} became unlocked}
\label{fail:timing:spll_unlock}
\begin{packed_enum}
\item [] \underline{Status}: DONE
\begin{pck_descr}
\item [] \underline{Status}: DONE (to be improved with holdover)
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\
If \emph{SoftPLL} loses lock, for any reason, Slave or Grand Master
switch can no longer be syntonized and phase aligned with its time
source. WRS in Free-running mode without properly locked Helper PLL is
not able to perform reliable phase measurements for enhancing Rx
timestamps resolution. For Grand Master the reason of \emph{SoftPLL}
going out of lock might be disconnected 1-PPS/10MHz signals or external
clock down. In that case, the switch goes into Free-running mode and
resets WR time. Later we will have a holdover to keep the Grand Master
switch disciplined in case it loses external reference.
If the \emph{SoftPLL} loses lock, for any reason, Boundary Clock or
Grand Master switch can no longer be syntonized and phase aligned with
its time source. WRS in Free-running mode without properly locked Helper
PLL is not able to perform reliable phase measurements for enhancing Rx
timestamps resolution. For a Grand Master the reason of \emph{SoftPLL}
going out of lock might be disconnected 1-PPS/10MHz signals or that the
external clock is down. In that case, the switch goes into Free-running
mode and resets the WR time. Later we will have a holdover to keep the
Grand Master switch disciplined in case it loses external reference.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsSpllMode}\\
\texttt{WR-SWITCH-MIB::wrsSpllSeqState}\\
\texttt{WR-SWITCH-MIB::wrsSpllAlignState}\\
\texttt{WR-SWITCH-MIB::wrsSpllHlock}\\
\texttt{WR-SWITCH-MIB::wrsSpllMlock}\\
\texttt{WR-SWITCH-MIB::wrsSpllDelCnt}\\
\texttt{WR-SWITCH-MIB::wrsSoftPLLStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsSpllMode}\\
\snmpadd{WR-SWITCH-MIB::wrsSpllSeqState}\\
\snmpadd{WR-SWITCH-MIB::wrsSpllAlignState}\\
\snmpadd{WR-SWITCH-MIB::wrsSpllHlock}\\
\snmpadd{WR-SWITCH-MIB::wrsSpllMlock}\\
\snmpadd{WR-SWITCH-MIB::wrsSpllDelCnt}\\
\snmpadd{WR-SWITCH-MIB::wrsSoftPLLStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf \emph{SoftPLL} has crashed/restarted}
\label{fail:timing:spll_crash}
\begin{packed_enum}
\item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read), (require changes in lm32 software)}
\begin{pck_descr}
\item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read), (requires changes in lm32 software)}
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\
If LM32 software crashes or restarts for some reason, its state may be
either reseted or random (if for some reason variables were overwritten
with junk values). In such case PLL becomes unlocked and switch is not
If the LM32 software crashes or restarts for some reason, its state may
be either reset or random (if for some reason variables were overwritten
with junk values). In such case, PLL becomes unlocked and switch is not
able to provide synchronization to other devices.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsSpllIrqCnt}\\
\texttt{WR-SWITCH-MIB::wrsStartCntSPLL} \emph{(not yet implemented)}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsSpllIrqCnt}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntSPLL} }
\item [] \underline{Note}: We have a similar mechanism as in the
\emph{wrpc-sw} to detect if the LM32 program has restarted because of
the CPU following a NULL pointer. However, LM32 program hangs on
re-initialization phase.
In addition to that, we can detect if
\emph{SoftPLL} is hanging (but not restarted) based on irq counter.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Link to WR Master is down for slave}
\label{fail:timing:master_down}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR (will become WARNING with the
switch-over)
\item [] \underline{Mode}: \emph{Slave}
\item [] \underline{Mode}: \emph{Boundary Clock}
\item [] \underline{Description}:\\
In that case, WR Switch loses timing reference, resets counters
responsible for keeping the WR time, and starts operating in a
Free-Running Master mode.
If a Boundary Clock switch loses the link on its Slave port, the timing
reference is lost. The switch resets counters responsible for keeping
the WR time, and starts operating in a Free-Running Master mode.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\texttt{WR-SWITCH-MIB::wrsSlaveLinksStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSlaveLinksStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Link to WR Master is up for master}
\label{fail:timing:master_up}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Grand Master}, \emph{Free-Running Master}
\item [] \underline{Description}:\\
In that case there is probably wrong configuration. Neither the
In that case there is probably a wrong configuration. Neither the
\emph{Grand Master} nor the \emph{Free-Running Master} should be
connected to another WR Master.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\texttt{WR-SWITCH-MIB::wrsSlaveLinksStatus}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSlaveLinksStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf PTP frames don't reach ARM}
\label{fail:timing:no_frames}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\
In this case, \emph{PTP/PPSi} will fail to stay synchronized and provide
synchronization. Even if WR servo is in the \texttt{TRACK\_PHASE} state,
it calculates new phase shift based on the Master-to-Slave delay
synchronization. Even if the WR servo is in the \texttt{TRACK\_PHASE}
state, it calculates a new phase shift based on the Master-to-Slave delay
variations. To calculate these variations, it still needs timestamped
PTP frames flowing. There could be several causes of such fault:
\begin{itemize}
......@@ -177,11 +254,14 @@ WR network.
\item wrong VLANs configuration
\end{itemize}
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPortStatusPtpTxFrames.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusPtpRxFrames.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPTPFramesFlowing}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPortStatusPtpTxFrames.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusPtpRxFrames.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPTPFramesFlowing}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: If the kernel driver crashes, there is not much
we can do. We end up with either our system frozen or a reboot. For
wrong VLAN configuration and HDL problems we can monitor if PTP frames
......@@ -190,11 +270,11 @@ WR network.
status (up/down). If VLANs are mis configured, we don't receive PTP
frames, but the link is still up. This could let us distinguish from a
lack of frames due to the link down (which is a separate issue).
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Detected SFP not supported for WR timing}
\label{fail:timing:wrong_sfp}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
......@@ -206,25 +286,28 @@ WR network.
Despite \emph{PTP/PPSi} offset being close to 0 \emph{ps}, the device won't
be properly synchronized.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpVN.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpPN.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpVS.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpInDB.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpGbE.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}\\
\texttt{WR-SWITCH-MIB::wrsSFPsStatus}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPortStatusConfiguredMode.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpVN.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpPN.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpVS.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpInDB.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpGbE.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSFPsStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: WRS configuration allow to disable this check on some ports.
That is because ports may be used for regular (non-WR) PTP
synchronization or for data transfer only (no timing). In that case any
Gigabit SFP can be used (also copper). Detecting if a non-Gigabit
Ethernet SFP is plugged into the cage is covered in a separate issue
\ref{fail:other:sfp}.
\end{packed_enum}
Ethernet SFP is plugged into the cage is covered in issue
\ref{fail:other:sfp}.
\end{pck_descr}
\subsubsection{\bf \emph{PTP/PPSi} process has crashed/restarted}
\label{fail:timing:ppsi_crash}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{all}
......@@ -233,14 +316,18 @@ WR network.
capabilities. Then \texttt{Monit} restarts the missing process.
The number of process starts is stored in a corresponding object.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsStartCntPTP}\\
\texttt{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsStartCntPTP}\\
\snmpadd{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\snmpadd{HOST-RESOURCES-MIB::hrSWRunName.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf \emph{HAL} process has crashed/restarted}
\label{fail:timing:hal_crash}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Mode}: \emph{all}
......@@ -249,15 +336,19 @@ WR network.
the hardware i.e. read phase shift, get timestamps, phase shift the
clock etc. When \emph{HAL} crashes, \texttt{Monit} will restart it.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsStartCntHAL}\\
\texttt{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsStartCntHAL}\\
\snmpadd{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\snmpadd{HOST-RESOURCES-MIB::hrSWRunName.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Wrong configuration applied}
\label{fail:timing:wrong_config}
\begin{packed_enum}
\item [] \underline{Status}: TODO \emph{(to be done later)}
\begin{pck_descr}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: WARNING
\item [] \underline{Mode}: \emph{all}
\item [] \underline{Description}:\\
......@@ -271,19 +362,22 @@ WR network.
For misconfigured VLANs, we can monitor if PTP frames are flowing on
Slave port(s) of the switch.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\item [] \underline{Note}: monitor remote updates of key configuration
options (PTP/WR mode, fixed hardware delays)
\end{packed_enum}
\item [] \underline{Note}: When a new configuration file is fetched on
boot time, compare it with a previously used config (the whole file,
but especially timing-critical fields like PTP/WR mode, fixed hardware
delays). Report using the Syslog (\emph{info}/\emph{warning}) if the
configuration has changed.
\end{pck_descr}
\subsubsection{\bf Switchover failed}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: ERROR
\item [] \underline{Mode}: \emph{Slave}, \emph{Grand Master}
\item [] \underline{Mode}: \emph{Boundary Clock}, \emph{Grand Master}
\item [] \underline{Description}: \emph{(not yet implemented)}\\
In case the primary timing link breaks, switchover is responsible for
seamless switching to the backup one to keep the device in sync. If WRS
operates in a \emph{Slave} mode, switchover is about switching
operates in a \emph{Boundary Clock} mode, switchover is about switching
between two (or more) WR links to one or multiple WR Masters. If it
operates in a \emph{Grand Master} mode, it is about broken/lost
connection to an external reference and switching to a backup WR link
......@@ -294,10 +388,10 @@ WR network.
\item [] \underline{Note}: we should probably use parameters reported by
the backup channel(s) of the SoftPLL and the backup PTP servo to be able
to detect and report that something went wrong.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Holdover for too long}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Mode}: \emph{Grand Master}
......@@ -307,41 +401,44 @@ WR network.
reference too much. All devices in a WR network will be still
synchronized, but no longer in sync with the external reference.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{packed_enum}
\end{pck_descr}
\newpage
\subsection{Data error}
As a data error we define WR Switch not being able to forward Ethernet traffic
between devices connected to the ports.\\
\noindent This section contains the list of faults leading to a data error.
When the WR switch is not able to forward Ethernet traffic between devices
connected to the ports, we consider this a data error. This section contains the
list of faults leading to a data error.
\subsubsection{\bf Link down}
\label{fail:data:link_down}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE \emph{(to be changed later for switchover)}
\item [] \underline{Severity}: ERROR (will be WARNING with the
switch-over)
\item [] \underline{Description}:\\
This obviously stops the flow of frames on an Ethernet port and there is
not much we can do besides reporting an error. Topology redundancy is a
cure for that (if backup link is fine, and reconfiguration does not
cure for that (if a backup link is fine, and reconfiguration does not
fail). There might be several causes of a link down:
\begin{itemize}
\item unplugged fiber
\item broken fiber
\item broken SFP
\item wrong(non-complementary) pair of WDM SPFs used
\item wrong (non-complementary) pair of WDM SPFs used
\end{itemize}
However, we are not able to distinguish between them inside the switch.
\item [] \underline{SNMP objects}:\\
\texttt{IF-MIB::ifOperStatus.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusLink.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{IF-MIB::ifOperStatus.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusLink.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSlaveLinksStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsTimingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Fault in the Endpoint's transmission/reception path}
\label{fail:data:ep_txrx}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
......@@ -349,18 +446,22 @@ between devices connected to the ports.\\
underrun in the Tx PCS or FIFO overrun in the Rx PCS, receiving invalid
\emph{8b10b} code, CRC error etc.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPstatsTXUnderrun.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXOverrun.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXInvalidCode.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXSyncLost.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXPfilterDropped.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXPCSErrors.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXCRCErrors.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPstatsTXUnderrun.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXOverrun.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXInvalidCode.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXSyncLost.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPfilterDropped.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPCSErrors.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXCRCErrors.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsEndpointStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Problem with the SwCore or Endpoint HDL module}
\label{fail:data:swcore_hang}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: TODO (add monitoring of the Endpoint hangs, depend on
HDL)
\item [] \underline{Severity}: ERROR
......@@ -368,24 +469,28 @@ between devices connected to the ports.\\
If the SwCore is hanging, then the Ethernet forwarding is not
performed on one or multiple ports. We have a HDL watchdog module which
constantly monitors if the SwCore is not stuck. If such a situation is
detected the whole SwCore is reset, all the frames enqueued in the
Endpoints are acknowledged and lost. After this the switch can continue
detected the whole SwCore is reset, all the frames queued in the
Endpoints are acknowledged and lost. After this the switch can continue
its operation and the watchdog triggers counter is incremented.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsGwWatchdogTimeouts}\\
\texttt{WR-SWITCH-MIB::wrsPstatsTXFrames.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPstatsForwarded.<n>}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsGwWatchdogTimeouts}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsTXFrames.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPstatsForwarded.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSwcoreStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: For Endpoint monitoring we could compare
per-port \emph{RTUfwd} counter with the \emph{Tx} Endpoint counter for
each port. \emph{RTUfwd} counts all forwarding decisions from RTU to the
port $<$n$>$ (excluding PTP frames from NIC). If the sum of this number
and RTU decisions generated from NIC is equal to the number of frames
actually transmitted by the Endpoint, then everything works fine.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf RTU is full and cannot accept more requests}
\label{fail:data:rtu_full}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
......@@ -393,12 +498,16 @@ between devices connected to the ports.\\
and generate new responses. In such case frames are dropped in the
Rx path of the Endpoint.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCh-MIB::wrsPstatsRXDropRTUFull.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXDropRTUFull.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsRTUStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Too much HP traffic / Per-priority queue full}
\label{fail:data:too_much_HP}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: TODO \emph{(depends on HDL)}
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
......@@ -408,20 +517,28 @@ between devices connected to the ports.\\
queue may become full and we start losing HP frames, which is
unacceptable.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPstatsFastMatchPriority.<n>} - HP frames on a port\\
\texttt{WR-SWITCH-MIB::wrsPstatsRXFrames<n>} - Total number of Rx frames on
the port\\
\texttt{WR-SWITCh-MIB::wrsPstatsRXPrio0.<n>} - Rx priorities 0-7\\
\texttt{[..]}\\
\texttt{WR-SWITCh-MIB::wrsPstatsRXPrio7.<n>}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPstatsFastMatchPriority.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXFrames.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio0.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio1.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio2.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio3.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio4.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio5.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio6.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsPstatsRXPrio7.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsSwcoreStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: we need to get from SwCore the information
about per-priority queue utilization, or at least an event when it's
full.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf \emph{RTUd} has crashed}
\label{fail:data:rtu_crash}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
......@@ -433,15 +550,19 @@ between devices connected to the ports.\\
broadcast to all ports (within a VLAN). When \emph{RTUd} crashes,
\texttt{Monit} will restart it.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsStartCntRTUd}\\
\texttt{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<n>}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsStartCntRTUd}\\
\snmpadd{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\snmpadd{HOST-RESOURCES-MIB::hrSWRunName.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Network loop - two or more identical MACs on two or more ports}
\label{fail:data:net_loop}
\begin{packed_enum}
\item [] \underline{Status}: TODO \emph{(to be done later)}
\begin{pck_descr}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
In such case we have a ping-pong situation. If two ports receive frames
......@@ -453,32 +574,32 @@ between devices connected to the ports.\\
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\item [] \underline{Note}: we need to monitor the \emph{rtu\_stat} to
detect ping-pong in the RTU table.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Wrong configuration applied (e.g. wrong VLAN config)}
\begin{packed_enum}
\item [] \underline{Status}: TODO \emph{(to be done later)}
\begin{pck_descr}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
The same problem as described in the timing fault
\ref{fail:timing:no_frames}
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Topology Redundancy failure}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}: \emph{(not yet implemented)}\\
Topology redundancy let's us prevent from losing data when the primary
Topology redundancy lets us prevent from losing data when the primary
uplink is down for some reason. However, if a backup link is also down
or reconfiguration to backup link fails, we start losing data and an
alarm should be raised.
or if the reconfiguration to backup link fails, we start losing data and
an alarm should be raised.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\item [] \underline{Note}: One thing we need to report is a backup link(s)
going down, but we should also think about how to determine if there is
some problem with eRSTP and if it may fail/has failed if the primary
link is down.
\end{packed_enum}
\end{pck_descr}
\newpage
\subsection{Other errors}
......@@ -486,7 +607,7 @@ between devices connected to the ports.\\
\subsubsection{\bf WR Switch did not boot correctly}
\label{fail:other:boot}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: TODO (add rebooting system when boot is
not successful, add stop restarting system after defined number of restarts)
\item [] \underline{Severity}: ERROR
......@@ -506,16 +627,19 @@ between devices connected to the ports.\\
\item status of starting userspace daemons
\end{itemize}
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsBootSuccessful} -- status word informing whether switch booted correctly\\
\texttt{WR-SWITCH-MIB::wrsRestartReason}\\
\texttt{WR-SWITCH-MIB::wrsRestartReasonMonit}\\
\texttt{WR-SWITCH-MIB::wrsConfigSource}\\
\texttt{WR-SWITCH-MIB::wrsConfigSourceUrl}\\
\texttt{WR-SWITCH-MIB::wrsBootHwinfoReadout}\\
\texttt{WR-SWITCH-MIB::wrsBootLoadFPGA}\\
\texttt{WR-SWITCH-MIB::wrsBootLoadLM32}\\
\texttt{WR-SWITCH-MIB::wrsBootKernelModulesMissing}\\
\texttt{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsRestartReason}\\
\snmpadd{WR-SWITCH-MIB::wrsRestartReasonMonit}\\
\snmpadd{WR-SWITCH-MIB::wrsConfigSource}\\
\snmpadd{WR-SWITCH-MIB::wrsConfigSourceUrl}\\
\snmpadd{WR-SWITCH-MIB::wrsBootHwinfoReadout}\\
\snmpadd{WR-SWITCH-MIB::wrsBootLoadFPGA}\\
\snmpadd{WR-SWITCH-MIB::wrsBootLoadLM32}\\
\snmpadd{WR-SWITCH-MIB::wrsBootKernelModulesMissing}\\
\snmpadd{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful} \\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}:
The idea is to reboot the system if it was not able to boot correctly.
Then we use the scratchpad registers of the processor to keep
......@@ -523,35 +647,32 @@ between devices connected to the ports.\\
rebooting and try to have a system running with at least \emph{dropbear}
for SSH and \emph{net-snmp} to allow remote diagnostics. If on the other
hand the switch has booted correctly, we set the boot count to 0.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Dot-config error}
\label{fail:other:dot-config}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
Dot-config file used to configure the switch can be stored locally or
retrieved from a central server. Additionally URL to the remote dot-config
can be retrieved via DHCP request. When dot-config is fetch from the server
it has to be verified before being applied. If downloading or verification has
failed an alarm is raised.
A dot-config file used to configure the switch can be stored locally or
retrieved from a central server. Additionally a URL to the remote
dot-config can be retrieved via DHCP request. When the dot-config is
fetched from the server it has to be verified before being applied. If
downloading or verification has failed, an alarm is raised.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsBootSuccessful} -- status word informing
whether switch booted correctly\\
\texttt{WR-SWITCH-MIB::wrsConfigSource} -- source of a dot-config,
local, remote or get URL to the dot-config via DHCP. When
\texttt{wrsConfigSource} is set to the \texttt{tryDhcp}, then failure of
getting dot-config's URL via DHCP does not rise an error in
\texttt{wrsBootSuccessful}\\
\texttt{WR-SWITCH-MIB::wrsConfigSourceUrl} -- path to the dot-config
on a server (if not local)\\
\texttt{WR-SWITCH-MIB::wrsBootConfigStatus} -- result of the dot-config verification
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsConfigSource} \\
\snmpadd{WR-SWITCH-MIB::wrsConfigSourceUrl} \\
\snmpadd{WR-SWITCH-MIB::wrsBootConfigStatus} \\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful} \\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Any userspace daemon has crashed/restarted}
\label{fail:other:daemon_crash}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: TODO \emph{(depends on monit)}
\item [] \underline{Severity}: ERROR / WARNING (depending on the process)
\item [] \underline{Description}:\\
......@@ -560,28 +681,34 @@ between devices connected to the ports.\\
corresponding start counter. If a process is restarted 5 times within
100 seconds, then the entire switch is restarted.
\item [] \underline{SNMP objects}:\\
\texttt{HOST-RESOURCES-MIB::hrSWRunName.<n>} - list of processes in standard MIB\\
\texttt{WR-SWITCH-MIB::wrsStartCntHAL}\\
\texttt{WR-SWITCH-MIB::wrsStartCntPTP}\\
\texttt{WR-SWITCH-MIB::wrsStartCntRTUd}\\
\texttt{WR-SWITCH-MIB::wrsStartCntSshd}\\
\texttt{WR-SWITCH-MIB::wrsStartCntHttpd}\\
\texttt{WR-SWITCH-MIB::wrsStartCntSnmpd}\\
\texttt{WR-SWITCH-MIB::wrsStartCntSyslogd}\\
\texttt{WR-SWITCH-MIB::wrsStartCntWrsWatchdog}\\
\texttt{WR-SWITCH-MIB::wrsStartCntSPLL} \emph{(not implemented)}\\
\texttt{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing} - number of missing processes\\
\texttt{WR-SWITCH-MIB::wrsBootSuccessful} - status word informing whether switch booted correctly
{\footnotesize
\snmpadd{HOST-RESOURCES-MIB::hrSWRunName.<n>} \\
\snmpadd{WR-SWITCH-MIB::wrsStartCntHAL}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntPTP}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntRTUd}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntSshd}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntHttpd}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntSnmpd}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntSyslogd}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntWrsWatchdog}\\
\snmpadd{WR-SWITCH-MIB::wrsStartCntSPLL}\\
\snmpadd{WR-SWITCH-MIB::wrsBootUserspaceDaemonsMissing}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful} \\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}: We shall distinguish between crucial
processes - error should be reported if one of them crashes; and less
important processes (warning should be reported if they crash). If any
of the processes has crashed, we need to restart it and increment a
per-process counter reported through the SNMP.
per-process counter reported through the SNMP. Dot-config should also
let us define which processes are not that important and the switch
should not restart even if such a process fails to start (e.g.
\emph{lighttpd}).
Crucial processes (Error report if any of them crashes):
\begin{itemize}
\item \emph{PTP/PPSi}
\item \emph{wrsw\_rtud} - after adding configuration preserving code
\item \emph{wrsw\_rtud} -- after adding configuration preserving code
on restart, RTUd could be crossed out from this list
\item \emph{wrsw\_hal}
\end{itemize}
......@@ -593,10 +720,10 @@ between devices connected to the ports.\\
\item \emph{rsyslogd}
\item \emph{snmpd}
\item \emph{lighttpd}
\item \emph{TRUd/eRSTPd} - not yet implemented
\item \emph{TRUd/eRSTPd} -- not yet implemented
\end{itemize}
\emph{wrsw\_rtud} - we need to set the flag informing the process has
\emph{wrsw\_rtud} -- we need to set the flag informing the process has
crashed so that when it runs again it knows that HDL is already
configured. It should not erase static entries in the RTU table (e.g.
multicasts for PTP), the static entries set by-hand as well as VLANs.
......@@ -606,41 +733,45 @@ between devices connected to the ports.\\
the source code has to be checked to make sure what is cleared on the
startup and modified to preserve the configuration.\\
\emph{TRUd/eRSTPd} - topology reconfiguration is done in hardware if
\emph{TRUd/eRSTPd} -- topology reconfiguration is done in hardware if
needed, the daemon is used only to configure the TRU/RTU HDL module.
However, the story is similar as with the RTUd. If eRSTPd crashes, we
need to store this information so that when it runs again, it does not
erase the whole configuration. Also if a topology reconfiguration
happens while eRSTPd is down, HDL should keep the flag for the eRSTPd so
that it's aware the backup link is active.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Kernel crash}
\begin{packed_enum}
\item [] \underline{Status}: DONE
\begin{pck_descr}
\item [] \underline{Status}: TODO (preserving stats of IP/LR registers)
\item [] \underline{Severity}: ERROR
\item [] \underline{Description}:\\
If the Linux kernel has crashed, system reboots. Until the next boot we
have no synchronization, no SNMP to report the status, FPGA may be still
forwarding Ethernet traffic, but based on dynamic and static routing
rules from before the crash. Based on the SNMP objects below it is
possible to figure out that reboot took place and what was the reason of
the last reboot.
If the Linux kernel has crashed, the system reboots. Until the next boot
we have no synchronization, no SNMP to report the status, and the FPGA
may be still forwarding Ethernet traffic, but based on dynamic and
static routing rules from before the crash. Based on the SNMP objects
below it is possible to figure out that reboot took place and what was
the reason of the last reboot.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsBootCnt}\\
\texttt{WR-SWITCH-MIB::wrsRebootCnt}\\
\texttt{WR-SWITCH-MIB::wrsRestartReason}\\
\texttt{WR-SWITCH-MIB::wrsFaultIP} \emph{(not implemented)}\\
\texttt{WR-SWITCH-MIB::wrsFaultLR} \emph{(not implemented)}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsBootCnt}\\
\snmpadd{WR-SWITCH-MIB::wrsRebootCnt}\\
\snmpadd{WR-SWITCH-MIB::wrsRestartReason}\\
\snmpadd{WR-SWITCH-MIB::wrsFaultIP}\\
\snmpadd{WR-SWITCH-MIB::wrsFaultLR}\\
\snmpadd{WR-SWITCH-MIB::wrsBootSuccessful}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\item [] \underline{Note}:
Unfortunately, right now it is not possible to distinguish whether the
reboot was caused by the kernel panic function or the \texttt{reboot}
command. Preserving the state of IP and LR registers has to be
implemented.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf System nearly out of memory}
\label{fail:other:no_mem}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
......@@ -648,15 +779,18 @@ between devices connected to the ports.\\
raise an alarm if it's extremely low (but still enough to keep the
system running).
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsMemoryTotal}\\
\texttt{WR-SWITCH-MIB::wrsMemoryUsed}\\
\texttt{WR-SWITCH-MIB::wrsMemoryUsedPerc} - percentage of used memory\\
\texttt{WR-SWITCH-MIB::wrsMemoryFree}\\
\texttt{WR-SWITCH-MIB::wrsMemoryFreeLow} - warning or error on low memory
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsMemoryTotal}\\
\snmpadd{WR-SWITCH-MIB::wrsMemoryUsed}\\
\snmpadd{WR-SWITCH-MIB::wrsMemoryUsedPerc}\\
\snmpadd{WR-SWITCH-MIB::wrsMemoryFree}\\
\snmpadd{WR-SWITCH-MIB::wrsMemoryFreeLow}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Disk space low}
\label{fail:other:no_disk}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
......@@ -664,44 +798,50 @@ between devices connected to the ports.\\
and raise an alarm if it's extremely low (but still enough to keep the
system running).
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsDiskMountPath.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskSize.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskUsed.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskFree.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskUseRate.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskFilesystem.<n>}\\
\texttt{WR-SWITCH-MIB::wrsDiskSpaceLow} - warning or error on low disk space\\
\texttt{HOST-RESOURCES-MIB::hrStorageDescr.<n>}\\
\texttt{HOST-RESOURCES-MIB::hrStorageSize.<n>}\\
\texttt{HOST-RESOURCES-MIB::hrStorageUsed.<n>}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsDiskMountPath.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskSize.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskUsed.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskFree.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskUseRate.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskFilesystem.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsDiskSpaceLow}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus}\\
\snmpadd{HOST-RESOURCES-MIB::hrStorageDescr.<n>}\\
\snmpadd{HOST-RESOURCES-MIB::hrStorageSize.<n>}\\
\snmpadd{HOST-RESOURCES-MIB::hrStorageUsed.<n>} }
\item [] \underline{Note}:
Objects like \texttt{HOST-RESOURCES-MIB::hrStorage*.<n>} are available
via standard MIB. The same functionality is implemented in
\texttt{WR-SWITCH-MIB} objects \texttt{wrsDisk*.<n>} (to ease the
implementation of \texttt{wrsDiskSpaceLow}).
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf CPU load too high}
\label{fail:other:cpu}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
On a healthy switch the average CPU load should be below \emph{0.1}.
On a healthy switch the average CPU load should be below \emph{0.1} (10\%).
Some actions like SNMP queries or web interface activity may increase
the average system load. The system load averages for the past 1, 5 and
15 minutes are exported via SNMP objects. Additionally
\texttt{wrsCpuLoadHigh} alerts when the load is too high.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsCPULoadAvg1min}\\
\texttt{WR-SWITCH-MIB::wrsCPULoadAvg5min}\\
\texttt{WR-SWITCH-MIB::wrsCPULoadAvg15min}\\
\texttt{WR-SWITCH-MIB::wrsCpuLoadHigh} - warning or error when CPU load too high
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsCPULoadAvg1min}\\
\snmpadd{WR-SWITCH-MIB::wrsCPULoadAvg5min}\\
\snmpadd{WR-SWITCH-MIB::wrsCPULoadAvg15min}\\
\snmpadd{WR-SWITCH-MIB::wrsCpuLoadHigh}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Temperature inside the box too high}
\label{fail:other:temp}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
......@@ -710,171 +850,190 @@ between devices connected to the ports.\\
the box are broken and should be replaced. There are 4 temperature
sensors monitored:
\begin{itemize}
\item \emph{IC19} - temperature below the FPGA
\item \emph{IC20}, \emph{IC17} - temperature near the SCB power supply
\item \emph{IC19} -- temperature below the FPGA
\item \emph{IC20}, \emph{IC17} -- temperature near the SCB power supply
circuit
\item \emph{IC18} - temperature near the VCXO and PLLs (AD9516,
\item \emph{IC18} -- temperature near the VCXO and PLLs (AD9516,
CDCM6100)
\end{itemize}
\texttt{wrsTemperatureWarning} is raised when the temperature read from
any of these sensors exceeds a threshold configured in the
\emph{dot-config}. When at least one threshold temperature is not set
\texttt{wrsTemperatureWarning} is set to \emph{Threshold-not-set}.
\emph{dot-config} (80 degrees by default). When at least one threshold
temperature is not set \texttt{wrsTemperatureWarning} is set to
\emph{Threshold-not-set}.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsTempFPGA}\\
\texttt{WR-SWITCH-MIB::wrsTempPLL}\\
\texttt{WR-SWITCH-MIB::wrsTempPSL}\\
\texttt{WR-SWITCH-MIB::wrsTempPSR}\\
\texttt{WR-SWITCH-MIB::wrsTempThresholdFPGA}\\
\texttt{WR-SWITCH-MIB::wrsTempThresholdPLL}\\
\texttt{WR-SWITCH-MIB::wrsTempThresholdPSL}\\
\texttt{WR-SWITCH-MIB::wrsTempThresholdPSR}\\
\texttt{WR-SWITCH-MIB::wrsTemperatureWarning}
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsTempFPGA}\\
\snmpadd{WR-SWITCH-MIB::wrsTempPLL}\\
\snmpadd{WR-SWITCH-MIB::wrsTempPSL}\\
\snmpadd{WR-SWITCH-MIB::wrsTempPSR}\\
\snmpadd{WR-SWITCH-MIB::wrsTempThresholdFPGA}\\
\snmpadd{WR-SWITCH-MIB::wrsTempThresholdPLL}\\
\snmpadd{WR-SWITCH-MIB::wrsTempThresholdPSL}\\
\snmpadd{WR-SWITCH-MIB::wrsTempThresholdPSR}\\
\snmpadd{WR-SWITCH-MIB::wrsTemperatureWarning}\\
\snmpadd{WR-SWITCH-MIB::wrsOSStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf Not supported SFP plugged into the cage (especially non 1-Gb SFP)}
\label{fail:other:sfp}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Status}: DONE
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
If a not supported Gigabit optical SFP is plugged into the cage, then
it's a timing issue \ref{fail:timing:wrong_sfp}. However, if a non 1-Gb
If a not supported Gigabit optical SFP (or an SFP that couldn't have
been matched with the \texttt{CONFIG\_SFP<XX>\_PARAMS} entries in the
configuration file) is plugged into the cage, then it's a timing issue
\ref{fail:timing:wrong_sfp}. However, if a non 1-Gb
SFP is used, then no Ethernet traffic would be flowing on that port.
It's due to the fact, that we don't have 10/100Mbit Ethernet implemented
inside the WRS.
\item [] \underline{SNMP objects}:\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpVN.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpPN.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpVS.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpGbE.<n>}\\
\texttt{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}\\
\texttt{WR-SWITCH-MIB::wrsSFPsStatus} - status word for SFPs' status
\end{packed_enum}
{\footnotesize
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpVN.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpPN.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpVS.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpGbE.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}\\
\snmpadd{WR-SWITCH-MIB::wrsSFPsStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsNetworkingStatus}\\
\snmpadd{WR-SWITCH-MIB::wrsMainSystemStatus} }
\end{pck_descr}
\subsubsection{\bf IP address on the management port has changed}
\begin{pck_descr}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
The change of an IP address on the management port might be a normal
situation or a result of an accidental modification of a DHCP server or
the WR Switch configuration. Notifying about such a situation is not
done through SNMP, since the IP address of a switch has to be known to
the SNMP manager prior querying the switch. Therefore, the switch only
generates a Syslog warning message if setting a new IP address is
detected.
\item [] \underline{SNMP objects}: \emph{(none)}, Syslog message is
generated
\end{pck_descr}
\subsubsection{\bf Multiple unauthorized access attempts}
\begin{pck_descr}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
Many attempts to gain a root access through the ssh (or the web
interface), might mean that somebody tries to do something nasty. Every
unsuccessful attempt to login is reported as a Syslog warning message.
\item [] \underline{SNMP objects}: \emph{(none)}, Syslog message is
generated
\end{pck_descr}
\subsubsection{\bf Network reconfiguration (RSTP)}
\label{fail:other:rstp}
\begin{pck_descr}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \emph{(not yet implemented)}\\
If topology reconfiguration occurs because of the primary link failure,
this fact should be reported through SNMP as a warning. It's not
critical situation, WR network still works. However, further
investigation should be performed to repair the broken link.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{pck_descr}
\subsubsection{\bf Backup link down}
\begin{pck_descr}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \emph{(not yet implemented)}\\
This is related to the issue \ref{fail:other:rstp}. If the WRS uses
primary uplink, but the backup one fails, it's not a critical fault. WR
Network still works, but the link should be diagnosed and repaired to
have the backup link operational in case the primary one fails.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{pck_descr}
\newpage
\subsection{Undetectable errors}
Beside the various errors already listed in previous sections, there are some
situations when reporting a problem to the SNMP manager or Syslog server is not
possible. This section lists some of them and proposes alternative ways of
diagnostics.
\subsubsection{\bf File system / Memory corruption}
\label{fail:other:memory}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
Memory or file system corruption can produce unpredictable results. It
may cause a failure of any of the processes running on the switch.
\item [] \underline{SNMP objects}: \emph{(none)}
\item [] \underline{Note}: how shall we detect this? Based on the
\emph{dmesg} errors reported by UBI and system in general? This is bad,
crazy things may happen, we can't do much about it.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Kernel freeze}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
If kernel freezes we can do nothing. It can freeze e.g. due to some
infinite loop in the irq handler. It's like with the power failure,
somebody has to go to the place where WRS is installed and
investigate/restart the device.
If the Linux kernel freezes there is nothing that can be done. It can
freeze e.g. due to some infinite loop in the irq handler. It is similar
to the power failure, somebody has to go to the place where the WRS is
installed and investigate/restart the device.
\item [] \underline{SNMP objects}: \emph{(none)}
\item [] \underline{Note}:
If we have watchdog in our CPU it should be used.
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Power failure}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
Power failure may be either a WRS problem (i.e. broken power supply
inside the switch) or an external problem (i.e. providing voltage to the
device). There is not much reporting we can do in such case. It's up to
the Network Management Station to raise an alarm if the SNMP Agent does
inside the switch) or an external voltage problem. It's up to the
Network Management Station to raise an alarm if the SNMP Agent does
not respond to the SNMP requests.
\item [] \underline{SNMP objects}: \emph{(none)}
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Hardware problem}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
If any crucial hardware part breaks we'll most probably notice it as one
(or multiple) timing / data errors described previously. Besides that,
we don't have any self-diagnostics on-board. Few examples:
If any crucial hardware part breaks, it will be most probably noticed
as one (or multiple) timing / data errors described in the previous
sections. Besides that, there is no self-diagnostics built-in on the
switch hardware boards. A few examples of hardware failures and problems
it may cause:
\begin{itemize}
\item DAC / VCO - problems with synchronization
\item cooling fans - rise of the temperature inside the WRS box
\item DAC / VCO -- problems with synchronization (failures in
\ref{sec:timing_fail})
\item cooling fans -- rise of the temperature inside the WRS box
(failure \ref{fail:other:temp})
\item power supply, ARM, FPGA - booting problem (failure
\item power supply, ARM, FPGA -- booting problem (failure
\ref{fail:other:boot})
\item memory chip - data corruption (failure \ref{fail:other:memory})
\item memory chip -- data corruption (failure \ref{fail:other:memory})
\end{itemize}
\item [] \underline{SNMP objects}: \emph{(none)}
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf Management link down}
\label{fail:other:management_link}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
For obvious reasons we are not able to report through SNMP that the
management link is down. This should be detected and reported by the NMS
if it does not receive SNMP and ICMP responses from the WRS.
\item [] \underline{SNMP objects}: \emph{(none)}
\end{packed_enum}
\end{pck_descr}
\subsubsection{\bf No static IP on the management port \& failed to DHCP}
\begin{packed_enum}
\begin{pck_descr}
\item [] \underline{Description}:\\
From operator's point of view it is similar to the issue
From the operator's point of view it is similar to the issue
\ref{fail:other:management_link}. WRS is not accessible through the
management port, so its status cannot be reported. This should be
detected and reported by the NMS if it does not receive SNMP and ICMP
responses from the WRS. In such case WR expert should make a physical
connection to the management USB port of the WRS to diagnose the
problem.
responses from the WRS. In such case the configuration of the switch and
management network should be verified.
\item [] \underline{SNMP objects}: \emph{(none)}
\end{packed_enum}
\subsubsection{\bf IP address on the management port has changed}
\begin{packed_enum}
\item [] \underline{Status}: TODO
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
I'm not yet sure how we should report this. Probably SNMP is not the
best choice because if the IP changes we're no longer able to poll SNMP
objects (until IP is updated also in the Network Management Station). We
should either generate SNMP trap to NMS or send Syslog message to a
central server.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{packed_enum}
\subsubsection{\bf Multiple unauthorized access attempts}
\begin{packed_enum}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}:\\
If we observe many attempts to gain a root access through the ssh (or
the web interface) this might be somebody trying to do something nasty.
We should report such situation as a Warning.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\item [] \underline{Note}: Bad password event is reported by Syslog as a
warning. We should probably use this information to add an SNMP object.
\end{packed_enum}
\subsubsection{\bf Network reconfiguration (RSTP)}
\label{fail:other:rstp}
\begin{packed_enum}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \emph{(not yet implemented)}\\
If topology reconfiguration occurs because of the primary link failure,
this fact should be reported through SNMP as a warning. It's not
critical situation, WR network still works. However, further
investigation should be performed to repair the broken link.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{packed_enum}
\subsubsection{\bf Backup link down}
\begin{packed_enum}
\item [] \underline{Status}: for later
\item [] \underline{Severity}: WARNING
\item [] \underline{Description}: \emph{(not yet implemented)}\\
This is related to the issue \ref{fail:other:rstp}. If the WRS uses
primary uplink, but the backup one fails, it's not a critical fault. WR
Network still works, but the link should be diagnosed and repaired to
have the backup link operational in case the primary one fails.
\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
\end{packed_enum}
\end{pck_descr}
%\subsection{Switch out of sync to Master}
%
......
\section{Introduction}
This document tries to list all possible ways the White Rabbit Switch can
break and describes the information exported from our device to help diagnose
the problems.
This document provides information about the diagnostics of White Rabbit
switches. It is a complementary documentation to the official \emph{White Rabbit
Switch: User's Manual} published with every stable firmware release. Please
refer to this user manual for a basic information about the switch and
guidelines on its configuration.\\
The document is organized in two parts. First one (section \ref{sec:failures})
tries to list all the possible failures that may disturb synchronization and
Ethernet switching. The structure of each failure description is the following:
\begin{itemize}[leftmargin=0pt]
\item [] \underline{Mode}: for timing failures, it says which modes are
affected. Possible values are:
\begin{itemize}
\item \emph{Slave} - WR Switch has at least one Slave port synchronized to
another WR device higher in the timing hierarchy (though it may be also
Master to other WR/PTP devices lower in the timing hierarchy).
\item \emph{Grand Master} - WR Switch at the top of the synchronization
hierarchy. It is synchronized to an external clock (e.g. GPSDO, Cesium)
and provides timing to other WR/PTP devices.
\item \emph{Free-Running Master} - WR Switch at the top of the
synchronization hierarchy. It provides timing to other WR/PTP devices
but runs from a local oscillator (not synchronized to external atomic
clock).
\end{itemize}
White Rabbit Switch firmware starting from \emph{v4.2} provides diagnostic
mechanisms in the form of SNMP objects and Syslog messages. This document is
organized in two parts. It starts with a description of the SNMP objects and
procedures to be followed if various errors are reported (section
\ref{sec:snmp_exports}). This first part is meant for the operators and people
integrating a WR switch into a control system, without the deep knowledge about
the White Rabbit internals. These people usually have to perform a quick
diagnostics and decide on actions to restore a WR network.
Second part of the document tries to list all the possible failures
that may disturb synchronization and Ethernet switching (section
\ref{sec:failures}). It is meant for the WR experts to help them with in-depth
diagnosis of the problems reported by SNMP.
\item [] \underline{Description}: What the problem is about, how important it
is and what bad may happen if it occurs.
\item [] \underline{SNMP objects}: Which SNMP objects should be monitored to
detect the failure. These may be objects from \texttt{WR-SWITCH-MIB} or one
of the standard MIBs used by the \emph{net-snmp}.
\item [] \underline{Notes}: Optional comment for SNMP implementation. It may describe current
implementation of ideas how to implement it in the future
\end{itemize}
Section \ref{sec:snmp_exports} is a documentation for people integrating WR
switch into a control system, operators and WR experts. It describes all
essential SNMP objects exported by the device divided into two groups:
\emph{Operator/basic objects}, \emph{Expert objects}
This document has many internal hyperlinks that associate general SNMP status
objects and expert SNMP objects with related problems' description and the other
way round. These links can be easily used when reading the document on a
computer.
\section{SNMP exports}
\section{SNMP diagnostics and solving problems}
\label{sec:snmp_exports}
This section describes SNMP objects exported by the WR Switch. Objects within
the \texttt{WR\--SWITCH\--MIB} are divided into two categories:
the \texttt{WR\--SWITCH\--MIB} are divided into two groups:
\begin{itemize}
\item operator/basic objects (section \ref{sec:snmp_exports:basic}) -
providing basic status of the switch. It should be used by a control system
operators and people without a deep knowledge of the White Rabbit internals.
These values report a general status of the device and high level errors.
\item expert/extended status objects (section \ref{sec:snmp_exports:expert}) -
\item General status objects for operators (section
\ref{sec:snmp_exports:basic}) - provide a summary about the status of a
switch and several main subsystems (like timing, networking, OS). These
should be used by control system operators and users without a
comprehensive knowledge of the White Rabbit internals. These exports provide
a general status of the device and high level errors which is enough in most
cases to perform a quick repair.
\item Expert objects (section \ref{sec:snmp_exports:expert}) -
can be used by White Rabbit experts for the in-depth diagnosis of the switch
failures. These values are verbose and should not be used by the operators.
failures. These values are verbose and normally should not be used by the
operators.
\end{itemize}
\subsection{Operator/basic objects}
Description of the general status objects in section
\ref{sec:snmp_exports:basic} includes also a list of actions to follow if a
particular object reports an error. These repair procedures don't require any
in-depth knowledge about White Rabbit. Independently of an error reported, there
are some common remarks that apply to all situations:
\begin{itemize}
\item Linux inside the WR Switch enumerates WR interfaces starting from 0.
This means we have to use internally port indexes 0..17. However, the
port numbers printed on the front panel are 1..18. Syslog messages
generated from the switch use the Linux port numbering. The consequence is
that every time Syslog says there is a problem on port X, this refers to
port index X+1 on the front panel of the switch.
\item If a procedure given for a specific SNMP object does not solve the
problem, please contact WR experts to perform a more in-depth analysis of
the network. For this, you should provide a complete dump of the WRS status
generated in the first step of each procedure.
\item The first action in most of the procedures below named \emph{Dump state}
requires simply calling a tool provided by WR developers that reads all the
detailed information from the switch and writes it to a single file that can
be later analyzed by the experts.\\
{\bf TODO: point to the tool once it's done}
\item If a problem solving procedure requires restarting or replacing a broken
WR Switch, please make sure that after the repair, all other WR devices
connected to the affected switch are synchronized and do not report any
problems.
\item If a procedure requires replacing a switch with a new unit, the broken
one should be handled to WR experts or the switch manufacturer to
investigate the problem.
\end{itemize}
\subsection{General status objects for operators}
\label{sec:snmp_exports:basic}
This section describes the general status MIB objects that are calculated based
on the other SNMP (detailed) exports. Most of the status objects described in
this section can have one of the following values:
This section describes the general status MIB objects that represent the overall
status of a device and its subsystems. They are organized in a tree structure
(fig.\ref{fig:snmp_oper}) where each object reports a problem based on the
status of its child objects. SNMP objects in the third layer of this tree are
calculated based on the SNMP expert objects. Most of the status objects
described in this section can have one of the following values:
\begin{figure}[ht]
\begin{center}
\includegraphics[width=.8\textwidth]{img/snmp_obj.pdf}
\caption{The structure of general status objects for operators}
\label{fig:snmp_oper}
\end{center}
\end{figure}
\begin{itemize}%[leftmargin=0pt]
\item \texttt{NA} -- status value was not calculated at all (returned value
is 0). Something bad has happened.
......@@ -25,359 +69,34 @@ this section can have one of the following values:
\item \texttt{Warning} -- objects used to calculate this value are outside the
proper values, but problem in not critical enough to report \texttt{Error}.
\item \texttt{WarningNA} -- at least one of the objects used to calculate the
status has a value \texttt{NA} or \texttt{WarningNA}.
status has a value \texttt{NA} (or \texttt{WarningNA}).
\item \texttt{Error} -- error in values used to calculate the particular
object.
\item \texttt{FirstRead} -- the value of the object cannot be calculated
because at least one condition uses deltas between the current and previous
value. This value should appear only at first SNMP read. Threated as a
value. This value should appear only at first SNMP read. To be treated as a
correct value.
\item \texttt{Bug} -- Something wrong has happened while calculating the
object. If you see this please report to WR developers.
\end{itemize}
\noindent {\bf General Status objects}:
\begin{itemize}%[leftmargin=0pt]
\item \texttt{wrsGeneralStatusGroup} -- Group containing collective statuses
of various subsystems and the main system status, describing the status of
entire switch.
\begin{itemize}
\item \texttt{wrsMainSystemStatus} -- WRS general status of a switch can
be \texttt{OK}, \texttt{Warning} or \texttt{Error}. When there is an
error or warning please check the values of \texttt{wrsOSStatus},
\texttt{wrsTimingStatus} and \texttt{wrsNetworkingStatus} to find out
which subsystem causes the problem.
\item \texttt{wrsOSStatus} -- Collective status of the
\texttt{wrsOSStatusGroup}. For details please check the group's content.
\item \texttt{wrsTimingStatus} -- Collective status of the
\texttt{wrsTimingStatusGroup}. For details please check the group's
content.
\item \texttt{wrsNetworkingStatus} -- Collective status of the
\texttt{wrsNetworkingStatusGroup}. For details please check the group's
content.
\end{itemize}
\paragraph*{SNMP objects:}
\item \texttt{wrsDetailedStatusesGroup} -- Branch with collective statuses of
various switch subsystems.
\begin{itemize}
\item \texttt{wrsOSStatusGroup} -- Group with collective statuses of the
embedded operating system running on the switch.
\begin{itemize}
\item \texttt{wrsBootSuccessful} -- Grouped status of
\texttt{wrsBootStatusGroup}, indicating whether boot was successful.
\texttt{Error} when dot-config source is wrong, unable to get the dot-config,
unable to get URL to the dot-config,
dot-config contains errors, unable to read the hwinfo, unable to load
the FPGA bitstream, unable to load the LM32 software, any kernel
modules or userspace daemons are missing (issue \ref{fail:other:boot},
\ref{fail:other:dot-config}).
\item \texttt{wrsTemperatureWarning} -- Report whether the temperature
thresholds are not set or are exceeded (issue \ref{fail:other:temp}).
\item \texttt{wrsMemoryFreeLow} -- \texttt{Warning} when 50\% of the memory is
used, error when more than 80\% of the memory is used (issue
\ref{fail:other:no_mem}).
\item \texttt{wrsCpuLoadHigh} -- \texttt{Warning} when the average CPU load is
more than 2 for the past 1min, 1.5 for 5min or 1 for 15min.
\texttt{Error} when the average CPU load is more than 3 for the past
1min, 2 for 5min or 1.5 for 15min (issue \ref{fail:other:cpu}).
\item \texttt{wrsDiskSpaceLow} -- \texttt{Warning} when more than 80\% of any
disk partition is used. \texttt{Error} when more than 90\% of any disk
partition is used (issue \ref{fail:other:no_disk}).
\end{itemize}
\item \texttt{wrsTimingStatusGroup} -- Group with collective statuses of
the timing subsystem.
\begin{itemize}
\item \texttt{wrsPTPStatus} -- \texttt{Error} when any of PTP error counters in
\texttt{wrsPtpDataTable} (\texttt{wrsPtpServoStateErrCnt},
\texttt{wrsPtpClockOffsetErrCnt} or\\ \texttt{wrsPtpRTTErrCnt}) has
increased since the last scan (issue
\ref{fail:timing:ppsi_track_phase}, \ref{fail:timing:offset_jump},
\ref{fail:timing:rtt_jump}), at least one of the $\Delta_{TXM}$,
$\Delta_{RXM}$, $\Delta_{TXS}$, $\Delta_{RXS}$ is 0 (issue
\ref{fail:timing:deltas_report}) or PTP servo update counter is not
increasing.
\item \texttt{wrsSoftPLLStatus} -- \texttt{Error} when \texttt{wrsSpllSeqState}
is not \emph{Ready}, or \texttt{wrsSpllAlignState} is not
\emph{Locked} (for Grand Master mode), or any of
\texttt{wrsSpllHlock}, \texttt{wrsSpllMlock} equals to 0 (for Slave
mode) (issue \ref{fail:timing:spll_unlock}).\\
\texttt{Warning} when \texttt{wrsSpllDelCnt} $>$ 0 (for Grand Master
mode) or \texttt{wrsSpllDelCnt} has changed (for all other modes).
\item \texttt{wrsSlaveLinksStatus} -- \texttt{Error} when link to Master
is down for a switch in the Slave mode (issue
\ref{fail:timing:master_down}). Additionally, \texttt{Error} when the
link to Master is up for a switch in the Free-running Master or Grand
Master mode (issue \ref{fail:timing:master_up}).
\item \texttt{wrsPTPFramesFlowing} -- \texttt{Error} when PTP Tx/Rx
frame counters on active links (Master / Slave ports) are not being
incremented. (issue \ref{fail:timing:no_frames}). Report the first
run.
\end{itemize}
\item \texttt{wrsNetworkingStatusGroup} -- Group with collective statuses
of the networking subsystem.
\begin{itemize}
\item \texttt{wrsSFPsStatus} -- \texttt{Error} when any of the SFPs
reports an error. To find out which SFP caused the problem check
\texttt{wrsPortStatusSfpError.<n>} (issue \ref{fail:timing:wrong_sfp},
\ref{fail:other:sfp})
\item \texttt{wrsEndpointStatus} -- \texttt{Error} when there is a fault
in the Endpoint's transmission/reception path (issue
\ref{fail:data:ep_txrx}).
\item \texttt{wrsSwcoreStatus} -- Not used in the current release.
Always reports \texttt{OK}.
\item \texttt{wrsRTUStatus} -- \texttt{Error} when RTU is full and cannot
accept more requests (issue \ref{fail:data:rtu_full}).
\end{itemize}
\end{itemize}
\item \texttt{wrsVersionGroup} -- Hardware, gateware and software versions.
Additionally the serial number and other hardware information for the WRS.
\begin{itemize}
\item \texttt{wrsVersionSwVersion} -- software version (as returned
from the \texttt{git describe} at build time).
\item \texttt{wrsVersionSwBuildBy} -- software build-by (as returned
from the \texttt{git config --get-all user.name} at build time)
\item \texttt{wrsVersionSwBuildDate} -- software build date
(\texttt{\_\_DATE\_\_} at build time)
\item \texttt{wrsVersionBackplaneVersion} -- hardware version of the
minibackplane PCB
\item \texttt{wrsVersionFpgaType} -- FPGA model inside the switch
\item \texttt{wrsVersionManufacturer} -- name of the manufacturing
company
\item \texttt{wrsVersionSwitchSerialNumber} -- serial number (or string)
of the switch
\item \texttt{wrsVersionScbVersion} -- version of the SCB (the
motherboard)
\item \texttt{wrsVersionGwVersion} -- version of the gateware (FPGA
bitstream)
\item \texttt{wrsVersionGwBuild} -- build ID of the gateware (FPGA
bitstream)
\item \texttt{wrsVersionSwitchHdlCommitId} -- gateware version: commit ID
from the \texttt{wr\_switch\_hdl} repository
\item \texttt{wrsVersionGeneralCoresCommitId} -- gateware version: commit
ID from the \texttt{general-cores} repository
\item \texttt{wrsVersionWrCoresCommitId} -- gateware version: commit ID
from the \texttt{wr-cores} repository
\item \texttt{wrsVersionLastUpdateDate} -- date and time of last firmware
update, this information may not be accurate, due to hard restarts or
lack of the proper time at update.
\end{itemize}
\end{itemize}
% SNMP status objects
\printnoidxglossary[type=snmp_status,title=,style=objtree,sort=def]
\newpage
\subsection{Expert/extended status}
\subsection{Expert objects}
\label{sec:snmp_exports:expert}
\noindent {\bf Expert Status}:
\begin{itemize}
\item \texttt{wrsOperationStatus}
\begin{itemize}
\item \texttt{wrsCurrentTimeGroup}
\begin{itemize}
\item \texttt{wrsDateTAI}
\item \texttt{wrsDateTAIString}
\end{itemize}
\item \texttt{wrsBootStatusGroup}
\begin{itemize}
\item \texttt{wrsBootCnt}
\item \texttt{wrsRebootCnt}
\item \texttt{wrsRestartReason}
\item \texttt{wrsFaultIP} -- Not implemented
\item \texttt{wrsFaultLR} -- Not implemented
\item \texttt{wrsConfigSource}
\item \texttt{wrsConfigSourceUrl}
\item \texttt{wrsRestartReasonMonit} -- Process that caused \texttt{monit}
to trigger a restart.
\item \texttt{wrsBootConfigStatus}
%below boot status values
\item \texttt{wrsBootHwinfoReadout}
\item \texttt{wrsBootLoadFPGA}
\item \texttt{wrsBootLoadLM32}
\item \texttt{wrsBootKernelModulesMissing} -- List of kernel modules is
defined in the source code.
\item \texttt{wrsBootUserspaceDaemonsMissing} -- List of daemons is defined
in the source code.
\item \texttt{wrsGwWatchdogTimeouts} -- Number of times the watchdog has
restarted the HDL module responsible for the Ethernet switching process
(issue \ref{fail:data:swcore_hang}).
\end{itemize}
\item \texttt{wrsTemperatureGroup}
\begin{itemize}
\item \texttt{wrsTempFPGA}
\item \texttt{wrsTempPLL}
\item \texttt{wrsTempPSL}
\item \texttt{wrsTempPSR}
\item \texttt{wrsTempThresholdFPGA}
\item \texttt{wrsTempThresholdPLL}
\item \texttt{wrsTempThresholdPSL}
\item \texttt{wrsTempThresholdPSR}
\end{itemize}
\item \texttt{wrsMemoryGroup}
\begin{itemize}
\item \texttt{wrsMemoryTotal}
\item \texttt{wrsMemoryUsed}
\item \texttt{wrsMemoryUsedPerc}
\item \texttt{wrsMemoryFree}
\end{itemize}
\item \texttt{wrsCpuLoadGroup}
\begin{itemize}
\item \texttt{wrsCPULoadAvg1min}
\item \texttt{wrsCPULoadAvg5min}
\item \texttt{wrsCPULoadAvg15min}
\end{itemize}
\item \texttt{wrsDiskTable} -- Table with a row for every partition.
\begin{itemize}
\item \texttt{wrsDiskIndex}
\item \texttt{wrsDiskMountPath}
\item \texttt{wrsDiskSize}
\item \texttt{wrsDiskUsed}
\item \texttt{wrsDiskFree}
\item \texttt{wrsDiskUseRate}
\item \texttt{wrsDiskFilesystem}
\end{itemize}
\end{itemize}
\item \texttt{wrsStartCntGroup}
\begin{itemize}
\item \texttt{wrsStartCntHAL} -- issue \ref{fail:timing:hal_crash}, \ref{fail:other:daemon_crash}
\item \texttt{wrsStartCntPTP} -- issue \ref{fail:timing:ppsi_crash}, \ref{fail:other:daemon_crash}
\item \texttt{wrsStartCntRTUd} -- issue \ref{fail:data:rtu_crash}, \ref{fail:other:daemon_crash}
\item \texttt{wrsStartCntSshd}
\item \texttt{wrsStartCntHttpd}
\item \texttt{wrsStartCntSnmpd}
\item \texttt{wrsStartCntSyslogd}
\item \texttt{wrsStartCntWrsWatchdog}
\end{itemize}
\item \texttt{wrsSpllState}
\begin{itemize}
\item \texttt{wrsSpllVersionGroup}
\begin{itemize}
\item \texttt{wrsSpllVersion}
\item \texttt{wrsSpllBuildDate}
\end{itemize}
\item \texttt{wrsSpllStatusGroup}
\begin{itemize}
\item \texttt{wrsSpllMode}
\item \texttt{wrsSpllIrqCnt}
\item \texttt{wrsSpllSeqState}
\item \texttt{wrsSpllAlignState}
\item \texttt{wrsSpllHlock}
\item \texttt{wrsSpllMlock}
\item \texttt{wrsSpllHY}
\item \texttt{wrsSpllMY}
\item \texttt{wrsSpllDelCnt}
% xxx wrsSpllHoldover
% xxx wrsSpllHoldoverTime
% xxx (...)
\end{itemize}
% \item \texttt{wrsSpllPerPortTable[18]
% \begin{itemize}
% xxx wrsSpllBlock
% xxx wrsSpllErr
% \end{itemize}
\end{itemize}
\item \texttt{wrsPstatsTable} -- Table with pstats values, one row per port.
\begin{itemize}
\item \texttt{wrsPstatsIndex}
\item \texttt{wrsPstatsPortName}
\item \texttt{wrsPstatsTXUnderrun}
\item \texttt{wrsPstatsRXOverrun}
\item \texttt{wrsPstatsRXInvalidCode}
\item \texttt{wrsPstatsRXSyncLost}
\item \texttt{wrsPstatsRXPauseFrames}
\item \texttt{wrsPstatsRXPfilterDropped}
\item \texttt{wrsPstatsRXPCSErrors}
\item \texttt{wrsPstatsRXGiantFrames}
\item \texttt{wrsPstatsRXRuntFrames}
\item \texttt{wrsPstatsRXCRCErrors}
\item \texttt{wrsPstatsRXPclass0}
\item \texttt{wrsPstatsRXPclass1}
\item \texttt{wrsPstatsRXPclass2}
\item \texttt{wrsPstatsRXPclass3}
\item \texttt{wrsPstatsRXPclass4}
\item \texttt{wrsPstatsRXPclass5}
\item \texttt{wrsPstatsRXPclass6}
\item \texttt{wrsPstatsRXPclass7}
\item \texttt{wrsPstatsTXFrames}
\item \texttt{wrsPstatsRXFrames}
\item \texttt{wrsPstatsRXDropRTUFull}
\item \texttt{wrsPstatsRXPrio0}
\item \texttt{wrsPstatsRXPrio1}
\item \texttt{wrsPstatsRXPrio2}
\item \texttt{wrsPstatsRXPrio3}
\item \texttt{wrsPstatsRXPrio4}
\item \texttt{wrsPstatsRXPrio5}
\item \texttt{wrsPstatsRXPrio6}
\item \texttt{wrsPstatsRXPrio7}
\item \texttt{wrsPstatsRTUValid}
\item \texttt{wrsPstatsRTUResponses}
\item \texttt{wrsPstatsRTUDropped}
\item \texttt{wrsPstatsFastMatchPriority}
\item \texttt{wrsPstatsFastMatchFastForward}
\item \texttt{wrsPstatsFastMatchNonForward}
\item \texttt{wrsPstatsFastMatchRespValid}
\item \texttt{wrsPstatsFullMatchRespValid}
\item \texttt{wrsPstatsForwarded}
\item \texttt{wrsPstatsTRURespValid}
\end{itemize}
\paragraph*{SNMP objects:}
% SNMP expert objects
\printnoidxglossary[type=snmp_expert,style=objtree,sort=def]
\item \texttt{wrsPtpDataTable} -- Table with a row per PTP servo instance.
\begin{itemize}
\item \texttt{wrsPtpIndex}
\item \texttt{wrsPtpPortName} -- The port on which the instance is running.
\item \texttt{wrsPtpGrandmasterID} -- Not implemented.
\item \texttt{wrsPtpOwnID} -- Not implemented.
\item \texttt{wrsPtpMode}
\item \texttt{wrsPtpServoState}
\item \texttt{wrsPtpServoStateN}
\item \texttt{wrsPtpPhaseTracking}
\item \texttt{wrsPtpSyncSource}
\item \texttt{wrsPtpClockOffsetPs}
\item \texttt{wrsPtpClockOffsetPsHR}
\item \texttt{wrsPtpSkew}
\item \texttt{wrsPtpRTT}
\item \texttt{wrsPtpLinkLength}
\item \texttt{wrsPtpServoUpdates}
\item \texttt{wrsPtpDeltaTxM}
\item \texttt{wrsPtpDeltaRxM}
\item \texttt{wrsPtpDeltaTxS}
\item \texttt{wrsPtpDeltaRxS}
\item \texttt{wrsPtpServoStateErrCnt} -- Number of the servo updates when
servo is out of the TRACK\_PHASE (issue
\ref{fail:timing:ppsi_track_phase}).
\item \texttt{wrsPtpClockOffsetErrCnt} -- Number of servo updates when
offset is larger than 500ps or smaller than -500ps (issue
\ref{fail:timing:offset_jump}).
\item \texttt{wrsPtpRTTErrCnt} -- Number of servo updates when RTT delta
between subsequent updates is larger than 1000ps or smaller than -1000ps
(issue \ref{fail:timing:rtt_jump}).
\end{itemize}
\item \texttt{wrsPortStatusTable} -- Table with a row per port.
\begin{itemize}
\item \texttt{wrsPortStatusIndex}
\item \texttt{wrsPortStatusPortName}
\item \texttt{wrsPortStatusLink}
\item \texttt{wrsPortStatusConfiguredMode}
\item \texttt{wrsPortStatusLocked}
\item \texttt{wrsPortStatusPeer}
\item \texttt{wrsPortStatusSfpVN}
\item \texttt{wrsPortStatusSfpPN}
\item \texttt{wrsPortStatusSfpVS}
\item \texttt{wrsPortStatusSfpInDB}
\item \texttt{wrsPortStatusSfpGbE}
\item \texttt{wrsPortStatusSfpError}
\item \texttt{wrsPortStatusPtpTxFrames}
\item \texttt{wrsPortStatusPtpRxFrames}
\end{itemize}
\end{itemize}
\vspace{12pt}
\paragraph*{Objects from other MIBs:}
% other objects
\printnoidxglossary[type=snmp_other,style=objtree,sort=def]
%%%%%%%%%%%%%%%%%%5
%% Other notes
......
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Add status entries in the order as the appear in the MIB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\snmpentrys{WR-SWITCH-MIB}{}{wrsGeneralStatusGroup}{
Group containing collective status of the switch and its various
subsystems.}
\snmpentrys{WR-SWITCH-MIB}{wrsGeneralStatusGroup}{wrsMainSystemStatus}{
\underline{Description:}
WRS general status of a switch can be \texttt{OK}, \texttt{Warning} or
\texttt{Error}. In case of an error or warning, please check the values of
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsOSStatus}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsTimingStatus}} and
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsNetworkingStatus}} to find out which
subsystem causes the problem.
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsGeneralStatusGroup}{wrsOSStatus}{
\underline{Description:}
Collective status of the operating system running on WR switch. In case of
an error or warning, please check status objects in the
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsOSStatusGroup}}.
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsGeneralStatusGroup}{wrsTimingStatus}{
\underline{Description:}
Collective status of the synchronization subsystem. In case of an
error or warning, please check status objects in the
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsTimingStatusGroup}}.
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsGeneralStatusGroup}{wrsNetworkingStatus}{
\underline{Description:}
Collective status of the Ethernet switching subsystem. In case of an error
or warning, please check status objects in the
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsNetworkingStatusGroup}}.
\glspar \underline{Related problems:}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\snmpentrys{WR-SWITCH-MIB}{}{wrsDetailedStatusesGroup}{
Branch with collective statuses of various switch subsystems.}
%------------------------------------------------------------------------
\snmpentrys{WR-SWITCH-MIB}{wrsDetailedStatusesGroup}{wrsOSStatusGroup}{
\underline{Description:}
Group with collective statuses of the embedded operating system running on
the switch.}
\snmpentrys{WR-SWITCH-MIB}{wrsOSStatusGroup}{wrsBootSuccessful}{
\underline{Description:}
Grouped status of \texttt{\glshyperlink{WR-SWITCH-MIB::wrsBootStatusGroup}},
indicating whether boot was
successful. \texttt{Error} when dot-config source is wrong, unable to get
the dot-config, unable to get URL to the dot-config, dot-config contains
errors, unable to read the hwinfo, unable to load the FPGA bitstream, unable
to load the LM32 software, any kernel modules or userspace daemons are
missing.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Check \texttt{\glshyperlink{WR-SWITCH-MIB::wrsBootConfigStatus}},
if it reports an error, please verify your WRS configuration.
\item Restart the switch
\item Please consult WR experts if the problem persists.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsOSStatusGroup}{wrsTemperatureWarning}{
\underline{Description:}
Reports whether the temperature thresholds are not set or are exceeded.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Verify if your switch configuration contains valid temperature
thresholds. By default, they are all set to 80 \textdegree C.
\item Verify if cooling of the rack where WR Switch is installed works
properly.
\item Verify if both cooling fans in the back of the WR Switch case are
working.
\item Replace the switch with a new unit and consult the WR Switch
manufacturer for a repair.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsOSStatusGroup}{wrsMemoryFreeLow}{
\underline{Description:}
Reports \texttt{Warning} when more than 50\%, or \texttt{Error} when more
than 80\% of the memory is used.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Restart the switch
\item Send the dumped state of the switch to WR experts for analysis as
this might mean there is some internal problem in the WRS firmware.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsOSStatusGroup}{wrsCpuLoadHigh}{
\underline{Description:}
Reports \texttt{Warning} when the average CPU load is more than 2 for the past 1min,
1.5 for 5min or 1 for 15min. \texttt{Error} when the average CPU load is
more than 3 for the past 1min, 2 for 5min or 1.5 for 15min.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Restart the switch
\item Send the dumped state of the switch to WR experts for analysis as
this might mean there is some internal problem in the WRS firmware.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsOSStatusGroup}{wrsDiskSpaceLow}{
\underline{Description:}
\texttt{Warning} when more than 80\% of any disk partition is used.
\texttt{Error} when more than 90\% of any disk partition is used.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Check the values of \emph{CONFIG\_WRS\_LOG\_*} configuration options
on the switch. These are the parameters describing where log messages
should be sent from various processes in the switch. Normally users
don't need to modify them, but if any of them is set to a file in the
WRS filesystem (e.g. /tmp/snmp.log) this may reduce the free space after
some time of operation.
\item Restart the switch
\item Send the dumped state of the switch to WR experts for analysis as
this might mean there is some internal problem in the WRS firmware.
\end{pck_proc}
\glspar \underline{Related problems:}}
%------------------------------------------------------------------------
\snmpentrys{WR-SWITCH-MIB}{wrsDetailedStatusesGroup}{wrsTimingStatusGroup} {
\underline{Description:}
Group with collective statuses of the timing subsystem.} %}
\snmpentrys{WR-SWITCH-MIB}{wrsTimingStatusGroup}{wrsPTPStatus}{
\underline{Description:}
Reports the status of PTP daemon running on the switch.\\
\texttt{Error} when any of PTP error counters in
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPtpDataTable}}\\
(\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPtpServoStateErrCnt.<n>}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPtpClockOffsetErrCnt.<n>}} or\\
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPtpRTTErrCnt.<n>}})
has increased since the last scan (issue
\ref{fail:timing:ppsi_track_phase},
\ref{fail:timing:offset_jump}, \ref{fail:timing:rtt_jump}), at least one of
the $\Delta_{TXM}$, $\Delta_{RXM}$, $\Delta_{TXS}$, $\Delta_{RXS}$ is 0
(issue \ref{fail:timing:deltas_report}) or PTP servo update counter is not
increasing.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Check \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSoftPLLStatus}} on the
Master (WR device one step higher in a timing hierarchy). Eventually
proceed to investigate the problem on the Master switch. Otherwise,
continue with the primary WRS.
\item Verify if the link to WR Master was not lost by checking the
object\\ \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSlaveLinksStatus}}.
\item If this is not the case, restart the switch.
\item If the problem persists replace the switch with a new unit.
%(see \ref{cern:wrs_replacement}).
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsTimingStatusGroup}{wrsSoftPLLStatus}{
\underline{Description:}
Reports the status of the PLLs inside the switch.\\
\texttt{Error} when \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllSeqState}}
is not \emph{Ready}, or
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllAlignState}} is not
\emph{Locked} (for Grand Master mode), or any of
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllHlock}},
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllMlock}} equals to 0 (for
Boundary Clock mode).\\
\texttt{Warning} when \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllDelCnt}}
$>$ 0 (for Grand Master mode) or
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsSpllDelCnt}} has changed (for all
other modes).\\
\underline{On error:}\\
For GrandMaster WRS:
\begin{pck_proc}
\item Dump state
\item Check 1-PPS and 10 MHz signals coming from an external source.
Verify if they are properly connected and, in case of a GPS receiver,
check if it is synchronized and locked.
\item Restart the GrandMaster switch.
\item If the problem persists, replace the switch with a new unit.
%(see \ref{cern:wrs_replacement}).
\end{pck_proc}
\glspar For Boundary Clock WRS:
\begin{pck_proc}
\item Dump state
\item Check \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSoftPLLStatus}} on the
Master. Eventually proceed to investigate the problem on the Master
switch.
\item Verify if the link to WR Master was not lost by checking the
object\\ \texttt{\glshyperlink{WR-SWITCH-MIB::wrsSlaveLinksStatus}}.
\item Restart the switch.
\item If the problem persists, replace the switch with a new unit.
%(see \ref{cern:wrs_replacement}).
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsTimingStatusGroup}{wrsSlaveLinksStatus}{
\underline{Description:}
Reports the status of the link on WR ports configured to slave role.\\
\texttt{Error} when link to master is down for a switch in the Boundary
Clock mode. Additionally, \texttt{Error} is generated when the
link to master is up for a switch in the Free-running Master or Grand
Master mode.\\
\underline{On error:}\\
For Master/GrandMaster WRS:
\begin{pck_proc}
\item Check the configuration of the switch. Especially if the
\emph{Timing Mode} is correctly set (i.e. if it was not accidentally set
to \emph{Boundary Clock}).
\item Check the role of each port timing configuration. They should be all
set to \emph{master}. If any of them is set to \emph{slave} you should
verify if there is no WR Master connected to it.
\end{pck_proc}
\glspar For Boundary Clock WRS:
\begin{pck_proc}
\item Check the fiber connection on the slave port of the WRS.
\item Check the configuration of the switch. Especially if the
\emph{Timing Mode} is correctly set (i.e. if it was not accidentally set
to \emph{Grand-Master} or \emph{Free-Running Master}).
\item Check the status of the WR Master connected to the slave port of the
WRS.
\item Replace the faulty switch with a new unit, if this does not solve
the problem, make sure your fiber link is not broken.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsTimingStatusGroup}{wrsPTPFramesFlowing}{
\underline{Description:}
Reports \texttt{Error} when PTP frames are not being sent and received on
active WR ports - Tx/Rx frame counters on active links (master / slave
ports) are not being incremented. Reports also \texttt{FirstRead} value.\\
\underline{On error:}
\begin{pck_proc}
\item Check Syslog message to determine the WR port on which the
problem is reported. You should see a message similar to this one:\\
\texttt{SNMP: wrsPTPFramesFlowing failed for port 1}
\item Check your network layout and the WR Switch configuration. If you
have some non-WR devices connected to ports of the WR Switch (e.g.
computer sending/receiving only data, without the need of
synchronization), these ports should have their role in the timing
configuration set to \emph{non-wr}.
\item Check the status of a WR device connected to the reported port.
\item Restart the switch.
\item If the problem persists, please contact WR experts for in-depth
investigation.
\end{pck_proc}
\glspar \underline{Related problems:}}
%------------------------------------------------------------------------
\snmpentrys{WR-SWITCH-MIB}{wrsDetailedStatusesGroup}{wrsNetworkingStatusGroup}{
\underline{Description:}
Group with collective statuses of the networking subsystem.}
\snmpentrys{WR-SWITCH-MIB}{wrsNetworkingStatusGroup}{wrsSFPsStatus}{
\underline{Description:}
Reports the status of SFP transceivers inserted to the switch.\\
\texttt{Error} when any of the SFPs reports an error. To find out which SFP
caused the problem check
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}}.\\
\underline{On error:}
\begin{pck_proc}
\item Check \texttt{\glshyperlink{WR-SWITCH-MIB::wrsPortStatusSfpError.<n>}}
SNMP objects or Syslog
messages to determine the WR port on which the problem is reported. In
case of Syslog, you should see a message similar to this one:\\
\texttt{Unknown SFP vn="AVAGO" pn="ABCU-5710RZ" vs="AN1151PD8A" on port
wr1}
\item If the reported port is intended to be used to connect a device that
does not require WR synchronization (e.g. using a copper SFP module),
then you should verify whether the role in the timing configuration for
this port is set to \emph{non-wr}.
\item Otherwise, you should use a WR-supported SFP module and make sure it
is declared together with calibration values in the WRS configuration.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsNetworkingStatusGroup}{wrsEndpointStatus}{
\underline{Description:}
Reports the status of Ethernet MAC endpoints on WR ports\\
\texttt{Error} when there is a fault in the Endpoint's
transmission/reception path.\\
\underline{On error:}
\begin{pck_proc}
\item Make several state dumps.
\item Restart the switch.
\item Check Syslog messages to determine the WR port on which the problem
is reported. You should see a message similar to this one:\\
\texttt{SNMP: wrsEndpointStatus failed for port 1}
\item Check the fiber link on a reported port, i.e. try replacing SFP
transceivers on both sides of the link, try using another fiber.
\item If the problem persists, please contact WR experts for in-depth
investigation.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsNetworkingStatusGroup}{wrsSwcoreStatus}{
\underline{Description:}
Reports the status of the Ethernet switching module.\\
Status object not implemented in the current firmware release. Always
reports \texttt{OK}.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state.
\item Restart the switch.
\item Please contact WR experts since this might mean that either there is
too much high priority traffic in your network, or there is some
internal problem in the WRS firmware.
\end{pck_proc}
\glspar \underline{Related problems:}}
\snmpentrys{WR-SWITCH-MIB}{wrsNetworkingStatusGroup}{wrsRTUStatus}{
\underline{Description:}
Reports the status of the routing module responsible for deciding where (to
which port) incoming Ethernet frames should be forwarded.\\
\texttt{Error} when RTU is overloaded and cannot accept more requests.\\
\underline{On error:}
\begin{pck_proc}
\item Dump state
\item Restart the switch.
\item If possible, try reducing the load of small Ethernet frames flowing
through your switch. If possible in your application, try using larger
Ethernet frames with lower load to transfer information.
\end{pck_proc}
\glspar \underline{Related problems:}}
%------------------------------------------------------------------------
\snmpentrys{WR-SWITCH-MIB}{wrsDetailedStatusesGroup}{wrsVersionGroup}{
\underline{Description:}
Hardware, gateware and software versions. Additionally the serial number and
other hardware information for the WRS.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionSwVersion}{
\underline{Description:}
Software version in the form of release version and eventually git commit
from the repository (information provided from \emph{git describe} command
at build time).}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionSwBuildBy}{
\underline{Description:}
Information who has built the firmware running on the switch (provided from
\texttt{git config --get-all user.name} command at build time).}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionSwBuildDate}{
\underline{Description:}
Firmware build date (\texttt{\_\_DATE\_\_} at build time).}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionBackplaneVersion}{
\underline{Description:}
Hardware version of the Minibackplane board.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionFpgaType}{
\underline{Description:}
FPGA model inside the switch.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionManufacturer}{
\underline{Description:}
Name of the manufacturing company.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionSwitchSerialNumber}{
\underline{Description:}
Serial number of the switch.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionScbVersion}{
\underline{Description:}
Hardware version of the main SCB board.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionGwVersion}{
\underline{Description:}
Version of the FPGA bitstream (Gateware).}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionGwBuild}{
\underline{Description:}
Build ID of the FGPA bitstream - the synthesis date}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionSwitchHdlCommitId}{
\underline{Description:}
FPGA bitstream commit ID from the main \texttt{wr\_switch\_hdl} repository.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionGeneralCoresCommitId}{
\underline{Description:}
FPGA bitstream commit ID from the \texttt{general-cores} repository.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionWrCoresCommitId}{
\underline{Description:}
FPGA bitstream commit ID from the \texttt{wr-cores} repository.}
\snmpentrys{WR-SWITCH-MIB}{wrsVersionGroup}{wrsVersionLastUpdateDate}{
\underline{Description:}
Date and time of the last firmware update. This information may not be
accurate, due to hard restarts or lack of the proper time during the
upgrade.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Add expert entries in the order as the appear in the MIB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\snmpentrye{WR-SWITCH-MIB}{}{wrsOperationStatus}{}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsCurrentTimeGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsCurrentTimeGroup}{wrsDateTAI}{}
\snmpentrye{WR-SWITCH-MIB}{wrsCurrentTimeGroup}{wrsDateTAIString}{}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsBootStatusGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootCnt}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsRebootCnt}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsRestartReason}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsFaultIP}{Not implemented}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsFaultLR}{Not implemented}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsConfigSource}{Source of a
configuration file. When it is set to \texttt{tryDhcp}, then a failure of
getting the URL via DHCP does not rise an error in
\texttt{\glshyperlink{WR-SWITCH-MIB::wrsBootSuccessful}} }
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsConfigSourceUrl}{Path to the
dot-config on a remote server (if local file is not used).}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsRestartReasonMonit}{
Process that caused \texttt{monit} to trigger a restart.}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootConfigStatus}{Result of
the dot-config verification.}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootHwinfoReadout}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootLoadFPGA}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootLoadLM32}{}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootKernelModulesMissing}{
List of kernel modules is defined in the source code.}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsBootUserspaceDaemonsMissing}{
Number of missing (not running while they should) processes in embedded
Linux.}
\snmpentrye{WR-SWITCH-MIB}{wrsBootStatusGroup}{wrsGwWatchdogTimeouts}{
Number of times the watchdog has restarted the HDL module responsible for
the Ethernet switching process}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsTemperatureGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempFPGA}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempPLL}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempPSL}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempPSR}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempThresholdFPGA}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempThresholdPLL}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempThresholdPSL}{}
\snmpentrye{WR-SWITCH-MIB}{wrsTemperatureGroup}{wrsTempThresholdPSR}{}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsMemoryGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsMemoryGroup}{wrsMemoryTotal}{}
\snmpentrye{WR-SWITCH-MIB}{wrsMemoryGroup}{wrsMemoryUsed}{}
\snmpentrye{WR-SWITCH-MIB}{wrsMemoryGroup}{wrsMemoryUsedPerc}{Percentage of
used memory.}
\snmpentrye{WR-SWITCH-MIB}{wrsMemoryGroup}{wrsMemoryFree}{}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsCpuLoadGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsCpuLoadGroup}{wrsCPULoadAvg1min}{}
\snmpentrye{WR-SWITCH-MIB}{wrsCpuLoadGroup}{wrsCPULoadAvg5min}{}
\snmpentrye{WR-SWITCH-MIB}{wrsCpuLoadGroup}{wrsCPULoadAvg15min}{}
\snmpentrye{WR-SWITCH-MIB}{wrsOperationStatus}{wrsDiskTable}{
Table with a row for every partition.}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskIndex.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskMountPath.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskSize.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskUsed.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskFree.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskUseRate.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsDiskTable}{wrsDiskFilesystem.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{}{wrsStartCntGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntHAL}{issue \ref{fail:timing:hal_crash}, \ref{fail:other:daemon_crash}}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntPTP}{issue \ref{fail:timing:ppsi_crash}, \ref{fail:other:daemon_crash}}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntRTUd}{issue \ref{fail:data:rtu_crash}, \ref{fail:other:daemon_crash}}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntSshd}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntHttpd}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntSnmpd}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntSyslogd}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntWrsWatchdog}{}
\snmpentrye{WR-SWITCH-MIB}{wrsStartCntGroup}{wrsStartCntSPLL}{Not implemented}
\snmpentrye{WR-SWITCH-MIB}{}{wrsSpllState}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllState}{wrsSpllVersionGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllVersionGroup}{wrsSpllVersion}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllVersionGroup}{wrsSpllBuildDate}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllState}{wrsSpllStatusGroup}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllMode}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllIrqCnt}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllSeqState}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllAlignState}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllHlock}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllMlock}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllHY}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllMY}{}
\snmpentrye{WR-SWITCH-MIB}{wrsSpllStatusGroup}{wrsSpllDelCnt}{}
% xxx wrsSpllHoldover
% xxx wrsSpllHoldoverTime
% xxx (...)
% \snmpentrye{WR-SWITCH-MIB}{wrsSpllPerPortTable[18]
%
% xxx wrsSpllBlock
% xxx wrsSpllErr
%
\snmpentrye{WR-SWITCH-MIB}{}{wrsPstatsTable}{Table with pstats values, one row per port.}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsIndex.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsPortName.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsTXUnderrun.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXOverrun.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXInvalidCode.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXSyncLost.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPauseFrames.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPfilterDropped.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPCSErrors.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXGiantFrames.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXRuntFrames.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXCRCErrors.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass0.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass1.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass2.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass3.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass4.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass5.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass6.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPclass7.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsTXFrames.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXFrames.<n>}{Total number
of Rx frames on port \emph{n} }
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXDropRTUFull.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio0.<n>}{Rx priority 0}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio1.<n>}{Rx priority 1}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio2.<n>}{Rx priority 2}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio3.<n>}{Rx priority 3}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio4.<n>}{Rx priority 4}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio5.<n>}{Rx priority 5}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio6.<n>}{Rx priority 6}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRXPrio7.<n>}{Rx priority 7}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRTUValid.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRTUResponses.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsRTUDropped.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsFastMatchPriority.<n>}{HP
frames on port \emph{n}}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsFastMatchFastForward.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsFastMatchNonForward.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsFastMatchRespValid.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsFullMatchRespValid.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsForwarded.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPstatsTable}{wrsPstatsTRURespValid.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{}{wrsPtpDataTable}{Table with a row per PTP servo instance.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpIndex.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpPortName.<n>}{
The port on which the instance is running.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpGrandmasterID.<n>}{
Not implemented.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpOwnID.<n>}{
Not implemented.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpMode.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpServoState.<n>}{PTP servo
state as string}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpServoStateN.<n>}{PTP servo
state as number}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpPhaseTracking.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpSyncSource.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpClockOffsetPs.<n>}{value of
the offset in ps}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpClockOffsetPsHR.<n>}{32-bit
signed value of the offset in picoseconds, with saturation}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpSkew.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpRTT.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpLinkLength.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpServoUpdates.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpDeltaTxM.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpDeltaRxM.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpDeltaTxS.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpDeltaRxS.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpServoStateErrCnt.<n>}{
Number of the servo updates when servo is out of the \texttt{TRACK\_PHASE}.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpClockOffsetErrCnt.<n>}{
Number of servo updates when offset is larger than 500ps or smaller than
-500ps.}
\snmpentrye{WR-SWITCH-MIB}{wrsPtpDataTable}{wrsPtpRTTErrCnt.<n>}{
Number of servo updates when RTT delta between subsequent updates is larger
than 1000ps or smaller than -1000ps.}
\snmpentrye{WR-SWITCH-MIB}{}{wrsPortStatusTable}{Table with a row per port.}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusIndex.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusPortName.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusLink.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusConfiguredMode.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusLocked.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusPeer.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpVN.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpPN.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpVS.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpInDB.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpGbE.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusSfpError.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusPtpTxFrames.<n>}{}
\snmpentrye{WR-SWITCH-MIB}{wrsPortStatusTable}{wrsPortStatusPtpRxFrames.<n>}{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Add entries from other MIBs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\snmpentryo{HOST-RESOURCES-MIB}{}{hrSWRunName.<n>}{List of running processes.}
\snmpentryo{HOST-RESOURCES-MIB}{}{hrStorageDescr.<n>}{}
\snmpentryo{HOST-RESOURCES-MIB}{}{hrStorageSize.<n>}{}
\snmpentryo{HOST-RESOURCES-MIB}{}{hrStorageUsed.<n>}{}
\snmpentryo{IF-MIB}{}{ifOperStatus.<n>}{}
......@@ -12,6 +12,7 @@
\usepackage[latin1]{inputenc}
\usepackage{verbatim}
\usepackage{amsmath}
\usepackage{textcomp}
\usepackage{times,mathptmx}
\usepackage{chngcntr}
\usepackage{hyperref}
......@@ -22,17 +23,20 @@
\definecolor{light-gray}{gray}{0.95}
%\usepackage[firstpage]{draftwatermark}
% for glossary
% nopostdot - no dot at the end of index entires
\usepackage[nogroupskip,nopostdot,counter=subsubsection]{glossaries}
\renewcommand{\glossarysection}[2][]{}
\usepackage{listings}
\usepackage{cancel}
\graphicspath{ {../../../../figures/} }
\newenvironment{packed_enum}{
\begin{itemize}[leftmargin=0pt,topsep=-12pt]
\begin{enumerate}[leftmargin=0pt,topsep=-12pt]
\setlength{\itemsep}{1pt}
\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
}{\end{itemize}}
}{\end{enumerate}}
\newenvironment{packed_items}{
\begin{itemize}[topsep=-12pt]
......@@ -41,6 +45,20 @@
\setlength{\parsep}{0pt}
}{\end{itemize}}
\newenvironment{pck_descr}{
\begin{itemize}[leftmargin=0pt,topsep=-12pt]
\setlength{\itemsep}{1pt}
\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
}{\end{itemize}}
\newenvironment{pck_proc}{
\begin{enumerate}[topsep=2pt]
\setlength{\itemsep}{1pt}
\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
}{\end{enumerate}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% creating subsubsubsection notation
......@@ -71,6 +89,146 @@
\newcommand{\eqdelay}[1]{{\text{delay}}_{#1}}
\newcommand{\eqasymm}{{\text{asymmetry}}}
% for glossary, set way of sorting entries
\makenoidxglossaries
% don't bold entries, texttt them
\renewcommand*{\glsnamefont}[1]{\texttt{\textmd{#1}}}
\newglossary*{snmp_status}{SNMP's status objects}
\newglossary*{snmp_expert}{SNMP's expert objects}
\newglossary*{snmp_other}{Objects from other MIBs}
% alphabetical list of all entries
\newglossary*{snmp_all}{All SNMP objects}
\defglsentryfmt[snmp_status]{\texttt{\glsentryfmt}}
\defglsentryfmt[snmp_expert]{\texttt{\glsentryfmt}}
\defglsentryfmt[snmp_other]{\texttt{\glsentryfmt}}
% macro to add entires
\newcommand{\snmpadd}[1]{
\glspl{#1}\glsadd{x#1}%
}
% helpers to add glossary entries
% add newline to non empty strings. For descriptions.
\newcommand{\descr}[1]{
\ifx&#1&%
%
\else% put fixed space
\\#1
\fi
}
% {MIB}{parent}{object}{comment}{glossary_name}
\newcommand{\snmpentry}[5]{%
\ifx&#2&% if parameter 2 is empty don't add parent
\newglossaryentry{#1::#3}{%
type=#5,%
name={#3},%
plural={#1::#3},% used to display name not plural
user1={#1},% MIB
description={\descr{#4}}%
}%
\else
\newglossaryentry{#1::#3}{%
type=#5,%
name={#3},%
description={\descr{#4}},%
plural={#1::#3},% used to display name not plural
user1={#1},% MIB
parent={#1::#2}%
}%
\fi
% add entry to alphabetical list
\newglossaryentry{x#1::#3}{%
type=snmp_all,%
name={#1::#3},%
description={}
}%
}
% {MIB}{parent}{object}{comment}
\newcommand{\snmpentrye}[4]{%
\snmpentry{#1}{#2}{#3}{#4}{snmp_expert}
}
% {MIB}{parent}{object}{comment}
\newcommand{\snmpentrys}[4]{%
\snmpentry{#1}{#2}{#3}{#4}{snmp_status}
}
% command to add snmp objects from other MIBs
% {MIB}{parent}{object}{comment}
\newcommand{\snmpentryo}[4]{%
\snmpentry{#1}{#2}{#3}{#4}{snmp_other}
}
% extra indent for lists
\newlength{\paraaindent}
% indent for new paragraphs
\newlength{\snmpentryindent}
% load glossary definitions from snmp_objects.tex
\loadglsentries{snmp_objects}
% use \kern 0.33em instead of \space to have fixed width space
\newglossarystyle{objtree}{%
\renewenvironment{theglossary}%
{\setlength{\parindent}{0pt}%
\setlength{\parskip}{0pt plus 0.3pt}}%
{}%
\renewcommand*{\glossaryheader}{}%
\renewcommand*{\glsgroupheading}[1]{}%
\renewcommand{\glossentry}[2]{%
\hangindent30pt\relax
% save indent for other paragraphs
\setlength{\snmpentryindent}{\hangindent}
\parindent0pt\relax
% set indent for lists entries
\setlength{\paraaindent}{\hangindent}
\addtolength{\paraaindent}{14pt}
\setlist[enumerate]{leftmargin=\paraaindent}
$\bullet$\kern 0.33em\glsentryitem{##1}\glstreenamefmt{\glstarget{##1}{\texttt{\textmd{\glsentryuseri{##1}}::}\glossentryname{##1}}}%
\ifglshassymbol{##1}{\kern 0.33em(\glossentrysymbol{##1})}{}%
\glossentrydesc{##1}\glspostdescription\kern 0.33em##2\par\vspace{12pt}
}%
\renewcommand{\subglossentry}[3]{%
\hangindent##1\glstreeindent\relax
\addtolength{\hangindent}{30pt}
% save indent for other paragraphs
\setlength{\snmpentryindent}{\hangindent}
\parindent##1\glstreeindent\relax
% set indent for lists entries
\setlength{\paraaindent}{\hangindent}
\addtolength{\paraaindent}{14pt}
\setlist[enumerate]{leftmargin=\paraaindent}
\ifnum##1=1\relax
$\circ$%
\fi
\ifnum##1=2\relax
$\ast$%
\fi
\kern 0.33em%
\ifnum##1=1\relax
\glssubentryitem{##2}%
\fi
\glstreenamefmt{\glstarget{##2}{\glossentryname{##2}}}%
\ifglshassymbol{##2}{\kern 0.33em(\glossentrysymbol{##2})}{}%
\glossentrydesc{##2}\glspostdescription\kern 0.33em##3\par\vspace{12pt}
}%
%redefine \glspar to support indentation in many paragraphs
\renewcommand{\glspar}{%
\par
\parindent\snmpentryindent % restore first line in paragraph indent
\hangindent\snmpentryindent % restore other lines in paragraph indent
}%
}
\newcommand{\ignore}[1]{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\setcounter{tocdepth}{2}
\input{revinfo.tex}
......@@ -96,18 +254,31 @@
\newpage
\input{intro.tex}
\newpage
\input{snmp_exports.tex}
\newpage
\section{Possible Errors}
\label{sec:failures}
\input{fail.tex}
\appendix
\newpage
\input{snmp_exports.tex}
%\section{SNMP exports}
%\subsection{Operator/basic objects}
%\subsection{Expert objects}
\section{Operator's diagnostic example}
\input{diamon_example.tex}
\newpage
\section{Sorted list of all MIB objects}
\label{sec:snmp_exports:sorted}
% print alphabetical list
\printnoidxglossary[type=snmp_all,style=tree,sort=letter]
%\newpage
%\bibliographystyle{unsrt}
%\bibliography{references}
% add not used entries, but don't display their's section
% based on:
% http://tex.stackexchange.com/questions/115635/glossaries-suppress-pages-when-using-glsaddall
\forallglsentries{\thislabel}%
{%
\ifglsused{\thislabel}{}{\glsadd[format=ignore]{\thislabel}}%
}
\end{document}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment