docs/specifications/management: move wrs_failures to wr-switch-sw repo

Move done to keep document synced with source. Signed-off-by: Adam Wujek <adam.wujek@cern.ch>

docs/specifications/management: move wrs_failures to wr-switch-sw repo
Move done to keep document synced with source. Signed-off-by: Adam Wujek <adam.wujek@cern.ch>
7f4c52ad · Adam Wujek · 0855671b · 0855671b · 0855671b · 0855671b
Commit 7f4c52ad authored Aug 13, 2015 by Adam Wujek 💬
5 changed files
--- a/documents/specifications/management/wrs_failures/Makefile
+++ b/documents/specifications/management/wrs_failures/Makefile
-all : wrs_failures.pdf
-
-.PHONY : all clean
-
-wrs_failures.pdf : wrs_failures.tex fail.tex intro.tex snmp_exports.tex
-	pdflatex -dPDFSETTINGS=/prepress -dSubsetFonts=true -dEmbedAllFonts=true -dMaxSubsetPct=100 -dCompatibilityLevel=1.4 $^
-	pdflatex -dPDFSETTINGS=/prepress -dSubsetFonts=true -dEmbedAllFonts=true -dMaxSubsetPct=100 -dCompatibilityLevel=1.4 $^
-
-clean :
-	rm -f *.eps *.pdf *.dat *.log *.out *.aux *.dvi *.ps *.toc
-
--- a/documents/specifications/management/wrs_failures/fail.tex
+++ b/documents/specifications/management/wrs_failures/fail.tex
-\subsection{Timing error}
-As a timing error we define WR Switch not being able to provide its slave
-nodes/switches with correct timing information consistent with the rest of the
-WR network.\\
-
-\noindent Faults leading to a timing error:
-\begin{enumerate}
-	\item {\bf \emph{PTP/PPSi} went out of \texttt{TRACK\_PHASE}}
-		\label{fail:timing:ppsi_track_phase}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{Slave}
-			\item [] \underline{Description}:\\
-				If \emph{PTP/PPSi} WR servo goes out of the \texttt{TRACK\_PHASE} state,
-				that means something bad has happened and switch has lost the
-				synchronization to its Master.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::ptpServoState}\\
-				\texttt{WR-SWITCH-MIB::ptpServoStateN}
-				%ppsiServoStateN shall contain state as a integer taken from ppsi shm
-			\item [] \underline{Note}: we should also monitor PTP/PPSi state inside the
-				switch to build up the general WRS status word.
-		\end{packed_enum}
-
-	\item {\bf Offset jump not compensated by Slave}
-		\label{fail:timing:offset_jump}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{Slave}
-			\item [] \underline{Description}:\\
-				This may happen if Master resets its WR time counters (e.g. because it
-				lost the link to its Master higher in the hierarchy or to external
-				clock), but Slave switch does not follow the jump.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::ptpClockOffsetPs}\\
-				\texttt{WR-SWITCH-MIB::ptpClockOffsetPsHR}
-			\item [] \underline{Note}: HR version is 32-bit signed value of the offset. With saturation on overflow and underflow.
-		\end{packed_enum}
-
-	\item {\bf Detected jump in the RTT value calculated by \emph{PTP/PPSi}}
-		\label{fail:timing:rtt_jump}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{Slave}
-			\item [] \underline{Description}:\\
-				Once WR link is established round-trip delay (RTT) can change smoothly
-				due to the temperature variations. If a sudden jump is detected, that
-				means erroneous timestamp was generated either on Master or Slave side.
-				One cause of that could be the wrong value of the t24p transition point.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::ptpRTT}
-			\item [] \underline{Note}: we should also monitor RTT variations inside
-				the switch to build up the general WRS status word.
-		\end{packed_enum}
-
-	\item {\bf Wrong $\Delta_{TXM}$, $\Delta_{RXM}$, $\Delta_{TXS}$,
-		$\Delta_{RXS}$ values are reported to the \emph{PTP/PPSi} daemon}
-		\label{fail:timing:deltas_report}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If \emph{PTP/PPSi} doesn't get the correct values of fixed hardware delays,
-				it won't be able to calculate a proper Master-to-Slave delay. Although
-				the estimated offset in \emph{PTP/PPSi} is close to 0, WRS won't be
-				synchronized to Master with the sub-nanosecond accuracy.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::ptpDeltaTxM.<n>}\\
-				\texttt{WR-SWITCH-MIB::ptpDeltaRxM.<n>}\\
-				\texttt{WR-SWITCH-MIB::ptpDeltaTxS.<n>}\\
-				\texttt{WR-SWITCH-MIB::ptpDeltaRxS.<n>}
-		\end{packed_enum}
-
-	\item {\bf \emph{SoftPLL} became unlocked}
-		\label{fail:timing:spll_unlock}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If \emph{SoftPLL} loses lock, for any reason, Slave or Grand Master
-				switch can no longer be syntonized and phase aligned with its time
-				source. WRS in Free-running mode without properly locked Helper PLL is
-				not able to perform reliable phase measurements for enhancing Rx
-				timestamps resolution. For Grand Master the reason of \emph{SoftPLL}
-				going out of lock might be disconnected 1-PPS/10MHz signals or external
-				clock down. In that case, the switch goes into Free-running mode and
-				resets WR time. Later we will have a holdover to keep the Grand Master
-				switch disciplined in case it loses external reference.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
-				\texttt{WR-SWITCH-MIB::spllMode}\\
-				\texttt{WR-SWITCH-MIB::spllSeqState}\\
-				\texttt{WR-SWITCH-MIB::spllAlignState}\\
-				\texttt{WR-SWITCH-MIB::spllHlock}\\
-				\texttt{WR-SWITCH-MIB::spllMlock}\\
-				\texttt{WR-SWITCH-MIB::spllDelCnt}
-			\item [] \underline{Note}: The idea to export the status from LM32 is to
-				place a structure with all these values under a fixed address in the
-				memory and read it from Linux.
-		\end{packed_enum}
-
-	\item {\bf \emph{SoftPLL} has crashed/restarted}
-		\label{fail:timing:spll_crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on SoftPLL mem read), (require changes in lm32 software)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If LM32 software crashes or restarts for some reason, its state may be
-				either reseted or random (if for some reason variables were overwritten
-				with junk values). In such case PLL becomes unlocked and switch is not
-				able to provide synchronization to other devices.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}\\
-				\texttt{WR-SWITCH-MIB::spllIrqCnt}
-			\item [] \underline{Note}: we need to have a similar mechanism as in the
-				\emph{wrpc-sw} to detect if the LM32 program has restarted because of
-				the CPU following a NULL pointer. If it occurs, we need to export this
-				information through Mini-IPC/HAL. In addition to that, we can detect if
-				\emph{SoftPLL} is hanging (but not restarted) based on irq counter.
-		\end{packed_enum}
-
-	\item {\bf Link to WR Master is down}
-		\label{fail:timing:master_down}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR (will become WARNING with the
-				switch-over)
-			\item [] \underline{Mode}: \emph{Slave}
-			\item [] \underline{Description}:\\
-				In that case, WR Switch loses timing reference, resets counters
-				responsible for keeping the WR time, and starts operating in a
-				Free-Running Master mode.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::portLink.<n>}\\
-				\texttt{WR-SWITCH-MIB::portMode.<n>}
-		\end{packed_enum}
-
-	\item {\bf PTP frames don't reach ARM}
-		\label{fail:timing:no_frames}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on ppsi shm?)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				In this case, \emph{PTP/PPSi} will fail to stay synchronized and provide
-				synchronization. Even if WR servo is in the \texttt{TRACK\_PHASE} state,
-				it calculates new phase shift based on the Master-to-Slave delay
-				variations. To calculate these variations, it still needs timestamped
-				PTP frames flowing. There could be several causes of such fault:
-				\begin{itemize}
-					\item HDL problem (e.g. SwCore or Endpoint hanging)
-					\item \emph{wr\_nic.ko} driver crash
-					\item wrong VLANs configuration
-				\end{itemize}
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::portPtpTxFrames.<n>} \emph{(not implemented)}\\
-				\texttt{WR-SWITCH-MIB::portPtpRxFrames.<n>} \emph{(not implemented)}\\
-				\texttt{WR-SWITCH-MIB::portLink.<n>} \emph{(implemented)}\\
-				\texttt{WR-SWITCH-MIB::portMode.<n>} \emph{(implemented)}
-			\item [] \underline{Note}: If the kernel driver crashes, there is not much
-				we can do. We end up with either our system frozen or a reboot. For
-				wrong VLAN configuration and HDL problems we can monitor if PTP frames
-				are flowing on Slave port(s) of WRS and raise an alarm (change status
-				word) if they don't flow anymore. We should combine this with the link
-				status (up/down). If VLANs are misconfigured, we don't receive PTP
-				frames, but the link is still up. This could let us distinguish from a
-				lack of frames due to the link down (which is a separate issue).
-		\end{packed_enum}
-
-	\item {\bf Detected SFP not supported for WR timing}
-		\label{fail:timing:wrong_sfp}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				By not supported SFP for WR timing we mean a transceiver that doesn't
-				have the \emph{alpha} parameter and fixed hardware delays defined in the
-				SFP database (\emph{/wr/etc/sfp\_database.conf}). The consequence is
-				\emph{PTP/PPSi} not having the right values to estimate link asymmetry.
-				Despite \emph{PTP/PPSi} offset being close to 0 \emph{ps}, the device won't
-				be properly synchronized.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::portSfpVN.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpPN.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpVS.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpInDB.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpGbE.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpError.<n>}
-			\item [] \underline{Note}: WRS configuration allow to disable this check on some ports.
-				That is because ports may be used for regular (non-WR) PTP
-				synchronization or for data transfer only (no timing). In that case any
-				Gigabit SFP can be used (also copper). Detecting if a non-Gigabit
-				Ethernet SFP is plugged into the cage is covered in a separate issue
-				\ref{fail:other:sfp} in section \ref{sec:other_fail}.
-		\end{packed_enum}
-
-	\item {\bf \emph{PTP/PPSi} process has crashed/restarted}
-		\label{fail:timing:ppsi_crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on monit)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If the \emph{PTP/PPSi} daemon crashes we lose any synchronization
-				capabilities. If, in the future, we will have another process that could
-				bring \emph{PTP/PPSi} back to live, such a restart would still create a time
-				jump and has to be reported.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::ptpRunCnt} \emph{(not implemented)}\\
-				\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
-			\item [] \underline{Note}: list of the processes has to be monitored, if
-				\emph{PTP/PPSi} is there and if its PID has changed (it was restarted).
-		\end{packed_enum}
-
-	\item {\bf \emph{HAL} process has crashed/restarted}
-		\label{fail:timing:hal_crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on monit)}
-			\item [] \underline{Severity}: WARNING (but only after we modify PTP/PPSi so
-				it reconnects to HAL, and HAL does not re-initialize SoftPLL after
-				crash)
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If \emph{HAL} crashes, \emph{PTP/PPSi} is not able to communicate with
-				hardware i.e. read phase shift, get timestamps, phase shift the clock
-				etc.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::halRunCnt} \emph{(not implemented)}\\
-				\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
-			\item [] \underline{Note}: list of processes has to be monitored, if
-				\emph{wrsw\_hal} is there and if its PID has changed (it was restarted).
-		\end{packed_enum}
-
-	\item {\bf Wrong configuration applied}
-		\label{fail:timing:wrong_config}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(to be done later)}
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Mode}: \emph{all}
-			\item [] \underline{Description}:\\
-				If there is a wrong configuration applied to the \emph{PTP/PPSi} or HAL
-				(i.e.  wrong fixed delays, mode of operation etc.) there is not much we
-				can do. The responsibility of WR experts (or person deploying the
-				system) is to make sure that all the devices have correct initial
-				configuration. Later we can only generate warnings, if the key
-				configuration options are changed remotely (e.g. Grand Master mode to
-				Free-running Master or updated fixed hardware delays values).\\
-				For misconfigured VLANs, we can monitor if PTP frames are flowing on
-				Slave port(s) of the switch.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: monitor remote updates of key configuration
-				options (PTP/WR mode, fixed hardware delays)
-		\end{packed_enum}
-
-	\item {\bf Switchover failed}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Mode}: \emph{Slave}, \emph{Grand Master}
-			\item [] \underline{Description}: \emph{(not yet implemented)}\\
-				In case the primary timing link breaks, switchover is responsible for
-				seamless switching to the backup one to keep the device in sync. If WRS
-				operates in a \emph{Slave} mode, switchover is about switching
-				between two (or more) WR links to one or multiple WR Masters. If it
-				operates in a \emph{Grand Master} mode, it is about broken/lost
-				connection to an external reference and switching to a backup WR link
-				(another WR Master). Regardless of the configuration, if we fail to
-				switch-over to a backup link (e.g. because the it is down), WRS resets
-				the time counters and continue the operation as a Free-Running Master.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: we should probably use parameters reported by
-				the backup channel(s) of the SoftPLL and the backup PTP servo to be able
-				to detect and report that something went wrong.
-		\end{packed_enum}
-
-	\item {\bf Holdover for too long}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Mode}: \emph{Grand Master}
-			\item [] \underline{Description}: \emph{(not yet implemented)}\\
-				Signaling active holdover is one thing, but if a Grand Master switch is
-				kept in sync with holdover for too long, it might drift away from the
-				ideal external reference too much. All devices in WR network will be
-				still synchronized, but no longer in sync with external reference.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-		\end{packed_enum}
-
-\end{enumerate}
-
-\newpage
-\subsection{Data error}
-As a data error we define WR Switch not being able to forward Ethernet traffic
-between devices connected to the ports.\\
-
-\noindent Faults leading to a data error:
-\begin{enumerate}
-
-	\item {\bf Link down}
-		\label{fail:data:link_down}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE  \emph{(to be changed later for switchover)}
-			\item [] \underline{Severity}: ERROR (will be WARNING with the
-				switch-over)
-			\item [] \underline{Description}:\\
-				This obviously stops the flow of frames on an Ethernet port and there is
-				not much we can do besides reporting an error. Topology redundancy is a
-				cure for that (if backup link is fine, and reconfiguration does not
-				fail). There might be several causes of a link down:
-				\begin{itemize}
-					\item unplugged fiber
-					\item broken fiber
-					\item broken SFP
-					\item wrong(non-complementary) pair of WDM SPFs is used
-				\end{itemize}
-				However, we are not able to distinguish between them inside the switch.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{IF-MIB::ifOperStatus.<n>}\\
-				\texttt{WR-SWITCH-MIB::portLink.<n>}
-		\end{packed_enum}
-
-	\item {\bf Fault in the Endpoint's transmission/reception path}
-		\label{fail:data:ep_txrx}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				This fault covers various errors reported by the Endpoint, e.g. FIFO
-				underrun in the Tx PCS or FIFO overrun in the Rx PCS, receiving invalid
-				\emph{8b10b} code, CRC error etc.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.1} - Tx PCS FIFO underrun\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.2} - Rx PCS FIFO overrun\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.3} - Rx invalid \emph{8b10b} code\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.4} - Rx sync lost\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.6} - Rx frame dropped by PFilter\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.7} - Rx PCS Error\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.10} - Rx CRC Error
-		\end{packed_enum}
-
-	\item {\bf Problem with the \emph{SwCore} or Endpoint HDL module}
-		\label{fail:data:swcore_hang}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on HDL, then hal?)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				If any of these HDL modules hangs, there is usually not much the user
-				can do besides resetting the WR Switch so that the FPGA is reprogrammed.
-				It may happen that frames are lost only on one or two ports, but it may
-				be also that the whole SwCore refuses to forward traffic. In the current
-				firmware release we have a bug causing SwCore/Endpoint to hang after
-				forwarding a specific frame size and load. It will be improved in the
-				future releases.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.19} - Endpoint Tx frames\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.38} - RTU forward decisions to the
-				port
-			\item [] \underline{Note}: We should probably provide also some events for
-				counting from the SwCore.\\
-				Two early ideas for checking if SwCore is hanging or not:
-				\begin{itemize}
-					\item Monitor the number of used and free pages in the MPM memory
-					\item Compare per-port \emph{RTUfwd} counter with the \emph{Tx}
-						Endpoint counter for this port. \emph{RTUfwd} counts all forwarding
-						decisions from RTU to the port $<$n$>$ (including PTP frames from
-						NIC). If this number is equal to the number of frames actually
-						transmitted by the Endpoint, then everything works fine).
-				\end{itemize}
-		\end{packed_enum}
-
-	\item {\bf RTU is full and cannot accept more requests}
-		\label{fail:data:rtu_full}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on HDL)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				If RTU is full for a given port, it's not able to accept more requests
-				and generate new responses. In such case frames are dropped in the
-				Rx path of the Endpoint.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCh-MIB::pstatsWR<n>.21} - Rx drop, RTU full
-			\item [] \underline{Note}: It turns out that the \emph{rtu\_port} HDL
-				component was changed and currently RTU full events are not generated
-				and therefore not counted by PSTATS.
-		\end{packed_enum}
-
-	\item {\bf Too much HP traffic / Per-priority queue full}
-		\label{fail:data:too_much_HP}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on HDL)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				If we get too much High Priority traffic, then SwCore will be busy all
-				the time forwarding HP frames. This way regular/best effort traffic
-				won't be flowing through the switch. In the extreme case, HP traffic
-				queue may become full and we start losing HP frames, which is
-				unacceptable.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.33} - HP frames on a port\\
-				\texttt{WR-SWITCH-MIB::pstatsWR<n>.20} - Total number of Rx frames on
-				the port\\
-				\texttt{WR-SWITCh-MIB::pstatsWR<n>.22-29} - Rx priorities 0-7
-			\item [] \underline{Note}: we need to get from SwCore the information
-				about per-priority queue utilization, or at least an event when it's
-				full.
-		\end{packed_enum}
-
-	\item {\bf \emph{RTUd} has crashed}
-		\label{fail:data:rtu_crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on monit)}
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				If RTUd crashed, traffic would be still routed between the WRS ports, but
-				only based on already existing static and dynamic rules. There would be
-				no learning or aging functionality. That means MAC addresses wouldn't be
-				removed from the RTU table if a device is disconnected from port. Since
-				there would be no learning, each frame with yet unknown destination MAC
-				will be broadcast to all ports (within a VLAN).
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::rtuRunCnt} \emph{(not implemented)}\\
-				\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>} \emph{(implemented)}
-			\item [] \underline{Note}: the list of processes has to be monitored, if
-				\emph{RTUd} is there and if its PID has changed (it was restarted).
-		\end{packed_enum}
-
-	\item {\bf Network loop - two or more identical MACs on two or more ports}
-		\label{fail:data:net_loop}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(to be done later)}
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				In such case we have a ping-pong situation. If two ports receive frames
-				with the same source MAC, it is learned on one of these ports. Then if
-				it comes on a second port, it is learned on a second port, and removed
-				from the first one. Later, MAC is learned again on the first port, and
-				removed from the MAC table for the second port, and so on. This
-				situation is a network configuration problem or eRSTP failure.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: we need to monitor the \emph{rtu\_stat} to
-				diagnose ping-pong in the RTU table.
-		\end{packed_enum}
-
-	\item {\bf Wrong configuration applied (e.g. wrong VLAN config)}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(to be done later)}
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				The same problem as described in the timing fault
-				\ref{fail:timing:no_frames}
-		\end{packed_enum}
-
-	\item {\bf Topology Redundancy failure}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}: \emph{(not yet implemented)}\\
-				Topology redundancy let's us prevent from losing data when the primary
-				uplink is down for some reason. However, if a backup link is also down
-				or reconfiguration to backup link fails, we start losing data and an
-				alarm should be raised.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: One thing we need to report is a backup link(s)
-				going down, but we should also think about how to determine if there is
-				some problem with eRSTP and if it may fail/has failed if the primary
-				link is down.
-		\end{packed_enum}
-
-\end{enumerate}
-
-\newpage
-\subsection{Other errors}
-\label{sec:other_fail}
-
-\begin{enumerate}
-	\item {\bf WR Switch did not boot correctly}
-		\label{fail:other:boot}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:\\
-				That one is about making sure that everything is up and running after WR
-				switch boots. If any of the services fails, an alarm should be raised.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: we should have a flag somewhere reported
-				through the SNMP (e.g. in the main status word) saying that WRS has
-				booted correctly, FPGA is programmed, all kernel drivers are loaded and
-				all daemons are up and running. If it's not the case, we should report
-				what has happened:
-				\begin{itemize}
-					\item reading HW information from dataflash failed ?
-					\item programming FPGA or LM32 failed ?
-					\item loading any of the kernel modules failed ?
-					\item starting any of the userspace daemons failed ?
-				\end{itemize}
-				The idea for that is to reboot the system if it was not able to boot
-				correctly. Then we use the scratchpad registers of the processor to keep
-				the boot count. If the value of this counter is more than X we stop
-				rebooting and try to have a system running with at least \emph{dropbear}
-				for SSH and \emph{net-snmp} to allow remote diagnostics. If on the other
-				hand we have booted correctly we set the boot count to 0.
-		\end{packed_enum}
-
-	\item {\bf Any userspace daemon has crashed/restarted}
-		\label{fail:other:daemon_crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(depends on monit)}
-			\item [] \underline{Severity}: ERROR / WARNING (depending on the process)
-			\item [] \underline{Description}:
-			\item [] \underline{SNMP objects}:\\
-				\texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>}\\
-				\texttt{WR-SWITCH-MIB::ptpRunCnt}\\
-				\texttt{WR-SWITCH-MIB::halRunCnt}\\
-				\texttt{WR-SWITCH-MIB::rtuRunCnt}\\
-				\texttt{WR-SWITCH-MIB::sshRunCnt}\\
-				\texttt{WR-SWITCH-MIB::udhcpdRunCnt}\\
-				\texttt{WR-SWITCH-MIB::rsyslogRunCnt}\\
-				\texttt{WR-SWITCH-MIB::snmpdRunCnt}\\
-				\texttt{WR-SWITCH-MIB::httpdRunCnt}
-			\item [] \underline{Note}: We have to monitor the list of running
-				processes and their PIDs. We shall distinguish between crucial
-				processes - error should be reported if one of them crashes; and less
-				important processes which should just be restarted if they crash (and
-				warning should be reported). If any of the processes has crashed, we
-				need to restart it and increment a per-process counter reported through
-				the SNMP to indicate how many times each process has crashed.\\
-
-				Crucial processes (Error report if any of them crashes):
-				\begin{itemize}
-					\item \emph{PTP/PPSi}
-					\item \emph{WRSW\_RTUd} - after adding configuration preserving code
-						on restart, RTUd could be crossed out from this list
-					\item \emph{WRSW\_HAL}
-				\end{itemize}
-				Less critical processes (Restarting them and Warning generation is
-				enough):
-				\begin{itemize}
-					\item \emph{dropbear}
-					\item \emph{udhcpc}
-					\item \emph{rsyslogd}
-					\item \emph{snmpd}
-					\item \emph{lighttpd}
-					\item \emph{TRUd/eRSTPd} - not yet implemented
-				\end{itemize}
-
-				\emph{RTUd} - we need to set the flag that it has crashed so that when
-				it runs again it knows that HDL is already configured. It should not
-				erase static entries in RTU table (e.g. multicasts for PTP) and it
-				should not erase or it should configure again static entries set by-hand
-				as well as VLANs. Dynamic entries are not a problem. RTUd will learn all
-				MACs after restarting. The only consequence will be increased network
-				traffic due to frames broadcast until all MACs are learned. In general
-				the source code has to be checked to make sure what is cleared on
-				startup and modified to preserve the configuration.\\
-
-				\emph{TRUd/eRSTPd} - topology reconfiguration is done in hardware if
-				needed, this daemon is used only to configure TRU/RTU HDL module.
-				However, the story is similar as with the RTUd. If eRSTPd crashes, we
-				need to store this information so that when it runs again, it does not
-				erase the whole configuration. Also if topology reconfiguration is done
-				in HDL while eRSTPd is down, HDL should keep the flag that it happened,
-				and eRSTPd should read this flag when starting, so that it's aware that
-				now, backup link is active.\\
-		\end{packed_enum}
-
-	\item {\bf Kernel crash}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO
-			\item [] \underline{Severity}: ERROR
-			\item [] \underline{Description}:
-				If Linux kernel has crashed the system reboots. We have
-				no synchronization, no SNMP to report the status, FPGA may be still
-				forwarding Ethernet traffic, but based on dynamic and static routing
-				rules from before the crash.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: On kernel crash, we should restart (it's done
-				already) but also be able to determine after the next boot what was the
-				reason of the reboot. There is a register in the processor that tells us
-				if we rebooted after the crash or is it a "clean" boot:\\
-				\lstset{frame=single, captionpos=b, caption=, basicstyle=\scriptsize, backgroundcolor=\color{light-gray}, label= }
-				\begin{lstlisting}
-After a power-on:
-wrs-192.168.16.242# devmem 0xfffffd04
-0x00010001
-After reboot:
-wrs-192.168.16.242# devmem 0xfffffd04
-0x00010300
-				\end{lstlisting}
-		\end{packed_enum}
-
-	\item {\bf System nearly out of memory}
-		\label{fail:other:no_mem}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(DONE?, create new object to report if error?)}
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:
-			\item [] \underline{SNMP objects}:\\
-				\texttt{HOST-RESOURCES-MIB::hrStorageDescr.<x>}\\
-				\texttt{HOST-RESOURCES-MIB::hrStorageSize.<x>}\\
-				\texttt{HOST-RESOURCES-MIB::hrStorageUsed.<x>}
-			\item [] \underline{Note}: we need to monitor and report the amount of the
-				free memory, report it through SNMP and raise an alarm if it's extremely
-				low (but still enough to keep the system running). In general we should
-				compare \texttt{hrStorageSize} with \texttt{hrStorageUsed} for each
-				chunk of memory and each partition.
-		\end{packed_enum}
-
-	\item {\bf CPU load too high}
-		\label{fail:other:cpu}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO \emph{(DONE?)}
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::cpuLoad} \emph{(not implemented)}\\
-				Can \texttt{HOST-RESOURCES-MIB::hrProcessorLoad} be used?
-        ("The average, over the last minute, of the percentage
-        of time that this processor was not idle.
-        Implementations may approximate this one minute
-        smoothing period if necessary.")
-			\item [] \underline{Note}: similar situation as with the memory. We need
-				to monitor, report and alarm if CPU load is close to 100\% (but still
-				enough to keep the system running).
-		\end{packed_enum}
-
-	\item {\bf Temperature inside the box too high}
-		\label{fail:other:temp}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				If the temperature raises too high we might break our electronics inside
-				the box. It also means that most probably one or both of the fans inside
-				the box are broken and should be replaced. There are 4 temperature
-				sensors monitored:
-				\begin{itemize}
-					\item \emph{IC19} - temperature below the FPGA
-					\item \emph{IC20}, \emph{IC17} - temperature near the SCB power supply
-						circuit
-					\item \emph{IC18} - temperature near the VCXO and PLLs (AD9516,
-						CDCM6100)
-				\end{itemize}
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::tempFPGA}\\
-				\texttt{WR-SWITCH-MIB::tempPSL}\\
-				\texttt{WR-SWITCH-MIB::tempPSR}\\
-				\texttt{WR-SWITCH-MIB::tempPLL}\\
-				\texttt{WR-SWITCH-MIB::tempTholdFPGA}\\
-				\texttt{WR-SWITCH-MIB::tempTholdPLL}\\
-				\texttt{WR-SWITCH-MIB::tempTholdPSL}\\
-				\texttt{WR-SWITCH-MIB::tempTholdPSR}\\
-				\texttt{WR-SWITCH-MIB::tempWarning}\\
-			\item [] \underline{Note}: 
-			\texttt{tempWarning} is raised when temperature read from any of these sensors
-			exceeds individually set threshold in \emph{.config}. When at least one threshold
-			temperature is not set tempWarning returns \emph{Threshold-not-set}.
-			Temperature is read by the HAL to drive PWM inside the FPGA. HAL reports
-			temperature to its area in the shared memory.
-		\end{packed_enum}
-
-	\item {\bf Not supported SFP plugged into the cage (especially non 1-Gb SFP)}
-		\label{fail:other:sfp}
-		\begin{packed_enum}
-			\item [] \underline{Status}: DONE
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				If not supported Gigabit Fiber SFP is plugged into the cage, then it's a
-				timing issue \ref{fail:timing:wrong_sfp}. However, if a non 1-Gb SFP is
-				used, then no Ethernet traffic would be flowing on that port. It's due
-				to the fact, that we don't have 10/100Mbit Ethernet implemented inside
-				the WRS.
-			\item [] \underline{SNMP objects}:\\
-				\texttt{WR-SWITCH-MIB::portSfpVN.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpPN.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpVS.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpGbE.<n>}\\
-				\texttt{WR-SWITCH-MIB::portSfpError.<n>}
-		\end{packed_enum}
-
-	\item {\bf File system / Memory corruption}
-		\label{fail:other:memory}
-		\begin{packed_enum}
-			\item [] \underline{Description}:\\
-			\item [] \underline{SNMP objects}: \emph{(none)}
-			\item [] \underline{Note}: how shall we detect this ? Based on
-				\emph{dmesg} errors reported by UBI and system in general ?\\
-				This is bad, crazy things may happen, we can't do much about it.
-		\end{packed_enum}
-
-	\item {\bf Kernel freeze}
-		\begin{packed_enum}
-			\item [] \underline{Description}:
-				If kernel freezes we can do nothing. It can freeze e.g. due to some
-				infinite in the irq handler. It's like with the power failure, somebody
-				has to go to the place where WRS is installed and investigate/restart
-				the device.
-			\item [] \underline{SNMP objects}: \emph{(none)}
-		\end{packed_enum}
-
-	\item {\bf Power failure}
-		\begin{packed_enum}
-			\item [] \underline{Description}:\\
-				Power failure may be either a WRS problem (i.e. broken power supply
-				inside the switch) or an external problem (i.e. providing voltage to the
-				device). There is not much reporting we can do in such case, it's up to
-				the Network Management Station to raise an alarm if the SNMP Agent does
-				not respond to the SNMP requests.
-			\item [] \underline{SNMP objects}: \emph{(none)}
-		\end{packed_enum}
-
-	\item {\bf Hardware problem}
-		\begin{packed_enum}
-			\item [] \underline{Description}:\\
-				If any crucial hardware part breaks we'll most probably notice it as one
-				(or multiple) timing/data errors described in the previous sections.
-				Besides that, we don't have any self-diagnostics on-board. Few examples:
-				\begin{itemize}
-					\item DAC / VCO - problems with synchronization
-					\item cooling	fans - rise of the temperature inside the WRS box
-						(failure \ref{fail:other:temp})
-					\item power supply, ARM, FPGA - booting problem (failure
-						\ref{fail:other:boot})
-					\item memory chip - data corruption (failure \ref{fail:other:memory})
-				\end{itemize}
-			\item [] \underline{SNMP objects}: \emph{(none)}
-		\end{packed_enum}
-
-	\item {\bf Management link down}
-		\label{fail:other:management_link}
-		\begin{packed_enum}
-			\item [] \underline{Description}:\\
-				For obvious reasons we are not able to report through SNMP that the
-				management link is down. This should be detected and reported by the NMS
-				if it does not receive SNMP and ICMP responses from the WRS.
-			\item [] \underline{SNMP objects}: \emph{(none)}
-		\end{packed_enum}
-
-	\item {\bf No static IP on the management port \& failed to DHCP}
-		\begin{packed_enum}
-			\item [] \underline{Description}:\\
-				From operator's point of view it is similar to the issue
-				\ref{fail:other:management_link}. WRS is not accessible through the
-				management port, so its status cannot be reported. This should be
-				detected and reported by the NMS if it does not receive SNMP and ICMP
-				responses from the WRS. In such case WR expert should make a physical
-				connection to the management USB port of the WRS to diagnose the
-				problem.
-			\item [] \underline{SNMP objects}: \emph{(none)}
-		\end{packed_enum}
-
-	\item {\bf IP address on the management port has changed}
-		\begin{packed_enum}
-			\item [] \underline{Status}: TODO
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				I'm not yet sure how we should report this. Probably SNMP is not the
-				best choice because if the IP changes we're no longer able to poll SNMP
-				objects (until IP is updated also in the Network Management Station). We
-				should either generate SNMP trap to NMS or send Syslog message to a
-				central server.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-		\end{packed_enum}
-
-	\item {\bf Multiple unauthorized access attempts}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}:\\
-				If we observe many attempts to gain a root access through ssh (or the
-				web interface) that might mean somebody tries to do something nasty. We
-				should report such situation as a Warning.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-			\item [] \underline{Note}: Bad password event is reported by syslog as
-				warning. We should probably use this information to add an SNMP object.
-		\end{packed_enum}
-
-	\item {\bf Network reconfiguration (RSTP)}
-		\label{fail:other:rstp}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}: \emph{(not yet implemented)}\\
-				If topology reconfiguration occurs because of the primary link failure,
-				this fact should be reported through SNMP as a warning. It's not
-				critical situation, WR network still works. However, further
-				investigation should be performed to repair the broken link.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-		\end{packed_enum}
-
-	\item {\bf Backup link down}
-		\begin{packed_enum}
-			\item [] \underline{Status}: for later
-			\item [] \underline{Severity}: WARNING
-			\item [] \underline{Description}: \emph{(not yet implemented)}\\
-				It's related to the issue \ref{fail:other:rstp}. If the WRS uses primary
-				uplink, but the backup one fails, it's not a critical fault. WR Network
-				still works, but the problem should be diagnosed and repaired to have
-				the backup link operational in case the primary one fails.
-			\item [] \underline{SNMP objects}: \emph{(not yet implemented)}
-		\end{packed_enum}
-
-\end{enumerate}
-
-%\subsection{Switch out of sync to Master}
-%
-%\subsection{Switch made a big offset jump to follow Master}
-%
-%\subsection{Unsupported SFP plugged to one of the cages}
-%
-%\subsection{Lost lock to external 1-PPS \& 10 MHz}
-%
-%\subsection{Switch wasn't able to fetch initial time from NTP}
-%
-%\subsection{Suspicious value of any PTP parameter}
-%e.g. bitslide > 16000;  dTx/dRx = 0, etc.
-%
-%\subsection{PPSi/HAL/SNMP/any other userspace daemon has crashed}
-%
-%\subsection{LM32 software has crashed/restarted}
-%
-%\subsection{Cooling fan broken}
-%
-%\subsection{Power supply broken}
-%
-%\subsection{Switch not reachable after power cut}
-%
-%\subsection{Switch not reachable through SNMP}
-%
-%\subsection{One of the links went down}
-%
-%\subsection{Ethernet frames being dumped}
-%
-%\subsection{Linux is out of memory}
-%
-%\subsection{Filesystem error/corruption}
-%
-%\subsection{HW version not recognized, FPGA bitstream not loaded}
-%
-%\subsection{Frames storm coming from one or multiple ports to CPU}
--- a/documents/specifications/management/wrs_failures/intro.tex
+++ b/documents/specifications/management/wrs_failures/intro.tex
-\section{Introduction}
-
-This document tries to list all possible ways the White Rabbit Switch can
-brake. It is my brain dump and should be a starting point to improve SNMP
-implementation and alarms (traps) generation. The document also tries to
-describe what should be the operator's action for each failure. Whether it's
-enough to reboot the switch or if it should be replaced with a new unit.
-
-The document is organized in two parts. First one (section \ref{sec:failures})
-tries to list all the possible failures that may disturb synchronization and
-Ethernet switching. The structure of each failure description is the following:
-\begin{itemize}[leftmargin=0pt]
-	\item [] \underline{Mode}: for timing failures, it says which modes are
-		affected. Possible values are:
-		\begin{itemize}
-			\item \emph{Slave} - WR Switch has at least one Slave port synchronized to
-				another WR device higher in the timing hierarchy (though it may be also
-				Master to other WR/PTP devices lower in the timing hierarchy).
-			\item \emph{Grand Master} - WR Switch at the top of the synchronization
-				hierarchy. It is synchronized to an external clock (e.g. GPSDO, Cesium)
-				and provides timing to other WR/PTP devices.
-			\item \emph{Free-Running Master} - WR Switch at the top of the
-				synchronization hierarchy. It provides timing to other WR/PTP devices
-				but runs from a local oscillator (not synchronized to external atomic
-				clock).
-		\end{itemize}
-
-	\item [] \underline{Description}: what the problem is about, how important it
-		is and what bad may happen if it occurs. 
-	\item [] \underline{SNMP objects}: which SNMP objects should be monitored to
-		detect the failure. These may be objects from \texttt{WR-SWITCH-MIB} or one
-		of the standard MIBs used by the \emph{net-snmp}.
-	\item [] \underline{Notes}: optional comment in case required SNMP objects are
-		not yet exported by our current implementation of the SNMP agent. It
-		describes some preliminary ideas what should be exported in the near future.
-\end{itemize}
-
-Section \ref{sec:snmp_exports} is a documentation for people integrating WR
-switch into a control system, operators and WR experts. It describes all
-essential SNMP objects exported by the device divided into two groups:
-\emph{Operator/basic objects}, \emph{Expert objects}
--- a/documents/specifications/management/wrs_failures/snmp_exports.tex
+++ b/documents/specifications/management/wrs_failures/snmp_exports.tex
-\section{SNMP exports (WIP)}
-\label{sec:snmp_exports}
-
-\subsection{Operator/basic objects}
-Objects providing basic status of the WR Switch. It should be used by control
-system operators and people without deep knowledge of the White Rabbit
-internals. These values report the general status of the device and high level
-errors.\\
-
-{\bf Note}: We will need to change the SNMP code. There should be something like
-a loop reading all information periodically (e.g. every 5s) from various SHM
-areas (HAL, PPSi, SPLL), caching and calculating general status information.
-This way, when we receive SNMP request we can feed the information from our
-local SNMP cache. The same code could be later used to generate SNMP Traps.\\
-
-\noindent {\bf General Status}:
-\begin{itemize}%[leftmargin=0pt]
-  \item WRS general status - OK / Warning / Error
-  \item Timing Status
-  \item Networking Status
-  \item System Statue
-  \item Detailed status
-  \begin{itemize}
-    \item Timing
-    \begin{itemize}
-      \item PTP (TRACK\_PHASE, offset, RTT, fixed deltas, deamon crash,
-        servo\_update\_cnt)
-      \item SoftPLL (DelCnt = 0; mode, SeqState, AlignState)
-      \item Slave link down
-      \item PTP frames flowing ?
-      \item (placeholder for Switchover)
-      \item (placeholder for Holdover)
-    \end{itemize}
-    \item Networking
-    \begin{itemize}
-      \item (placeholder for Link down)
-      \item SFPs (portSfpError.<x> ?)
-      \item Endpoint status (2.2.2)
-      \item Swcore status (2.2.3, 2.2.5)
-      \item RTU status (2.2.4, 2.2.7)
-      \item (placeholder for TRU)
-      \item (placeholder for switchover or backup link state)
-    \end{itemize}
-    \item System
-    \begin{itemize}
-      \item Boot ok
-      \item Free memory too low
-      \item Temperature
-      \item CPU load too high
-      \item Disk space too low (?)
-    \end{itemize}
-  \end{itemize}
-  \item Version (rewrite existing)
-  \begin{itemize}
-    \item last date/time when firmware was updated\\
-      (save current time on restart, when new firmware is in /update so that it can be exported with SNMP)
-    \item contact info
-    \item build by
-    \item build date
-    \item hash, HW, SW, 
-    \item (check what exists and add missing)
-  \end{itemize}
-\end{itemize}
-
-\newpage
-\subsection{Expert/extended status}
-Expert objects can be used by White Rabbit experts for the in-depth diagnosis of
-the switch failures. These values are verbose and should not be used by
-operators.
-
-\begin{itemize}
-  \item Operation Status
-  \begin{itemize}
-    \item CPU Load (\%)
-    \item current time
-    \begin{itemize}
-      \item TAI
-      \item date string
-    \end{itemize}
-    \item Boot status
-    \begin{itemize}
-      \item boot cnt
-      \item restart reason
-      \item boot status values\\
-        (1 object for each: hwinfo readout, FPGA, LM32, kernel modules, userspace daemons, config retreived ok)
-      \item config source (tftp, flash, as string?)
-    \end{itemize}
-    \item Temperature
-    \begin{itemize}
-      \item temp 1..4
-      \item threshold 1..4
-    \end{itemize}
-  \end{itemize}
-
-  \item Restart Counters
-    \begin{itemize}
-      \item HAL
-      \item PPSi
-      \item RTUd
-      \item (..)
-      \item SPLL
-    \end{itemize}
-		
-  \item SoftPLL state
-  \begin{itemize}
-	  \item mode, irqcnt, seqstate, alignstate, Hlock, Mlock, Block[18], Err[18], HY, MY, delCnt, holdover, holdoverTime
-	  \item spll version
-	  \item spll build date 
-	  \item (...)
-  \end{itemize}
-
-  \item Networking
-  \begin{itemize}
-    \item VLAN table dump
-    \item RTU table dump (check if management sw uses snmpwalk)
-    \item SW core status
-    \begin{itemize}
-      \item Free pages
-    \end{itemize}
-  \end{itemize}
-
-  \item Pstats (pivot table, some of the counters should be used to fill
-    standard MIBs)
-  \item PtpData (make it an array for later switch-over needs)
-  \begin{itemize}
-     \item per instance/ which port
-   \end{itemize}
-
-  \item Ports status (per-port information)
-  \begin{itemize}
-    \item portEnable (enable/disable port via ifconfig)
-    \item ptpTxFrames (per port or per instance, depending on implementation)
-    \item ptpRxFrames (per port or per instance, depending on implementation)
-  \end{itemize}
-
-  \item Configuration
-  \begin{itemize}
-    \item PPS width
-  \end{itemize}
-
-\end{itemize}
-
-\newpage
-\subsection{Expert objects (to be updated, was first draft}
-{\bf Note:} we will put here MIB file dump later.
-
-\subsubsection{PTP/WR parameters}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{WR-SWITCH-MIB::ptpGrandmasterID}\\ - is it really Grand
-		Master, so the same ID for the whole network ? or is it a Master higher in
-		the sync hierarchy for a given device ?
-	\item [] \texttt{WR-SWITCH-MIB::ptpOwnID}
-	\item [] \texttt{WR-SWITCH-MIB::ptpMode}
-	\item [] \texttt{WR-SWITCH-MIB::ptpSyncSource}\\ - port number
-		\emph{wr0}/\emph{wr1}/... for Slave mode or \emph{ext} for Grand Master mode
-	\item [] \texttt{WR-SWITCH-MIB::ptpServoState}\\ - string, WR servo state
-		(\emph{SYNC\_IDLE}, \emph{SYNC\_SEC}, \emph{SYNC\_NSEC}, \emph{SYNC\_PHASE},
-		\emph{OFFSET\_STABLE}, \emph{TRACK\_PHASE}) (timing:
-		\ref{fail:timing:ppsi_track_phase})
-	\item [] \texttt{WR-SWITCH-MIB::ptpServoStateN}\\ - would it be usefull to
-		report also ptpServoState in a numeric form ? (timing:
-		\ref{fail:timing:ppsi_track_phase})
-	\item [] \texttt{WR-SWITCH-MIB::ptpRTT}\\ - Round-trip delay ($delay_{MM}$)
-		(timing: \ref{fail:timing:rtt_jump})
-	\item [] \texttt{WR-SWITCH-MIB::ptpDelayMS}\\ - one-way M-S delay
-		($delay_{MS}$)
-	\item [] \texttt{WR-SWITCH-MIB::ptpLinkLength}
-	\item [] \texttt{WR-SWITCH-MIB::ptpPhaseTracking}\\ - if phase tracking is
-		enabled (only for WR-demo purposes I think)
-	\item [] \texttt{WR-SWITCH-MIB::ptpClockOffsetPs}\\ (timing:
-		\ref{fail:timing:offset_jump})
-	\item [] \texttt{WR-SWITCH-MIB::ptpSkew}
-	\item [] \texttt{WR-SWITCH-MIB::ptpPhSetpoint}
-	\item [] \texttt{WR-SWITCH-MIB::ServoUpdates}
-	\item [] \texttt{WR-SWITCH-MIB::portLink.<n>}\\ (timing:
-		\ref{fail:timing:master_down}, \ref{fail:timing:no_frames}; data:
-		\ref{fail:data:link_down})
-	\item [] \texttt{WR-SWITCH-MIB::portMode.<n>}\\ (timing:
-		\ref{fail:timing:master_down}, \ref{fail:timing:no_frames})
-	\item [] \texttt{WR-SWITCH-MIB::portLocked.<n>}
-	\item [] \texttt{WR-SWITCH-MIB::portPeer.<n>}
-	\item [] \texttt{WR-SWITCH-MIB::portPtpState.<n>}\\ - does it make sense to
-		report PTP state for each port ? (regular PTP, not WR state)
-	\item [] \texttt{WR-SWITCH-MIB::portPtpTxFrames.<n>}\\ - how many PTP frames
-		were sent from the port (counted by PTP/PPSi) (timing:
-		\ref{fail:timing:no_frames})
-	\item [] \texttt{WR-SWITCH-MIB::portPtpRxFrames.<n>}\\ - how many PTP frames
-		were received on the port (counted by PTP/PPSi) (timing:
-		\ref{fail:timing:no_frames})
-	\item [] \texttt{WR-SWITCH-MIB::portActiveSlave.<n>}\\ - 0/1 to mark which one
-		is the active Slave (if there are also Backups and timing switchover)
-	\item [] \texttt{WR-SWITCh-MIB::portDeltaTxM.<n>}\\ - for each Slave and
-		Backup port (timing: \ref{fail:timing:deltas_report})
-	\item [] \texttt{WR-SWITCH-MIB::portDeltaRxM.<n>}\\ - for each Slave and
-		Backup port (timing: \ref{fail:timing:deltas_report})
-	\item [] \texttt{WR-SWITCH-MIB::portDeltaTxS.<n>}\\ - for each Slave and
-		Backup port (timing: \ref{fail:timing:deltas_report})
-	\item [] \texttt{WR-SWITCH-MIB::portDeltaRxS.<n>}\\ - for each Slave and
-		Backup port (timing: \ref{fail:timing:deltas_report})
-	\item [] \texttt{WR-SWITCH-MIB::}
-		\begin{itemize}[topsep=-12pt]
-			\item any other usefull to report stuff from backup channels
-			\item holdover information (e.g. timestamp when it was activated)
-		\end{itemize}
-\end{itemize}
-
-\subsubsection{SoftPLL parameters}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{WR-SWITCH-MIB::spllMode}\\ - Grand Master / Free-running
-		Master / Slave / Disabled (timing: \ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllIrqCnt}\\ - IRQ counter
-	\item [] \texttt{WR-SWITCH-MIB::spllSeqState}\\ (timing:
-		\ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllAlignState}\\ (timing:
-		\ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllHlock}\\ (timing:
-		\ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllMlock}\\ (timing:
-		\ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllBlock}\\ - All backup channels locked
-	\item [] \texttt{WR-SWITCH-MIB::spllHY}\\ - Helper DAC setting (Helper PI.Y)
-	\item [] \texttt{WR-SWITCH-MIB::spllMY}\\ - Main DAC setting (Main PI.Y)
-	\item [] \texttt{WR-SWITCH-MIB::spllDelCnt}\\ - De-lock counter (timing:
-		\ref{fail:timing:spll_unlock})
-	\item [] \texttt{WR-SWITCH-MIB::spllCrashCnt}\\ - counter incremented when
-		LM32 SoftPLL software crash was detected (e.g. CPU has followed a NULL
-		pointer). Should this be a counter ? (timing: \ref{fail:timing:spll_crash})
-	\item [] \texttt{WR-SWITCH-MIB::}
-		\begin{itemize}[topsep=-12pt]
-			\item per-port stuff for active and backup channels related to timing
-				switchover
-		\end{itemize}
-\end{itemize}
-
-\subsubsection{Per-port statistics}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{WR-SWITCH-MIB::pstatsDescr.<x>} - string describing counter
-		$<$x$>$
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.1} - Tx PCS FIFO underruns (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.2} - Rx PCS FIFO overruns (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.3} - Rx invalid 8b10b codes (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.4} - Rx sync losts (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.5} - received pause frames
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.6} - Packet Filter frame drops
-		(data: \ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.7} - Rx PCS Errors (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.8} - received giant frames
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.9} - received runt frames
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.10} - Rx CRC errors (data:
-		\ref{fail:data:ep_txrx})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.11 - 18} - Rx framess assigned by
-		Packet Filter to classes 0 to 7
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.19} - transmitted frames (data:
-		\ref{fail:data:swcore_hang})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.20} - received frames (data:
-		\ref{fail:data:too_much_HP})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.21} - Rx frames dropped due to RTU
-		being full and not accepting requests (data: \ref{fail:data:rtu_full})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.22 - 29} - received frames with
-		priority 0 - 7 (based on 802.1q tag priorities to traffic classes mapping)
-		(data: \ref{fail:data:too_much_HP})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.30} - valid RTU requests
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.31} - valid RTU responses
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.32} - dropped frames based on RTU
-		decision
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.33} - Fast Match high priority
-		frames (data: \ref{fail:data:too_much_HP})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.34} - Fast Match fast-forward
-		frames
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.35} - Fast Match non-forward
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.36} - Fast Match valid responses
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.37} - Full Match valid responses
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.38} - RTU forward decisions to the
-		port (data: \ref{fail:data:swcore_hang})
-	\item [] \texttt{WR-SWITCH-MIB::pstatsWR<n>.39} - TRU valid responses (not
-		supported)
-\end{itemize}
-
-\subsubsection{Other per-port status}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{IF-MIB::ifOperStatus.<n>}\\ - is link up or down (data:
-		\ref{fail:data:link_down})
-	\item [] \texttt{WR-SWITCH-MIB::portSfpID.<n>}\\ (timing:
-		\ref{fail:timing:wrong_sfp})
-	\item [] \texttt{WR-SWITCH-MIB::portSfpInDB.<n>}\\ - was SFP ID found in SFP
-		database with fixed delays and alpha ? (timing: \ref{fail:timing:wrong_sfp})
-	\item [] \texttt{WR-SWITCH-MIB::portSfpGbE.<n>}\\ - is there Gigabit Ethernet
-		SFP plugged ? (other: \ref{fail:other:sfp})
-	\item [] \texttt{WR-SWITCH-MIB::portNoTiming.<n>}\\ - port is used only for data
-		transfer, no timing, or non-WR timing; timing-unsupported SFP may be used
-		there (it's a signal for NMS not to raise an alarm if
-		\texttt{WR-SWITCH-MIB::portSfpInDB} is \emph{false})
-	\item [] \texttt{WR-SWITCH-MIB::confVLAN.<n>}\\ - per-port VLAN configuration
-	\item [] \texttt{WR-SWITCH-MIB::portEnabled.<n>}\\
-		- read/write value\\
-		- if the port is enabled / enable or disable the port; this may be useful of
-		part of the network causing problem would have to be remotely disconnected
-\end{itemize}
-
-\subsubsection{Other HDL info}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{WR-SWITCH-MIB::swcoreUsedPages}\\ - number of used pages in
-		the MPM memory (data: \ref{fail:data:swcore_hang})
-	\item [] \texttt{WR-SWITCH-MIB::swcoreFreePages}\\ - number of free pages in
-		the MPM memory (data: \ref{fail:data:swcore_hang})
-\end{itemize}
-
-\subsubsection{System status and configuration}
-\begin{itemize}[leftmargin=0pt]
-	\item [] \texttt{WR-SWITCH-MIB::ppsWidth}\\ - configured width of 1-PPS signal
-	% info from wrs_version -t
-	\item [] \texttt{WR-SWITCH-MIB::swVer}\\ - version of the WRS software
-	\item [] \texttt{WR-SWITCH-MIB::swBuildBy}\\ - who compiled the firmware
-	\item [] \texttt{WR-SWITCH-MIB::swBuildDate}\\ - when the firmware was
-		compiled
-	\item [] \texttt{WR-SWITCH-MIB::swSpllVer}\\ - version of the LM32 software
-		(revision reported in rt\_cpu.elf)
-	\item [] \texttt{WR-SWITCH-MIB::swSpllBuildDate}\\ - when LM32 firmware was
-		compiled
-	\item [] \texttt{WR-SWITCH-MIB::gwVer}\\ - version of the WRS gateware
-	\item [] \texttt{WR-SWITCH-MIB::gwBuild}\\ - gateware build
-	\item [] \texttt{WR-SWITCH-MIB::gwHash.0}\\ - commit hash of the
-		\emph{wr-switch-hdl} repo
-	\item [] \texttt{WR-SWITCH-MIB::gwHash.1}\\ - commit hash of the
-		\emph{general-cores} repo
-	\item [] \texttt{WR-SWITCH-MIB::gwHash.2}\\ - commit hash of the
-		\emph{wr-cores} repo
-	\item [] \texttt{WR-SWITCH-MIB::hwVer.0}\\ - version of the scb
-	\item [] \texttt{WR-SWITCH-MIB::hwVer.1}\\ - version of the backplane
-	\item [] \texttt{WR-SWITCH-MIB::hwFpga}\\ - FPGA type
-	\item [] \texttt{WR-SWITCH-MIB::hwSN}\\ - serial number of the device
-	\item [] \texttt{WR-SWITCH-MIB::hwProd}\\ - manufacturer of the hardware
-
-	\item [] \texttt{HOST-RESOURCES-MIB::hrSWRunName.<x>}\\ - is a list of running
-		processes in the system. Each object \emph{x} is a string with process name,
-		and \emph{x} is PID of this process. We need to filter processes like:
-		\begin{packed_items}
-			\item \emph{ppsi}
-			\item \emph{wrsw\_hal}
-			\item \emph{wrsw\_rtud}
-			\item \emph{dropbear}
-			\item \emph{udhcpc}
-			\item \emph{rsyslogd}
-			\item \emph{snmpd}
-			\item \emph{lighttpd}
-		\end{packed_items}
-		\vspace{12pt}
-		(timing: \ref{fail:timing:ppsi_crash}, \ref{fail:timing:hal_crash}; data:
-		\ref{fail:data:rtu_crash}; other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::ptpRunCnt}\\ - how many times PTP/PPSi daemon
-		has crashed (timing: \ref{fail:timing:ppsi_crash})
-	\item [] \texttt{WR-SWITCH-MIB::halRunCnt}\\ - how many times HAL daemon
-		has crashed (timing: \ref{fail:timing:hal_crash})
-	\item [] \texttt{WR-SWITCH-MIB::rtuRunCnt}\\ - how many times RTU daemon
-		has crashed (data: \ref{fail:data:rtu_crash})
-	\item [] \texttt{WR-SWITCH-MIB::sshRunCnt}\\ - how many times Dropbear
-		daemon has crashed (other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::udhcpdRunCnt}\\ - how many times DHCP daemon
-		has crashed (other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::rsyslogRunCnt}\\ - how many times rsyslog
-		daemon has crashed (other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::snmpdRunCnt}\\ - how many times SNMP daemon
-		has crashed (other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::httpdRunCnt}\\ - how many times HTTPd daemon
-		has crashed (other: \ref{fail:other:daemon_crash})
-	\item [] \texttt{WR-SWITCH-MIB::sysCnfDate}\\ - TAI seconds when last
-		time the configuration was changed
-	\item [] \texttt{WR-SWITCH-MIB::sysCnfCrit}\\ - is \emph{true} when any of
-		the critical configuration options was modified during the last
-		reconfiguration. Critical configuration options:
-		\begin{packed_items}
-			\item PTP/PPSi timing mode
-			\item fixed hardware delays
-		\end{packed_items}
-		(timing: \ref{fail:timing:wrong_config})
-	\item [] \texttt{WR-SWITCH-MIB::sysRst}\\ - if true, system had to auto-reboot
-		due to a serious fault (e.g. kernel crash)
-	\item [] \texttt{WR-SWITCH-MIB::rtuRules}\\ - RTU table with dynamic and
-		static entries (\emph{rtu\_stat}) (data: \ref{fail:data:net_loop})
-	\item [] \texttt{HOST-RESOURCES-MIB::hrStorageDescr.<x>}\\ - description of
-		the memory/partition. \emph{x} can be:
-		\begin{packed_items}
-			\item [] {\bf 1} - Physical memory
-			\item [] {\bf 3} - Virtual memory
-			\item [] {\bf 6} - Memory buffers
-			\item [] {\bf 7} - Cached memory
-			\item [] {\bf 10} - Swap space
-			\item [] {\bf 31} - /update partition
-			\item [] {\bf 32} - /boot partition
-			\item [] {\bf 33} - /usr partition
-		\end{packed_items}
-		(other: \ref{fail:other:no_mem})
-	\item [] \texttt{HOST-RESOURCES-MIB::hrStorageSize.<x>}\\ - size of the
-		memory/partition (other: \ref{fail:other:no_mem})
-	\item [] \texttt{HOST-RESOURCES-MIB::hrStorageUsed.<x>}\\ - utilization of the
-		memory/partition (other: \ref{fail:other:no_mem})
-	\item [] \texttt{WR-SWITCH-MIB::sysNoMem}\\ - if true, system is nearly out of
-		memory (other: \ref{fail:other:no_mem})
-	\item [] \texttt{WR-SWITCH-MIB::cpuLoad}\\ - current CPU utilization (\%)
-		(other: \ref{fail:other:cpu})
-	\item [] \texttt{WR-SWITCH-MIB::tempFPGA}\\ - SCB temperature below the FPGA
-		(other: \ref{fail:other:temp})
-	\item [] \texttt{WR-SWITCH-MIB::tempScbPsu.1}\\ - SCB temperature near the
-		power supply circuit (other: \ref{fail:other:temp})
-	\item [] \texttt{WR-SWITCH-MIB::tempScbPsu.2}\\ - SCB temperature near the
-		power supply circuit (other: \ref{fail:other:temp})
-	\item [] \texttt{WR-SWITCH-MIB::tempPLL}\\ - SCB temperature near the VCXO and
-		PLLs (other: \ref{fail:other:temp})
-\end{itemize}
-
-\noindent \rule{\textwidth}{2pt}
-
-%%%%%%%%%%%%%%%%%%5
-%% Other notes
-%
-% What else should be reported in the future
-% Status of Primary Slave port and backup links
-% For backup timing links, report parameters from Backup SPLL channels and PTP servo
-% What can be reported regarding eRSTP ?
-% %	role of the bridge - root/designated
-% % port role - root/designated/backup/alternate/disabled
-% % number of exchanged BPDUs
-%
-% * we could use information from RSTP to visualize the topology of network made of switches
-% * switches exchange BPDU messages to leard about other switches
-% * RFC 2674 - Bridges with priority, multicast pruning and VLAN
--- a/documents/specifications/management/wrs_failures/wrs_failures.tex
+++ b/documents/specifications/management/wrs_failures/wrs_failures.tex
-\def\us{\char`\_}
-
-\documentclass[a4paper, 12pt]{article}
-%\documentclass{article}
-
-\usepackage{fullpage}
-\usepackage{pgf}
-\usepackage{tikz}
-\usetikzlibrary{arrows,automata,shapes}
-\usepackage{multirow}
-\usepackage{color}
-\usepackage[latin1]{inputenc}
-\usepackage{verbatim}
-\usepackage{amsmath}
-\usepackage{times,mathptmx}
-\usepackage{chngcntr}
-\usepackage{hyperref}
-\usepackage{enumitem}
-\usepackage{scrextend}
-%\usepackage[table]{xcolor}
-\usepackage{listings}
-\definecolor{light-gray}{gray}{0.95}
-%\usepackage[firstpage]{draftwatermark}
-
-
-\usepackage{listings}
-\usepackage{cancel}
-\graphicspath{ {../../../../figures/} }
-
-\newenvironment{packed_enum}{
-\begin{itemize}[leftmargin=0pt,topsep=-12pt]
-	\setlength{\itemsep}{1pt}
-	\setlength{\parskip}{0pt}
-	\setlength{\parsep}{0pt}
-}{\end{itemize}}
-
-\newenvironment{packed_items}{
-\begin{itemize}[topsep=-12pt]
-	\setlength{\itemsep}{1pt}
-	\setlength{\parskip}{0pt}
-	\setlength{\parsep}{0pt}
-}{\end{itemize}}
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% creating subsubsubsection notation
-% src: http://www.latex-community.org/forum/viewtopic.php?f=5&t=791
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\setcounter{secnumdepth}{6}
-\renewcommand\theparagraph{\Alph{paragraph}}
-
-\makeatletter
-\renewcommand\paragraph{\@startsection{paragraph}{4}{\z@}%
-                                     {-3.25ex\@plus -1ex \@minus -.2ex}%
-                                     {0.0001pt \@plus .2ex}%
-                                     {\normalfont\normalsize\bfseries}}
-\renewcommand\subparagraph{\@startsection{subparagraph}{5}{\z@}%
-                                     {-3.25ex\@plus -1ex \@minus -.2ex}%
-                                     {0.0001pt \@plus .2ex}%
-                                     {\normalfont\normalsize\bfseries}}
-\counterwithin{paragraph}{subsubsection}
-\counterwithin{subparagraph}{paragraph}
-\makeatother
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\newcommand{\eqoffset}[1]{%
-  {\ensuremath{%
-      {\text{offset}}_{#1}}%
-  }%
-}
-\newcommand{\eqdelay}[1]{{\text{delay}}_{#1}}
-\newcommand{\eqasymm}{{\text{asymmetry}}}
-
-\begin{document}
-
-\title{White Rabbit Switch: Failures and Diagnostics}
-\author{Grzegorz Daniluk\\ Adam Wujek\\[.5cm] CERN BE-CO-HT}
-\maketitle
-\thispagestyle{empty}
-
-\begin{figure}[ht!]
-  \centering
-  \vspace{1.3cm}
-  \includegraphics[width=0.50\textwidth]{logo/WRlogo.pdf}
-\end{figure}
-
-\newpage
-
-\newpage
-
-\newpage
-
-\tableofcontents
-
-\newpage
-\input{intro.tex}
-
-\newpage
-\section{Possible Errors}
-\label{sec:failures}
-\input{fail.tex}
-\newpage
-\input{snmp_exports.tex}
-%\section{SNMP exports}
-%\subsection{Operator/basic objects}
-%\subsection{Expert objects}
-
-%\newpage
-%\bibliographystyle{unsrt}
-%\bibliography{references}
-
-\end{document}