clockResilience.tex 5.89 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
\section{Clock Resilience}

While the previous chapter described WRPTP focusing on a single
Master-to-Slave link, clock resilience needs to be considered in terms
of the entire WR Network.

\subsection{Clock Path redundancy}

In a WRN, the clock is distributed along so-called \textit{clock
  paths}. A path is understood as the cables and switches by which
information is sent from the transmitter (node/switch) to the receiver
(switch/node). The continuity of clock distribution -- existence of
clock paths to all WRN components -- is essential in ensuring clock
resilience. Therefore, clock path redundancy is introduced. This
allows to prevent a \textit{single point of failure} \footnote{Failure
  of a single network component.} from affecting clock distribution
and inevitably translates into network topology redundancy, which is
supported by the \textcolor{red}{WR Switch}. The \textcolor{red}{WR
  Switch V2 (WRSv2)} has two uplinks which can be connected to
separate sources of timing (downlinks of other \textcolor{red}{WR
  Switches} or a node being \textit{grandmaster}). Redundancy of the
WRN is limited by a number of factors:
\begin{itemize}
\item Latency of data delivery which limits the number of network
  layers.
\item SyncE which enforces a tree-like network structure and ensures
  high-quality frequency distribution only through a limited number of
  switches.
\item Data delivery reliability which enforces a tree-like topology
  with the roles of ports defined \textit{a priori} by the Rapid
  Spaning Tree Protocol (RSTP) algorithm (\cite{biblio:IEEE8021D},
  \cite{biblio:Robustness}).
\end{itemize}
Studies (\cite{biblio:Robustness}) suggest that, given the limitations
in topology, the two uplinks of WRSv2 might be not sufficient to
achieve high network reliability. The next version of the
\textcolor{red}{WR Switch (WRSv3)} will eliminate this limitation.

The choice of an active clock path -- the uplink which is used for
syntonization and synchronization -- is made based on the RSTP
algorithm.

\subsection{Switch-over}

The redundancy of clock path ensures continuity of clock distribution
but introduces a possible instability of the recovered clock during
the process of switching between sources (active uplink), further
called \textit{switch-over}. Syntonization and synchronization are
governed by SyncE and WRPTP respectively.  Therefore, both need to
take account of stability during \textit{switch-over}.

\subsubsection{SyncE-wise}

The White Rabbit clock recovery unit (described in
Sec.~\ref{sec:hwSupport}), by design, enables multiple inputs (RX
clocks). The phase and frequency errors of all the input clocks are
continuously tracked and fed into the VCTCXO control algorithm and a
\textcolor{red}{delay can be introduced to wait for freqency/phase
  error validation}, if tests show such a need. Therefore, SyncE-wise
switch-over is considered seamless for syntonization.
\textcolor{blue}{I would need some numbers and tests here}

\subsubsection{WRPTP-wise}

Since a WRN is a set of independent M-to-S link connections, WRPTP is
unaware of whether a given link is active or not. Delay and offset
measurements are performed on all the links all the time and the
information is provided to the clock recovery unit (see
\figurename~\ref{fig:PLL}). Therefore, the switch-over is unnoticible
for WRPTP and shall be seamless for synchronization.
\textcolor{blue}{ Measurement of "backup" offset and delay with
  reference to the primary one - idea by Tomek to further decrease
  switch-over instability }

\subsection{External conditions variation}

Apart from the switch-over process, another potential source of clock
instability is a variation of external conditions,
e.g. temperature. It affects the characteristics of the physical
connections, consequently changing the delay introduced by the medium
--the variable delay ($\delta_{ms,sm}$).  It is important to note that
frequency distributed over SyncE is not affected by this
phenomenon. Therefore, only synchronization over WRPTP, i.e. delay
change, needs to be compensated. This is done by periodically
measuring the delay through a standard exchange of PTP messages. The
frequency of measurements needs to be greater then the speed of
temperature changes, which is reasonably slow.
\textcolor{red}{Therefore, a much lower rate of message exchange than
  in standard PTP is sufficient.}

\subsection{Loss of WRPTP-messages}

PTP employs timeouts to address PTP-specific message loss, provoking
repetition of operations and re-sending of messages. WRPTP uses the
same idea during the \textit{WR Link Setup} (see
Sec.~\ref{sec:wrLinkSetup}) repeating operations and re-sending WR
management messages in case of message loss. The measurement of the
offset and delay in WRPTP is much more tolerant to multiple message
loss. Unlike standard PTP, WRPTP is responsible only for
synchronization (syntonization is done through SyncE). After achieving
synchronization with the master at the beginning of the connection,
the offset changes only due to temperature-related delay
variation. The rate of delay measurements through the PTP-message
exchange is supposed to be much greater than the rate of change of
physical medium parameters. Therefore, multiple PTP-message loss is
tolerated with no effects on clock stability.  \textcolor{blue}{we
  should probably have a sanity check in the PTP daemon for ruling out
  impossible corrections, i.e. very fast changes probably due to a
  measurement or transmission error}

\subsection{Cascading Boundary Clocks}

A switch can be seen as a boundary clock. In standard PTP, a cascade
of boundary clocks faces nonlinear decreasing synchronization accuracy
problems due to error accumulation.  \textcolor{blue}{
  \begin{itemize}
  \item if it is possible to make measurements (we need $\geq$ 3
    switches), measurements here
  \item the deterioration should be due to SyncE...
  \item measurements would answer the question: how many layers of
    switches we can have.
\end{itemize}
}