Commit 46137f46 authored by Grzegorz Daniluk's avatar Grzegorz Daniluk

docs/specs/robustness: use common figures

parent 73669eb0
all : robustness.pdf
.PHONY : all clean
robustness.pdf : robustness.tex
latex $^
latex $^
dvips robustness
clean :
rm -f *.eps *.pdf *.dat *.log *.out *.aux *.dvi *.ps *.toc
\paragraph*{List of Acronyms}
\vspace{3 cm}
WR & : & White Rabbit \\
PTP & : & Precision Time Protocol \\
WRPTP & : & White Rabbit extension PTP \\
\HP & : & High Priority. Used to indicate special WR Ethernet
\SP & : & Standard Priority. Used to indicate special WR
Ethernet Frames\\
&& \\
WRN & : & White Rabbit Node \\
WRS & : & White Rabbit Switch \\
WRCM & : & White Rabbit Clock Master Node \\
WRMM & : & White Rabbit Management Master Node \\
WRDM & : & White Rabbit Data Master Node \\
SNMP & : & Simple Network Management Protocol \\
FEC & : & Forward Error Correction\\
PFC & : & Priority Flow Control \\
% \cc{ } & : & Comment by Cesar \\
% \cm{ } & : & Comment by Maciej \\
A few of the terms (names) used in this document are a matter of discussion.
The currently used terms do not seem appropriate, they are sometimes
confusiong. The names are listed below. They are prone to be changed. A reader
is kindly asked to indicate a better name, if he/she have something in mind.
\item \HighPriority (\HP),
\item \StandardPriority (\SP),
\item \GranularityWindow (\GW),
\item \ControlMessage (\CM).
\chapter{Appendix: Reliability measure (Mean Time Between Failures)}
The measurement of reliability by the number of Control Messages lost per year,
which has been mentioned in the requirements, is neither standard nor practical.
Therefore, in this document, we estimate reliability of White Rabbit Network
using method which is readily and commonly applied to Large-Scale LANs. We use
Mean Time Between Failures of a single network component (be it WR Switch,
fibre/copper link, WR Node) to estimate reliability (i.e. MTBF and failure
probability) of entire White Rabbit Network as described in
Mean Time Between Failures (MTBF) represents a statistical likelihood that half
of the number of devices, represented by a given MTBF factor, will not function
properly after the period given by MTBF. MTBF does not give the functional
relationship between time and number of failures. However, the estimation that
the function is linear is assumed to be sufficient (see \cite{DesigningLSLANs},
page 36).
For network MTBF derivation, the probability of failure related to the number of
failures of N devices per unit time is interesting:
P_= \frac{N}{2*MTBF}
However, to use in probability calculations, a net value is required.
Therefore, the below equation is used. It is a probability of single device
failing in a single day, where M denotes MTBF
P_= \frac{1 }{2*M}
Examples of common values of MTBF of network components and corresponding
probability of their failure per-day are presented in
\caption{Example MTBFs and probabilities of network units (src:
\begin{tabular}{| l | c | c |} \hline
\textbf{Component} & \textbf{MTBF [hours]} & \textbf{Probability [$\%$]}\\
& & \\ \hline
Fiber connection & 1000 000 & 0.0012 \\
Router & 200 000 & 0.0060 \\ \hline
In order to calculate reliability of entire network, the meaning of network
failure needs to be defined. In case of White Rabbit Network, it is critical
that all the White Rabbit Nodes connected to the network receive Control
Messages. In other words, failure for White Rabbit Network is a failure of any
number of its components which prevents any WR Node from receiving Control
Messages. If single component causes network failure, such component is called
a single point of failure (SPoF).
The probability of entire network failure is calculated by adding probabilities
of all the components' simultaneous failures combinations which cause failure of
the entire network.
Using equation~\ref{eq:MTBFprobNetto}, MTBF of entire network can be calculated
out of network failure probability.
\chapter{Appendix: Ethernet Frame Delivery Delay Estimation}
\captionof{figure}{WR Switch routing using Swcore and RTU (not to
It is estimated that WR Switch routing ($delay_{sw}$) takes between
13 $\mu$s and 80 $\mu$s for highest priority traffic (size: 1500bytes), provided
no traffic congestion occurs and depending on the size of output buffer. In
order to estimate minimum Granularity Window a few more values needs to be
introduced and estimated. We define the time it takes for a WR Node to send an
Ethernet frame as Transmission Delay ($delay_{n\_tx}$) and the time it takes a
WR Node to receive Ethernet frame as Reception Delay ($delay_{n\_rx}$). The
delay introduced by physical connection (i.e. the time it takes for a frame to
travel through the physical medium) is defined as Link Delay ($delay_{f\_link}$
[$\frac{\mu s}{km}$] and $delay_{c\_link}$ [$\frac{\mu s}{km}$] for fibre and
copper respectively). Thus, the final equation to estimate the delay of single
Ethernet frame delivery time can be defined by the following equation:
Delay_{frame} = D_{f} * delay_{f\_link} + D_{c} * delay_{c\_link} +
N * delay_{sw} + delay_{n\_tx} + delay_{n\_rx}
where $D_f$ [$km$] is the total length of fibre connection and $D_f$ [$km$] is
the total length of copper connection. $N$ is the number of WR Switches on
the way.
In the following estimations, the worst case scenario of Ethernet frame size is
taken into account. It means that we always assume the size of the frame is
maximum, i.e. 1500bytes. Transmission of 1500bytes over Gigabit Ethernet takes
$~13\mu s$.
\paragraph{Ethernet Frame Transmission Delay Estimation.}
For simplicity, no encoding is assumed. The delay of frame transmission depends
on the remaining size of the currently sent frame and the number of packages
already enqueued in the output buffer. Assuming that sending 1500 bytes takes
$~13\mu s$, $delay_{n\_tx}= [ 0 \mu s \div (13 + B * 13)\mu s]$
where $B$ is the number of frames in the output buffer (maximum $B$ is the size
of the output buffer).
\paragraph{Ethernet Frame Reception Delay estimation.}
The reception of maximum size Ethernet frame is estimated to take $~13\mu s$
\footnote{The time of 1500byte Ethernet Frame reception is $12.176\mu s$,
in the calculations, it is overestimated to $13\mu s$.} .
It is assumed that no decoding is performed. Therefore $delay_{n\_rx} = 13\mu
\paragraph{Link Delay estimation.}
The delay introduced by link is estimated to be 5 [$\frac{\mu s}{km}$] for fibre
\cite{PropagationDelay} (\cm{i'm not sure about the source}) and 5
s}{km}$] copper \cite{FAIRtiming} link.
\paragraph{Switch Routing Delay estimation.}
The delay introduced on the switch is more complicated to estimate. The
reception delay and transmission delay overlap with the delay introduced by
storing and routing.
delay_{sw} = \delta + delay_{RTU} + delay_{n\_tx}
\; \; \; \; \;
\; \; \; \; \;
(delay_{n\_rx} - \delta) < delay_{RTU}
delay_{sw} = delay_{n\_rx} + delay_{n\_tx}
\; \; \; \; \;
\; \; \; \; \;
(delay_{n\_rx} - \delta) > delay_{RTU}
where $\delta$ is the time needed to receive frame's header and retrieve
information necessary for Routing Table Unit, e.g.: VLAN, source and
destination MAC. In general, it's always true that
delay_{sw} \geq delay_{n\_rx} + delay_{n\_tx}
The Routing Table Unit delay is estimated as $delay_{RTU} = [0.5\mu s \div 3\mu
s]$ (\cm{i'm not sure about this 3 us, need to make some tests/simulations })
So $delay_{sw} = (13 + B * 13)\mu s$ where $B$ is the number of
frames in the output buffer (maximum $B$ is the size of the output buffer). When
considering Frame Transmission Delay in the Node, the number of frames in the
output buffer can be assumed 0 ($B = 0$). However, in the switch's Frame
Transmission Delay consideration, it is very likely that $B > 0$, since many
ports can forward frames to the same port simultaneously. Therefore $B$ should
equal to the size of output buffer. In this document, we
assume $B=5$. However, the number can be much greater. The final range of the
delay for consideration is $delay_{sw} = [13 \mu s \div (13 + B * 13)\mu s$
\paragraph{Ethernet Frame Delivery Delay estimation.}
A summary of above estimations is included in the
Table~\ref{tab:EtherFrameDelayGeneral}. Details of the final frame delivery
delay estimation for GSI and CERN, taking into account the requirements
concerning the length of physical links, are depicted in
\caption{Elements of Ethernet frame delivery delay estimation.}
\begin{tabular}{| l | c | c | c |} \hline
\textbf{Name}&\textbf{Symbol}&\textbf{Value}&\textbf{Value} \\
& & Min& Max \\ \hline
% Sending node
Ethernet Frame Transmission Delay&$delay_{n\_tx}$&$0\mu s$&$(13 + B * 13)\mu s$
\\ \hline
% Switch
Switch Routing Delay &$delay_{n\_sw}$&$13\mu s$&$(13 + B * 13)\mu s$
\\ \hline
% Links
Link Delay & $delay_{link}$ &5 [$\frac{\mu
s}{km}$]&5 [$\frac{\mu s}{km}$]
\\ \hline
% Receivning node
Ethernet Frame Reception delay & $delay_{n\_rx}$&$13\mu s$&$13\mu s$
\\ \hline
\caption{Parameters and numbers used to estimate Ethernet frame delivery delay
in WR Network}
\begin{tabular}{| l | c | c | c | c |} \hline
\textbf{Name}&\textbf{Symbol}&\textbf{Value}&\textbf{Value}&\textbf{ Concenrs}
& & (GSI)&(CERN) &Network el.
\\ \hline
% Frame param
Frame size & $f\_size$ &1500 bytes&1500 bytes&Frame
\\ \hline
% Sending node
Number of frames in output buffer& $B_{tx}$ &0 &0 &Tx Node
\\ \cline{0-3}
Ethernet Frame Transmission Delay& $delay_{n\_tx}$&13 $\mu s$&13 $\mu s$&
\\ \hline
% Switch
Number of frames in output buffer& $B_{sw}$ &5 &5 &Switch
\\ \cline{0-3}
Switch Routing Delay & $delay_{n\_sw}$&78 $\mu s$&78 $\mu s$&
\\ \hline
Number of hops (switches) & $delay_{n\_sw}$&3 &3 &
\\ \hline
% Links
Link Length & $D$ &2 km &10 km & Links
\\ \cline{0-3}
Link Delay & $delay_{link}$ &10 $\mu s$&50$\mu s$&
\\ \hline
% Receivning node
Ethernet Frame Reception delay & $delay_{n\_rx}$&13 $\mu s$&13 $\mu s$&Rx Node
Ethernet frame delivery delay & $Delay_{frame}$&270$\mu s$&310$\mu s$& ALL
\\ \hline
\chapter{Appendix: Control Messages Size at CERN}
\section{Control Messages Size at CERN}
The appendix presents how the Control Message size for CERN was estimated.
Since there is no official document on this yet, the below information is not
official but can be useful.
Each event consists of
\item Address, estimated size: 32 bits
\item Timestamps, estimated size: 64 bits
\item Event Header (IDs), size: 32 bits
\item Event Payload, size: 64 bits
The total size of single event is estimated to be 192 bits or 24 bytes.
In the current control system at CERN, there are 7 events generated per each of
7 distribution networks (per machine). This gives 49 events every Granularity
As a consequence, for the current system, the Control Message size would equal
1176 bytes. However, the current number of events is not sufficient. Therefore,
it is suggested to increase it. A desired number of events is not defined, the
more the better. Four-fold increase, to 200 events, would be very appreciated.
This gives the Control Message size of 4800 bytes.
Therefore, the minimum Control Message size given in the document equals 1200
bytes and the maximum size is rounded to 5000 bytes.
\section{Control Messages Size at GSI}
\chapter{Appendix: HP Bypass Hardware Implementation}
A method has been proposed to achieve below 3 $\mu s$ switch-over of
ports' role and state (as understood in RSTP specification \cite{IEEE8021D}),
and consequently HP Traffic routing. The solution takes advantage of the fact
that HP Traffic is routed using HP Bypass (see
Chapter~\ref{jitterDeterminismNetworkDimention}). Hardware implementation of HP
Bypass and the simplicity of routing, enables extremely fast port switch-over.
The changes proposed thereafter, shall be integrated into HP Bypass
Two registers arranged in a table shall be available per port (RSTP Port Role
Table). A table entry is adressed by VLAN ID. Each entry in the table
represents an association between VLAN number and port's role in this particular
VLAN. An entry size shall be 4 bits to enable encoding the following roles:
Root, Designated, Alternate, Backup, Blocked. There shall be $2^10$ entries
in the register to represent 1024 VLANS. One of the registers stores current
role of a give port. It is called RSTP Port Current Role Table and is
used in a routing process of HP Traffic. It can be read-only by software. The
latter register stores next roles of a given port. It is called RSTP Port Next
Role Table and can be writen-only by software. If the content of both registers
differ, the RSTP Port Current Role Table is updated with the Next Role
Table when no HP Traffic is being forwarded. The timeslot when HP Traffic is
not being forwarded (is not received) is called a HP Gap. Since HP Packages are
always sent in burst, HP Gap can be easily detected. It is imporant to change
port roles while HP Gap takes place to prevent HP Package loose. Such a loose
can happen when the change is being made between port with longer path to Data
Master and port with shorter link to Data Master (see Appandix~\ref{appD} for
use cases analysis.)
\textbf{In normal operation}, the role of a given port for a given VLAN is set
by the software (RSTP daemon) by writting appropriate RSTP Port Next Role
Register. The register shall be written as soon as ports' roles have been
established by means of RSTP Algorithm. The HP Bypass algorithm shall verify the
role of the port on which a HP Package is received for a given VLAN ID (provided
in the header). If the port's role translates into forwarding state, the
algorithm checks port roles of all other ports for the given VLAN ID. HP Package
is forwarded to all the ports whose ports' role translate into forwarding state.
The transition between port's role and port's state for HP Traffic is
included in Table~\ref{tab:portRoleStatetrans}.
\caption{Translation between port's role and state for HP Traffic.}
\begin{tabular}{| c | c | c | c |} \hline
\textbf{Port's Role}& \multicolumn{2}{|c|}{\textbf{Port's State}} \\
& Incoming & Outgoing \\ \hline
Root & Forward & Forward \\ \hline
Designated & Forward & Forward \\ \hline
Alternate & Block & Block \\ \hline
Backup & Forward & Forward \\ \hline
Disabled & Block & Block \\ \hline
\textbf{In case of link failure}, as soon as link failure is detected by the
Endpoint, it shall notify HP Bypass and the change of ports' roles stored in
RSTP Port Current Role Tables shall be triggered. The change concerns only VLANs
for which the the broken port was root or designated. For such VLANs the ports'
role shall change according to the Table~\ref{tab:portRoleTransition}.
The process of HP routing and ports' role change in case of link failure is
presented in Figure~\ref{fig:wrRSTP}.
\caption{Port's role transitions in case of link failure.}
\begin{tabular}{| c | c | c | c |} \hline
\textbf{Current Role}& \textbf{New Role} \\
& \\ \hline
Root & Disabled \\ \hline
Designated & Disabled \\ \hline
Alternate & Root \\ \hline
Backup & Designated \\ \hline
\captionof{figure}{WR RSTP for HP Traffic}
This diff is collapsed.
\chapter{Appendix: Potential Modifications to RTU required by RSTP}
Potential changes to RTU needed by RSTP: \\
1. RTU@HW:
\item Blocking of incoming packages per-VLAN (currently only per-port).
\item Blocking of outcoming packages per-port (currently only per-VLAN).
2. RTU@SW:
\item aging of information on a given port - this means queuing Filtering
Database for the entries belonging to given port, and removing (aging out) this
\item changes enabling control of new HW parameters.
\ No newline at end of file
This diff is collapsed.
\chapter{Appendix: Timing and Prioritizing of the Ideas Presented}
The number of ideas presented in this document is overwhelming. Not all the
ideas are necessary for White Rabbit to work as Control Network
which is carefully controlled, managed and configured. Some of the ideas are
thought for the more general usage. The below table attempts to prioritize the
ideas, group it into areas and present planning.
\caption{Timing and Prioritizing of the Ideas Presented.}
\begin{tabular}{| p{3.5cm} | p{1cm} | p{3.5cm} | p{1.5cm} | p{3.5cm} |} \hline
\textbf{Name}&\textbf{Prio}& \textbf{Approx. finished}&\textbf{Ref
Chapter} & \textbf{Area} \\ \hline
FEC & 1 & Workshop April 2011 & \ref{chapter:FEC} & Control Data\\ \hline
HP Bypass & 2 & Workshop after April Workshop & \ref{chapter:FEC} & Control
Data\\ \hline
WR RSTP (HP only) & 2 & Workshop after April Workshop& \ref{chapter:WRRSTP} &
Standard Data and a bit timing
\\ \hline
Monitoring (limited) & 2 & Workshop after April Workshop &
\ref{chapter:monitoring} &
Diagnostics, Monitoring
\\ \hline \hline
Congestion/Flow Control & 3 & 2012 & \ref{chapter:monitoring} &
Standard and Control Data
\\ \hline
Management & 3 & 2012 & \ref{chapter:monitoring} &
Standard and Control Data s
\\ \hline
Full Monitoring & 3 & 2012 & \ref{chapter:monitoring} &
\\ \hline \hline
Transparent Clocks PTP & 4 & ? & - & Timing Data
\\ \hline
Ring topologies & 5 & ? & - & Timing and Control Data Earthquake
\\ \hline
%Congestion/flow control, standard and for all the priorities & 5 & ? & -
%Control Data
%\\ \hline
Link Aggregation & 5 & ? & - & Timing, Control and Standard Data
\\ \hline
WR RSTP (SP traffic and other crazy ideas regarding SP and HP) & 6 & ? &
\ref{chapter:WRRSTP} &
Control and Standard Data
\\ \hline
\chapter{Appendix:Flow Monitor}
sFlow is a multi-vendor sampling technology embedded within switches and
routers. It provides the ability to continuously monitor application level
traffic flows at wire speed on all interfaces
sFlow consists of:
\item sFlow Agent
\item sFlow Collector
The sFlow Agent is a software process that runs in the White Rabbit Switches.
It combines interface Counters and Flow Samples into sFlow datagrams that are
sent across the network to an sFlow Collector. The Counters and Flow Samples
will be implemented in hardware in order to increase the
processing of the data sampling. Flow samples are defined based on a sampling
rate, an average of 1 out of N packets is randomly sampled. This type of
sampling provides quantifiable accuracy. A polling interval defines how often
the network device sends interface counters.
The sFlow Agent packages the data into sFlow Datagrams that are sent on the
network. The sFlow Collector receives the data from the Flow generators, stores
the information and provides reports and analysis.
Every switch capable of sFlow must configure and enable:
\item local agent
\item sFlow Colector address
\item ports to monitor
In order to acquire a reliable network information in a WR network:
\item the statistics shall be collected every ?? (sec,msec..)
\item a sample is taken per port every ?? (sec,msec...)
\item ?? samples per port shall be sent to the CPU
\section{Requirements of a Flow Monitor}
General requirements:
\item Network-wide view of usage and active switches.
\item Measuring network traffic, collecting, storing, and analysing
traffic data.
\item Monitor links without impacting the performance of the switches
without adding significant network load.
\item Industrial Standard
\noindent The Flow Monitor shall:
\item Measure the volume and rate of the traffic by QoS level.
\item Measure the availability of the network and devices.
\item Measure the response time that a device takes to react to a given
\item Measure the throughput of the over the links.
\item Measure the latency and jitter of the network.
\item Identify grouping of traffic by logics groups (Master, Node,
\item Identify grouping of traffic by protocols.
\item Define filters and exceptions associated with alarms and
\noindent The measurements shall be carried out either between network devices,
\noindent Per-Link Measurements, and monitor:
\item number of packet
\item bytes
\item packet discarded on an interface
\item flow or burst of packets
\item packets per flow
\noindent or End-to-End Measurements:
\item path delay
\item ....
\item ....
\noindent The combination of both measurements provides a global picture of the
\vspace{10 mm}
\noindent The monitoring shall performance:
\item Active Measurement, injection of network traffic and study the
reaction to the traffic
\item Passive Measurement, Monitor of the traffic for measurement.
\vspace{10 mm}
\noindent Performance:
\item Reaction Time ...
\item Sampling...
\section{State of the Art of Flow Controller}
Currently there are three main choices for traffic monitoring:
\item RMON, IETD standard.
\item NetFlow, Cisco Systems.
\item sFlow, Industry standard
In a nutshell, all them offers the similar features and provides the same
information, thus the selection criteria is based on the usage of resources by
the Agent in the switches and the collector of information.
\begin{tabular}{ | c | c | c | c | c | c | c |}
Flow Controllers & CPU & Memory & Bandwidth & RT Statistics & Implementation \\
RMON & high & very high 8-32 MB & bursty & supported & sw \\ \hline
NetFlow & high & high 4-8 MB & high bursty & not & sw \\ \hline
sFlow & very low & very low akB & low smooth & supported & sw/hw \\ \hline
\caption{Comparison Flow Control}
As the Table~\ref{tab:flow_controlers} shows that sFlow requires less resources
either in the Agent, which is placed in the switch, or the Collector. As well
the usage of bandwidth is more conservative since the gathered information every
short periods of time, conversely to the others controllers. It seems that
sFlows becomes a good choice for White Rabbit. Besides sFlows allows the
implementation of part of Agent in hardware, providing wire-speed to the
sampling of frames. In addition the license scheme of sFlow's allows White
Rabbit project modify and publish our own version.
\chapter{Appendix: WR-specific MIB definitions}
\section{WR PTP}
\vspace{5 mm}
\textbf{All applicable data sets} \\
\vspace{5 mm}
\begin{tabular}{ l l }
\textbf{ps1(23),} & \textbf{-- The time is accurate to 1ps} \\
\textbf{ps2p5(24),} & \textbf{-- The time is accurate to 2.5ps} \\
\textbf{ps10(25),} & \textbf{-- The time is accurate to 10ps} \\