Switch overall description
The basic idea is to have a modular system whereby the MCH slot in a uTCA crate is populated with a card comprising the following:
- Clock inputs (external reference clock and PPS) very likely coming from a GPS Disciplined Oscillator (GPSDO). The uppermost switch in the timing hierarchy will be configured to use these signals (rather than using its uplink ports) in order to generate its internal UTC time base. Basically the algorithm is: wait for PPS, read UTC seconds from RS232 (GPSDO) or Fast Ethernet (NTP) and load a register with it + 1, then start counting ext ref clock ticks at the next PPS.
- RS232: one for debugging (located on pinhead directly on PCB along with all other JTAG/programming interfaces) and another one on front panel for reading from GPSDO (see above) or as a management console.
- Fast Ethernet for servicing, debugging, SSHing, remote firmware update, etc.
- Two uplink ports for implementing redundancy for the timing part, i.e. one port is chosen by default and we turn to the other in case of problem with the default port. For data purposes, these two ports are the same as all downlink ports, i.e. there is no hierarchy.
- GbE links to the backplane. Each of these goes to a slot in the uTCA crate (see the drawing above), which in our case will host an AMC card with a certain number of downlink ports (4 or 8)
- Compensated clocks to the backplane. Again, this is a star configuration, with a 125 MHz clock going to each slot. This is meant for people designing e.g. ADC cards in AMC format and taking a phase-compensated clock from the backplane. In the new version of MCH (December 2009) there are also LVDS buffers to take the DMTD clock to each AMC slot. Availability of DMTD clock allows for easy implementation of additional clock phase shifting on the AMC itself.
-
Switch Management Interface (SMI) - Configuration/timecode
links, also in a star, to the backplane. SMI is a simple 8b10b or
4b5b-encoded serial link (to be defined - we need to agree on some
AC-coupling-proof encoding) used for following purposes:
- Informing AMCs of current UTC and PPS, by sending a message containing encoded UTC at every PPS pulse. Since SMI is synchronous to the reference clock, we can extract the PPS signal by dividing the backplane reference clock and resetting the divider when the last bit of UTC message has been received.
- Synchronization of AMCs without need for implementing Gigabit Ethernet or PTP in each AMC.
- Booting up the AMC cards remotely
- Keeping routing tables in sync in switch AMC cards and for general AMC register access. The idea is to keep the AMCs simple&stupid and put the intelligence in the MCH, which would govern the AMCs via SMI.
- Transmitting out-of-band packet data which couldn't be embedded into main Ethernet link without disturbing it. For example, the timestamps of PTP packets sent/received by the downlink ports are transmitted via SMI.
- Transmitting user-defined data for AMC cards which don't need gigabit speed. This would enable us to build very cheap AMCs for non performance-critical tasks.
-
Electrical standards:
- Both clocks (REF & DMTD) and SMI are LVDS.
- GbE is AC-coupled something-like-LVPECL used in the TLK1221 PHY (they say in the datasheet that differential I/O is "LVPECL compatible" although doesn't look like real LVPECL). Nevertheless it works well with AC-coupled CML and LVDS transmitters/receivers.
The AMC cards will have 4 of downlink ports (SFP sockets) and an FPGA. These four ports will all converge into a single !GbE stream in the backplane, but this performance bottleneck is not considered a problem in our application. Note: Tom has found some nice 8xSFP connectors for uTCA cards, so we could have 8 ports per AMC. So, with one fully-loaded crate we can build a 32- or 64- port switch.
There is also an individual IPMI link from the MCH to each one of the AMCs. This is specified by the uTCA standard and it serves the purpose of general crate management (card identification, hot swap control, etc.).
The general WR idea is that a switch extracts a notion of time either from its uplink port(s) or - if present - from the external clock signals. It then uses this notion of time as its internal timebase and propagates it through all downlink ports, i.e. a WR switch contains PTP slaves in the uplink ports and PTP masters in each downlink port. This is what's known as a boundary clock in PTP parlance. The typical degradation in performance when cascading boundary clocks in traditional systems does not affect WR networks thanks to the usage of Synchronous Ethernet (i.e. extracting a clock signal from the data stream).
A typical WR link is always between a downstream port of a switch and an uplink port sitting either in a switch or in an end node. An upstream port must only extract rx_clk from the incoming data flow, clean it and re-inject it into the tx_clk of its !GbE transceiver, as shown on the drawing:
The downlink port will then extract this clock from up-going traffic and compare its phase continuously against its own tx_clk. This continuous monitoring will happen without any traffic. The downlink port will use then this phase measurement information to shift its tx_clk accordingly. Therefore, a WR slave does not need to do anything to get a phase-compensated clock, at least after an initial coarse (125 MHz) count PTP-type exchange has taken place. This idea is called "zero-traffic PTP" and it's still not completely mature.
-- Main.JavierSerrano - 17 Nov 2009