Design considerations
This page is a working diary about the current ideas and issues for the implementation of WRAP.
Main idea
The main idea is to wrap all the WR PTP Core functionalities in a single
FPGA, offering time and data services.
Data services are provided by a flexibile data interface which can be
GMII or a simplified
interface
The image shows only one of the two SFP interfaces.
Simplified interface
The simplified interface uses the same lines of GMII to provide a simple interface from/to Ethernet. The interface behavior can be set through the management SPI of WRAP. It can be a FIFO over Ethernet or a more complex TLV interface.
FPGA Package
The FPGA choice is of primary importance since its package affects the
compatibility of WRAP with newer version of WR PTP core.
Currently, the WR PTP Core has the following resource utilization, using
the default ram setting of WR PTP Core on a Artix
35T:
As you can see, the most scarce resource is the block ram, heavily used
by the LM32 inside the PTP Core.
I/O usage is about 80+ lines
- PTP core & timing: 41 I/O
- GMII + Management: 28
- Time services: 16+
Using two SFP requires two endpoints but not two LM32 cores, so the
block ram usage (BRAM) is not doubled.
Artix 35T has several package. Currently, two of them are under
evaluation:
- CSG325: 15mmx15mm BGA footprint (ball pitch 0.8mm)
- FGG484: 23mmx23mm BGA footprint (ball pitch 1mm)
Also CPG236 is available, but has a ball pitch of 0.5mm (PCB realization and soldering must be checked!)
CSG325
This package support up to 4 Gigabit Serdes. The footprint allow to
upgrade the WRAP board up to Artix 50T, which has 33% more BRAM.
General I/O lines: 106
FGG484
This package support up to 4 Gigabit Serdes. The footprint allow to
upgrade the WRAP board up to Artix 100T, which has 250% more BRAM.
General I/O lines: 250
My suggestion*: use CSG325, the FGG484 package is very huge on a FMC form-factor board.
FMC Connector
- 2 differential clocks available (10 MHz and WR clock?)
- Additional 2 differential clocks for HPC connector (custom clock signal from PLL)
- 68 generic I/O
- 28 for Gigabit MII (single-ended, clock signals single-ended)
- PPS output, timecode output
- Time-services (timestamping & outputs) 20 (4 LVDS inputs, 2 single-ended inputs, 4 LVDS outputs, 2 single-ended outputs)
- About 15 I/O left for management and GPIO
Timing services
Timing services as timestamping of input events can be done using Serdes
of Artix. The generation of the required clock can be done using an
internal PLL or an external one (suggested solution). BUFG has a maximum
clock frequency near 500 MHz, BUFIO can be used to supply to the Serdes
a clock of 500MHz to achive 2ns resolution.
Retrieving of time-related services' data is done using a dedicated SPI.
Clocking
The clocking scheme is very important. The goal is to achieve a phase
noise and jitter below the WR Switch.
Another key aspect is size, on a FMC form-factor the space is critical.
Oscillators
The DDMTD offset clock phase noise is not critical on the WR Switch
(check DDMTD report), so likely an external PLL to get the 125/62.5MHz
offset clock frequency is not needed. The suggestion is to check the
cleaniness of the internal PLLs of Artix-7 and use the approach
described in the DDMTD report to evaluate the effect on the phase
noise.
The Main VCTCXO suggested is DOT050F combined with an external PLL to
get the best phase noise.
The considered Main PLLs are:
- LTC6950
- LMK03806
LTC6950
The best solution for phase noise but it's rather expensive and require an external VCSO to get the best phase noise. Check space but likely cannot fit into FMC.
LMK03806
It has an internal VCO, so an higher phase noise at larger offset values (but still much better than the WR Switch) and is not as expensive as the LTC6950 solution (PLL+VCSO).
My suggestion*: use LMK03806
Clock distribution
Custom frequency clocks, which is any clock with a frequency different
than 125/62.5MHz (WR clock) are a big deal. The issue of providing a
deterministic clock rises from the divider uncertainty, since the PTP
daemon correctly align only the WR Clock. WR Switch solves the problem
using a Digital flip flop, which clean up a clock signal created using
the WR clock.
The clock cleaning solution provide extreme flexibility, but also a
phase noise at larger offset frequencies. The basic issue is that flip
flops output stages are not designed to carry high-quality clocks (but
data!).
The LMK03806 VCO can tune to 2.5GHz and provide the 125/62.5MHz clock
and 10MHz clocks. The outputs can be distributed to a coaxial cable
using different setups.
If the supported coaxial cable length is below 2 meters, one of the two
methods below can be used:
Differential output of the PLL
This solution is the best for phase noise. LVPECL level can provide up
to 8 dBm into a 50 ohm load. It's using a balun to properly convert the
differential signal in a singe-ended one.
The downside of the solution is the unknown delay introduced by the
balun, which can vary from lot to lot (production/process variations).
Since the clock must be distributed to the LEMO connector and to the FMC
connector, using a balun to distribute the signal to the LEMO output
could introduces an unknown skew between the two outputs.
Moreover, the clock output is AC coupled (check if this is ok)
https://www.ohwr.org/project/wrap/uploads/1981920ce8c0226d356ab94148919c59/lvds_balun.png!
Single-ended output of the PLL
The simplest yet effective solution. A 5PB1108 clock output buffer has been tested with 6 outputs tied together, resulting in a >3V/ns rise time on a 1 meter coax cable. One of the outputs can be used for the FMC connector since the skew between outputs is below 50ps. Downside: with a small series resistance to barely match a source impedance of 50 ohm (about 30 ohm of total source impedance) the signal to the far-end 50 ohm load is about 2V (check if this is ok!). The voltage levels and source impedance are the same as the WR Switch 10MHz clock output. So this is ok?
Long cables
If a cable length greater than 2 meter is mandatory, the proposed solutions must be tested. If the rise time is not acceptable, an opamp or another high current buffer can be used, but skew, especially part-to-part skew, must be tested to keep a bounded skew between LEMO output and FMC output.
Clock diagram
The image shows the most simple clock scheme. The custom clock (or 10MHz clock) is distributed using the 5PB1108. The low skew between the outputs guarantees a close match between the LEMO output and FMC output without calibration. Thanks to the 200ps maximum part-to-part skew of 5PB1102 and 5PB1108, the WR clock distributed to the FMC and custom clock (10MHz) are closely matched without calibration.
The image doesn't show the 1PPS output, but can be distributed with 5PB1108 as well. Moreover, additional frequency outputs can be added.
The approach described above is good, but doesn't take into account the inevitable delays due to the trace length of the User board from the FMC socket to their designs. A better design is the one depicted below:
The FMC output has a delay line to phase shift the output to closely
match the delay introduced by the user. In this case, the user must tell
to WRAP the additional delay to compensate. Another way to do it is with
a clock loop, using the same technique used by zero-delay PLLs (but this
would require a 1:2 clock buffer in the User design!).
Delay lines introduces phase noise, check suitable delay lines. Perhaps
is better to build a custom delay line using a varactor and LTC6957.
The downside of this solution is that the factory must calibrate the
delay line in order to match the delay of the LEMO output.
Clock alignment
As discussed previously, a flip flop to get 10MHz/custom clock is bad for phase noise and jitter. The procedure described below uses a clock coming from the PLL with a frequency different from the WR Clk (e.g. 100MHz) and align it correctly. The procedure uses an assumption: the FPGA has a copy of the clock to be aligned.
Custom frequency must have a common sub-multiple frequency (es. 100 MHz
custom clock and 125 MHz WR clock have a common 25 MHz).
Steps to be done:
- Wait that the Gigabit SERDES synchronize from incoming clock (wr_clk_network clock domain)
- Wait for clock recovery and 1-PPS alignment (information is on wr_clk_recovered clock domain)
- Transfer the 1-PPS information from wr_clk_recovered to the wr_clk_network using a counter in the wr_clk_network (can be done safely since we control the phase shift between the two clock domains)
- The counter in wr_clk_network will count from 0 to N-1, where N is the multiplier to get the WR clk frequency from the common sub-multiple frequency (e.g. For 100MHz is 5)
- Every time the counter overflow, we sample both the wr_clk_recovered and the custom clock from wr_clk_network domain.
- We collect data, phase shifting the wr_clk_recovered using the SoftPLL until we get a transition edge where wr_clk_recovered rising_edge and custom clock rising_edge are very close
- When transition is found, align it with the 1-PPS edge from wr_clk_network
- Restart PPS alignment, the custom clock will be aligned correctly to
1-PPS as wr_clk_recovered.
check if this is OK!
Another way to do it is with glitching. The LM32 reprogram the divider/delay_block of the PLL's clock output until the correct alignment is found. Since the procedure must be done without the resync of the outputs, glitches may occur in the custom frequency clock (hence the "glitching" approach). This approach is not deterministic.
User calibration
I suggest performing the calibration of the board using the 10MHz output/custom frequency LEMO output rather than the usual 1PPS calibration. The PPS should be used to mark the edge of the 1 second, not provide the phase information itself. The phase information should be carried by the clock signal.
Things to do
- Finalize FPGA package
- Finalize LEMO clock and PPS voltage level signals and output impedance
- Check Artix-7 internal PLL cleanliness
- Evaluate delay line for advanced clock compensation, see [1] and [2].
References
[1] https://www.mail-archive.com/time-nuts@febo.com/msg84735.html
[2] https://www.mail-archive.com/time-nuts@febo.com/msg84748.html
7 February 2017, M. Rizzi