WRS - Stress Tests

06 Feb. 2013 - b455777+

Benoit RAT (Seven Solutions)
Contents

1 Introduction 2
  1.1 What does Xilinx Says? ........................................ 2
  1.2 Setup ................................................................. 2

2 Experiments 3
  2.1 Software ............................................................. 3
  2.2 Phases ................................................................. 3
  2.3 Temperatures logging ............................................. 4

3 Results 4
  3.1 Conclusions ......................................................... 5

4 References 5
1 Introduction

Once upon a time, the now well known “White Rabbit” was tortured to reveal its true nature under adversed conditions. However the torture was focused on its behaviour with the. This new tale is about the story of the white rabbit switch and will relate the behaviour of its inner while facing torture...

In other words, the basic idea behind this report is about monitoring the internal temperature of the switch with different FAN configurations and under extreme environments: [-10/+50°C].

1.1 What does Xilinx Says?

“What is the absolute maximum junction temperature (Tj max) for plastic and ceramic parts?”

The specification for Tj - Junction Temperature is given for most of Xilinx families under the “Absolute Maximum Ratings” section of the Xilinx Data Book. This absolute maximum Tj value is consistent for all Xilinx devices, and is as follows:

Absolute Max Junction Temperature:
- +150 deg C for mature Xilinx devices (XC3000, XC4000 series) in ceramic packages.
- +125 deg C for all Xilinx devices in plastic packages, and newer device families in ceramic packages.

For the maximum Tj (Tc for ceramic packages) particular to your device, please see that device family data sheet.

“What safe Tj max value should I use to run my device?”

Xilinx can not guarantee timing specs at the above-mentioned temperature, as these are absolute maximum ratings. Device operation for extended times at these temperatures could result in design failure or damage to the part. For reliable operation, the part should be run at its specified operating temperature range. The operating temperature range is limited by the temperature grade of the part viz., Commercial (C) or Industrial (I) grades. This temperature range is specified in the “Recommended Operating Conditions” section of the device data sheets.

1.2 Setup

The setup is shown as in the figure below

- One switch in the temperature chamber which is under test (Passive) that log its internal temperature.
- A testing switch (Active) that check packet transmissions.
- The port 2,4,6,8,10,12,14,16 of testing switch are connected to theirs corresponding port on the tested switch.
- A PC with 3 Gigabit interfaces to send high speed packet to port 1,2 of the testing switch and receive them through port 1 of the tested switch.
- Both switches mount the Virtex 6 LX130T FPGA.
2 Experiments

2.1 Software

In order to test that the switch is behaving correctly we have send and receive packet on all the ports that connect passive and active switches. To do so we use a tool to control the interface on the active switch: `wrs-iftool`

We have also uses the software by Maciej Lipinski called `networkTool` to send/receive burst frames from the PC and its 3 GigE interfaces.

2.2 Phases

The experiment is dived in two phases at 3 different temperatures (20ºC, 50ºC, -10ºC):

1. An iddle phase where only PTP packets are transmitting.
2. A stress phase where packets are sent from one switch to another, and where the Gigabit Ethernet are used to send packets.
The stress phase consists of sending packets at maximum bandwidth until we start loosing some frames (< 3% errors):

- the active switch: `wrs-iftool -t 3 -z 1200 -p 100 -n 10000 bcast (2.7 Mbps)`
- Looping the following on the testing PC:
  ```
  # ./networkTool -f eth1 -g 90:e2:ba:17:a6:ee -1200 -n 100000 -q 10000
  # ./networkTool -f eth2 -g 90:e2:ba:17:a6:ee -1200 -n 100000 -q 10000
  # ./networkTool -r eth3 -n 100000
  ```

2.3 Temperatures logging

The temperature is monotorized on 3 sensors each 10 seconds:

- $T_{act}$: Temperature actual in the system monitor (inside FPGA)
- $T_{bfpga}$: Temperature below the FPGA
- $T_{ps}$: Temperature of the power supply

The objective temperature is set on the chamber and it will be progressively reach. The chamber however does not guarantee a perfect and stable temperature.

In the figure below we can observe the value of the 3 sensors of the passive switch with FAN in pumping mode and stressed by the active switch.

![Temperature Graph](image)

**Figure 2: 20 ⇒ 50 ⇒ −10°C, Fan Pumping, Stressed**

3 Results

The table below resumes the different experiments. To obtain it we have computed the mean of the value for each sensors when the temperature was considered stable ($\sigma^2_{T_{ps}} <= 1$)

---

[A temperature logger was supposed to register these changes but it was not setup properly.]
<table>
<thead>
<tr>
<th>$T_{obj}$</th>
<th>FAN</th>
<th>States</th>
<th>$\mu{T_{act}/T_{fpga}/T_{ps}}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>20ºC</td>
<td>Blowing</td>
<td>Iddle</td>
<td>34.2/33.3/24.4</td>
</tr>
<tr>
<td>20ºC</td>
<td>Pumping</td>
<td>Iddle</td>
<td>37.0/35.3/28.1</td>
</tr>
<tr>
<td>20ºC</td>
<td>Off</td>
<td>Iddle</td>
<td>75.0/72.7/52.0</td>
</tr>
<tr>
<td>20ºC</td>
<td>Blowing</td>
<td>Stressed</td>
<td>35.4/34.7/24.8</td>
</tr>
<tr>
<td>20ºC</td>
<td>Pumping</td>
<td>Stressed</td>
<td>38.6/36.2/28.6</td>
</tr>
<tr>
<td>20ºC</td>
<td>Off</td>
<td>Stressed</td>
<td>75.4/73.2/52.1</td>
</tr>
<tr>
<td>50ºC</td>
<td>Blowing</td>
<td>Iddle</td>
<td>66.4/65.2/55.3</td>
</tr>
<tr>
<td>50ºC</td>
<td>Pumping</td>
<td>Iddle</td>
<td>67.2/65.5/54.8</td>
</tr>
<tr>
<td>50ºC</td>
<td>Off</td>
<td>Iddle</td>
<td>$&gt;110^\circ$C\textsuperscript{a}</td>
</tr>
<tr>
<td>50ºC</td>
<td>Pumping</td>
<td>Stressed</td>
<td>68.2/66.0/55.5</td>
</tr>
<tr>
<td>-10ºC</td>
<td>Blowing</td>
<td>Stressed</td>
<td>7.3/6.0/−2.3</td>
</tr>
<tr>
<td>-10ºC</td>
<td>Pumping</td>
<td>Stressed</td>
<td>10.0/8.0/−0.7</td>
</tr>
</tbody>
</table>
\textsuperscript{a}Without any fan the temperature reach the 110ºC inside the FPGA after 10m, so we decide to stop the test at this moment.

3.1 Conclusions

- The switch needs a running fan if the temperature can not be controlled around “ambient/-room” temperature (20-25ºC).
- Blowing vs Pumping:
  - At 20ºC blowing seems to behave better ($\Delta 3^\circ$C)
  - At 50ºC pumping is almost the same as blowing. The temperature chamber provided hot air from the back, so blowing hot air might be worse.
- Iddle/Stress: It seems that stressing the FPGA does not really affect the temperature. However we have observed that the SFP get hotter during stress test.

4 References

- totureReport
- wrs-iftool
- networkTool

Thanks again to Maciej Lipinski for its help during my stay at the CERN to check the temperature of the switch.