Probing Gigabit Ethernet | Interconnects

How to Benchmark TCP/IP Ethernet Performance

There are two ways to look at designing HPC clusters. On the positive side, there is a plethora of hardware and software options. On the negative side, there is a plethora of hardware and software options! The cluster designer has a special burden because unlike putting together one or two servers for the office, a cluster multiplies your decisions by N, where N is the number of nodes. A wrong decision can have two negative consequences. First, a fix for the problem will probably require more money and time. And, second, before the problem is fixed you may only be getting a fraction of the performance possible from your cluster. Clusters have a way of amplifying bad decisions.

All is not lost, however. Choosing the right stuff is not that difficult. It takes some investigation, some testing, and asking the right questions. As an example, let's look at a scenario where you would like to retro-fit an older cluster with Gigabit Ethernet (GigE) or even build a new cluster using GigE. We are not going to do an exhaustive review of all GigE adapters, but rather demonstrate how one might benchmark a specific adapter. GigE adapters may also be integrated on motherboards in which case it is a good idea to test these as well. The common assumption that an on-board GigE adapter will run better than a separate adapter may not necessarily be true. Ultimately your application(s) should determine the hardware, but before getting too specific, we can develop a set of general protocols to help narrow down the hardware and software choices.

How Low Can We Go?

After some investigation, we see that GigE adapters seem to come in two flavors, a 32 bit version for workstations and a 64 bit version for servers. There also seems to be a large price difference. A quick check on web shows us that workstation adapters can be purchased for about $35 (US) and server adapters for about $115 (US). The price difference can be substantial for a large number of nodes. Of course, we know that the more expensive 64 bit adapter is much better than the silly little 32 bit workstation part. Are you sure?

Before you bet your career on an assumption, why not do some testing. Indeed, let's probe a 32 bit adapter to see just how it measures up. Turning to the Internet again we find that the Netgear GA302T seems to be a good choice. It is based on a Broadcom chipset, works at both 33 and 66 MHz, and can be purchased for about $32 (including shipping!). Note: Since writing this article, the GA302T has gone out of production, which is all the more reason to use this article as a guide for testing Ethernet adapters!

The Test Set-up

To perform the tests, we will connect the two Netgear GigE adapters using a CAT 5e crossover cable between. We will use two servers that contain Supermicro P4DC6 motherboards with dual 1.6 GHz Xeons and 1 GB RAM. The motherboards have both 32 bit/33 MHz and 64 bit/66 MHz PCI slots. It should be noted, that the Netgear 32 bit adapter is designed to work in both the 33 and 66 MHz slots. The test servers are connected via the on-board Fast Ethernet connection so the GigE interface can be changed without loosing communication between the nodes. The test software includes Linux kernel 2.4.21, a program called Netpipe, and Gnuplot for plotting data (See Resources Sidebar).

Our testing checklist includes the following:

Download and compile the most recent kernel (which usually includes the most recent adapter drivers).
Bring up the GigE adapters (see Sidebar One).
Download and compile Netpipe on both machines.
Download and compile Gnuplot if needed. Gnuplot is included in most Linux distributions.

When all the above is complete, the testing can begin. Once the two adapters are up and running, using Netpipe is quite easy, see the Running Netpipe Sidebar. It should be mentioned that Netpipe can test TCP, MPI, and PVM performance. Newer versions can also test many popular high performance interfaces as well. For the purposes of this column, we will measure the TCP performance. It is best to perform the tests as root as you will be taking the interface up and down quite a bit.

As part of out testing we would like to answer the following questions.

What is the single byte latency?
How do the 33 and 66 MHz results compare?
Can increasing the MTU help performance?

To answer these questions we will place the adapters in both the 32 bit/33 MHz and 64 bit/66Mhz slots. We will also vary the MTU size.

The Results

A summary of results is shown in Figure One (See Sidebar Three for instructions on how to plot the data using gnuplot). Both the 33 MHz and 66 MHz PCI slots were used with a standard 1500 byte MTU and then a 3000 byte MTU. The latency is a respectable 35 microseconds for all tests, however, the throughput for large block payloads can vary by a factor of two or more depending on the PCI slot and MTU size. It is clear that the best throughput (800 Mbits/sec) was recorded for the 66 MHz slot and the 3000 byte MTU. However, a closer look at the graph shows varying results for the 1000 to 10,000 byte payloads. Indeed, the 1500 byte MTU seems to provide better performance in this region. If your application transfers a large amount of data of this size, you may see better results with a stand 1500 byte MTU. MTU sizes greater than 3000 are not reported because with large MTUs the interface would stop working after a certain block size.

Figure One: Test Results for Netgear GA302T

Click here for a larger version of Figure One. The Netpipe authors also suggest plotting a "Network Signature Graph" (throughput vs time). This result is shown in Figure Two. A larger version of Figure Two is available here.

Figure Two: Netpipe Signature Graph

Conclusions

In regards to the low cost Netgear GA302T, we can say that it is was a very capable adapter while it lasted. Currently the Intel Pro/1000 MT is good choice for a low cost adapter. Looking at the results, we found that tuning the MTU size to the application may improve performance. We also found a limit to increasing the MTU size even though the driver and chipset are supposed to support MTU sizes up to 9000 bytes.

Netpipe is a great tool for testing the network performance of your cluster designs. Of course, there are many other factors we did not consider such as motherboards, chipset implementation, adding an MPI or PVM layer, and introducing a switch. Fortunately the effect of all these variables can be easily measured with Netpipe. Finally, you will notice that this type of information is not normally part of the product literature. Without proper testing, design decisions based on product data sheets and glossies is at best a guess and at worst a costly mistake. In future articles, we will examine other issues that influence cluster price and performance.

Sidebar One: Configuring An Interface

Adding a second Ethernet interface for testing purposes is not difficult. The most important thing is to use up-to-date kernels and drivers. The driver should be compiled as a module so that it can be easily added or removed from the kernel. Assuming you have two nodes that can communicate through a network, the follow steps, performed on each node, should allow you to easily bring up the test interface.

Enter the following command to load the Tigon 3 module (the module name may vary for the adapter under test).

# insmod tg3

The module should load successfully. Check the end of the output from

dmesg |tail

If you are using the tg3 module, you should see two lines similar to the following. Other adapter modules will produce a different message, but still list the Ethernet port. You may also want to check to see if the driver allows for any tunable parameters such as interrupt mitigation settings, which may effect latency.

eth1: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) \
10/100/1000BaseT Ethernet 00:09:5b:22:cd:bc

The dmesg output tells us, among other things that the card is assigned to eth1 and the card is in 33Mhz:32 bit PCI slot.

Now that the driver is loaded and recognizes the card, we need to bring up the interface. Because we will be playing with a parameter (MTU size -- Ethernet packet size), we will use

ifconfig

to assign the IP address (in this case 192.168.1.2) and start the interface.

# ifconfig eth1 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 mtu 1500

If this command was successful, you should be able to issue a

ifconfig eth1

command and get something similar to the following.

eth1      Link encap:Ethernet  HWaddr 00:09:5B:60:18:E5
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:869
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:18

Once this process is performed on both nodes, using different IP addresses of course, you should be able to successfully ping between the nodes. In our case, we used 192.168.1.1 and 192.168.1.2 as the IP numbers so we know the interface is communicating if we issue the following:

#ping 192.168.1.1

from the node whose interface we assigned as 192.168.1.2. Once the interfaces are up we can begin testing.

To vary any of the parameters (such as IP or MTU size), you simply take the interface down using an ifconfig Interface down

# ifconfig eth1 down

# ifconfig eth1 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 mtu 3000 The MTU for both adapters must be the same.

Sidebar Two: Running Netpipe

From one of the test nodes, open a window and login to the other node. Start Netpipe in receive mode (-r) by entering:

NPtcp -r

Open a second window on the first node and enter:

NPtcp -t -h 192.168.1.2 -P -o NP_output_file

using the IP address for the receiving machine (-h option). The -P option tells Netpipe to print to the screen, the -o option produces an output file for plotting, and the -t options tells Netpipe to run in transmit mode. There are other options for Netpipe, but these will provide a basic test of the interface. Once it is running you should see something similar to the following:

Latency: 0.000035
Now starting main loop
  0:         1 bytes 7241 times -->    0.23 Mbps in 0.000033 sec
  1:         2 bytes 7511 times -->    0.46 Mbps in 0.000033 sec
  2:         3 bytes 7473 times -->    0.68 Mbps in 0.000034 sec
  3:         4 bytes 4956 times -->    0.91 Mbps in 0.000033 sec
  4:         6 bytes 5601 times -->    1.37 Mbps in 0.000033 sec
  5:         8 bytes 3734 times -->    1.81 Mbps in 0.000034 sec
  6:        12 bytes 4642 times -->    2.69 Mbps in 0.000034 sec
  (continues)

The default Netpipe test is self limiting as the block size is increased from a single byte (by various non-standard increments) until the transmission time exceeds 1 second.

Sidebar Three: Plotting Results

The Netpipe output file can be easily plotted with Gnuplot. The plotting file for Figure One, a standard plot of Throughput vs. Blocksize, is as follows.

# gnuplot file for plotting Netpipe data
#
set title "Netpipe TCP - Throughput vs. Blocksize \n Netgear GA302T Adapter"
set xlabel "Blocksize (Bytes)"
set ylabel "Throughput (Mbits/s)"
set logscale x
set key bottom right

#Uncomment to produce a png file
#set terminal png picsize 1200 896
#set output "netpipe.throughput_vs_blocksize.png"

# Uncomment these to produce an eps file
#set terminal postscript monochrome  "Helvetica" 10
#set pointsize .6
#set output "netpipe.throughput_vs_blocksize.eps"
#set size 0.6,0.6

plot [] [] \
 "NP.1500.33-1" using 4:2 title "1500 MTU 33 MHz PCI" w linespoints, \
 "NP.3000.33-1" using 4:2 title "3000 MTU 33 MHz PCI" w linespoints, \
 "NP.1500.66-1" using 4:2 title "1500 MTU 66 MHz PCI" w linespoints, \
 "NP.3000.66-1" using 4:2 title "3000 MTU 66 MHz PCI" w linespoints
                                                                                
# wait so we can view it! (comment out when making files)
pause -1

The output files produced by Netpipe are listed as part of the plot line (i.e. NP.1500.33-1, etc.) You can easily edit the Gnuplot file to view your tests results. To plot data, simply enter:

gnuplot filename.gp

where

filename.gp

is the name of the Gnuplot file similar to the one shown above. You can also generate other views of the data. See the Netpipe/Gnuplot documentation for more information.

To plot the "Netpipe Signature" graph shown in Figure Two, you can use this gnuplot file:

set title "Netpipe Data - Signature Graph (Throughput vs. Time)\n Netgear GA302T Adapter"
set xlabel "Time"
set ylabel "Throughput (Mb/s)"
set logscale x
#Uncomment to produce a png file
#set terminal png picsize 1200 896
#set output "netpipe.network_signature_graph.png"

# Uncomment these to produce an eps file
#set terminal postscript monochrome  "Helvetica" 10
#set pointsize .6
#set output "netpipe.network_signature_graph.eps"
#set size 0.6,0.6

set key bottom right

plot [] [] \
 "NP.1500.33-1" using 1:2 title "1500 MTU 33 MHz PCI" w linespoints, \
 "NP.3000.33-1" using 1:2 title "3000 MTU 33 MHz PCI" w linespoints, \
 "NP.1500.66-1" using 1:2 title "1500 MTU 66 MHz PCI" w linespoints, \
 "NP.3000.66-1" using 1:2 title "3000 MTU 66 MHz PCI" w linespoints


# wait so we can view it! (comment out when making files)
pause -1

Sidebar Four: Resources

Netpipe

Gnuplot

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish visit Linux Magazine.