SC05 Wrapup - No Sleeping in Seattle

Article Index

Introduction

The SC conference is always a lot of fun because there are so many cool new things at the show, you get to see people that you have only emailed, you get to see old friends, and you get to "geek out" without too much grief from your family. This year's show was no exception. It was the largest SC conference ever and had lots of new announcements and even included a large presence from Microsoft. Of course, Cluster Monkey has already commented about this turn of events.

This year's SC was a good show. I didn't get to any of the presentations, but I did try to see as much of the show floor as I could. So I want to share with you what I saw and what I learned in talking with the various vendors. However the show floor was huge so I may have to rely on press releases to get me over the hump.

In addition to this summary, which is by no means a complete synopsis of the show, check out Joe Lanman's Blog as well.

SC05 - Location, Location, Location

This year's SC Conference was at the Washington State Convention and Trade Center in Seattle Washington. It was a great location for the conference although the exhibition hall had to be split into two parts to accommodate all of the vendors. However I may be a little prejudice since I used to live in Seattle.

The Convention Center is located just a few blocks up from Pike Place Market and it is literally surrounded by coffee places, particularly Starbucks. I swear that you can't walk 20 feet without running into a coffee place. On the other hand, I don't like Starbucks because it tastes like they've burned their beans. Anyway, there are a number of hotels in the area, plenty of places to eat, and, more importantly, lots of watering holes that serve the best of the local micro-breweries (never mind Dungeness Crab).

Monkey Get Together

The First Monkey Get Together was a huge success. A number of people showed up and all of the hats were given out (although we had to save one for rgb. So if you see someone with a yellow hat with a monkey on it on the Duke campus, introduce yourself! rgb is a great guy.). I got to see some old friends like Roger Smith, Joey, and Trey from the ERC at Mississippi State, Dan Stanzione of Cluster Monkey-famedom, Glen Otero - International Man of Mystery and Super Cluster Monkey (ladies, he's still single and still a body builder), and others. I also got to meet some new friends like Josip Loncaric. Josip was an early major contributor to clusters. He made a small change to to the TCP stack in the 2.2 kernel, greatly improving the TCP performance. He now works at Los Alamos on aspects of clusters and high performance computing. It was a real honor to meet him and to talk to him (a little hero worship going on there).

I also spent some time talking to Dimitri Mavriplis, who is a professor at the University of Wyoming. He is one of the best CFD (Computational Fluid Dynamics) researchers in the world. It was great fun to talk about CFD with him since that's one of my interests, as well as clusters (he uses clusters in his research). If you are looking for CFD codes for your clusters, Dr. Mavriplis is the man to talk to.

I think the Monkey Get Together was a big success. It was very nice to see such a ground swell of support for clusters, particularly beowulfs, from such a cross-section of the community. There were various people there from competing companies but they can discuss the state of clusters and the future of clusters in a constructive and passionate way.

Linux Networx Announcements

I'd like to talk about the Linux Networx announcements at the conference, but I need to disclose that I work for Linux Networx so you can view this as a shameless plug if you like. However, even if I didn't work for Linux Networx, I would still write about their announcements and you'll see why in the following paragraphs.

Linux Networx introduced two new clusters: the LS-1 and the LS/X. These two systems represent a new approach to clusters - bringing them to the systems level. Doug as mentioned in one of his recent writings that clusters were a disruptive influence on HPC at many levels, one of them being "disruptive support." Doug went on to say that, "... . There are integrated clusters from larger vendors that reduce the number of user options in order to increase the level of performance, integration, and support. ..." This is precisely what Linux Networx has done. The key concept is to take a systems approach to clusters and make them easier to use, easier to manage, easier to support, and easier to upgrade. Both LS-1 and LS/X embody this philosophy.

Full-Height LS-1 and Half-Height LS-1. Courtesy of Linux Networx
Full-Height LS-1 and Half-Height LS-1. Courtesy of Linux Netwrx

LS-1

The LS-1 has been designed based on the years of experience Linux Networx has with clusters using the "best of breed" components and processes. The LS-1 is designed for the small to medium range market with up to 128 nodes. The current LS-1 system is Opteron only with dual socket nodes that are dual-core capable. You can also choose to have a GigE network, Myrinet 2G network, or an Infiniband network (Infinipath is coming around 1Q of 2006). There are also a number of storage options that range from simple NFS boxes to parallel file systems with great IO performance. At SC05 there was also a technology demo of a parallel visualization capability for the LS-1. Linux Networx is working very hard on visualization. To give you a little insider information, I think the resulting visualization product will be really neat and cost much less than the equivalent SGI visualization equipment (Not that I'm biased or anything).

LS/X

The LS/X is designed for the upper range of supercomputer performance. It uses a mid-plane architecture where the boards slide into an 8U sub-rack (I guess you can call them blades). Linux Networx is currently shipping a 4-socket Opteron node (dual-core capable) with two built-in Infinipath NICs, two GigE NICs, and up to 64 GB of memory. For each 4-socket node there are also two bays at the rear of the rack that allow either two SATA drives or two PCI-Express cards to be connected to the node. Linux Networx is also doing some 8-socket boards for special situations, but they may or may not be generally available. However at SC05, Linux Networx was showing an 8-socket node with 8 Opteron sockets (dual-core capable), 4 Infinipath NICs, 4 GigE NICs, up to 128 GB of memory, and up to four SATA or four PCI-Express cards per node. Up to 6 of the 4-socket nodes can be put into an 8U sub-rack and up to 4 sub-racks in a normal rack, for a total of up to 96 sockets in a single rack.

Three racks of LS/X. Courtesy of Linux Networx
Three racks of LS/X. Courtesy of Linux Networx

The LS/X nodes slide into a mid-plane to get their power (from a DC PDU in the bottom of the rack), communication, and expandability. The sub-racks have built-in Tier-1 switching for the Infinipath and GigE networks. The racks can also have Tier-2 switching in the bottom of the rack. These built-in switches greatly reduce the number of required cables. For a full rack you only need 17 cables!! A very high percentage of the parts of the nodes are field replaceable (you just pull them out and put in a new one). The racks are also designed to sit over vented tiles in a raised floor area to pull air up into the rack. This eliminates hot air recirculation. The performance of the LS/X is setting records on benchmarks which should be posted on the website soon. It is very competitive to the IBM Blue Gene, Power 5, Cray X1, Cray XD1 on the HPC Challenge Benchmark. In some cases it has the best performance of any of these systems.

The Intel booth was right next to the Linux Networx booth so I did want to mention that an Intel person, who watched the unveiling of the LS-1 and the LS/X on Monday night, commented that they thought the systems were the "...sexiest machines on the floor..." despite not having Intel chips in them.

Pathscale and Infinipath

I spent some time talking to the Pathscale folks. They are great people to talk to since they know so much and they are so enthusiastic about clusters. Greg Lindahl took some time to demonstrate how to use their compilers to search for the best set of compile flags for performance for a given code. Very cool feature. However, what was even more interesting was that they like to hear what compiler flags people end up using for what codes. Greg said this helps them understand how to improve their compiler. Part of the improvements come from knowing how to better optimize code and part comes from knowing what options are routinely used and how to improve them. he had some very interesting comments about what compile options work well for certain codes.

Even more exciting than their compilers is their Infinipath interconnect. They announced this new interconnect a while ago, but it is now shipping in quantity. Let me tell you, this interconnect is really hot stuff. Pathscale has taken a great deal of care to understand how various parameters affect code performance. While things such as zero-byte packet latency and peak bandwidth are important in some respects, Pathscale has realized that things such as N/2 and message rate are perhaps more important. N/2 is the packet size where the interconnect reaches half of the bandwidth (basically the bandwidth in one direction). You want the smallest N/2 possible for the best code performance and Infinipath has it. In addition, you want the fastest message rate possible out of the NIC for the best performance (seems obvious but I never thought about before). Pathscale took this into account when designing their NIC. They have the best message rate of any interconnect that I know of. In addition, the performance of the NIC gets better as you add cores. Imagine that?

Pathscale has a number of papers on their website that discuss Infinipath and the influence of network performance on code performance and scalability. You can download the papers from their website. They are very useful and informative.

Since I work for Linux Networx and we are using the Infinipath ASIC in our new LS/X system, I can safely say that the benchmarks I've seen using the Infinipath NIC are amazing. We should be posting benchmarks in the near future, but I can safely say that the results will stun people. Very, Very fast.

10 GigE

I've been watching 10 Gigabit Ethernet (GigE) for over a year when companies started to talk about 10 GigE NICS (Network Interface Card). At last year's SC Conference Neterion (formerly S2IO) and Chelsio were showing 10 GigE NICs, primarily using Fiber Optic connections. They were expensive, but so was GigE a few years ago. However, the really large problem was the cost of 10 GigE switches. So I walked around the floor at SC05 talking to various Ethernet switch companies as well as the 10 GigE NIC vendors.

The general consensus was that the prices for 10 GigE NICs are coming down quickly and will continue to do so. Plus copper 10 GigE NICs are common now. But, perhaps more importantly, 10 GigE switches prices are coming down. The 10 GigE switch prices are coming down from the traditional HPC Ethernet companies such as Foundry, Force10, and Extreme Networks. However, the biggest price drops are coming from, perhaps unexpectedly, companies that either haven't traditionally played in the HPC space, are new companies, or companies that are new to Ethernet.

Chelsio

Chelsio was showing their 10 GigE NICs at SC05. They have the lowest list priced 10 GigE NICs I've seen. Their T210-CX 10 GigE NIC has a copper connection while the T210 NIC has a fiber connection. Both are PCI-X NICs (maybe if we ask hard enough they will do a PCI-Express version). They both have RDMA support as well as TOE (TCP Off-Load Engine). Chelsio also has a "dumb" NIC that does not have RDMA or TOE support and uses fiber connectors (N210). Chelsio is also using their 10 GigE technology for the rapidly expanding iSCSI market. At SC05 they announced a PCI-Express based 10 GigE NIC with 4 ports, TOE and iSCSI hardware acceleration.

10 GigE Switches

I didn't get to talk to the primary 10 GigE switch companies - Foundry, Force10 or Extreme, so I'm going to have to rely on their websites and press releases. Foundry currently has a range of switches that can accommodate 10 GigE line cards. Their high end switch, the BigIron RX-16, can accommodate up to 64 10 GigE ports in a single chassis. At the lower end, their SuperX series of switches can accommodate up to 16 ports of 10 GigE.

{mosgoogle right}

Force10 has the largest 10 GigE port count in a single chassis that I know of. On Oct. 31 they announced that they had new line cards for their Terascale E-Series switches that would allow them to go to 224 ports of 10 GigE in a single switch (14 line cards with 16, 10 GigE ports per line card). At that size they also said the price per port would be about $3,600. By they way, in the same switch chassis you can also put 1,260 GigE ports.

Extreme Networks was also at SC05. They have a large switch, the BlackDiamond 10808 that allows up to 48 ports of 10 GigE. They are also working with Myricom to use their 10 GigE switches with Myricom's new 10G interconnect NICs.

While not necessarily new, there were some companies showing small port count 10 GigE switches with the lowest per port cost available. Fujitsu was proudly displaying their 12 port, 10 GigE switch. It is one of the fastest 10 GigE switches available with a very low per port cost of approximately $1,200.

Already companies are taking advantage of the Fujitsu 12 port 10 GigE ASIC. One of the traditional HPC interconnect companies, Quadrics, is branching out into the 10 GigE market. At SC05, they were showing a new 10 GigE switch that uses the Fujitsu ASIC. The switch is an 8U chassis that has 12 slots for 10 GigE line cards. Each line card has eight 10 GigE ports that connect using CX4 connectors (they look like the new "thin" Infiniband cables). This means that the switch can have up to a total of 96 ports of 10 GigE. The remaining four ports on the line card are used internally to connect the line cards in a fat tree configuration. This means that the network is 2:1 oversubscribed but looks to have very good performance. This will one of the largest 10 GigE single switch on the market when it comes out in Q1 2006 (that I know of). No prices have been announced, but I've heard rumors that the price should be below $2,00 a port. Quadrics also stated in their press release that they will have follow-on products that increase the port count to 160 and then 1,600.

I also spoke with a new company, Fulcrum Micro and talked to them about a new 10 GigE switch ASIC they are developing. It has great performance (about 200 nanosecond latency) with up to 24 ports and uses cut-through rather than store-and-forward to help performance. The ASIC will be available in Jan. 2006 for about $20/port. A number of vendors are looking at them for making HPC centric 10 GigE Ethernet switches. They have a nice paper that talks about how to take the 24-port 10 GigE switches, built using their ASICs of course, and construct a 288-port fat-tree topology with full bandwidth to each port. The fat-tree would only have a latency of about 400 nanoseconds (two tiers of switches). Maybe the ASICs from Fulcrum Microsystems will get 10 GigE over the price hump and get it on par with other high speed interconnects.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.