Conference Reports

SC06 - Tampa: Where did all of the nightlife go?


As I mentioned in my discussion about Penguin Computing, people think that the HPC market is large enough to justify the introduction of new hardware. One company has taken an even bigger view of the market and is introducing a whole new type of HPC systems. SiCortex has developed a new line of large processor count/low-power HPC systems. Linux was a designed criteria further supporting the idea that Linux is the HPC operating system of choice.

Over the last few years, the overall power consumption of system has dramatically increased. The CPU vendors have been slowly edging up the power requirements of their chips. First they did it with the power for a single core. Now they have multiple cores and other functions on a single chip. This is coupled with the huge appetite for more compute power and this drives up density. The end result is that data centers are having a difficult time keeping their data centers cool. In fact, certain companies such as APC, are recommending that under the floor cooling will not be adequate depending upon the power consumption density. The simple reason is that it becomes almost impossible to get enough volume of air through vented floor tiles (Marlyn Monroe's skirt along with the rest of her would have risen a few feet with some of the air flow requirements that data centers are seeing). What can be done about it? There are several options: (1) try alternative cooling methods, (2) put all data centers above the Arctic circle, or (3) try low power chips but use lots of them. This third approach is what SiCortex is trying.

SiCortex has based their hardware on a low-power 64-bit MIPS chips that run at 500 MHz. They then take six of these cores and put them into one chip. But, wait, that's not all. They also put a high performance interconnect and a PCI-Express connection to storage and other networking on the chips as well. Each chip has 2 DIMM slots associated with it to form a node. The nodes are packaged in groups of 27 per module. Each module has 27 nodes and 54 DIMM slots. as well as the needed interconnect, PCI-Express access and support networking. A single node with DDR2 memory only consumes about 15W of power. So a module only draws a bit over 400W of power. The Mother of all Boards used by SiCortex is shown in Figure 11. Each blue and black heat sink is six cores. You can get an idea of the board size by using the DIMMs as a reference.

SiCortex Motherboard
Figure 11: SiCortex Motherboard

Since SiCortex is using standard MIPS chips, the standard Linux kernel and remaining software stack will work with the system. However SiCortex has had to write their own network drivers and monitoring tools -- all of which are open source. They are using Pathscale's (now Qlogic's) EKO compilers for their systems. The developers of the EKO compilers were originally MIPS developers. So porting the EKO compilers to the SiCortex architecture wasn't too difficult.

At SC06, SiCortex introduced two models. The SC5832, a larger machine with 5,832 processors, up to 8 TB of memory, and 2.1 TB/s of IO capacity (see Figure 12).

SiCortex SC5832
Figure 12: SiCortex SC5832

It comes in a single cabinet that is kind of interesting looking. It provides a theoretical performance of 5.8 TFLOPS. It uses only 18KW of total power and is about 56" wide x 56" deep x 72" high.

The second model, the SC648, is a smaller cabinet, with only 648 processors (see Figure 13).

SiCortex SC648
Figure 13: SiCortex SC648

It offers a theoretical peak of 648 GFLOPS of performance, up to 864 GB of memory, and 240 GB/s of IO capacity. Because of the low-power chips, the SC648 can plug into a single 110V outlet and draws 2 KW of power. The unit is 23" wide x 36" deep x 72" high.

One of the challenges that SiCortex has to face is that they are using a large number of chips to achieve high levels of performance. This means that to get a large percentage of peak performance they need a very good, low-latency, high-bandwidth interconnect. So SiCortex has done away with the traditional switched network and gone to a direct connect network. But instead of using something like a Torus or a Mesh, they are using a Kautz network. The network connects each of the nodes where the network traffic is handled by dedicated hardware on the chip, so there is no load on the CPU cores.

I think the SiCortex is an interesting system. It is a fresh approach to HPC and also provides better performance/watt than other approaches. I think SiCortex will be facing a number of challenges, however. Unlike a commodity cluster, they will have to develop and support much of their own hardware - chips, boards, interconnects, etc. These tasks cost money and one assumes they have to recover the cost through a higher product price. (although no pricing has been announced to date). Second, they are using lots of chips to achieve high performance. This means that codes will need to be able to scale well to take advantage of this cool hardware. But, I take my hat off to them. I think it's always a good idea for a new company to enter the market and shake things up. Plus, I think it's high time we start paying attention to power consumption of our clusters.

What you didn't see

This should have been the year of 10 GigE for HPC. There were some really cool 10 GigE companies on the floor - Neterion, Chelsio, Neteffects, Foundry, Netxen, Extreme, Quadrics, etc. But the per port price of 10 GigE is still way too high for clusters. I was hoping that we would see 10 GigE NICs below $400 and switch costs below $500 for large port counts (128 ports and above). But the costs of 10 GigE are still too high. Let's look at why the costs for 10 GigE are still too high. Although one interesting development is Myricom's support of 10 GigE.

Ten Gigabit Ethernet is pushing lots of data across the wire (or fiber). This development means that the frequency is a bit higher so you need thicker cables to handle the data transmission requirements. Initially 10 GigE used fiber connectors which are nice and thin and easy to work with (small bend radius) but are expensive compared to copper cables. Plus you need the laser converters at each end of the fiber which also drives up the cost. Then 10 GigE moved to CX4 cables like the ones used for Infiniband. Infiniband has been using these cable for even DDR (double data rate) IB, but they are pushing the limits of what the cable can carry and have some limitations on cable length. Still these cables are reasonable inexpensive and you don't need the laser converters. Also, the cat6 specification that could allow 10 GigE to travel over wires similar to what we use for GigE was released this year. So why haven't the costs for 10 GigE come down?

In talking to people in the industry I think the answer is:

  • The cost of the PHY's is still too high either for copper (CX4 or cat) or fiber.
  • In the case of fiber, the cost of the cables is high compared to copper.
  • PHYs for the new cat specification for 10 GigE aren't really out yet.
  • Switch cards for the cat specification aren't out yet.
  • The price of 10 GigE NICs hasn't really dropped yet (could be a function of demand).
  • The demand for 10 GigE hasn't yet gotten large enough to drive down the costs.

I was hoping that HPC could ride the coat tails of the enterprise market that would start yelling for cheaper 10 GigE. However, this demand hasn't really developed to the point where costs are starting to go down.

In addition, the price of the 10 GigE NICs hasn't been dropping by much either. Perhaps this is a function of the price of the PHYs, but perhaps not. Regardless, you are still looking at a minimum of $700 for a 10 GigE NIC.

So is 2007 going to be the year for 10 GigE in HPC? Maybe. There are some companies making new NIC ASICs that could drive down the price of the NICs. On the other side of the equation, companies such as Fulcrum Micro are developing 10 GigE switch ASICs can help drive down the price of switches and drive up performance. Perhaps the combination of development coupled with increased demand will drive down the cost of 10 GigE. But there is a potential problem though.

In 2007, Infiniband is going to release QDR (Quad Data Rate) IB. This will almost require a switch to fiber cables to handle the data rates. So IB is going to start moving away from copper wires. This is both a good thing and a bad thing. It's bad in that the combination of the 10 GigE demand and the IB demand helps drives down the prices of CX4 cables and now IB is not going to be able to use the cables for their new product. However, it's good in that this might help drive down the cost of the fiber optic PHYs. This could help fiber connectors for 10 GigE.

With the appearance of fiber cables for IB and with 100 Gigabit Ethernet being developed which will require fiber cables from the start, I think it's safe to say that fiber is the future for HPC cabling. Let's just hope there is a enough demand so that the cost of the associated hardware (PHYs, etc.) can come down enough in price that we can afford such things in clusters (never mind putting them in our house).

Potpourri for a Million, Alex

This is category where I put lots of other neat stuff or neat companies that I didn't get a chance to see or even grab their literature. But I think they are cool enough to warrant a comment and perhaps even some rumors. {mosgoogle right}

  • There was a company, Evergrid that as showing what they claimed was a true system level checkpoint and restart capability. I heard their demo was compelling. Something to check out.
  • Liquid Computing was there but I didn't get to talk with them. They are trying new ideas about constructing clusters to make them more scalable.
  • Qlogic mentioned that they will have a double date rate version of Infinipath in 2007.
  • There was a UPC (Unified Parallel C) booth with George Washington University. This is a new high-level language that can help with getting new applications onto clusters.
  • Scali was showing the new version of their cluster management tool as well as some cool performance improvements in their MPI (BTW - they won the HPCWire 2006 Editor's Choice Award for the cluster management tool - Scali Manage).
  • Dr. Hank Dietz and the gang at were there but I didn't get to spend as much time as I wanted with them. They are always up to something really cool. Check their website.
  • I didn't get over to a number of other companies that always have something interesting to talk about: Clearspeed, Quadrics, Myricom, Voltaire, Chelsio, Neterion, NetXen, and Neteffect.

There are many other companies that have cool things to talk about at the show, but I just couldn't get to everyone nor grab literature in the short time I was there. Be sure to look at the floor plan from the SC06 website and see who was there. Also look at the website to see announcements during the show. You can also hear Doug and I discuss SC06 and interview the likes of Don Becker and Greg Lindahl on ClusterCast.

Until Next Year

I think I will stop here before I attempt to mention just about everyone who had a booth at the show. While the show seemed slow and the floor was cramped, and sometimes I think "SC" stands for "Sleep Deprivation Seminar," there were a number of cool things at the show and I got to see lots of friends. So it's not all bad by any stretch. Next year's show is in Reno Nevada at the Reno Convention Center. I've been to Reno the last two years for the AIAA show in January. I'm not a big fan of the Reno area, but we'll see what the show will hold (God, I hope I'm not becoming a grumpy old man...).

Jeffrey Layton has been a cluster enthusiast for many years. He firmly believes that you have a God-given right to build a cluster on your own (but only after getting approval from your management, household or otherwise). He can also be found swinging from the trees at the ClusterMonkey Refuge for Wayward Cluster Engineers (CRWCE - hey I think this spells something in Welsh).



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.