Conference Reports

SC05 Wrapup - No Sleeping in Seattle

Personal Clusters

The last couple of years various companies have been discussing what some people call "personal clusters." There are a number of reasons for this new genre in clusters.

Supercomputers of the past were large central systems that were shared by users. These systems grew in size and capability, but the needs of the users and the number of users far out stripped the growth. Consequently, the amount of processing time for each user steadily decreased and the overall effective performance of the system for each user decreased. Beowulfs were partially developed as a way to give more power to the individual. The original Beowulf that Tom Sterling and Don Becker developed was designed to be used by one person or a small group of people (basically a workstation). The combination of high performance commodity components, an open-source operating system (Linux), and the availability of parallel libraries and codes, allowed for low-cost, high-performance systems to be built. This was the genesis of beowulfs.

As clusters, particularly beowulfs, became popular they started to replace the traditional central supercomputers. However, the same concept of having a large central shared resource still rules the day. So now we have large centralized clusters for users. So, in my humble opinion, we are falling into the exact same trap that doomed the old supercomputers - reducing the amount of computer power available to a user or a small team of users.

So, how do we prevent clusters from going down the same hole that doomed traditional supercomputers? I'm sure there are several solutions, but one that I see is the development of personal clusters.

At SC05, Bill Gates gave the keynote address where he talked about Microsoft entering the HPC arena and also talked about the personal cluster for individuals or small teams. While I assume that Bill didn't steal the idea, he should have talked to me before the speech anyway. However it is kind of disconcerting to have Microsoft saying the sames things as yourself. Anyway, it looks like Microsoft as well as IDC think that the small to medium size clusters will be on the rise in the next few years.

Front view of Tyan Personal Cluster
Front view of Tyan Personal Cluster

There have been a number of personal clusters developed. Orion Multisystems was one of first to announce a true personal cluster. That is, a cluster that ran in a normal cubicle environment with a single power cord and single power switch. Other companies have show scaled down versions of clusters using shorter racks. Rocketcalc has been shipping personal clusters for a few years and has multiple models available. One of them, Delta, uses an 8-socket Opteron motherboard. Other companies have also taken this approach, which gives the user a large SMP machine. However, at SC05, there were a couple of new systems that are notable.


Tyan, the motherboard manufacturer is working on a new personal cluster that is a small chassis with four dual socket Opteron boards in it. The chassis is about 12" x 12"; as you look at the front and about 24" deep. There are a number of large fans in the back to cool all four motherboards, but using the large fans helps reduce the noise of the cluster. They use the HE version of the Opteron to reduce power and cooling. Each board can accomodate up to 16 GB of memory. At the current time they include a GigE network to connect the four nodes and to connect the personal cluster to the outside world. It can also have up to 1 TB in storage in the chassis.

Penguin Computing

Penguin Computing was showing a personal cluster that, in my humble opinion, is second to none in terms and power, form factor, and ease of use.

Michael Will of Penguin showing a dual socket node
Michael Will of Penguin showing a dual socket node

Penguin Computing has a 4U rack mount chassis called Blade Runner that can accommodate up to 12 blades that have either dual Xeon EM64T processors or dual Opteron HE processors. Each blade can have up to 8 GB of memory and has two built-in GigE NICs. In the picture below, Michael Will of Penguin Computing is holding a dual Xeon blade from one of the 4U Blade Runner racks.

By the way, Michael contributes regularly to the Beowulf mailing list. He is very experienced with clusters, and helps people regardless of whether they have Penguin hardware or not.

This 4U chassis has a built-in GigE switch with an optional second GigE switch. It also has redundant 3+1 power supplies and a built-in KVM capability (presumably via IPMI). Using these 4U boxes, they can get up to 10 chassis, or 120 blades, or 240 processors, or 480 cores (if using dual-core Opterons) per 42U rack. That's density.

While the Blade Runner chassis are for normal rack mount clusters, Penguin has done something very, very clever. They have taken the concept behind the Blade Runner, created a vertical chassis that uses the exact same blades, put wheels on it, and made this single Penguin Personal Cluster unit into THE BEST, bar none, desktop cluster I've seen.

Side View of the Penguin Computing Personal Cluster
Side View of the Penguin Computing Personal Cluster

In the Penguin Personal Cluster they can get up to 12 blades, or 24 processors, or 48 cores (if using dual-core Opterons) in a single chassis. With up to 8 GB of memory per blade, they can get up to 96 GB of total memory in the unit. Each blade can also hold at least one 2.5"; hard drive (the Xeon blades can hold up to two 2.5" hard drives). To minimize the power and cooling, they use low-voltage Xeon processors or Opteron HE processors. The chassis has a built-in GigE switch with 20 ports (12 internal and 8 external). There is an optional second GigE switch so that you could use channel bonding for each blade. They have also incorporated workstation required peripherals (e.g. DVD, USB).

The Application-Ready Personal Cluster from Penguin Computing has the best balance of number of processors and individual CPU performance I've seen. At SC05, Penguin had a 12 blade, dual processor, dual-core Opteron system running Fluent in the AMD booth (total of 48 cores). It was quiet enough that I could stand next to it and talk to someone without having to raise my voice (despite the fact that I was losing my voice). Uber-cool and a very, very useful personal cluster. Nice work Penguin!

Myricom 10G

Myricom will have their new 10G product shipping any day now. It is an interesting product because the NICs can be plugged into a 10 GigE switch and they will behave like normal 10 GigE TCP NICs and speak TCP. They can also be plugged into Myricom's switches and they behave like Myrinet NICs (running MX). Pretty interesting idea. However until the price of 10 GigE switches comes down, using the TCP capability of 10G is really only good for an uplink. But, as I said earlier, the price is coming down. In the meantime, the Myricom switches will give you good performance with 10G NICs.

Front View of the Penguin Computing Personal Cluster
Side View of teh Penguin Computing Personal Cluster


Clearspeed is a company that is developing an array processor ASIC for a year or so. The goal is to accelerate floating point computations for very little power. The array processor chip has 96 processing units with each processor having 6K of SRAM. There is also 128 KB of scratchpad memory for the chip and 576 KB of on-chip memory. Clearspeed will be shipping a PCI-X card that has two of these chips. Each card can also have up to 8 GB of DDR2 memory. The card communicates to the main processors and memory over the PCI-X bus with a resulting bandwidth of 3.2 GB/s. They are working on a BLAS and FFTW library for the card so all you have to do is link to their library and the resulting code can run on the card. They also have an API for writing your own code.

At SC05, Clearspeed was demoing the card running a simple DGEMM (double precision matrix multiply) computation. However, the performance was anything but simple. The card was getting about 30 GLFOPS sustained performance! (a fast dual Opteron gets about 8 GFLOPS). Also, the card was only using about 25 watts of power (A standard Opteron has a thermal envelope with of 90 Watts).

AMD Booth

AMD usually has a unique booth at shows. Rather than just talk about AMD processors, they invite partnering companies to show in the booth with them. This helps smaller companies who can't afford a booth and gets them more exposure being in the AMD booth, and it helps AMD by showing how many partners they have and what unique and interesting things they are doing. SC05 was no exception. AMD had a number of companies in their booth. They had a neat pillar with a bunch of motherboards from various companies. Some of the boards had HTX slots ( HyperTransport eXtension) and some had dual sockets and quad sockets. One also had lots of DIMMs slots per socket (up to 8 in one case) which screams lots of memory. The thing that impressed me the most about the motherboards was the variety and the innovation that companies are showing. I also think it's safe to say that you will see a number of HTX motherboards coming out in Q1 2006. Now if I could find a board company that has a Micro-ATX board with built-in video, GigE, HTX and PCI-Express, then I would be very happy. I guess you know what's on my Christmas list :)

{mosgoogle left}

PSSC Labs was showing a liquid cooled 1U server node that was very, very quiet. The key to the reduced noise is that they are using small fans that run at about 6,000 rpm (the usual 1U fans run at about 12,000 to 13,000 rpm). They didn't know much about the liquid since the cooling system is made by another company, but they did know that it is not conductive. The cooling system was very interesting because in a small vertical space like a 1U they had a radiator with three or four of the small, low-speed fans, a pump, and a reservoir. They had the cooling attachments for a single or dual socket system. This approach might even give a quad-socket machine a chance to be put into a 1U without violating OSHA noise standards. The gentlemen at PSSC Labs said that the 1U liquid cooled servers should be out later in 2006.

Verari had one of their blades on display in the AMD booth as well. It's a very nice blade that takes a COTS motherboard and turns it vertically along with a hard drive and power supply. Their rack then takes the blades and connects them to power and communication. Verari also introduced their new BladeRack 2 product to handle the increased power requirements from new processors.

Onward and Upward

I hope everyone enjoyed the show in Seattle. If you couldn't make it, then perhaps you can use these comments to convince your boss you should go next year. Next year's show is in Tampa Bay (ahh, the South). The next two shows after that are in Austin (great BBQ - I recommend the"Green Mesquite") and Reno (think smokey stinky hotel rooms - yuck).

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He occasionally finds time to perform experiments on clusters in his basement.He also has a Ph. D. in Aeronautical and Astronautical Engineering and he's not afraid to use it.



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.