Getting Serious: Cluster Infrastructure

Building Serious Clusters or Your Cluster Needs a Happy Home

Remember these words; location, location, location. Clusters need space, power, cooling, and network access. So far, this column has focused on getting you started on cluster computing, where by "started" I mean started at the very beginning with the basic ideas of parallel computing. If you've been reading along, you'll recall that we began by describing the very simplest of cluster designs, the "I already have a Network Of Workstations, presto-change-o and it's a NOW cluster" design. We then put this simple cluster to work with a threaded Perl task distribution script, learned about Amdahl's Law, and most recently how learned to write a "real" parallel application using PVM.

Hopefully you are convinced at this point that clusters can be put to work for you to speed up your parallelizable high performance computational task. You're ready to write your grant proposal, to go to the board for funds, to pull money out of your own darned pocket if necessary but one way or another, build yourself a spanking new cluster and put it to work. The only problem is that you don't quite know what you need. What does one need to think about and plan for to build a cluster of any given scale, anyway?

The next few columns will be for you. In this one, we will examine cluster infrastructure -- once one departs from the NOW model clusters need a place to live, power (possibly quite a bit of power), cooling, (possibly quite a bit of cooling), controlled access and security, and hot and cold running networks. A happy cluster room will also have certain amenities -- a jacket (it gets COLD in there, or should), a work bench, a refrigerator well-stocked with beer and jolt cola. You may not get all of these, but it can't hurt to try.

Housing Your Nodes

Let us suppose that you are preparing to build a real cluster. You have to make several decisions right away. The first one is its size. How many nodes will your new cluster have? If the answer is four, your cluster can go on your desk and you can skip the rest of this article. However, answers like 16, 64, 1024 require careful consideration of how you build your cluster and where you build your cluster. (Don't ask me why, but cluster builders tend to think in powers of two. Imagine that.)

If your cluster is going to be small and will remain small (where small is less than or equal to, say, 32 nodes), it can be built out of mid-tower systems stacked up on ordinary steel shelving from your friendly local hardware store. One good sized heavy duty shelf will typically hold roughly 16 nodes (up to 32 processors in a dual CPU configuration), so a couple or three units will hold your cluster. Its "footprint" (occupied floor space per shelf) will be something like two meters square, allowing for room to get around the shelf and access to the front and back. Shelf mounted clusters are the cheapest you can build from the hardware point of view and very common for starter/trainer clusters, hobbyist clusters, small special purpose clusters. They do waste space, though, and at some scale space becomes more expensive than hardware. {mosgoogle right}

If your cluster is going to be larger than eight nodes but with no clear upper bound on size in sight -- 16 nodes this year, but maybe 32 more next year and who knows beyond that -- then you should almost certainly consider rack mounted nodes. Rack mounted nodes cost a bit more, and racks cost more than cheap shelving, but consider their footprint: A rack is a bit over 20" wide including the base and supports, and as deep as the node cases you buy, typically around 30". You can put TWO racks side by side in two square meters. with access to the front and back and room to walk around.

A rack is measured in "U", where a U is 1.75". Racks come in various heights with 40 to 45U being typical for a cluster environment. Nowadays it is easy to buy dual CPU nodes in 1U cases. One rack can therefore hold 32 to 40 nodes, 64 to 80 CPUs (depending on how much room you reserve for patch panels, network switches, power regulation devices). Two together therefore can hold as many as 160 CPUs in two meters square, where shelving holds only 32.

There is a third alternative for people with severe space and power/cooling requirements: Bladed clusters. Blades are basically single board computers that snap into a common rack mounted backplane that might be 6-8U in height. Blades permit you to double or quadruple CPU density relative to ordinary rack mounts, but often do so at the expense of CPU speed and are always considerably more expensive per CPU.

Physical Space

Once you've decided how you're going to house and shelve your nodes, you need a place to put them. This decision is not as simple as it sounds; the space needs to have certain properties and to provide certain resources. Here is a short summary:

  • Space. You need to be able to physically locate all your shelves or racks in such a way that you can easily access them to add new or remove broken nodes for service, get to the back to cable them, get to the front to turn them on. Plan on at least a meter between shelves or racks front to back. You also need to worry about delivery of cool air to the fronts and removal of warm air from the rears (see section on cooling).
  • Floor support. A loaded rack can weigh as much as a half a ton, all concentrated on less than a square meter of floor space. Nothing can ruin your day like having a half-ton of very expensive equipment break through a floor and fall onto the head of a colleague in the office below. Ruin your life, even. Get a professional architect or structural engineer to inspect your proposed space if there is ANY DOUBT about the ability of the floor to sustain the weight of the proposed cluster.
  • Raised floor vs solid. This option is a related question. Some server/cluster rooms are built with raised floors so that all cabling and cooling can be delivered from underneath. This is good in that it results in a very neat and tidy room; it is bad in that it is a pain to GET to all the cabling and cooling for modification or maintenance and raised floors need to be carefully built to be structurally sound. Two post racks generally require solid floors as the racks have to be affixed to the floor with bolts to withstand a certain amount of torque. I use solid floors (and cheaper two post racks to some extent) but am neutral on the issue.
  • Power. I could write a book on power alone. Or give it its own section, below. The short answer (for now) is that one needs very roughly 100 watts of power per CPU with most modern node designs, and should wire for twice this at least to accommodate future power needs and a degree of spare capacity.
  • Cooling. Again, worthy of its own section. Every watt that goes into your space has to go out again, or the temperature goes up. Computers HATE being hot -- they break when they get hot. In a cluster they can break a hundred thousand dollars' worth at a time.
  • Network. A cluster REQUIRES networking to function. Every node will therefore require at least an Ethernet connection (switched 100BT or 1000BT) and high end clusters might require fiber or specialty cabling for high end networks such as Myrinet or Serial Channel Interconnect (SCI). Wiring trays, patch panels, room for switching devices all need to be accommodated in your plan.
  • Security and Access. A cluster can cost quite a lot of money and is composed of components that are attractive to thieves. The cluster room therefore needs to be physically secured with locks and possibly alarms and monitors. However, a cluster also breaks and needs maintenance, so it has to be accessible to administrators and possibly users.
  • Physical Environment. A cluster room often needs monitoring facilities that can keep an eye on things like sound levels, temperature and humidity in an automated fashion. If cooling fails, a densely packed cluster room can get hot enough to cause massive system failures rather quickly.
  • Amenities. Things that you very likely want to have handy in your cluster room include: Storage for spare parts, systems, tools. A well-equipped work bench where nodes can be assembled and disassembled for repair. Extra cabling of all sorts. A jacket (your cluster room SHOULD be too cold for comfort). Headphones and a source of music both for diversion and to keep the roar of the air conditioner out of your ears. I was just kidding about the refrigerator and beer. Wistfully kidding, but kidding. [Raised floor panel AC vents make great beverage chillers. -- Ed.]

Power

Electrical power is a cluster's natural food, and they need a lot of it. In fact, they need more of it than you might expect or estimate, so it is wise to have a solid surplus -- more circuits than you will likely need. Providing power to nodes is also not as simple or obvious as it might appear -- there is more to good electrical power in a cluster room than just having a bunch of circuits.

It is beyond the scope of this column to teach you everything you need to know about electrical wiring to ensure that your cluster is well-fed and happy. However, I can direct you at various useful resources where you can learn more. Let me start with two or three bits of very basic advice:

  • The room should be wired by professional, certified electricians, ideally ones with direct experience in wiring computer rooms.
  • The room wiring layout should be designed by a professional e.g. architect with experience in computer rooms as well.
  • The wiring should be done according to all applicable codes.

These rules are just because faulty wiring can kill people and start fires. You can also HOPE that your professionals know enough not to make horrible mistakes in wiring a computer room.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.