A Web-Based Tool for Optimized Cluster Design

Read this article and become a cluster design expert! Use a new tool from the aggregate.org to determine price and performance before you buy! Get a handle on everything from Ethernet cables, to GFLOPS, to power and cooling. ClusterMonkey likes to call it the Clustanator, you will probably call it extremely useful.

Whenever someone asks what hardware to buy for their new cluster, the answer is always, "It depends." It depends on what application the cluster will be used for, it depends on how much space, power, and cooling are available, it depends on the costs of operating the cluster, and it depends on how much the parts that might be used cost. The standard process is to analyze the application and use rules of thumb and experience to guess what hardware will work best. The sophistication of the analysis depends on how much money is involved and how much computer engineering expertise the designer has.

Large clusters, with price tags in the millions of dollars, justify spending a lot of effort to characterize the applications that will run on them and to design the best system for those applications. However, the vast majority of clusters that get built are smaller, typically costing anywhere from $10,000 to $200,000 to build. It is not economically viable to pay an expert for time to design such a cluster. The number of components available and the ways they can be combined can be overwhelming, even for experienced designers. Moreover, the scientists and engineers wanting a cluster are typically not computer engineers. Rather, they are experts in their own field who are just using the cluster as a tool to help them solve their problems faster than is otherwise possible. Automating the design process is the key to helping both experienced and inexperienced designers get the most for their money for these low cost systems.

The Cluster Design Rules (CDR) is a web based software tool for designing this sort of cluster supercomputer. Users specify requirements of their applications and resources available to them, like power, cooling, and floor space. The CDR uses these constraints along with a performance model and a database of available components to find a design that meets all the constraints and optimizes performance. {mosgoogle right}

The Cluster Design Rules

The CDR models a cluster based on commodity components available to the end user. The CDR combines network interfaces, cables, switches, motherboards, processors, memory parts, disk drives, cases and racks from a database to design a cluster. The CDR searches for viable designs by selecting a number of nodes and a motherboard type. It then tries to build various network topologies (no network, ring, 2D mesh, 3D mesh, switched network, tree, fat tree, flat neighborhood network, flat neighborhood network of trees) using available network interfaces, switches, and cables. The remaining system components are added to the design one at a time until a complete design is available. Complete designs that do not meet application or resource constraints are discarded. If at any stage a partial design cannot be completed without violating the application or resource constraints, the design is removed from the search. For example, if a partial design costs more than the acquisition budget, then it cannot possibly be part of a valid full design because adding more components will only increase the cost.

The application and resource constraints describe the needs of the application and the resources available to the user. Table One lists the application parameters users can set as constraints in the CDR. Some constraints, like data size, may be known a priori from the application and the type of problem being solved. Other constraints like memory bandwidth and the networking parameters can either be estimated based on the source code analysis and knowledge of the problem, or they can be measured from profiling running versions of the application. It is worthwhile to point out that memory bandwidth is measured in bytes/FLOP so that it will scale with processor speed and memory bandwidth. Also bisection bandwidth is measured per processor core, because all of the cores in the same node shares the network links attached to a node.

Table One: Application constraints modeled by the CDR
Memory Size for Data (Bytes/cluster) Message Latency (μs)
Memory Size for Code (Bytes/node) Collective Latency (μs)
Memory Size for Operating System (Bytes/node)   Bisection Bandwidth/Processor Core (Mbps/Core)
Memory Bandwidth (Bytes/FLOP) Coordinality (Nodes/Node)
Virtual Memory Size (Multiple of node memory) Number of Nodes/Processors/Cores (n2,n3,2n)
Local Disk Storage (Bytes/Cluster) GFLOPS

Resource constraints describe the budget and infrastructure available to the user. Table Two lists the resource constraints considered by the CDR. The first resource constraint is often acquisition budget for the cluster. However, available power, cooling capacity, floor space, and operating costs are often more important limiting factors. It is not uncommon for a user to buy as many nodes as they can afford only to find out they do not have enough power or that their current air conditioner cannot keep the room cooled. The CDR considers these constraints and avoids designs that will not fit within existing infrastructure.

Table Two: Resource constraints modeled by the CDR
Floor space Operating Budget
Power Acquisition Budget
Air Conditioning

All designs that meet application constraints and resource constraints are ranked by a performance metric. The performance metric can either be a weighted sum of system-wide parameters, like usable GFLOPs, network bisection bandwidth per processor, network latency, memory bandwidth per processor, or acquisition cost. Alternatively, the metric can be based on an application model. When no application model is available the weighted sum is useful for approximating performance. The weighted sum method measures each system parameter relative to the minimum amount specified as a design constraint and multiplies it by a weighting factor. The most important system parameter is weighted most heavily followed by the second most important parameter, etc. Determining precisely what the weightings should be used can be difficult, but it is usually easy to guess a reasonable range for the weightings. The CDR runs quickly enough that designs covering a range of weightings can be computed in a short amount of time. Designs that rank among the top designs for many combinations of settings are likely to work well.

As an alternative to the weighting factors, the CDR provides several application performance models. An application performance model estimates application performance based on design parameters. Application models are usually more accurate than a simple weighting formula because they can use the system parameters, including network topology, in any arbitrary calculation that can be expressed as C code. Currently, application performance models are available for the SWEEP3D benchmark [HoLW00] and the HPL benchmark [PWDC04], but a programming interface is available to add new models.



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.