The Network Is The Glue

Know the network, know the cluster.

In previous columns we looked briefly at compute node hardware for use in clusters and learned some basic cluster design principles. The most important of these was to focus on the problem at hand – getting the most work done for the least amount of money, effort, and time invested. This analysis means being able to resist the natural inclination to buy nodes with the highest possible clock speed to the exclusion of all else (unless you know for a fact that clock speed is the only important rate determining factor for your cluster and that the highest clock nodes buy you the most clock cycles per second, in aggregate, for your money, which is almost never true). We learned that there were actually quite a few very broad systems descriptors – CPU architecture, clock, memory architecture, disk, the network – any one of which might turn out to be "the" rate limiting resource that bottlenecks your application and determines the amount of work you get done for a given dollar investment.

We learned to study our own applications to understand where they are likely to be bottlenecked (what kind of work is being done where it is doing the most work and what resource is holding it back). We learned to use that application as the critical benchmark wherever possible – measuring the performance of a node prototype on your application beats the heck out of trying to predict that performance on the basis of the vendor's published benchmark numbers or a bare knowledge of its CPU clock plus the fact that Joe down the hall told you that the boxes were "really fast" and Joe is well known to be an expert in all things computational.

However, we have barely scratched the surface in terms of learning about computer architecture. Things we haven't discussed at all include the details of computer architecture – how motherboards are organized, just what a bus is and why bus speed and width might be important to your application, how a microprocessor talks to memory (and how MANY different kinds of memory there are likely to be on a system), and how the processors talk to peripheral devices. We haven't discussed operating systems and compilers and libraries. And we haven't discussed the network.

All of this is important in cluster design, but if you are armed with the right benchmarks (your application) and are somewhat constrained in your choice of operating systems and compilers and libraries (or are just going to stick with Linux, gcc, and the standard libraries that accompany your favorite Linux distribution) then knowing all of these details yet isn't that important. Of course, knowing some of these details is important, and that's where we are going next. {mosgoogle right}

If you've been reading this column from the beginning, you should already have a pretty good idea that the network is almost as important as the computer in cluster design. Yes, the computer nodes do the actual computational work, but we derived in moderate detail equations for the speedup you can expect dividing a task among many nodes, and it turned out that in many cases it was the speed of the network that determined whether or not the application would actually complete in less time when spread out on many nodes. This result has profound implications in cluster design, which we will now explore.

A Networking Primer

Networking is one of those subjects that is really too deep and complex to be reducible to a "simple" column that makes you an instant expert. However, the world's greatest expert (it isn't me – I don't even know the world's greatest expert) wasn't born knowing all about networking, they had to start somewhere. So today we're going to start here. If you are already an expert, my apologies – skip ahead to the next article, go get a beer, stare out the window for a while.

What is a network? For our purposes it is everything associated with communications between an application on computer A and another application on computer B. We will abstract the communication process only one degree and assume that the information being communicated is transmitted in finite-size chunks called packets. A packet can be thought of as a single unitary burst of information going from computer A to computer B or vice versa.

Whoa, you say. Hold on there, dawg. That means my application's communications code, the communications libraries it calls, the operating system that runs that communications code on the CPU, the CPU itself, the computer's memory, its bus(es), its physical device(s) such as network interface(s), the physical wire itself, and all of it repeated all over again on the other end, that's the network?

Yes, that's it exactly. The network is the totality of how information gets between applications, and technically includes the networked applications themselves! It can even describe at least some of the ways that information gets between two applications running on the same computer. The network is more than just the physical layer over which data passes between two computers!

Here is the venerable ISO/OSI (International Standards Organization, Open System Interconnect) model for "a network". It consists of seven layers, the last of which are themselves applications! In this model, you the user are the final layer of any network:

  1. The Physical Layer. This is layer defines the physical media over which the network runs, e.g. wire or optical fiber or radio frequency.
  2. The Data Link Layer. This defines the way raw data is encapsulated for transmission on the physical layer – the size and structure of basic packets of information. This layer is generally associated with the actual interface in your computer to which the physical layer is coupled. The one you are most likely to be familiar with is Ethernet, although there are others.
  3. The Network Layer. This layer is responsible for routing data between specific systems and contains (possibly elaborate) mechanisms for assigning and retrieving a systems identity and address information. In general the network layer is represented by the Internet Protocol (IP), which specifies its own addressing information, data encapsulation, and has its own packet header (itself encapsulated within an e.g. Ethernet packet).
  4. The Transport Layer. This layer manages transmission control and the division of long messages into suitable packets. It is also typically the layer where reliability is built into the network protocol (or not). There are two protocols in common use on top of IP. TCP (Transmission Control Protocol), is a reliable but relatively slow protocol that manages sequencing of a stream of packets and retransmission on failures (both important to run a service over a wide area network where packets that are part of the same message can take different routes and arrive out of order or missing). TCP is state-aware – a connection is maintained for an extended period of time for most TCP transactions. UDP (User Datagram Protocol) is stateless/connectionless, unreliable (in the sense that the protocol itself doesn't provide sequencing or retransmission), and much faster. Important services are built on top of both of these protocols.
  5. The Session Layer. The layers above are all recognizable as "the network" to most people, but the last three layers move increasingly into application and user territory. The session layer defines how data moves over the network. Session layer tools in Linux are typically Remote Procedure Calls (RPCs) that manage data transformation transparently for the user.
  6. The Presentation Layer. This layer continues this work by providing a "standard" data representation that is independent of the host. The "eXternal Data Representation" (XDR) sits at this layer; tools convert local data into a canonical form and vice versa.
  7. The Application Layer. This layer contains network services (user applications). For example: mail, ftp, http, DNS (domain name service) and more are all network applications.

Although the ISO/OSI model is well-reasoned and respected, it is also a bit more complex than it needs to be. Much of the software in common use in the Unix/Linux world (and by inheritance in the Windows world as well) can be equally understood in terms of the more economical TCP/IP model:

  1. The Link Layer. This layer includes layers 1 and 2 above, and includes the physical network, the network devices, and their (raw) device drivers. "100BT Ethernet on UTP" is one possible description of such a layer. Others include ATM (Asynchronous Transmission Mode) on various media and a variety of cluster specific, proprietary link layers such as Myrinet, Scalable Coherent Interconnect (SCI), QsNet, and more.
  2. The Network Layer. This layer is more or less the same as the ISO/OSI layer – IP and things such as ICMP live here to help the network route physical packets.
  3. The Transport Layer. Again the same, this layer is e.g. TCP and UDP.
  4. The Application Layer. Also the same.
This model skips the session and presentation layer. In other words, in the TCP/IP model, the physical network isn't split into media and device, and the user is left responsible for any transformations required for the data. This model is popular because the near-universal acceptance of TCP/IP networking over Ethernet as the basis for e.g. the web has led to far more commonality in data representation than there once was, and because the tools required to manage data conversion add significantly to both overhead and security risk.



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.