Why Linux on Clusters?

Article Index

The other main advantage it that the the entire "cluster plumbing" is open. It allows optimizations and fixes that may not be needed for the mainstream and thus deemed unimportant for the kernel maintainers. A good example of this is the TCP acknowledgment fix implemented by Josip Loncaric. (See Sidebar Josip's Fix)

{mosgoogle right} Another shinning example of cluster customization has been the process migration facilities introduced by bproc an openMosix. They were able to do things with an open kernel that would almost impossible with a closed source environment.

If we move above the kernel to the distribution level, we see a large amount of "customization" being done for specific HPC distributions of Linux. The BioBrew distribution is an example of a full Linux version tailored to bioinformatics users. Open software seems to have no bounds when it comes to HPC. If there is a need, the infrastructure is available for customization.

Finally, another factor that is often taken for granted is the Internet. Open collaboration and sharing would be quite difficult without it. News, packages, distributions, fixes, updates, patches, How To's, mailing lists, and even grids, all circulate freely throughout an international community.

The Marketing Department is Closed

In a closed source model, the features that the end users see are, of course, determined by the owner of the source code. Deciding what features a new product should have often falls in the hands of the marketing department. A good marketer checks to see what the competition has, what the current users want, and makes a decision based on the cost to implement and release new features. If you and a small cadre of users require some special feature you are at the mercy of the "marketing optimization" equations. For closed source, there is no other way. If you don't make the features list you are as they say SOL (bad-word-your-mom-told-you-not-to-say Out of Luck).

In the case of HPC, many features are at the bottom of the list because HPC market is not that big compared to other market segments. You will see a better return on your money by appeasing the bigger markets.

Let's look at process migration as an example. Both bproc an mosixrequired access to the intimate details of the kernel. These packages are extraordinary useful to the HPC market. The funny thing is they only show up in open software. There is no marketing department attempting to optimize ROI (Return on Investment). There are users who need something, there are implementors who will build things and get paid to keep them working, and no one else (no costs) in the middle. Marketing, in a sense, has been optimized from the equation.

Sidebar One: What is a Beowulf ?
Perhaps more than any other question, this one always comes up in conversations about HPC clusters. To answer the question, you need to understand that Beowulf was the project name, not a cluster. Tom Sterling says it best:

Then one afternoon, Lisa, Jim Fischer's accounts manager, called me and said, "I've got to file paperwork in 15 minutes and I need the name of your project fast!" or some words to that effect. I was desperate. I looked around my office for inspiration, which had eluded me the entire previous month, and my eyes happened on my old, hardbound copy of Beowulf, which was lying on top of a pile of boxes in the corner. Honestly, I haven't a clue why it was there. As I said, I was desperate. With the phone still in my hand and Lisa waiting not all that patiently on the other end, I said, "What the hell, call it 'Beowulf.' No one will ever hear of it anyway." End of story.

Beowulf is more a concept or methodology than a thing. A very good operational definition can be found in one of the first books on the subject, How to Build a Beowulf: "A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking technology running any one of several open-source Unix-like operating systems."

So if you want to call your cluster a Beowulf, you need to play by the rules. The key words are personal computers, widely available networking, and open-source. Because there is no such thing as shrink wrapped "Beowulf", these key words are very important. The use of personal computers could probably be replaced with commodity servers/computers. The widely available networkingpart provides a bit of wiggle room. One of the major design goals of early Beowulf systems was to keep them as commodity as possible. The benefit of using only commodity components are low cost, no unique vendors, guaranteed upgrade path, and rapid introduction. The problem with using only commodity networking components is that it may severely restrict certain networking technologies that are crucial for performance. With this in mind, Beowulf's can be build from less mainstream technologies (such as Myrinet, QsNet, SCI, or Infiniband). The jury is still out whether Infiniband is a commodity technology.

The last requirement open-source is probably what differentiates a Beowulf from all other "clustered" systems. The use of open-source operating system allows customization and rapid changes that are virtually impossible with commercial operating systems. Another side effect of open-source is the absence of licensing fees for each computer in the cluster. Open-source does not preclude operating systems like BSD from being used, but the GNU license under which Linux is distributed has pushed Linux to the forefront of the market. Finally, it should be noted that an HPC cluster composed of Windows NT or Solaris are not strictly Beowulf systems. They certainly can be an clusters, but probably should not be called a Beowulf.

Lawyer Free Zone

In 1997, I found an articlein EE Times describing the creation of a "Lawyer Free Zone" in Scotland to help foster collaboration in the semiconductor market. Interesting idea.

When I think about Linux and clusters, I think about how the GPL has created a "Lawyer Free Zone" for software development. (SCO of course believes otherwise). Think about the fact that there are people from many large companies (like IBM, SGI, SUN, and HP) who would, outside of the GPL, never put there development people in the same room -- let alone co-develop software. The large array of Linux file systems, is only one example of how clusters have benefited from this safe haven. In a sense, the GPL, has lowered the "lawyer latency" (measured in months/years) for collaborative projects to near zero.

In addition, discussion on mailing lists and technical meetings is also unencumbered. Everyone benefits by co-operating, which, by the way, is the goal of any successful legal agreement.

Because We Play Computer Hardball

From my experience with the HPC community, I can say with complete confidence that if Linux was unstable or did not work as expected, it would have been given the boot long ago. Losing a weeks worth of results because of node crash can be a serious setback. Although a lower level of stability is often an accepted part of the mainstream, it will find no quarter in the HPC world. The HPC market, by definition, pushes the limits of everything it touches. In this respect, Linux is a major league player.

Vendor Lock-in

The classic business strategy of selling a customer something that requires them to continue buying products and services, has fueled the growth of many companies. It has also been the best source of boat anchors in the HPC market.

Let's consider a common scenario. Your organization buys a nice new supercomputer called the Whopper Z1 from FBN (Fly By Night) Systems. The Whopper Z1 runs WOS 1.0 a version of UNIX ported for their system. The computer works well for the first year. Everyone is happy. Then, in the second year, you want to add more memory. Well, in order to keep the service contract intact, you need to buy the memory from FBN systems. Funny, it looks like the memory you bought for your home computer, but it costs ten times more than you paid. So you upgrade the memory, and while you are at it you upgrade to the next version of WOS (version 2.0). Everything is fine, until year three. It turns out that the Whopper Z1 is now going "off contract" because a new replacement system your organization is buying, called the Whopper Z2 , has been installed. The Whopper Z2 also has a new version of WOS (version 3.0) which does not run on the old Z1 system. Now the old Whopper Z1 is pretty much useless and will be kept on-line for another year to allow everyone to move their codes over the new new machine. After this time, you can not really sell it, or use it because hardware or software support is expensive and is considered obsolete. Ah, but if you tied a rope to it, it could indeed be used as a boat anchor.

Now consider the scenario where Linux was used for the operating system. Since the source code is available, you can if you choose keep the old Whopper Z1 running without a support contract. You can find people who can help you fix things. You may even have some "Linux Hackers" on staff because they have been running Linux at home for five years. And, as you find out, this is a good thing because FBN Systems goes out of business and now you are now stuck with two large pieces of hardware and a tape with a binary versions of WOS on it.

In the end, "Vendor Lock-in" is always bad for the customer. No one likes to here "you can't do that." The word "can't" and "Linux" are not often used in the same sentence.

    Search

    Login And Newsletter

    Create an account to access exclusive content, comment on articles, and receive our newsletters.

    Feedburner

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.