CANIS Beowolf Cluster

Comments on Beowolf and Interspace from Robert McGrath

After a quick review of some of the Beowolf articles, here are some discussion points.

1. Basic architecture

A Beowolf cluster is an enhanced NOW, with compute nodes with 1 or 2 (or more?) Pentium processors, distributed memory, and 1 or 2 (or more?) network links between nodes.

This cluster can be seen to have very fast nodes, with very large caches and memories at each node; In fact, each individual node is more powerful than supercomputers of 5 years ago.; (This should be remembered when comparing Beowolf clusters to, say, a CM-5.)

The Beowolf software seems to provide a fairly complete and robust set of clustering services: a cluster-wide process id space, several standard forms of message passing and scheduling, and perhaps a Distributed Shared Memory package; There seems to be considerable attention to network issues, especially tuning; (Of course, nothing can change the huge network; latency application to application.)

Beowolf seems to work with several network topologies, which is nice; We'll need to study this a bit, as there will no doubt be cost/benefit tradeoffs depending on the particular configuration we are thinking of.

I'm not sure if Beowolf provides much in the way of a shared file system; It seems to rely on NFS or packages like MPI; This will be a pretty crude environment, which will support parallel I/O but not transparently.

It is also not clear how Beowolf really works with > 16 nodes (if we are considering that many); I.e., can you really manage a 32 or 64 node cluster?

2. Problems that Beowolf Should Be Good For

This kind of architecture is known to be very good for certain classes of applications, and absolutely horrible for others.; Basically, the bad cases involve global synchronization (e.g., updating a shared data structure) or constant exchange of small messages/data items (some kinds of transaction processing).; The good cases involve problems that can be decomposed into chunks, computed, and then combined, all without excessive synchronization between nodes.

Interspace applications -- probably good news

The good news is that the Interspace applications are mostly good cases:

1. Concept spaces -- probably will work well2. 3. Category Maps -- may work OK, although the classic Kohonen algorithm has a global update that is a problem4. 5. Automatic keyword generation -- probably OK6. 7. noun parsing, etc. -- Fine.; This is serial for a document, but parallel across documents8. 9. other algorithms (image crunching, vocabulary switching) -- I have no information10.

Naturally, this analysis should be done more carefully.

My Conclusion: there is every reason to seriously consider Beowolf for Interspace computations.

3. Questions to Consider

Here are some suggestions for issues that should be carefully considered.

a) Problem areas that are problems for all architectures

1. How does a OODBMS (e.g., Versant) work on this system?
2. - generally speaking, the DB is a global synchronization point, so it will be a problem area in any MP system
3. How can we do data ingestion?; This is already the slowest part of the system, and it is not clear whether Beowolf will be better, worse, or no different.  If we can't get data in fast enough, the cluster will starve.
4. 5. How about output, e.g., browsing/display?; What are the requirements here, and how does Beowolf measure up?; Can we get data out fast enough to sustain the "consumer" programs?

b)Environmental considerations

1. How compatible is this with other systems (e.g., with NCSA Origins)?

2. If we have to support several radically different versions of the same software, that will be costly2. 3. how easy/hard will it be to share data with other systems

3. How does Beowolf work with Wide Area Networks?

4. I'm not sure what the requirements are, but if there has to be fast access from remote locations, this can be a problem area in clustered systems

c) Practical Maintenance Issues

We're talking about a fairly large number of independent systems that are tied together into a single cluster; This can be a real nightmare to keep afloat.

1. How do backups, upgrades, installs, etc., work? Whatever works for a single node will almost surely not scale up to a cluster of 16 or more.; How will we ("atomically") upgrade the entire cluster in a reasonably short time?; How we you install software?; How will we know what software is installed?2. 3. How fault tolerant is the system?; How easily can nodes switch out and in?4. McGrath's 2nd law of parallel systems: If you have enough processors, some of them are broken.

d)Heterogeneity

Beowolf is designed to be completely 'symmetric', with all nodes being identical, and having identical capabilities.; This is unlikely to be true in a real system.; Because of the factors above, and because it is impossible to obtain and maintain completely identical PC's, there are always going to be differences between the nodes.

Areas of heterogeneity:

1. Processor/memory/cache (if you add or replace nodes, they will be different--new ones will probably be faster and with more storage)2. 3. network channels (some nodes may have more or fewer channels at some times, channel speed may not be identical)4. 5. disk layout (some will have more free disk than others, disk speed may vary, etc.)6. 7. the effective "distance" between nodes may not be uniform8. 9. the software configurations may vary; (some software may only be available on certain nodes, nodes may have different versions)10. 11. external devices; (there will likely be a single attachment to the outside networks, a single fast tape drive, possible other asymmetries)12.

How is this handled?; How well is this handled?; The issue here is configuration control.; (These issues and common solutions are familiar from "rich people's supercomputers".)

1. Does the system know and present all the "important" facts about the resources?; This has to be dynamic, since the cluster can change from day to day if equipment fails or software is upgraded.2. 3. How can the user specify and reserve the right resources?; This really should be automatic rather than a static script.4. 5. How are jobs mapped to the resources of the cluster?; What sort of scheduling and contention resolution is provided?6.

mail to: Robert McGrath
mail to: Baba Buehler: Networking Architectures

 

 

 

 

 

 

 

 

 

 

 

... ... ...