gcontrol

In a large data analysis project such as Elixir, there are usually certain sub-systems which resemble an assembly line: A continuous stream of objects, each one very similar to the next, enters a factory. They pass through a series of stages at which some process occurs, each step along the way being performed in essentially the same way for each object it encounters. At the end of the assembly line, a new object (or objects) appears, fashioned from the original that entered at the other end.

Consider, for example, a scientific analysis of images from the telescope. Each image is essentially the same as any other, differing only in detail. A sequence of steps is performed on each image: for example, detrending, object detection, and astrometry. As the analysis proceeds, each step performs a transformation on its input object, creating a new, slightly modified object. The result of the analysis step may create a new data file, or they may simply alter some portion of the file provided.

Because each step in the analysis is identical for every image, there is a regularity to the commands which are executed. For example, consider an input science image called image01.fits . The series of commands needed to arrive at the final results might consist of:

 detrend image01.fits image01.flt
 detect  image01.flt  image01.dat
 astrom  image01.dat  image01.astro

Each step in this analysis sequence produces a new file, and each file is only different in a portion of the filename. If the input image had been called image02.fits, we could easily reconstruct the above sequence of commands, replacing image01 for image02.

We developed the program `elixir' to make it very easy to define a sequence of steps which are performed identically for a sequence of similar objects in a list. Elixir is very flexible: the sequence of operations for a given implementation are defined in a simple text configuration file, and can consist of essentially any operations which fit into the assembly-line model. Each implementation can be drastically different, and need not operated just on images, but any anslyis sequence. We call a specific implementation an `elixir', and usually give the elixir a name relevant to its task.

Because of the assembly-line nature of the elixir model, it is easy to extend the concept from one computer to a network of computers. Not only does `elixir' construct the commands as needed, it also executes those commands on any of a collection of computers, keeping track of which computers are currently being used and which are currently free.

Figure 1: Elixir node

List-based command generation

To visualize the elixir process, consider each of the analysis steps as a node, as shown in Figure 1. Each node has three queues, an input queue of pending objects, a 'success' queue, and a 'failure' queue. Each node represents a single command, such as 'detrend' in the example above. The node receives as input a word or a set of words which define the object being analysed. Associated with each node is a set of rules to generate the complete command, including possible variations on the input words, as necessary. In the example above, the input list might simply consist of the input image names, image01.fits, image02.fits , etc. The rule for the `detrend' node would construct the command detrend image01.fits image01.flt.

Each node in Elixir uses its rule to generate a command, such as detrend image01.fits image01.flt and start its execution. When this command is finished being executed, the node passes the associated words to either the `success' or the `failure' queues, based on the command's exit status. Each success and failure queues are connected in turn to the pending queue of another node. In this way, a successful `detrend' command for image01.fits would pass the word image01.fits to its success queue, which is in turn connected to the pending queue of the `detect' node. There is a special node called 'global' which represents the entire elixir process. The pending, success, and failure queues of the global node represent the global input and output of the elixir analysis. Thus, objects are introduced into the process by placing them in the global input queue. The objects are finished when they reach either the global success or failure queues (having a status for the entire process of 'success' or 'failure').

In order for a node to execute a command, it needs to be given a machine on which to run the command. Elixir maintains a pool of available computers. When a node is ready to execute a command, it requests a machine from the pool of free computers. If none are free, it waits for a while then tries again. If a computer is free, the node receives control of the machine, and starts the execution of its process. The node then monitors the output from the machine, saving all messages in a log file specific to the current object. Eventually, the process finishes, and the node passes the object to the appropriate output queue. At this point, it returns the machine to the pool of free machines, and grabs the next object from its pending queue, if any. This process limits the number of executing processes to the number of available machines. This is a very simple, but very reliable way of balancing the load on each machine. As long as none of the analysis steps individually overloads the machine resources, each machine will be used at an appropriate level.

The machines which are available to the Elixir process are simply listed in the elixir configuration file. This makes it easy to change the collection of machines used for a given elixir implementation. In practice, if the analysis steps for a particular elixir underload each of the machines available to it, multiple entries for each machine can be included in the configuration list, increasing the demands on the individual machines.

elixir command example
Figure 2: Example elixir command system

Elixir configuration

The Elixir configuration file makes if very easy to define a sequence of commands and the rules for generating the command line arguments. In the configuration file, most lines consist of keyword / value pairs. The keyword can consist of any non-white space. The value may contain any ASCII characters at all, including spaces. If there are multiple entries of the same keyword, the last one provides the value. Keywords defined in the configuration file may be referred to elsewhere by appending a dollar sign: $KEYWORD. The entire configuration file is loaded before variables such as this are parsed, so order is not important.

In addition to the keyword / value entries, there are a limited number of other possible entries. Lines with a leading hash mark (#) are commented out. A line beginning with the word 'input' is a command to load additional configuration information from another file. In this special case, variables used to define the filename must already exist in the configuration file loaded so far. These details about the configuration file are relevant to other programs in addition to `elixir'.

Finally, we come to the portion of the configuration file which defines the components of the specific elixir implementation. Each node in the elixir process is defined by a block which looks like this:

 process detrend
 detrend.arg      0        detrend
 detrend.arg      5 %s     &0
 detrend.arg      0 %s.flt BASE(&0) 
 detrend.success           detect
 detrend.failure	          global

The first line here, process detrend , defines the node. The next three lines, beginning with detrend.arg , define the syntax of the command associated with that node. The first entry is the program, the next two lines generate the first and second argument to the program. The executed command ( detrend ) and the name of the node are not required to be the same. It is particularly important to note that the order of these lines in the configuration file matters: the order of the command-line arguments is the order of the lines in the configuration file.

These lines define how to generate the arguments by providing a C-style format statement and a set of arguments to the format statement. Let us first consider the input list. The input list to Elixir describes a set of objects to be analysed. In our example above, these consisted of the names of the raw science images:

 image01.fits
 image02.fits
 image03.fits

and so on. In this example, there is only one word on each line to describe the objects. It is possible to have multiple words to describe a single object. For example, to define a specific CCD image in both MEF & SPLIT style images, one could use lines like:

 image01.fits X SPLIT
 image.fits 00 MEF
 image.fits 01 MEF
 image.fits 02 MEF
 image03.fits X SPLIT

In this example, the first word is a file name, the second defines the CCD for MEF images (and is ignored for SPLIT), the third identifies the image type, SPLIT vs MEF. In the elixir configuration file, the first word on each line is identified by &0, the second by &1, etc. Thus, in the process detrend example above, the line which reads detrend.arg 5 %s &0 constructs an argument which just consists of the first word for each line.

The following line shows the use of the filename manipulation functions available within the Elixir configuration system. These functions are only available when defining the Elixir node command line arguments. In this example, detrend.arg 0 %s.flt BASE(&0), the argument is constructed by taking the first word (&0), stripping off the path and extension from the file name (ie, taking /path/file.ext and returning 'file'), and appending the string '.flt' to that word. This de-construction of the filename is provided by the BASE function. Other string manipulation functions are available, mostly functions relevant to manipulating file names. Some of the available functions are:

BASE - return file basename
EXT - return file extension
PATH - return file pathname

Other functions will be added as they become necessary.

These configuration formatting lines are somewhat limited compared with standard C formatting commands: There can only be one word defined by the format (ie., no white space is allowed). The only formatting command currently allowed is %s.

The only remaining portion of these format lines which has not yet been described is the number in the second position. The number in the second space on each line represents part of the flow control process. A positive definite number here says that the word formed by this line is a filename which must exist before the process should be run. The number tells how many seconds elixir should wait for the object to appear before giving up on this object. This timeout is provided for various possible uses. One basic use is to avoid NFS latency problems. It is not unusual under NFS for a file created on one machine not to appear on a cross-mounted disk on a second machine for some tenths of seconds or so.

The final two lines which define this elixir node are used to make the connection between the node and other nodes. The first, detrend.success detect, tells elixir to connect the success queue of the detrend process to the pending queue of the `detect' process. The second line, detrend.failure global , tells elixir to connect the failure queue of this process to the global failure queue. Similarly, the success queue of the last node should be connected to the global success queue.

There is also a set of configuration entries which define global concepts for the elixir, including the actions of the global input / output queue. The required entries of this type of listed here:

 global.success	/path/analysis.success
 global.failure	/path/analysis.failure
 global.source	/path/analysis.source
 global.msg	/path/analysis.msg
 global.end	/path/analysis.end
 global.Nargs	3
 global.logfile  0 %s.log  BASE(&0)
 global.pending	detrend
 global.timeout  1200.0

The first two entries, global.success and global.failure define the output files for the globals success and failure queues. When an object lands on either of these queues, the object and the exit status are written to the named file, and the object is marked as being completed. The entry global.pending defines the first node in the process. global.timeout defines a timeout period in seconds: if a process provides not output for this much time, elixir decides that it has hung and kills it. The entry global.logfile defines the rules for generating a log file name for each object, using a syntax identical to the command-line argument generation lines. global.end is a file to which elixir writes a collection of processing statistics after a complete analysis is done. global.msg provides a way for elixir to communicate with other processes. Finally, we leave the description of global.source until after we describe how objects are passed to Elixir. Elixir Communication Issues

The input to an elixir process is a list consisting of lines with words which define the specific objects, such as a list of image names. There are two ways in which elixir may be presented such a list. In the simple case, when the program is started, a filename is given as a command line argument. This file contain a complete list of names for the elixir run. The lines from this list are loaded and passed to the global.pending queue, which in turn passes them to the first node. Once the elixir process has finished with all of the entries in the file, it exits, and the process is complete.

The other possibility is to use a mechanism we call an input FIFO. In our implementaton, a FIFO is not a standard UNIX fifo-type special file. We have avoided the use of true fifos for two reasons. First, it would be necessary for a special file to be created for each implementation of elixir (by root), which makes it a non-trival task to create a new elixir implementation. Second, the data in a standard UNIX fifo is ephemeral. If the programs on either end of the fifo are not running, there is no guarantee that the data will remain (the situation with socketed connections is even worse). Instead, we have chosen to implement a type of FIFO by using a normal UNIX file which one program writes to and another reads the data from. To avoid conflicts between programs, we simply lock the file each time we are accessing it. Messages are passed to Elixir using this mechanism, and so are the lines which define new objects.

If elixir is not invoked with a list file on the command line, it instead monitors the file defined by global.source . Anytime elixir looks for object entries in this file, the file is locked first. Then, when the lines have been loaded, the file is cleared. In this way, a second program can add lines to this file at arbitrary times without danger of losing any entries, whether or not the particular elixir is already running. When the external program writes to the file, it also locks it, so elixir will not load the entries before it is ready. Then, when it has written the entries, the file is unlocked, ready for elixir to grab the new list of names.

In this mode, elixir runs continuously, waiting for more entries in the FIFO file. A mechanism to end the elixir run is made possible by having elixir recognized a special word in the FIFO. If elixir encounters the word EOF, it will no longer accept input from the FIFO, and will exit when all of the entries it has already loaded have been processed. The same goal can be accomplished by passing the elixir a message saying 'STOP' via the message FIFO.

Elixir runs like a daemon, in the background with no output directly to the screen (except for serious errors). It is possible to monitor the progress of the elixir run by communicating through the message FIFO. Elixir monitors the message fifo for several specific requests, and responds to them as they arrive. Typically a request will include a command, such as STATUS and a filename, where the requested information should be placed. Other messages include 'STOP' (halt processing and exit when all objects have been processed), 'KILL' (halt all processing immediately), and 'TIME' (provide a set of statistics on process times).

Elixir uses either the rsh or ssh commands to make the connections to the remote machines. All machines are treated as remote, even the machine on which elixir is run. Elixir forks off a process which logs into the remote machine and starts a new shell (csh). The STDIN, STDOUT, and STDERR connections are used for communication between the remote shell and elixir. Programs are started by executing the command in the shell.

The configuration file used by Elixir may also contain configuration information for a variety of other programs. Elixir saves a copy of the complete configuration and passes this filename to those programs which can interpret it with the PTOLEMY environment variable. This name tells the Elixir programs what configuration file to load. By specifying this value as an environment variable, it will override other choices so that elixir can guarantee that the programs it launches have a consitent, known set of configuration choices.

The other unique feature in the command line aids the process control. Every command is followed by an echo of the words "PROCESS DONE". As the programs run, elixir monitors the output stream looking for three special phrases. One is this 'PROCESS DONE' phrase. Another is the word "SUCCESS" and the third is the word "ERROR". By looking for combinations of these words, elixir can determine if the program ended successfully, failed, or if the computer crashed. To interact well with Elixir, programs should send the correct word "SUCCESS" or "ERROR" on exit. If necessary, one can wrap the program in a shell which monitors the exit status and sends the correct word. If this is not possible, it is always possible to assume the process always ends successfully and include these words as an echo on the command line.

A short note about our use of filelocks . We have implemented a somewhat complex type of file lock mechanism. We do not want to use a standard NFS implemented lock, particularly for the locks on our database files. The problem the standard filelocking mechanism under UNIX/NFS is that the lock only exists if the program holding the lock is running. This is insufficient to maintain data integrity. Consider a database which consists of several distinct files. A particular write operation to the database may need to manipulate more than one file, and it needs to be sure those files are not also changed by another program in the meantime. To avoid this, it should set a lock (on the files or on the whole database). Then it should write to the files, then it should clear the lock. Consider, though, what would happen in the program were to crash or be killed after writing the first file, but before writing the second. The data in the two would be inconsistent. At that point, if a second program tried to write to the database, it could easily corrupt the data, making it rather difficult to correct the problem. With the standard UNIX/NFS locks, the lock is ephemeral - it only exists as long as the program is running (and certainly only while the machine is up). In this example, it is easy to imagine that, by the time the second program comes along, the files are out of sync but the lock is no longer set. We wanted a lock that would be guaranteed to exist until it was actively cleared, either by the program in its usual way, or by a person (or program) which has found the problem and fixed it.

To implement such a lock, we create a lock in two stages. The file that is being locked (ie, filename) is associated with the lockfile of the form .filename.lck. The lockfile is locked with NFS style locks. Then the word BUSY is written to the lockfile. At this point the NFS lock doesn't matter, and can be cleared or left. Even if the program dies, the word BUSY remains to prevent another program from taking the lock. When it is time to clear the lock, the file is either deleted or the word IDLE is written to it.