The
lamboot
tool starts the LAM software on each of the machines specified in the
boot schema,
<bhost> .
The user may wish to first run the recon(1) tool to verify that LAM can
be started.
Starting LAM is a three step procedure. In the first step, hboot(1)
is invoked on each of the specified machines. Then each machine
allocates a dynamic port and communicates it back to
lamboot
which collects them. In the third step,
lamboot
gives each machine the list of machines/ports in order to form a fully
connected topology. If any machine was not able to start, or if a
timeout period expires before the first step completes,
lamboot
invokes wipe(1) to terminate LAM and reports the error.
The remote shell program that is used to invoke commands on remote
hosts is set when LAM is configured. It is typically
rsh ,
but can be set to any value by the person who setup/compiled LAM.
This program can be overridden at
lamboot
invocation time by setting the LAMRSH environment variable to a
suitable remote shell program. For example:
This will force LAM to use the "ssh" client to invoke programs on
remote nodes, and ensure that "ssh" uses the
-x
command line flag (to suppress the ssh 1.x client series standard
information banner that is normally output to the standard error,
which would cause
lamboot
to fail).
The
<bhost>
file is a LAM boot schema written in the host file syntax. See
bhost(5). Instead of the command line, a boot schema can be specified
in the LAMBHOST environment variable. Otherwise a default file,
bhost.def, is used. LAM searches for
<bhost>
first in the local directory and then in the installation directory
under boot/.
In addition,
lamboot
uses a process schema for the individual LAM nodes. A process schema
(see conf(5)) is a description of the processes which constitute the
operating system on a node. In general, the system administrator
maintains this file. It is also possible for the user to customize
the LAM software with a private process schema.
If the
-x
option is given, LAM runs in fault tolerant mode. In this mode, nodes
exchange ``heart beat'' messages periodically to make sure all nodes
are running and the links connecting them are operational. When a
node's heart beats stop, it is declared ``dead'' and all LAM nodes
(and processes) are notified. This allows users to write fault
tolerant applications that can degrade gracefully, or fully recover by
replacing the defunct node with another (see lamgrow(1)). Since this
mode introduces a performance penalty, it is not activated by default.