In order for LAM to be started on a remote UNIX machine, several
requirements have to be fulfilled:
The machine must be reachable via the network.
The user must be able to remotely execute on the machine with the
default remote shell program that was chosen when LAM was configured.
This is usually rsh(1), but any remote shell program is acceptable
(such as ssh(1), etc.). Note that remote host permission must be
configured such that the remote shell program will not ask for a
password when a command is invoked on remote host.
The remote user's shell must have a search path that will locate
LAM executables.
The remote shell's startup file must not print anything to standard
error when invoked non-interactively.
If any of these requirements is not met for any machine declared in
<bhost> ,
LAM will not be able to start.
By running
recon
first, the user will be able to quickly identify and correct problems
in the setup that would inhibit LAM from starting.
The local machine where
recon
is invoked must be one of the machines specified in
<bhost> .
The
<bhost>
file is a LAM boot schema written in the host file syntax.
See bhost(5).
Instead of the command line, a boot schema can be specified in
the LAMBHOST environment variable.
Otherwise a default file, bhost.def, is used.
LAM seaches for
<bhost>
first in the local directory and then in the installation directory
under boot/.
Note that the default remote shell command can be overriden at
invocation time with the LAMRSH environment variable. The LAMRSH
environment variable can be set with a new command and optional
command line arguments. For example, the 1.x series of
ssh
clients require the
-x
flag to be specified to suppress standard ssh information from being
sent to the standard error (which would cause
recon
to fail). For example (for the C shell and its derrivates):
recon
tests each machine defined in
<bhost>
by attempting to execute on it the tkill(1) command using its
"pretend" option (no action is taken). This test, if successful,
indicates that all the requirements listed above are met, and thus LAM
can be started on the machine. If the attempt is successful, the next
machine is checked. In case the attempt fails, a descriptive error
message is displayed and
recon
stops unless the
-a
option is used, in which case
recon
continues checking the remaining machines.
If
recon
takes a long time to finish successfully, this will be a good
indication to the user that the LAM system to be started has slow
communication links or heavily loaded machines, and it might be
preferable to exclude or replace some of the machines in the system.