Appendix C. Troubleshooting
Many SCSI problems are caused by cabling and (lack of, or inappropriate)
termination. This often results in repeated SCSI bus resets, parity
or CRC errors and sometimes reduced transfer speeds. There is a good
SCSI termination tutorial at this site:
www.scsita.org/aboutscsi/SCSI_Termination_Tutorial.html. There is
other useful SCSI information at that site (see W9).
There is also a SCSI "faq" site (see W10)
that addresses many configuration and troubleshooting issues. Although
the main focus of this site is Windows (and its ASPI interface), much
is relevant to SCSI in Linux and other Unix implementations.
When it looks like something has partially locked up the system, the
ps command can be useful for finding out what may be
causing the problem. The following options may be useful for identifying what
part of the kernel may be causing the problem. This information could be
forwarded to the maintainers.
ps -eo cmd,wchan
ps -eo fname,tty,pid,stat,pcpu,wchan
ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args
|
The most interesting option for finding the location of the "hang" is
"wchan". If this is a kernel address then
ps will
use
/proc/ksyms to find the nearest symbolic
location. The "nwchan" option outputs the numerical address of the "hang".
If the system is not responding to keystrokes, then <Alt+ScrollLock>
in text mode should output a stack trace while <Ctrl+ScrollLock>
should output a list of all processes. If the log is still working, the
output will be sent there as well as appearing on the console.
If the kernel has been built with the CONFIG_MAGIC_SYSRQ, then in text
mode <Alt+SysRq+H> will list available commands. Of these <Alt+SysRq+S>
is useful for doing an emergency sync while <Alt+SysRq+U> will remount
file systems in read only mode. After that <Alt+SysRq+B> to reboot the
machine might be your next move.