Often when you can ping a machine, but cannot telnet in or finger a user
(due to an immediately dropped connection, or a connection that is
accepted but no result is ever output, but usually not due to
"Connection refused"), that's a matter of one of the following having
gone wrong:
All of VM being fully used up, which messes up fork+exec
A hard disk or filesystem having problems preventing a coherent view of files to fork+exec. This can
be a matter of what's on the disk being bad, or a memory-resident data structure in the kernel having
become corrupted.
A good way to test for these conditions (not flawless, but decent) is to
try talking to services that are not always memory resident, and some
services that are always memory resident, and comparing the
results. Not that there are two kinds of "memory" of interest here:
physical memory, and virtual memory. The general flow of
this procedure is:
Connect to some services that are normally not always memory resident
Connect to some services that are normally always memory resident
Compare the results: if the "not normally always memory resident services" have problems, but the
"normally always memory resident" services do not, then you probably have one of the two problems
identified above: all of VM is full, or a disk or filesystem is having problems.
For example, your typical "nowait" inetd-launched service should be something that always forks+exec's for each
service request, hence they are good for testing services that are not always memory resident. Examples of these
kinds of tests are:
telnet hostname
finger @hostname
For examples of things that aren't fork+exec'd for each service request, usually sendmail can give a banner without
forking and exec'ing (unless you've hung it off of inetd, which is uncommon), and also usually (always?) rpcbind
and/or portmap will be virtual memory resident without any fork+exec'ing, so they make good test cases for
something that is always memory resident. Please note that sendmail will commonly (always?) fork+exec to actually
process a mail message, so it's best not to do the whole "HELO/RCPT TO/MAIL FROM/DATA/." thing, and just stick to
"quit". Examples of such tests are:
telnet hostname 25 (followed by "quit" to get out)
rpcinfo -p hostname
If the fork+exec'ing services (telnet, finger) are getting stuck or dropping connections
right away, but not giving "Connection refused" (which is usually indicative of a daemon that isn't configured to start up via inetd,
an inetd that has died, or in rare cases a SYN flood attack), but the always-memory-resident services aren't (sendmail to an extent,
rpcbind/portmap), you probably have a failure to fork+exec on your system, which often goes back to
reasons #1 and #2 described above.
Another informative test, is of course to install fallback-reboot, and see if it brings up a banner OK with something
like:
telnet hostname 3002
fallback-reboot mlockall()'s (or the equivalent) itself into physical memory, not
virtual memory, so checking it for a banner is a little better test than
checking sendmail and rpcbind/portmap, because sendmail or
rpcbind/portmap are more likely to have trouble due to being unable to demand
page or swap in part of themselves from the disk-based part of virtual
memory (actually it's the kernel that does this on their behalf, but you get the idea). Now of course, if your kernel has
pageable regions, and mlockall() (or similar) doesn't lock those down too (I doubt it will, but I cannot rule out the
possibility), then you could still get a fallback-reboot that doesn't bring up a banner due to the kernel failing to
demand page from swap space.