Some tools that should allow you to do this:
  1. strace
  2. truss
  3. par
  4. trace
  5. dtruss
  • Common options
  • Making effective use of the tools, from a sysadmin perspective
    1. The most common thing to do with one of these, is to just run the syscall tracer against a program that is having problems, and look for a pathname that was referenced shortly before an error was written, an exit() was called prematurely, or the program segfaulted, &c.
    2. Another common thing use for a syscall tracer is when you have a program that is getting stuck, and it's not clear why it's getting stuck. You can use the syscall tracer to see what syscall the program is getting stuck on - often this will be an open or stat or similar system call, which is referencing a pathname that is on a timing-out NFS server. Once you have the pathname, you can usually use "mount | grep" to quickly figure out what filesystem is troublesome. Then again, it might be a connect or select or poll, in which case see below...
    3. Getting a little bit more sophisticated now, sometimes a program will get stuck using a numbered socket or numbered filedescriptor. In this case, make a note of the filedescriptor's number (usually a smallish integer), and then under less or vi (vi'll be faster if you're willing to wait for it to load. Less starts up fast, but searches slowly), search backward for that filedescriptor's creation, like: "?= 3$" or similar, assuming the filedescriptor in question was "3". This should get you back to the point where this filedescriptor was most recently created, and you may be able to tell what it is being used for by examining the surrounding code. If it's an open(), you're set. If it's socket-related, see below:
      1. Here's an example of connecting to a nameserver, which is done via UDP usually, as in this example.
        • Some comments on the trace:
          • The "12950" is the process number ("pid") we're looking at.
          • You can tell we're connecting to a nameserver, in two ways:
            1. the htons(53) in the connect() call. The /etc/services file will typically have the following in it:
              • domain 53/tcp # name-domain server
              • domain 53/udp
            2. the address given in the sin_addr parameter to connect(), which has the IP address of one of UCI's main nameservers.
          • You can tell this is UDP, by the SOCK_DGRAM in the socket() call.
          • the poll() is for waiting until the program gets something useful back from the server.
          • If you look in the send() call, you can see the name "dcs.nac.cui.edu" being shipped off to the nameserver for a lookup.
          • In the recvfrom(), we can again see the hostname "dcs.nac.uci.edu" showing up. And, if we were to extend the length of the strings output with strace -s 1024, we'd most likely be able to see dcs.nac.uci.edu's IP address in the string, expressed in octal.
        • Here's the relevant part of the trace
          • 12950 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
          • 12950 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("128.200.1.201")}, 28) = 0
          • 12950 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
          • 12950 fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
          • 12950 gettimeofday({1110855013, 217602}, NULL) = 0
          • 12950 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
          • 12950 send(4, "=\205\1\0\0\1\0\0\0\0\0\0\3dcs\3nac\3uci\3edu\0\0\1\0"..., 33, 0) = 33
          • 12950 poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
          • 12950 ioctl(4, FIONREAD, [181]) = 0
          • 12950 recvfrom(4, "=\205\205\200\0\1\0\1\0\3\0\4\3dcs\3nac\3uci\3edu\0\0\1"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("128.200.1.201")}, [16]) = 181
          • 12950 close(4) = 0
      2. Here's an example of a TCP socket.
        • Comments on the trace
          • In this trace, there was a large number of more or less irrelevant system calls between the socket() and connect() calls. Hence you can see why it might be helpful to search backward for the place where filedescriptor 3 was created.
          • We can tell that the socket is TCP, by the SOCK_STREAM in the socket() call.
          • You can see from the connect() call, that we're trying to connect to (and succeeding in connecting to, based on the return value of 0) port TCP/3002, on IP address 128.200.34.32, AKA dcs.nac.uci.edu.
        • The trace itself
          • 12950 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
          • ...large gap...
          • 12950 connect(3, {sa_family=AF_INET, sin_port=htons(3002), sin_addr=inet_addr("128.200.34.32")}, 16) = 0
          • 12950 recv(3, "This is fallback-reboot 0.99; cr"..., 4096, 0) = 111
  • Applying this to Apache httpd
  • Some quick notes on terminology:
  • Sort of similar things to possibly document in the future:
    1. ltrace/sotruss
    2. Dynamic Probes/Linux Trace Toolkit/DTrace



    Hits: 24325
    Timestamp: 2024-03-28 07:18:22 PDT

    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: