To troubleshoot an NFS mounting problem (NOT in order!) :

  1. If you're automounting, try static mounting of the same filesystem, to a different mount point, like /mnt/nfs or /mnt. It'll probably give more useful error messages.
  2. Try a sniffer, like ethereal, tethereal, snoop or tcpdump -v. Look for NFS or RPC errors in the sniffer output.
  3. Try truss/strace/par/trace against rpc.mountd. You probably don't want to do this with nfsd - it tends to just sit in kernel space all the time.
  4. Check your logs
  5. Please see this URL If you're experiencing an NFS timeout
  6. Make sure you're exporting to and mounting from an FQDN. Sometimes weird things happen when you use short hostnames.
  7. Try exporting "insecure", in case you have a host checking for a specific port range. Or alternatively, see if you can persuade the host that's not using reserved ports, to use reserved ports - EG, on AIX, this can be done with:
  8. Make sure the user doing the NFS mount isn't in too many groups. If you're in a large number of groups, NFS mounts can fail, seemingly inexplicably. You can usually check this with the "id" command. If it's above some OS-specific threshold (most likely 8, 16 or 32), then NFS may refuse to give a mount due to the large number of groups.
  9. Try unexporting everything, and reexporting.
  10. Try completely shutting down NFS and restarting it
  11. Make sure there isn't a firewall blocking some important traffic. Sometimes even NFS clients will require accepting some incoming traffic, initiated by the server. This command can be very useful for this: It may or may not help to add -p1-65535 to the options.

    I suggest running this on the server against the server, on the server against the client, on the client against the client, and on the client against the server - then compare the results. The runs against the client should be the same, and the runs against the server should be the same. If something is getting blocked over the network that isn't blocked via localhost, then you can be pretty assured that there's a firewall or something (network problem?) blocking some traffic.

    You can expect the server to have greater RPC service requirements than the client. The client, if it is also an NFS server, may have the same RPC services registered, but usually NFS will actually use a proper subset of the RPC services on an NFS server (may even be a set of size 0 :).

  12. If you're automounting, and you have static mounting working, there are two scenarios to consider:
    1. On systems that have both automount and automountd programs, automountd is the daemon, and automount is a program that is supposed to make automountd notice changes in its maps.
    2. On systems that only have an automount program, automount is the daemon, and you need to kill and restart it (without using the -9 signal!) to make it see changes.
  13. Are all of the relevant daemons running? You probably want something like the following in rpcinfo -p:
       program vers proto   port
        100000    2   tcp    111  portmapper
        100000    2   udp    111  portmapper
        100021    1   udp  32775  nlockmgr
        100021    3   udp  32775  nlockmgr
        100021    4   udp  32775  nlockmgr
        100021    1   tcp  32768  nlockmgr
        100021    3   tcp  32768  nlockmgr
        100021    4   tcp  32768  nlockmgr
        100024    1   udp  32776  status
        100024    1   tcp  32769  status
        100011    1   udp    671  rquotad
        100011    2   udp    671  rquotad
        100011    1   tcp    690  rquotad
        100011    2   tcp    690  rquotad
        100003    2   udp   2049  nfs
        100003    3   udp   2049  nfs
        100003    2   tcp   2049  nfs
        100003    3   tcp   2049  nfs
        100005    1   udp    693  mountd
        100005    1   tcp    708  mountd
        100005    2   udp    693  mountd
        100005    2   tcp    708  mountd
        100005    3   udp    693  mountd
        100005    3   tcp    708  mountd
    
    (the numbers in the left column are more significant than the names in the right column)

    From there, you can get probably to the daemon names using netstat -ap and/or lsof.

    Make sure that the actual daemon names sound NFS-related; sometimes a non-RPC program will steal a port that rpcbind/portmap thought it could allocate - but couldn't.

    Alternatively, you can just run my rpc-health script - but note that it won't detect missing services, only services that are registered but not responding to a minimal test.

  14. Try the mount with TCP or UDP, whichever you haven't tried already. TCP should be better on long hauls or flakey networks, and UDP should be better on close, reliable networks. But if one isn't working, go ahead and try the other anyway.
  15. Are you using a flakey version of NFS? EG, are both of the systems that cannot communicate via NFS using the still-rough NFSv4 (Wed Feb 23 14:16:34 PST 2005)? IIRC, idmap is indicative of NFSv4 on a Fedora Core 3 system. NFSv4 reportedly worked better in FC2 than it does in FC3, though yum -y update may have changed that by now. It's probably worth it to try at least NFS v2 and v3, and maybe v4 as well.
  16. Try a different blocksize for read and/or write. 8192 is a good number to try, if you haven't yet (most systems default to this). 8192 is -not- always optimal though. Some sun systems used to crash if you used a blocksize of 32768. Also, some linux systems default to 1024, which is a good choice on particularly flakey networks, or when you're stuck with a poor network card.
  17. Can you mount a different filesystem from the NFS server, but not the one you want?
  18. Are there permissions on the -mount-point-, underneath a mounted filesystem, that are confusing matters? I once saw an NFS problem that turned out to be due to this on a SunOS 4.1.x system.
  19. Do you have a firewall that is blocking ICMP packets inappropriately? Some ICMP's are hazardous, but others can be essential to non-flakey network communication.
  20. Check showmount -e nfs.server.com
  21. Check your netgroups, and NIS in general, if you're exporting to netgroups. Also try removing the netgroup export temporarily, and just exporting to the host you need to have access from but isn't working.
  22. If you have a large number of mounts, and suddenly subsequent mounts start failing, and the same thing happens after a reboot, you may be running out of privileged ports.
  23. If you run man for each NFS daemon in turn, do they have an option for cranking up verbosity? If so, and you've gotten this far, you may as well try it. :)
  24. Linux: Try enabling debugging facilities and checking for errors:
  25. Linux: Run this once while the NFS server is working, and then again when the NFS server is having problems:
  26. Post to any and all relevant mailing lists and newsgroups :) Do this sequentially, not in parallel - to keep the people you want help from, from getting annoyed by reading and rereading the same message over and over again unnecessarily. Do not cross post.
  27. Call the relevant vendors :)



32000

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: