Definitions of some terms that describe network performance.
Analogies to human transportation vehicles like cars or busses are in this color.
Bandwidth is a measure of how many units of data can pass
between two machines on a network per unit time. Examples include
"megabits per second" and "gigabytes per hour". It is analogous to how rapidly cars can pass through an
intersection in a given
unit of time.
Latency is a measure of how long it takes for a single
packet to get from machine A to machine B. It is
somewhat analogous to how long it takes the first vehicle at a red
light to
accelerate back up to full speed, from a full stop. The more
"send a packet, wait for an acknowledgement packet, repeat as
needed" your protocol uses, the more latency will slow things
down. NFS is a protocol that has high latency (at least until it
becomes intent-based or similar). High network latency is
somewhat analogous to a street
light that changes color frequently, which means more time spent
waiting for vehicles to accelerate.
MTU is the largest size of a packet as it passes across a
single network link. It is somewhat analogous to
the number of passengers in a vehicle: motorcycles, cars, busses,
planes...
Path MTU is the largest size of a packet as it passes
across all network links between two machines on a network. It is
somewhat analogous to the minimum of the number of passengers
on someone's commute. For example, if you start out by taking a
van pool to a trainstation, and from the final trainstation taking
a bus to your office, than the "Path MTU" might be the maximum
number of passengers possible in the van pool - because it would
typically have the smallest maximum number of passengers.
Jumbo frames are larger MTU's (and perhaps Path
MTU's as well) beyond what the traditional ethernet standard
indicates
should be used. You might think of Jumbo Frames
as being kind of like a train or an ocean liner
Block size is the size of packets as they are written by
an application and passed into an operating system's network stack.
The operating system is then free to chop up your blocks into smaller
pieces, to shrink them down to the Path MTU, or to aggregate your
packets into fewer, larger packets to bring them up to the Path MTU.
(Older systems may use the plain MTU instead of the Path MTU, and let
network equipment further along the path sort out MTU changes as
needed, but this is often going to be slower). Block size is a little bit like how
quickly people can board a vehicle. For example, you might have a
car that allows 2 or 4 people to get in at a time, or a bus that
only allows one person to get in at a time. So even though the
bus can transport more people at a time, people board it more
slowly. So if you have a super mag-lev train that goes ultra
fast, the trip might still be slow if only one person can board or
exit the train at a time. Some operating
systems will be prone to only uses packets, the data portion of
which will be at miximum the same as your block size - hence an
application that writes lots of tiny packets may not be able to
make good use of larger (Path)
MTU's. Ironically, protocols that perform well with standard
ethernet frames, may suddenly appear to be poor performers with
jumbo frames, if the application does not know how to write blocks
that are larger than the size of the data portion of a standard
ethernet frame. For example, with standard ethernet frames,
sometimes rsh will outperform NFS. However, NFS is sometimes
better able to make use of jumbo frames, so may be faster than rsh
sometimes on networks with jumbo frames.
The Nagel Algorithm is used on TCP sockets to improve performance,
but it won't always be faster. It only pertains to small packets.
When Nagel is enabled, your small TCP packets won't be sent ASAP
after your application passes them into the kernel's TCP handling.
Instead, if the packet is small, the kernel will dump your data into a
buffer, and then wait a little while to see if more is
coming. Then, after either enough data arrives in that buffer (padding
out an MSS worth of data) or enough time passes without more data
arriving, then your data will be sent to its destination. You can think of this as being a bit like a stop light that
has a sensor to tell it how many cars are waiting, and is optimized to
wait a little bit longer before going green if there aren't that many cars
waiting for it from that direction.
Assessing network performance
Reliability
Everyone's first network analysis tool: ping. It comes with just about every network
operating system conceived.
traceroute is also fairly common, and can often show the first
hop having problems between two machines.
pchar (for more on pchar, see the section on pchar under
"Bandwidth", below)
mtr and xmtr are very nice for determining which link in a
multi-router connection are dropping packets, or are unpingable, but
I believe that pchar does this, plus much more. However, mtr and
xmtr are more likely to come with a linux distribution.
Bandwidth measurement
Something to report bandwidth, hop by hop, can be very useful, but
these tools are a bit heuristic, so don't trust them overmuch.
However, when such tools are working as intended, they can give you the
bandwidth of each hop between two machines, across (potentially)
multiple routers (IE, if there are 2 endpoints and n-2 routers,
then such a tool should give you information about all n-1 links, as
well as summary information).
pathchar, the original, by Van Jacobsen. Please mention it if you
determine otherwise, but it appears that pathchar used to be available
in source form, but now you can only get precompiled binaries for a few
platforms.
. pchar is an easy build on linux and solaris, and
perhaps other *ix's as well. So far, it's worked well on most of
the networks I've tried it on, but not the optiputer.
You can find patches for pchar to get it building on a modern (or archaic) linux
here.
From the author of pchar:
If you Google for papers and/or software by Constantinos Dovrolis or
Allen Downey, I believe they both have some alternatives, as well as
a more coherent explanation of what problems you're probably seeing.
What I dug up in response so far: This
is an excellent link about a handful of pchar-like programs.
pipechar - this one seems to work well on our "optiputer" network,
but it also seems to be binary-only.
clink - Source is available, but it seems very linux-specific, at least in its current
incarnation
Another excellent
URL. "bing" looked especially interesting to my eye.
iperf is very nice, but tends to lean toward the "theoretical
result" side - IE, it'll often be quite a lot faster than
real-world performance. To use it, compile two iperf binaries,
then run "iperf -s" on one machine A, and "iperf -c A" on the
other. You can use it multiple times, or increase the run
length.
ntop can give a breakdown of what sorts of traffic are on the
LAN adjacent to the machine that ntop is running on.
My "reblock" program can measure
real-world performance pretty well, especially if you give
it a large block size:
My "pnetcat" program can be used for testing
network performance with varied blocksizes and window sizes, via TCP or
UDP. Because it's in python, it's extremely easy to tweak for just the
right sort of testing your specifics require.
Latency
ping, mtr and pchar can all measure latency in some
sense, but the amount that latency actually matters is
very protocol-specific.
Protocols that are not very subject to latency slowdown:
Imagine a protocol that is able to stream large packets of
data and only expect confirmation packets when there is an
error.
Although not an ethernet or IP protocol, zmodem (which is
for serial transmission) is a good example of a
protocol that is not slowed down much by high latencies. It
has very little "back and forth" unless there are errors.
VNC is another example of a protocol
that is not very subject to slowdown due to high latency - at
least, not relative to raw X11 on a high latency network. This
is because VNC pre-renders the highly-back-and-forth X11
traffic down to a simple, in-memory bitmap, and then only
passes the
portions of that bitmap that have changed in a series of
bitmap-rectangles via the network. VNC's help due to
reducing the impact of latency can be somewhat "toolkit"
specific. For example, VNC seems to
pay off for motif applications (like old versions of
netscape), while simple GTK+ applications appear to be less
subject to latency issues.
Protocols that are slowed down by latency a lot:
Imagine a protocol that sends a huge number of tiny packets and
waits until receiving an acknowledgement packet before sending
the next tiny packet.
NFS with a small rsize and wsize is a good example of a
protocol that will be slowed down greatly by high latencies.
As mentioned above, raw X11 can be very slow when used over
a high latency network link, compared to VNC.
Finding a network link's path MTU
tracepath is a very convenient way of determining the "Path MTU"
between two machines.
A more effective way of finding your Path MTU than tracepath, is
to select some protocol that you want to see using your desired
MTU size, send a lot of data across that protocol, and analyze the
traffic with ethereal, tethereal, tcpdump, snoop, whatever you
prefer. You can expect some of the packets to be well below your
desired Path MTU, but if some of the packets are of the desired
size, then you're at least theoretically able to get the Path MTU
you want. Then there may be application issues.
Improving network performance
First assess where the network bottlenecks are (see above, perhaps
especially under "pchar")
Then measure how well your required application is peforming,
possibly by having it write data to localhost and measuring
throughput (see
above, perhaps especially under "ntop", but "reblock" may be
helpful in some cases as well, particularly if you're working with
pipes)
Is it the application that's slow, or the network?
If pchar (or other measure) is showing much better throughput is
possible than your application is getting, then you likely need to
tune that application
You may be able to improve application performance by
just convincing it to use larger block sizes
Another possibility, particularly if your network
performance is suffering from high latency, might be to
modify the application to use "intent-based" data transfers.
"Intent-based" transfers basically are a series of "do
something; if it works, do the next something", where you
bundle up a bunch of these conditional somethings into a
single, large block. You may not end up actually
performing all of the somethings, which means some data
transferred needlessly in a sense, but this may still allow a
net gain in performance due to latency reduction.
Perhaps especially if you have old equipment involved that
doesn't understand Path MTU discovery, you may find that making
your application write data in integral multiples of the
data-portion of your Path MTU will speed up performance quite a
bit - but some would call this "outdated thinking". Anyway,
if you want to do this, then using a sniffer is probably the
best way to figure out what block size is to use a sniffer like ethereal.
If pchar (or other measure) is showing much worse throughput than
the application you require is getting on the first hop (or
better, compared to using using "localhost" as the
destination), then you probably have a network performance issue
Examine each network hop, and see if you can improve the
slowest hop. You can reapply this procedure iteratively, until
the network isn't slower than the application, or the budget
dictates that further improvements aren't practical.
If you are on a gigabit network (or better), you may be
able to squeeze out better performance by turning on Jumbo
frames. This is sometimes a no-extra-budget performance
boost, especially if your application is already using large
blocks. However, it may turn out that one or more machines
in your network path do not support jumbo frames, in which
case you would need to upgrade those pieces of equipment.
Some nice links from prg on comp.os.linux.networking: