reblock (download here) is a bit like dd, but it appears to work better when transferring data over a network, to a tape drive, in that it makes more of an effort to keep block sizes consistent. Why not just use "tar -b" or "dd obs"? Because tar -b is a producer, and if you write what's produced over a network via a pipe, your blocking may get messed up. As for "dd obs", it doesn't appear to know how to wait for the remainder of a block to arrive. Reblock goes a long way out of its way to ensure that the blocks it outputs are consistently sized, and it doesn't rush to write out data before a complete block has been received.

However, perhaps reblock's more common purpose now, is for getting a running tally of transfer throughput, like in these two examples. The first shows using reblock where filesize is available via fstat, and the second shows reblock usage where filesize info is not available via fstat.

Please note that you only need to provide an estimate of the transfer size if both of the following are true:

  1. You want an estimate of how much longer the transfer will take
  2. reblock cannot automatically use os.fstat() to determine how large the file you're transferring is. Usually, os.fstat will be able to get the size of a file, but not a pipe.

Usage is like:

reblock just writes a bunch of data followed by a carriage return, to give a running tally of how things are going. It will often write more than 80 characters on a line though, so if you see reblock scrolling really fast, you probably need to resize your terminal emulator (xterm, rxvt, gnome-terminal, konsole, &c) to be wide enough to hold the entire line.

Known bugs:

  1. Mon Jan 31 17:14:20 PST 2005 reblock doesn't seem to work quite right on a Fedora Core 3 for x86-64 system. I suspect it's probably a bug in the version of python that comes with this distribution. More specifically, the bug appears to be that on large data transfers, reblock gets stuck. The producer process continues writing to reblock, but reblock stops giving statistics, and stops writing data.
  2. If you control-Z reblock, and leave it that way for a while, the throughput measure will get all messed up, as will the completion time estimate. But the timeout doesn't occur.

Future possibilities:

  1. It'd be pretty cool to have a pygtk mode, so that the filter's output isn't scribbled on by reblock... Also, with this kind of presentation, you could get a lot more sophisticated about what sorts of information you present.
  2. It'd be nice to be able to specify a file to feed to gzip or bzip2, in order to get an estimate on an archive's compression ratio... Or even an option that accepts a list of files on stdin, and a percentage of the files to compress for the estimate. We could then use that list of files on stdin, to use as input to tar/gtar/whatever.

  • This software is owned by The university of California, Irvine, and is not distributed under any version of the GPL. GPL is a fine series of licenses, but the owners of the software need it to be distributed under these terms.

    You can download it here.
    And you can get here

    Recent changes:

  • Thu Apr 21 12:47:35 PDT 2005: modified to allow progress status via estimate with byte counts larger than 2^32.
  • Mon May 9 16:14:29 PDT 2005: Added a "-p" flag that tells reblock not to pad the final block with null bytes, even if it wasn't exactly the block size specified by the user in length.

    Related projects:




    2109

    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: