reblock

reblock (download here) is a bit like dd, but it appears to work better when transferring data over a network, to a tape drive, in that it makes more of an effort to keep block sizes consistent. Why not just use "tar -b" or "dd obs"? Because tar -b is a producer, and if you write what's produced over a network via a pipe, your blocking may get messed up. As for "dd obs", it doesn't appear to know how to wait for the remainder of a block to arrive. Reblock goes a long way out of its way to ensure that the blocks it outputs are consistently sized, and it doesn't rush to write out data before a complete block has been received.

However, perhaps reblock's more common purpose now, is for getting a running tally of transfer throughput, like in these two examples. The first shows using reblock where filesize is available via fstat, and the second shows reblock usage where filesize info is not available via fstat.

Note: On 2014-08-04 I rewrote reblock for compatibility with CPython 3.x and 2.x, without 2to3 or similar tool. Previously, reblock only worked with CPython 2.x. This change also changed the interface a bit. The interface changes mostly reflect the fact that I haven't used a tape drive in years, and that reblock mostly is used for getting progress out of a pipeline today. It also provides rational defaults for the blocksize and timeout, rather than making you specify them.

$ reblock < /tmp/gevent\ -\ asynchronous\ I_O\ made\ easy-0wpYQr-_kqg.mp4 > /dev/null
blockno 0: 0.3% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 7.5g/s
blockno 1: 0.7% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 3.8g/s
blockno 2: 1.1% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 3.1g/s
blockno 3: 1.5% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 3.4g/s
blockno 4: 1.8% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 3.8g/s
blockno 5: 2.2% of 69737460, ETA: Fri Aug  8 16:28:42 2014, Rate: 4.1g/s
...

$ du -sk .
5909400 .
dstromberg@dstromberg-laptop:~/src/home-svn x86_64-pc-linux-gnu 31522 - above
cmd done 2014 Fri Aug 08 04:30 PM

$ tar cflS - . | reblock -e $((5909400*1024)) > /dev/null
blockno 0: 0.0% of 6051225600, ETA: Fri Aug  8 16:30:51 2014, Rate: 1.8g/s
blockno 1: 0.0% of 6051225600, ETA: Fri Aug  8 17:03:21 2014, Rate: 24.4m/s
blockno 2: 0.0% of 6051225600, ETA: Fri Aug  8 16:58:33 2014, Rate: 28.6m/s
blockno 3: 0.0% of 6051225600, ETA: Fri Aug  8 16:54:32 2014, Rate: 33.4m/s
blockno 4: 0.0% of 6051225600, ETA: Fri Aug  8 16:54:08 2014, Rate: 34.0m/s
blockno 5: 0.0% of 6051225600, ETA: Fri Aug  8 16:54:30 2014, Rate: 33.4m/s
blockno 6: 0.0% of 6051225600, ETA: Fri Aug  8 16:52:35 2014, Rate: 36.3m/s
blockno 7: 0.0% of 6051225600, ETA: Fri Aug  8 16:47:11 2014, Rate: 48.0m/s
blockno 8: 0.0% of 6051225600, ETA: Fri Aug  8 16:46:25 2014, Rate: 50.3m/s
blockno 9: 0.0% of 6051225600, ETA: Fri Aug  8 16:43:20 2014, Rate: 62.3m/s
blockno 10: 0.0% of 6051225600, ETA: Fri Aug  8 16:40:31 2014, Rate: 79.7m/s
...

Please note that you only need to provide an estimate of the transfer size if both of the following are true:

You want an estimate of how much longer the transfer will take

reblock cannot automatically use os.fstat() to determine how large the file you're transferring is. Usually, os.fstat will be able to get the size of a file, but not a pipe.

Usage is like:

usage: ./reblock [-v] [-e size_in_bytes] [-b blocksize_in_bytes] [-t timeout_in_seconds] [-p]
    -v mean verbose
    -e provides a size estimate, for when we cannot stat the file length (EG in a pipe)
    -b provides a block size in bytes
    -t provides a timeout duration in seconds
    -p says to pad the final block with nulls

    size_in_bytes defaults to -1
    blocksize defaults to 262144
    timeout defaults to 180

reblock just writes a bunch of data followed by a carriage return, to give a running tally of how things are going. It will often write more than 80 characters on a line though, so if you see reblock scrolling really fast, you probably need to resize your terminal emulator (xterm, rxvt, gnome-terminal, konsole, &c) to be wide enough to hold the entire line.

Known bugs:

If you control-Z reblock, and leave it that way for a while, the throughput measure will get all messed up, as will the completion time estimate. But the timeout doesn't occur.

Future possibilities:

It'd be nice to be able to specify a file to feed to gzip, bzip2 or xz, in order to get an estimate on an archive's compression ratio... Or even an option that accepts a list of files on stdin, and a percentage of the files to compress for the estimate. We could then use that list of files on stdin, to use as input to tar/gtar/whatever.

This software is owned by The university of California, Irvine, and is not distributed under any version of the GPL. GPL is a fine series of licenses, but the owners of the software need it to be distributed under these terms.

You can download it here.

Related projects:

cpipe
pv - pipe viewer, based on curses
bar
pipemeter
speedometer
gprog - by the same author as reblock; a GTK+ interface

Hits: 10052
Timestamp: 2025-07-28 00:01:51 PDT

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: