If two files that are different formats of the same data differ by:
A constant difference: Then one likely is the same data, with some sort of header prepended
A constant quotient: Then one likely is the same data, just converting every n bytes to m bytes, over and over again
length2 == length1 * a + b: Where a and b are constants. In this case we probably have a header, and a constant ratio of databytes.
In this case, as you can see from the table, the differences are not consistent, but the quotients are nearly so. This might be because
they're converting every n bytes to m bytes, but then file lengths are always required to be a multiple of 512... Consequently, it might be worthwhile to do this
same sort of comparison for a really long file, and see if we are asymptotically approaching a constant quotient as the file lengths go to infinity.
The sample data encoding is signed linear (2's complement), unsigned
linear, u-law (logarithmic), A-law (logarithmic), ADPCM, IMA_ADPCM, GSM,
or Floating-point. U-law (actually shorthand for mu-law) and A-law are
the U.S. and international standards for logarithmic telephone sound
compression. When uncompressed u-law has roughly the precision of 14-bit
PCM audio and A-law has roughly the precision of 13-bit PCM audio. A-law
and u-law data is sometimes encoded using a reversed bit-ordering (ie.
MSB becomes LSB). Internally, SoX understands how to work with this
encoding but there is currently no command line option to specify it. If
you need this support then you can use the psuedo file types of ".la"
and ".lu" to inform sox of the encoding. See supported file types for
more information. ADPCM is a form of sound compression that has a good
compromise between good sound quality and fast encoding/decoding time.
It is used for telephone sound compression and places were full fidelity
is not as important. When uncompressed it has roughly the precision of
16-bit PCM audio. Popular version of ADPCM include G.726, MS ADPCM, and
IMA ADPCM. The -a flag has different meanings in different file
handlers. In .wav files it represents MS ADPCM files, in all others it
means G.726 ADPCM. IMA ADPCM is a specific form of ADPCM compression,
slightly simpler and slightly lower fidelity than Microsoft's flavor of
ADPCM. IMA ADPCM is also called DVI ADPCM. GSM is a standard used for
telephone sound compression in European countries and its gaining
popularity because of its quality. It usually is CPU intensive to work
with GSM audio data.