- We can get a traditional histogram of some numbers:
$ (seq 10; seq 20; seq 30) | histogram below cmd output started 2020 Wed Oct 14 10:12:25 AM PDT 18 1 <= x < 7 ************************************************************ 16 7 <= x < 13 ***************************************************** 12 13 <= x < 19 **************************************** 8 19 <= x < 25 *************************** 6 25 <= x <= 30 ********************

- We can get a "string histogram" of permissions under /usr/local/cpython-3.7, recursively; this is not traditionally
called a "histogram":
$ find /usr/local/cpython-3.7 -print0 | xargs -0 ls -ld | awk ' { print $1 }' | histogram 2 -r-xr-xr-x 17243 -rw-r--r-- ************************************************************ 650 -rwxr-xr-x ** 1341 drwxr-xr-x ***** 84 lrwxrwxrwx

$ find /usr/local/cpython-3.7 -print0 | xargs -0 ls -ld | awk ' { print $1 }' | sort | uniq -c below cmd output started 2019 Sun Feb 17 06:41:35 PM PST 2 -r-xr-xr-x 15482 -rw-r--r-- 355 -rwxr-xr-x 1018 drwxr-xr-x 84 lrwxrwxrwx

Usage: ./histogram --bin-function specifies the function used to compute the width of the bins EG: --bin-function "square root" specifies the numeric bin widths will be computed using the square root rule --display-width specifies the width of the largest count (for the stars; default: 60.0) --elide elides the stars (overrides --display-width) --suppress-zeros says to eliminate zero counts from the report. Only relevant for numeric histograms --force-string tells this program to treat numbers as strings --help outputs this usage message --percent gives percentages --tabbed says to separate fields with tabs instead of spaces, suitable for piping to expand(1) --input-file fn says to read values from file fn instead of stdin (mostly for debugging this program If all values read from stdin can be converted to decimal, bin widths will be the decile of the range - or other bin width function, as specified. Otherwise, each distinct str read from stdin will be treated as a bin. IOW, if you feed ./histogram lots of numbers, and get one bin for each distinct number, then you've probably fed it one or more values that cannot be converted to decimal.Decimal(). Decile choice is used for numeric binning by default. Valid choices for bin width functions are: square root sturges formula rice rule quartile decile twentyfifth-ile percentile Please note that the numeric binning algorithm is no longer O(n) in the number of bins. It is now a more reasonable O(logn).

Hits: 1076

Timestamp: 2024-02-29 10:26:24 PST

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: