The results yielded by the DCS-written portion of oacstats is relatively well documented at this time, and while I wrote up a previous explanation of how oacstats works, apparently no one thought to save a copy. So here is a behind-the-scenes look at how the DCS-written portion of oacstats works. First off, oacstats is launched, three times a week, by the following root cronjob on bounce2.nac.uci.edu: # Sunday, Tuesday, Friday 0 0 * * 0,2,5 /usr/local/etc/daily-srsh This script first sleeps for a random number of seconds, from 0 to 24 hours - to "jitter" the sampling, to avoid the effect of missing some hosts that are always turned off at the same time every day. Next, this cron job launches 3 "phases" of the DCS-oacstats data collection: 1) srsh-tied 2) dns-tied 3) subnet-tied There is actually a fourth phase, which was only collected once: 4) p0f-tied Now I'll go into some detail about each of these phases, and then wrap up with a summary of how these phases are combined into a variety of views of the data. 1) srsh-tied This phase is run on bounce2.nac.uci.edu itself. The host simply srsh's to each host in the srsh database, and runs /dcslib/allsys/etc/HostInfo, which just outputs a bunch of information describing the machine is was run on. In practice, these hosts tend to be DCS-support, or sometimes (unfortunately) -formerly- DCS supported. This is the highest level of detail we collect on hosts, and is only collected for a relatively small number of hosts. 2) dns-tied In this phase we iterate over /dcslib/allsys/etc/hosts.uci, probing each host in the list (except for a relatively small list of exception hosts, created through people complaining about oacstats probing their machines). We collect things like well known ports, registered RPC services, some of the banners on well known ports, microsoft networking information where enabled, and some OS guessing is performed based on "active IP fingerprinting" - meaning the guessing is performed based on attributes of an IP conversation initiated by the host doing the collection, network-stats-collector-1.nacs.uci.edu. The script that is run once per ucinet host that collects all this "dns-tied" data is ~oacstats/bin/do-dns-tied . 3) subnet-tied In this phase, we query all the routers we can and examine their ethernet address caches. We then attempt to identify the make of the ethernet card (and sometimes, computer vendor) based on the first three octets of these ethernet address in combination with a textual database of vendors. Not all UCI routers allow DCS to collect this data. This data is collected via ~oacstats/subnet-tied-collection/ethers, which is a semi-sophisticated wrapper around /dcs/packages/cmu-snmpd/bin/snmpwalk. 4) p0f-tied This phase was only run once, and we've been making use of that same data ever since. As such, it is not especially trustworthy anymore, but it gives us data that sometimes the other phases do not. This phase used "passive IP fingerprinting" to guess operating systems, meaning the guessing is done based on IP conversations initiated by the host we're trying to guess the OS of. Passive IP fingerprinting can often be more accurate than active IP fingerprinting, hence the interest in this data. This phase was run on a linux/x86 box at the UCI network border. Finally, all the data resulting from these phases are combined together by running ~oacstats/bin/turn-over-data on network-stats-collector-1.nacs.uci.edu via a single-host srsh job from bounce2.nac.uci.edu, from the same cronjob mentioned above, /usr/local/etc/daily-srsh. turn-over-data is the script that merges the above phases into the directories described in nsc-1.nacs.uci.edu:~oacstats/00README and ~oacstats/01README . Please feel free to ask questions about how this works, or even about the value of the data, in person, over the phone, or via e-mail. Thanks.