Note: This web page was automatically created from a PalmOS "pedit32" memo.

Searching for lots of hostnames taken from a file in large amount of syslog data


The commands: network-stats-collector-1-strombrg syslog) mkdir /tmp/host-hits network-stats-collector-1-strombrg syslog) pwd /target1/syslog Tue Feb 14 11:44:54 network-stats-collector-1-strombrg syslog) for host in $(cat /tmp/soe-hosts-to-possibly-eliminate-from-sendmail-hiding ); do echo "egrep '$(echo $host | sed -e 's/^/\\</' -e 's/$/\\>/' -e 's/\./\\./g')' > /tmp/host-hits/$host"; done > /tmp/commands network-stats-collector-1-strombrg syslog) cat messages* | reblock -e $[$(du messages* | awk ' { print $1 }' | total)] 65536 300 | mtee -f /tmp/commands
Notes on the above: Yes, reblock will put some nulls at the end of the data to search, but they aren't going to match our egrep patterns anyway :) The reblock is there to get some idea when the searching will be done. Also, you'd think that the machine would get really bogged down by all the context switching between the large number of egrep's, but in practice, the log on nsc-1 plateaued at only about 3.1 when searching for 160 different hostnames concurrently - IE 160 concurrent egrep's. I guess Linux just context switches pretty well, despite the x86 hardware it's running on that doesn't :) Be careful, as you cut and paste these commands, that any ^'s don't get lost. Escaping the .'s means that ang.eng won't match ang@eng. Using \< and \> means that ang.eng.uci.edu won't also match yang.eng.uci.edu, for example. This method made it through all 160 egrep's with the following performance: (estimate: 99.9% 3s) Kbytes: 3593024.0 Mbits/s: 9.3 Gbytes/hr: 4.1 min: 50.0 Tue Feb 14 13:49:15


Back to Dan's palm memos