The data recovery is proving very slow going, mostly because some of the lustre systems are crashing very frequently. My script that only retries copying files a user-specified number of times seem to help, but I'm still making so many trips to the machine room that it's taking forever. The initial positive-indication with Francois' data, unfortunately, turned out to be a case of Lustre giving my program a far-too-short list of files to transfer, I believe. I've since modified the program to re- enumerate all files in a directory hierarchy each time it is run, instead of only the first time, in an effort to pick up files that are visible sometimes and not others. I've also modified my "lustre" bash script to greatly reduce the back- and-forth between myself and lustre when rebooting lustre, but it's still slow going, entirely because of the physical need for reboots. Given how slowly things are progressing, I'm ready to suggest that either we give up with what little data we have so far, or that we investigate some form of automatic reboot facility - perhaps something from http://www.cpscom.com/reboot.htm . I am currently estimating that this could cost as little as: 1) A base unit that cycles AC power, costing $115 2) A power strip with enough plugs 7 or more plugs 3) A serial cable 4) A spare PC running linux, possibly esmft2, to act as a "controller" in CPS' parlance. Is there interest in investigating this sort of approach, or should we just drop it? The automation software I'd need is almost entire already done. I must point out, that even with this sort of solution, we may still find that we get only a small fraction of the ESMF data back. For example with Jin Yi's data, this is how far my program has progressed so far: Succeeded: 1420 Failed: 349 1: 94 2: 108 3: 1 4: 3930 5: 39046 The "succeeded" and "failed" numbers are probably pretty apparent. The "1" through "5" mean that there are n files that still need to be retried from 1 to 5 times (on separate, serial runs of the program) before concluding that they are recoverable or not.