Memo: 50, exported from J-Pilot on 09/23/04 05:22PM Category: Unfiled This memo was marked as: Not Private ----- Start of Memo ----- Lustre class, second day, 2004-09-10 08:46:25 am sometimes lustre uuid's are hex gibberish (normal), but they decided descriptive text uuid's were easier to work with mds knows size of file when closed, but not when open. mds has zeroconf logs oss does not know filename or file length: only the length of its stripe. Oss does not know how many stripes mds precreates objects on oss's, 0 size. 1.2: ~1000 precreated on mds: state files: last_rcvd, lov_objid can mount ext3's when lustre is shut down to see how lustre stores files mds fs layout last_rcvd, contains last reply data+. all connected client state here. if mds sees this, it knows it needs to replay some transactions pending: rm'd but open files oss fs layout client state: last_rcvd only for uuid on nonfailover setup. no 'last transaction' data lustre inode split acrose mds and oss striping info in 'extended attributes area' of ext3 acl's coming not using page cache on servers, using 'direct i/o' instead oss performs block allocation less tied to ext3 on ost's than mds, but still tied to it file creation: open(O_CREAT) makes inode on mds mds uses preallocated objects for lustre file inodes 5000 creations/sec by 1000 clients in same dir ost's are stateless, no list of in-use files mds has state: does have a list of open files dcache cashes inode. used to have a list of open files (on oss's?), but they eliminated that file deletion client dels on mds, clients also del on oss's' if failure, mds checks its llog (transaction log), sends dels to oss's llogs: small api used for replay of dels, more in future on mds and ost, mostly mds. coda intermezzo lustre 'orphan' in 3 places del failure precreated obj's on ost open and del'd lconf verbose or -n -v saved as llog record rread on 170 min ls: it's a bug in older releases of lustre. glimpse should help. likely need the nfs mods made to a later release of lustre lustre doc says stripes must be a multiple of 16k, the largest page size in common use today (ia64). this preserves capacity for heterogenaity. However, this does not appear to be enforced by lconf, as the doc says striping: which ost computed on clients by lov lfs can set striping on dir or file, size, number. subdirs probably do not inherit that no headers, just concatenated when to stripe better aggregate thruput with mult clients good for big shared files don't stripe to min latency. 1.2: 512k 1.4: 1m 5-6 transfers in flight metadata: intent based, fewer distrib locks, fewer rpc's gather mult ops into one rpc recovery and replay if client locks a file and crashes, lock is released mds goes into recovery mode, only performs recovery, no new transactions. then later does new stuff 1.4 upcalls not needed for failover, but you can still have them failout mode only for ost's -EIO not really recommended when create ost, spec --failover a little overhead added clients hang instead of getting errors troubleshooting messages file can grep for Lustre: and LustreError' 5m circular buffer of lustre logs faster debug perf now, but turning it off is still good for benchmarking sysctl -w portals.debug=0 nothing -1 is everything high 8 bits are subsystem low 24 bits aredebug mask llctl debug_kernel writes lustre log, stdout, file lctl clear clear lustre kernel log debug daemon can constantly flush kernel logs to a file. can file up a file very quickly lctl debug_daemon ... e2fsck with extended attribute patches can be used lfsck. uses e2fsck. lfsck still in development? buffalo.clusterfs.org/com testing lustre iozone echo_client test lower layers of lustre test bandwidth no client fs echo_server leak finder, a perl script lbug: /tmp lustre log, binary, unsorted, lctl to read, similar to debug_kernel they have a tcpdump that knows about portals https://bugzilla.lustre.org/ search for bugs https://wiki.clusterfs.com/lustre/BugFiling tools llctl calls ioctl's initiate recovery lctl device_list not an ioctl comes from devices under /proc lctl --device 6 deactivate lctl --device 6 activate number, name uuid on ost, ignore a failed device for a while lfs find ... dir or file lfs getstripe file lfs setstripe filename size 0 1 lfs find file lfsck, used after dataloss on mds or oss use lfsck on client still need e2fsck scan mds and create mdsdb with mod'd e2fsck scan oss and create ossdb mount lustre fs run lfsck on mtpt using db files e2fsck -f -y -mdsdb /tmp/mdsdb /dev/sdb1 oss run on -mtpt-, feeding mdsdb and ossdb (ostdb?) orphans in lost+found no unaccounted storage some files may have empty objects not sure how to list files with empty objects 1.4 for customers only 2004-09-10 11:18:56 am, 1.2 opensource 1.3, 1.4 beta UCI using. What we have is a branch. It is not going to become 1.4. Nic indicates thatcwhat we're seeing - 170 min ls - is not a glimpse issue, and does still occur in modern lustre. He said it is probably a VFS problem. He also indicated that he thought all versions of lustre had glimpse in them. opteron 3ware 100mB/s our perf is poor llanalyze is the perl script that Robert didn't like that much llanalyze --rpctrace does not appear to work set portals debug to -1 and just grep for RPC. opc ... appears to have either program name or fs function name to join two rpc logs, cat and sort with -t: -k4 (or +3) ldlm prefix on lustre locking functions lconf has kernel debug flags. llanalyze does to, but may be incorrect' lconf -nv -ptldebug rpctrace+page adds bits for... debug_kernel echo client/echo server not that important, mostly for testing new NAL's. /usr/lib/lustre/examples/* ----- End of Memo -----