[root@esmfsn05 root]# egrep '^DEVICE|^MAILADDR' /etc/mdadm.conf
DEVICE /dev/sd[ab]1
MAILADDR root@esmft1
mdadm --create /dev/md0 --level=stripe --chunk=4096 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm --examine --scan >> etc/mdadm.conf
[root@esmfsn05 root]# grep md /etc/fstab
/dev/md0 /metadata ext3 defaults 1 3
mkfs -t ext3 /dev/md0
mount /metadata
#!/bin/sh mdadm --assemble --uuid=9cb23d0b:ca6af799:778dca49:a0a9019c /dev/md0
-- Dan Stromberg DCS/NACS/UCI <strombrg@dcs.nac.uci.edu> |
Hi All, FYI, the machine platform is a 2xOpteron, running ubuntu hoary preview (64bit), 4GB RAM, system running off a single IDE drive. The raid drives are running on the on-board 4way Silicon Image SATA controller. The drives are identical 250GB WD SATAs Model: WDC WD2500JD-00G, each partitioned for 232GB on /dev/sdx1 and 1.8GB on /dev/sdx2 (for parallel swap partitions). I'm using the mdadm suite to set them up and control the raid: 1 - create the raid: $ mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 -c128 /dev/sd{a,b,c,d}1 mdadm: layout defaults to left-symmetric mdadm: /dev/sda1 appears to contain a reiserfs file system size = 242220004K mdadm: /dev/sdb1 appears to contain a reiserfs file system size = 242220004K mdadm: /dev/sdc1 appears to contain a reiserfs file system size = 242220004K mdadm: /dev/sdd1 appears to contain a reiserfs file system size = 242220004K mdadm: size set to 242219904K Continue creating array? y mdadm: array /dev/md0 started. 2 - make sure we monitor it. $ nohup mdadm --monitor --mail='hjm@tacgi.com' --delay=300 /dev/md0 & 3 - make the reiserfs on md0 (it was made on the individual partitions before, but apparantly it needs to be made on the virtual device) $ mkreiserfs /dev/md0 4 - # then mount it $ mount -t reiserfs /dev/md0 /r 5- #then admire it $ df Filesystem 1K-blocks Used Available Use% Mounted on ... /dev/md0 726637532 32840 726604692 1% /r so for a raid5 array, we end up with about 78% of the input space (more than I expected) - the rest is lost to the parity info which is striped across all the disks, giving the redundancy. When the raid initialized, mdadm immediately sent me an email warning of a degraded array - this was not welcome news, but it turns out that this is normal - in building the parity checksums, it essentially fakes a dead disk and rebuilds all the parity info. This took about 8 hrs to do for 1 TB, however, the array was available and pretty peppy without waiting for it to finish. And the message did confirm that mdadm was actually monitoring the array. I immediately tried a few cp's to and from it - and on the 'degraded' array, got ~40MB/s to and from the IDE drive on some 100-600MB files. There was not much difference after it finished doing the parity rebuild - possibly it was deferring the parity calculations until afterwards? If anything it's slightly slower now that the parity info is complete - maybe 38-40MB/s. (this measure includes the sync time - with 4GB of RAM, GB files can be buffered to RAM and so appear to be copied in a few sec). On my home 2xPIII system with IDE drives, I only get ~7-8MB/s between drives, so 40MB/s sounds pretty good. Bonnie++ reports (a bunch of confusing #s) but seems to indicate that depending on CPU utilization, type of io, and size of file, disk io will range from ~80MB/s to 24MB/s on the SATA raid. On my old IDE laptop (but with a newer disk), bonnie returns numbers that are surprisingly good - about 1/3 to 1/4 the RAID speed. On the 2xPII home IDE system, bonnie returns numbers that are not much better than the laptop. So there you have it - linux SW SATA raid is pretty easy to set up, can be configured to be reasonably informative via email, is pretty cheap (relative to the true HW raid cards that go for $300-$400 each) and seems to be pretty fast. Long term, I can't say yet. Also note that this is using an md device without any further wrapping with lvm - we just need a huge data space, not much needed in the way of administering different group allocations etc. Would like to hear others' experiences.
To re-incorporate sda1 into the array, use
NeilBrown
> When trying to read sectors from a disk and the disk fails the read: > 1.) Read the data from the other disks in the RAID and > 2.) Overwrite the sectors where the read error occur. Note: this is NOT how current linux softraid code works, it's how it *supposed* to work. And right now, linux raid code kicks a drive out of the array after *any* error (read or write), without trying to "understand" what happened. /mjt
I use RAID-s in 3 layers!
Layer 1 : (RAID5 in 11+1 raw disk in 1 pc) x4 (4 disk nodes)
Layer 2 : RAID1 4x 1/2 nodes (only for ability to backup complete nodes)
Layer 3 : RAID0 to 4 nodes -> 1 big 8TB disk. :-)
I use gnbd for this.
The gnbd sends only small packets, and I think, the too much readahead is
the problem, because the whole 8TB array is phisically stripped to 44 raw
disks, but NOT!
When I disable the readahead, the performace is more worse.
Now I grow the readahead in the raw disk to 1MB, the raid5 to 10MB, the
raid1 to 1MB, and the raid0 to 8MB, and the performance is great now! :-)
You can e-mail the author with questions or comments: