Difference between revisions of "SnapRAID / MergerFS"
From Briki
(→Identifying/Fixing a bad block) |
(→Identifying/Fixing a bad block) |
||
Line 92: | Line 92: | ||
* Check that the number of failing sectors has decreased: | * Check that the number of failing sectors has decreased: | ||
<pre> | <pre> | ||
− | smartctl | + | smartctl -a /dev/sdX | grep Pending |
</pre> | </pre> | ||
Latest revision as of 09:00, 26 April 2022
Contents
SnapRAID / MergerFS
Setup
https://zackreed.me/setting-up-snapraid-on-ubuntu/
Note that if (like me) you use a dedicated snapraid content directory then you'll need to create that by hand for each disk with:
mkdir /mnt/data/disk1/.snapraid
Partitioning a new data disk
Note: "-m 2" here reserves 2% of the filesystem for root-owned files (eg. .../.snapraid/content)
sudo parted -a optimal -s /dev/sdX -- mklabel gpt mkpart primary 0% 100% sudo mkfs.ext4 -m 2 -T largefile4 /dev/sdX1
Partitioning a new parity disk
Note: "-m 0" here reserves 0% of the filesystem, ensuring that the parity disks are slightly larger than the data disks
sudo parted -a optimal -s /dev/sdX -- mklabel gpt mkpart primary 0% 100% sudo mkfs.ext4 -m 0 -T largefile4 /dev/sdX1
Adding a new data disk to mergerfs
From: https://zackreed.me/mergerfs-neat-tricks/ From within the root of the mergerfs filesystem (eg. /srv)
xattr -w user.mergerfs.srcmounts '+>/mnt/data/disk4/srv' .mergerfs
Removing a data disk from mergerfs
From within the root of the mergerfs filesystem (eg. /srv)
xattr -w user.mergerfs.srcmounts '-/mnt/data/disk4/srv' .mergerfs
Forcing a resync
sudo snapraid sync
Identifying/Fixing a bad block
- Run ddrescue to identify the failing bytes on the disk:
ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdX /dev/null sdX.map
- ddrescue will write out a map file containing start positions and sizes (all in bytes) of good (+) and bad (-) byte ranges
- Get the block size for the volume with
tune2fs -l /dev/sdXY | grep "Block size"
(we'll call this B). In my case this is 4096. - Get the sector size for the disk with
fdisk -l /dev/sdX | grep Units
(we'll call this S). In my case this is 512. - Identify the starting sector for the /dev/sdXY volume (eg. /dev/sda1) with
fdisk -l /dev/sdX
(we'll call this T). In my case this is 2048. - Run ddrescuelog to list out the bad block locations (using the block size B):
ddrescuelog -b B --list-blocks=- sdX.map
- For each of these, convert to a block location in the volume (rather than the disk) by subtracting
(T * S / B)
. In my case that's 256. Let's call each of these bad volume blocks BB - Start debugfs for the volume with
debugfs /dev/sdXY
- For each BB run:
testb BB
- If debugfs returns "Block BB not in use", then that block isn't part of a file and can safely be overwritten. If it returns an inode number (we'll call it I) then you can convert that inode number to a file path with:
ncheck I
For any unused blocks, we need to do the following:
- Check with dd that we’ve got the right block IDs. For each one of these reads we expect to see an error (and “0+0 records in”):
for block in `ddrescuelog -b B --list-blocks=- sdX.map` do dd if=/dev/sdX of=/dev/null count=1 bs=B skip=$block done
- For each of the bad blocks, write zeros over the block to force it to be reallocated from spare space on the drive. Be careful here – getting it wrong will destroy data! Also note that when reading, “skip” is used to position the input stream, but here “seek” is used to position the output stream:
for block in `ddrescuelog -b B --list-blocks=- sdX.map` do dd if=/dev/zero of=/dev/sdX count=1 bs=B seek=$block done
- It’s possible that dd will fail to write to the block, in which case try again with hdparm:
- First check that we’ve got the right sectors (we expect to see “SG_IO: bad/missing sense data” for each sector on stderr, so we pipe stdout to /dev/null to avoid noise). Note that we use S as the block size for ddrescuelog, since hdparm deals in sectors:
for block in `ddrescuelog -b S --list-blocks=- sdX.map` do hdparm --read-sector $block /dev/sdX > /dev/null done
- Assuming we’ve seen the expected errors, write zeros over each of the bad sectors. Note that we use S as the block size for ddrescuelog, since hdparm deals in sectors. Be careful here – getting it wrong will destroy data! You may be asked to add a “—yes-i-know-what-i-am-doing” flag.
for block in `ddrescuelog -b S --list-blocks=- sdX.map` do hdparm --write-sector $block /dev/sdX done
- Check that the number of failing sectors has decreased:
smartctl -a /dev/sdX | grep Pending
For any files reported by `debugfs`:
- Write random data over the bad file, which should force the pending sector to be marked bad and reallocated from spare space on the disk:
shred –v /path/to/bad/file
- Check that the number of failing sectors has decreased:
smartctl –a /dev/sdX | grep Pending
- Assuming the number of pending sectors has decreased, it’s then ok to delete the bad file:
rm /path/to/bad/file
More details: