Difference between revisions of "SnapRAID / MergerFS"

From Briki
Jump to: navigation, search
(Identifying/Fixing a bad block)
(Identifying/Fixing a bad block)
Line 58: Line 58:
 
* Check with dd that we’ve got the right block IDs. For each one of these reads we expect to see an error (and “0+0 records in”):
 
* Check with dd that we’ve got the right block IDs. For each one of these reads we expect to see an error (and “0+0 records in”):
 
<pre>
 
<pre>
for block in `ddrescuelog –list-blocks=- sdX.map`
+
for block in `ddrescuelog -b B --list-blocks=- sdX.map`
 
do
 
do
   dd if=/dev/sdX of=/dev/null count=1 bs=512 skip=$block
+
   dd if=/dev/sdX of=/dev/null count=1 bs=B skip=$block
 
done
 
done
 
</pre>
 
</pre>
Line 66: Line 66:
 
* For each of the bad blocks, write zeros over the block to force it to be reallocated from spare space on the drive. Be careful here – getting it wrong will destroy data! Also note that when reading, “skip” is used to position the input stream, but here “seek” is used to position the output stream:
 
* For each of the bad blocks, write zeros over the block to force it to be reallocated from spare space on the drive. Be careful here – getting it wrong will destroy data! Also note that when reading, “skip” is used to position the input stream, but here “seek” is used to position the output stream:
 
<pre>
 
<pre>
for block in `ddrescuelog –list-blocks=- sdX.map`
+
for block in `ddrescuelog -b B --list-blocks=- sdX.map`
 
do
 
do
   dd if=/dev/zero of=/dev/sdX count=1 bs=512 seek=$block
+
   dd if=/dev/zero of=/dev/sdX count=1 bs=B seek=$block
 
done
 
done
 
</pre>
 
</pre>
Line 74: Line 74:
 
* It’s possible that dd will fail to write to the block, in which case try again with hdparm:
 
* It’s possible that dd will fail to write to the block, in which case try again with hdparm:
  
** First check that we’ve got the right sectors (we expect to see “SG_IO: bad/missing sense data” for each sector on stderr, so we pipe stdout to /dev/null to avoid noise):
+
** First check that we’ve got the right sectors (we expect to see “SG_IO: bad/missing sense data” for each sector on stderr, so we pipe stdout to /dev/null to avoid noise). Note that we use '''S''' as the block size for ddrescuelog, since hdparm deals in sectors:
 
<pre>
 
<pre>
for block in `ddrescuelog –list-blocks=- sdX.map`
+
for block in `ddrescuelog -b S --list-blocks=- sdX.map`
 
do
 
do
 
   hdparm –read-sector $block /dev/sdX > /dev/null
 
   hdparm –read-sector $block /dev/sdX > /dev/null
Line 82: Line 82:
 
</pre>
 
</pre>
  
** Assuming we’ve seen the expected errors, write zeros over each of the bad sectors. Be careful here – getting it wrong will destroy data! You may be asked to add a “—yes-i-know-what-i-am-doing” flag.
+
** Assuming we’ve seen the expected errors, write zeros over each of the bad sectors. Note that we use '''S''' as the block size for ddrescuelog, since hdparm deals in sectors. Be careful here – getting it wrong will destroy data! You may be asked to add a “—yes-i-know-what-i-am-doing” flag.
 
<pre>
 
<pre>
for block in `ddrescuelog –list-blocks=- sdX.map`
+
for block in `ddrescuelog -b S --list-blocks=- sdX.map`
 
do
 
do
 
   hdparm –write-sector $block /dev/sdX
 
   hdparm –write-sector $block /dev/sdX
 
done
 
done
 +
</pre>
 +
 +
* Check that the number of failing sectors has decreased:
 +
<pre>
 +
smartctl –a /dev/sdX | grep Pending
 +
</pre>
 +
 +
For any files reported by `debugfs`:
 +
* Write random data over the bad file, which should force the pending sector to be marked bad and reallocated from spare space on the disk:
 +
<pre>
 +
shred –v /path/to/bad/file
 +
</pre>
 +
 +
* Check that the number of failing sectors has decreased:
 +
<pre>
 +
smartctl –a /dev/sdX | grep Pending
 +
</pre>
 +
 +
* Assuming the number of pending sectors has decreased, it’s then ok to delete the bad file:
 +
<pre>
 +
rm /path/to/bad/file
 
</pre>
 
</pre>
  

Revision as of 09:56, 26 April 2022

SnapRAID / MergerFS

Setup

https://zackreed.me/setting-up-snapraid-on-ubuntu/

Note that if (like me) you use a dedicated snapraid content directory then you'll need to create that by hand for each disk with:

 mkdir /mnt/data/disk1/.snapraid

Partitioning a new data disk

Note: "-m 2" here reserves 2% of the filesystem for root-owned files (eg. .../.snapraid/content)

 sudo parted -a optimal -s /dev/sdX -- mklabel gpt mkpart primary 0% 100%
 sudo mkfs.ext4 -m 2 -T largefile4 /dev/sdX1

Partitioning a new parity disk

Note: "-m 0" here reserves 0% of the filesystem, ensuring that the parity disks are slightly larger than the data disks

 sudo parted -a optimal -s /dev/sdX -- mklabel gpt mkpart primary 0% 100%
 sudo mkfs.ext4 -m 0 -T largefile4 /dev/sdX1

Adding a new data disk to mergerfs

From: https://zackreed.me/mergerfs-neat-tricks/ From within the root of the mergerfs filesystem (eg. /srv)

 xattr -w user.mergerfs.srcmounts '+>/mnt/data/disk4/srv' .mergerfs

Removing a data disk from mergerfs

From within the root of the mergerfs filesystem (eg. /srv)

 xattr -w user.mergerfs.srcmounts '-/mnt/data/disk4/srv' .mergerfs

Forcing a resync

 sudo snapraid sync

Identifying/Fixing a bad block

  • Run ddrescue to identify the failing bytes on the disk:
ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdX /dev/null sdX.map 
  • ddrescue will write out a map file containing start positions and sizes (all in bytes) of good (+) and bad (-) byte ranges
  • Get the block size for the volume with tune2fs -l /dev/sdXY | grep "Block size" (we'll call this B). In my case this is 4096.
  • Get the sector size for the disk with fdisk -l /dev/sdX | grep Units (we'll call this S). In my case this is 512.
  • Identify the starting sector for the /dev/sdXY volume (eg. /dev/sda1) with fdisk -l /dev/sdX (we'll call this T). In my case this is 2048.
  • Run ddrescuelog to list out the bad block locations (using the block size B):
ddrescuelog -b B --list-blocks=- sdX.map
  • For each of these, convert to a block location in the volume (rather than the disk) by subtracting (T * S / B). In my case that's 256. Let's call each of these bad volume blocks BB
  • Start debugfs for the volume with debugfs /dev/sdXY
  • For each BB run:
testb BB
  • If debugfs returns "Block BB not in use", then that block isn't part of a file and can safely be overwritten. If it returns an inode number (we'll call it I) then you can convert that inode number to a file path with:
ncheck I

For any unused blocks, we need to do the following:

  • Check with dd that we’ve got the right block IDs. For each one of these reads we expect to see an error (and “0+0 records in”):
for block in `ddrescuelog -b B --list-blocks=- sdX.map`
do
  dd if=/dev/sdX of=/dev/null count=1 bs=B skip=$block
done
  • For each of the bad blocks, write zeros over the block to force it to be reallocated from spare space on the drive. Be careful here – getting it wrong will destroy data! Also note that when reading, “skip” is used to position the input stream, but here “seek” is used to position the output stream:
for block in `ddrescuelog -b B --list-blocks=- sdX.map`
do
  dd if=/dev/zero of=/dev/sdX count=1 bs=B seek=$block
done
  • It’s possible that dd will fail to write to the block, in which case try again with hdparm:
    • First check that we’ve got the right sectors (we expect to see “SG_IO: bad/missing sense data” for each sector on stderr, so we pipe stdout to /dev/null to avoid noise). Note that we use S as the block size for ddrescuelog, since hdparm deals in sectors:
for block in `ddrescuelog -b S --list-blocks=- sdX.map`
do
  hdparm –read-sector $block /dev/sdX > /dev/null
done
    • Assuming we’ve seen the expected errors, write zeros over each of the bad sectors. Note that we use S as the block size for ddrescuelog, since hdparm deals in sectors. Be careful here – getting it wrong will destroy data! You may be asked to add a “—yes-i-know-what-i-am-doing” flag.
for block in `ddrescuelog -b S --list-blocks=- sdX.map`
do
  hdparm –write-sector $block /dev/sdX
done
  • Check that the number of failing sectors has decreased:
smartctl –a /dev/sdX | grep Pending

For any files reported by `debugfs`:

  • Write random data over the bad file, which should force the pending sector to be marked bad and reallocated from spare space on the disk:
shred –v /path/to/bad/file
  • Check that the number of failing sectors has decreased:
smartctl –a /dev/sdX | grep Pending
  • Assuming the number of pending sectors has decreased, it’s then ok to delete the bad file:
rm /path/to/bad/file

More details: