Difference between revisions of "Mdadm"

From Briki
Jump to: navigation, search
(Creating a RAID array)
Line 37: Line 37:
  
 
* From this point, just treat the array (/dev/md0) as a normal physical disk.
 
* From this point, just treat the array (/dev/md0) as a normal physical disk.
 +
 +
== Recovering from disk failure ==
 +
 +
* Check the disk status in mdadm:
 +
<pre>
 +
mdadm --detail /dev/md0
 +
</pre>
 +
 +
* If the disk is already marked as failed, then skip this step. Otherwise:
 +
<pre>
 +
mdadm /dev/md0 --fail /dev/sdX1
 +
</pre>
 +
 +
* From this point, the array will continue to operate in "degraded" mode
 +
 +
* Remove the failed disk:
 +
<pre>
 +
mdadm /dev/md0 --remove /dev/sdX1
 +
</pre>
 +
 +
* Add a replacement disk:
 +
<pre>
 +
mdadm /dev/md0 --add /dev/sdY1
 +
</pre>
 +
 +
* Wait (potentially several days) for the array to be resynced
 +
 +
== Recover from a dirty reboot of a degraded array ==
 +
If the server shuts down uncleanly (eg. due to a power cut) when the array is degraded, it will refuse to automatically assemble the array on startup (with a dmesg error of the form "cannot start dirty degraded array"). This is to because the data may be in an inconsistent state. In this situation:
 +
 +
* Check that the good disks have the same number of events. If the numbers differ slightly, that suggests some of the data being written when the server shutdown wasn't written fully, and is probably corrupt (hopefully this will just mean a logfile with some bad characters, or similar).
 +
<pre>
 +
mdadm --examine /dev/sdX /dev/sdY /dev/sdZ | grep Events
 +
</pre>
 +
 +
* Assuming the number of events is the same (or very similar), forcibly assemble the array.
 +
<pre>
 +
mdadm --assemble --force /dev/md0 /dev/sdX1 /dev/sdY1 /dev/sdZ1
 +
</pre>
  
 
== Useful Commands ==
 
== Useful Commands ==

Revision as of 13:58, 19 April 2016

Overview

Several physical disks (/dev/sdX) or partitions (/dev/sdX1) of equal size are joined into a single array.

Creating a RAID array

  • (Recommended) Create a partition on each disk with the following attributes:
    • Use the "GPT" partition table format (to handle disks > 2TB)
    • Use optimal alignment for partition start (this will normally mean that the partition start will be at the 1MB boundary)
    • End 100MB before the end of the disk (this is to allow for slight variances in exact size of similar disks)
    • Set partition type to raid (0xFD00); this is optional, but may encourage some tools to avoid writing directly to the disk (and avoid corrupting the array)
parted -a optimal -- /dev/sdX mklabel gpt mkpart primary 0MB -100MB set 1 raid on
  • Create a RAID 5 array over 3 partitions:
    • Note, the default metadata version is now 1.2 for create commands
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdX1 /dev/sdY1 /dev/sdZ1
  • Wait (potentially several days) for the array to be built
  • Once built, save the current raid setup to /etc, to allow for automounting on startup:
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
  • Update the initial boot image for all current kernel versions to include the new mdadm.conf:
update-initramfs -u -k all
  • Start the array:
mdadm --assemble /dev/md0 /dev/sdX1 /dev/sdY1 /dev/sdZ1
  • From this point, just treat the array (/dev/md0) as a normal physical disk.

Recovering from disk failure

  • Check the disk status in mdadm:
mdadm --detail /dev/md0
  • If the disk is already marked as failed, then skip this step. Otherwise:
mdadm /dev/md0 --fail /dev/sdX1
  • From this point, the array will continue to operate in "degraded" mode
  • Remove the failed disk:
mdadm /dev/md0 --remove /dev/sdX1
  • Add a replacement disk:
mdadm /dev/md0 --add /dev/sdY1
  • Wait (potentially several days) for the array to be resynced

Recover from a dirty reboot of a degraded array

If the server shuts down uncleanly (eg. due to a power cut) when the array is degraded, it will refuse to automatically assemble the array on startup (with a dmesg error of the form "cannot start dirty degraded array"). This is to because the data may be in an inconsistent state. In this situation:

  • Check that the good disks have the same number of events. If the numbers differ slightly, that suggests some of the data being written when the server shutdown wasn't written fully, and is probably corrupt (hopefully this will just mean a logfile with some bad characters, or similar).
mdadm --examine /dev/sdX /dev/sdY /dev/sdZ | grep Events
  • Assuming the number of events is the same (or very similar), forcibly assemble the array.
mdadm --assemble --force /dev/md0 /dev/sdX1 /dev/sdY1 /dev/sdZ1

Useful Commands

cat /proc/mdstat 
Display a summary of current raid status
mdadm --detail /dev/md0 
Display raid information on array md0
mdadm --examine /dev/sdf 
Display raid information on device/partition sdf