From f5e857d7b31798ed0ea12e72229924826d4a4e7d Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 25 Feb 2023 14:08:13 +0100 Subject: Moving docs to subdirectory in debian tree. Signed-off-by: Daniel Baumann --- debian/FAQ | 669 ------------------------------------- debian/README.checkarray | 33 -- debian/README.recipes | 168 ---------- debian/local/doc/FAQ | 669 +++++++++++++++++++++++++++++++++++++ debian/local/doc/README.checkarray | 33 ++ debian/local/doc/README.recipes | 168 ++++++++++ debian/mdadm.docs | 4 +- 7 files changed, 871 insertions(+), 873 deletions(-) delete mode 100644 debian/FAQ delete mode 100644 debian/README.checkarray delete mode 100644 debian/README.recipes create mode 100644 debian/local/doc/FAQ create mode 100644 debian/local/doc/README.checkarray create mode 100644 debian/local/doc/README.recipes diff --git a/debian/FAQ b/debian/FAQ deleted file mode 100644 index 40e0aba..0000000 --- a/debian/FAQ +++ /dev/null @@ -1,669 +0,0 @@ -Frequently asked questions -- Debian mdadm -========================================== - -Also see /usr/share/doc/mdadm/README.recipes.gz . - -The latest version of this FAQ is available here: - http://anonscm.debian.org/gitweb/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD - -0. What does MD stand for? -~~~~~~~~~~~~~~~~~~~~~~~~~~ - MD is an abbreviation for "multiple device" (also often called "multi- - disk"). The Linux MD implementation implements various strategies for - combining multiple (typically but not necessarily physical) block devices - into single logical ones. The most common use case is commonly known as - "Software RAID". Linux supports RAID levels 1, 4, 5, 6 and 10 as well - as the "pseudo" RAID level 0. - In addition, the MD implementation covers linear and multipath - configurations. - - Most people refer to MD as RAID. Since the original name of the RAID - configuration software is "md"adm, I chose to use MD consistently instead. - -1. How do I overwrite ("zero") the superblock? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --zero-superblock /dev/sdXY - - Note that this is a destructive operation. It does not actually delete any - data, but the device will have lost its "authority". You cannot assemble the - array with it anymore and if you add the device to another array, the - synchronisation process *will* *overwrite* all data on the device. - - Nevertheless, sometimes it is necessary to zero the superblock: - - - If you want ot re-use a device (e.g. a HDD or SSD) that has been part of an - array (with an different superblock version and/or location) in another one. - In this case you zero the superblock before you assemble the array or add - the device to a new array. - - - If you are trying to prevent a device from being recognised as part of an - array. Say for instance you are trying to change an array spanning sd[ab]1 - to sd[bc]1 (maybe because sda is failing or too slow), then automatic - (scan) assembly will still recognise sda1 as a valid device. You can limit - the devices to scan with the DEVICE keyword in the configuration file, but - this may not be what you want. Instead, zeroing the superblock will - (permanently) prevent a device from being considered as part of an array. - - WARNING: Depending on which superblock version you use, it won't work to just - overwrite the first few MiBs of the block device with 0x0 (e.g. via - dd), since the superblock may be at other locations (especially the - end of the device). - Therefore always use mdadm --zero-superblock . - -2. How do I change the preferred minor of an MD array (RAID)? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - See item 12 in /usr/share/doc/mdadm/README.recipes.gz and read the mdadm(8) - manpage (search for 'preferred'). - -3. How does mdadm determine which /dev/mdX or /dev/md/X to use? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The logic used by mdadm to determine the device node name in the mdadm - --examine output (which is used to generate mdadm.conf) depends on several - factors. Here's how mdadm determines it: - - It first checks the superblock version of a given array (or each array in - turn when iterating all of them). Run - - mdadm --detail /dev/mdX | sed -ne 's,.*Version : ,,p' - - to determine the superblock version of a running array, or - - mdadm --examine /dev/sdXY | sed -ne 's,.*Version : ,,p' - - to determine the superblock version from a component device of an array. - - Version 0 superblocks (00.90.XX) - '''''''''''''''''''''''''''''''' - You need to know the preferred minor number stored in the superblock, - so run either of - - mdadm --detail /dev/mdX | sed -ne 's,.*Preferred Minor : ,,p' - mdadm --examine /dev/sdXY | sed -ne 's,.*Preferred Minor : ,,p' - - Let's call the resulting number MINOR. Also see FAQ 2 further up. - - Given MINOR, mdadm will output /dev/md if the device node - /dev/md exists. - Otherwise, it outputs /dev/md/ - - Version 1 superblocks (01.XX.XX) - '''''''''''''''''''''''''''''''' - Version 1 superblocks actually seem to ignore preferred minors and instead - use the value of the name field in the superblock. Unless specified - explicitly during creation (-N|--name) the name is determined from the - device name used, using the following regexp: 's,/dev/md/?(.*),$1,', thus: - - /dev/md0 -> 0 - /dev/md/0 -> 0 - /dev/md_d0 -> _d0 (d0 in later versions) - /dev/md/d0 -> d0 - /dev/md/name -> name - (/dev/name does not seem to work) - - mdadm will append the name to '/dev/md/', so it will always output device - names under the /dev/md/ directory. Newer versions can create a symlink - from /dev/mdX. See the symlinks option in mdadm.con(5) and mdadm(8). - - If you want to change the name, you can do so during assembly: - - mdadm -A -U name -N newname /dev/mdX /dev/sd[abc]X - - I know this all sounds inconsistent and upstream has some work to do. - We're on it. - -4. Which RAID level should I use? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Many people seem to prefer RAID4/5/6 because it makes more efficient use of - space. For example, if you have devices of size X, then in order to get 2X - storage, you need 3 devices for RAID5, but 4 if you use RAID10 or RAID1+ (or - RAID6). - - This gain in usable space comes at a price: performance; RAID1/10 can be up - to four times faster than RAID4/5/6. - - At the same time, however, RAID4/5/6 provide somewhat better redundancy in - the event of two failing devices. In a RAID10 configuration, if one device is - already dead, the RAID can only survive if any of the two devices in the other - RAID1 array fails, but not if the second device in the degraded RAID1 array - fails (see next item, 4b). A RAID6 across four devices can cope with any two - devices failing. However, RAID6 is noticeably slower than RAID5. RAID5 and - RAID4 do not differ much, but can only handle single-device failures. - - If you can afford the extra devices (storage *is* cheap these days), I suggest - RAID1/10 over RAID4/5/6. If you don't care about performance but need as - much space as possible, go with RAID4/5/6, but make sure to have backups. - Heck, make sure to have backups whatever you do. - - Let it be said, however, that I thoroughly regret putting my primary - workstation on RAID5. Anything device-intensive brings the system to its - knees; I will have to migrate to RAID10 at one point. - - Please also consult /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz, - https://en.wikipedia.org/wiki/Standard_RAID_levels and perhaps even - https://en.wikipedia.org/wiki/Non-standard_RAID_levels . - -4b. Can a 4-device RAID10 survive two device failures? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - I am assuming that you are talking about a setup with two copies of each - block, so --layout=near2/far2/offset2: - - In two thirds of the cases, yes[0], and it does not matter which layout you - use. When you assemble 4 devices into a RAID10, you essentially stripe a RAID0 - across two RAID1, so the four devices A,B,C,D become two pairs: A,B and C,D. - If A fails, the RAID10 can only survive if the second failing device is either - C or D; if B fails, your array is dead. - - Thus, if you see a device failing, replace it as soon as possible! - - If you need to handle two failing devices out of a set of four, you have to - use RAID6, or store more than two copies of each block (see the --layout - option in the mdadm(8) manpage). - - See also question 18 further down. - - [0] It's actually (n-2)/(n-1), where n is the number of devices. I am not - a mathematician, see http://aput.net/~jheiss/raid10/, which gives the - chance of *failure* as 1/(n-1), so the chance of success is 1-1/(n-1), or - (n-2)/(n-1), or 2/3 in the four device example. - (Thanks to Per Olofsson for clarifying this in #493577). - -5. How to convert RAID5 to RAID10? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - To convert 3 device RAID5 to RAID10, you need a spare device (either a hot - spare, fourth device in the array, or a new one). Then you remove the spare - and one of the three devices from the RAID5, create a degraded RAID10 across - them, create the filesystem and copy the data (or do a raw copy), then add the - other two devices to the new RAID10. However, mdadm cannot assemble a RAID10 - with 50% missing devices the way you might like it: - - mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing - - For reasons that may be answered by question 20 further down, mdadm actually - cares about the order of devices you give it. If you intersperse the "missing" - keywords with the physical devices, it should work: - - mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing - - or even - - mdadm --create -l 10 -n4 -pn2 /dev/md1 missing /dev/sd[cd] missing - - Also see item (4b) further up, and this thread: - http://thread.gmane.org/gmane.linux.raid/13469/focus=13472 - -6. What is the difference between RAID1+0 and RAID10? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - RAID1+0 is a form of RAID in which a RAID0 is striped across two RAID1 - arrays. To assemble it, you create two RAID1 arrays and then create a RAID0 - array with the two md arrays. - - The Linux kernel provides the RAID10 level to do pretty much exactly the - same for you, but with greater flexibility (and somewhat improved - performance). While RAID1+0 makes sense with 4 devices, RAID10 can be - configured to work with only 3 devices. Also, RAID10 has a little less - overhead than RAID1+0, which has data pass the md layer twice. - - I prefer RAID10 over RAID1+0. - -6b. What's the difference between RAID1+0 and RAID0+1? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - In short: RAID1+0 concatenates two mirrored arrays while RAID0+1 mirrors two - concatenated arrays. However, the two are also often switched. - - The linux MD driver supports RAID10, which is equivalent to the above - RAID1+0 definition. - - RAID1+0/10 has a greater chance to survive two device failures, its - performance suffers less when in degraded state, and it resyncs faster after - replacing a failed device. - - See http://aput.net/~jheiss/raid10/ for more details. - -7. Which RAID10 layout scheme should I use -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - RAID10 gives you the choice between three ways of laying out chunks on the - devices: near, far and offset. - - The examples below explain the chunk distribution for each of these layouts - with 2 copies per chunk, using either an even number of devices (fewer than 4) - or an odd number (fewer than 5). - - For simplicity we assume that the chunk size matches the block size of the - underlying devices and also the RAID10 device exported by the kernel - (e.g. /dev/md/name). The chunk numbers map therefore directly to the block - addresses in the exported RAID10 device. - - The decimal numbers below (0, 1, 2, …) are the RAID10 chunks. Due to the - foregoing assumption they are also the block addresses in the exported RAID10 - device. Identical numbers refer to copies of a chunk or block, but on different - underlying devices. The hexadecimal numbers (0x00, 0x01, 0x02, …) refer to the - block addresses in the underlying devices. - - "near" layout with 2 copies per chunk (--layout=n2): - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The chunk copies are placed "as close to each other as possible". - - With an even number of devices, they lie at the same offset on the each device. - It is a classic RAID1+0 setup, i.e. two groups of mirrored devices, with both - forming a striped RAID0. - - device1 device2 device3 device4 device1 device2 device3 device4 device5 - ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── - 0 0 1 1 0x00 0 0 1 1 2 - 2 2 3 3 0x01 2 3 3 4 4 - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ - ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ - 254 254 255 255 0x80 317 318 318 319 319 - ╰──────┬──────╯ ╰──────┬──────╯ - RAID1 RAID1 - ╰──────────────┬──────────────╯ - RAID0 - - "far" layout with 2 copies per chunk (--layout=f2): - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The chunk copies are placed "as far from each other as possible". - - Here, a complete sequence of chunks is striped over all devices. Then a second - sequence of chunks is placed next to them. More copies are added as the number - 2 goes up. - - It is undesirable, however, to place copies of the same chunks on the same - devices. That is prevented by a cyclic permutation of each such stripe. - - device1 device2 device3 device4 device1 device2 device3 device4 device5 - ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── - 0 1 2 3 0x00 0 1 2 3 4 ╮ - 4 5 6 7 0x01 5 6 7 8 9 ├ ▒ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ - ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ - 252 253 254 255 0x40 315 316 317 318 319 ╯ - 3 0 1 2 0x41 4 0 1 2 3 ╮ - 7 4 5 6 0x42 9 5 6 7 8 ├ ▒ₚ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ - ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ - 255 252 253 254 0x80 319 315 316 317 318 ╯ - - Each ▒ in the diagram represents a complete sequence of chunks. ▒ₚ is a cyclic - permutation. - - A major advantage of the "far" layout is that sequential reads can be spread - out over different devices, which makes the setup similar to RAID0 in terms of - speed. For writes, there is a cost of seeking. They are substantially slower. - - "offset" layout with 2 copies per chunk (--layout=o2): - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Here, a number of consecutive chunks are bundled on each device during the - striping operation. The number of consecutive chunks equals the number of - devices. Next, a copy of the same chunks is striped in a different pattern. - More copies are added as the number 2 goes up. - - A cyclic permutation in the pattern prevents copies of the same chunks - landing on the same devices. - - device1 device2 device3 device4 device1 device2 device3 device4 device5 - ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── - 0 1 2 3 0x00 0 1 2 3 4 ) AA - 3 0 1 2 0x01 4 0 1 2 3 ) AAₚ - 4 5 6 7 0x02 5 6 7 8 9 ) AB - 7 4 5 6 0x03 9 5 6 7 8 ) ABₚ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯ - ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ - ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯ - 251 252 253 254 0x79 314 315 316 317 318 ) EX - 254 251 252 253 0x80 318 314 315 316 317 ) EXₚ - - With AA, AB, …, AZ, BA, … being the sets of consecutive chunks and - AAₚ, ABₚ, …, AZₚ, BAₚ, … their cyclic permutations. - - The read characteristics are probably similar to the "far" layout when a - suitably large chunk size is chosen, but with less seeking for writes. - - Upstream and the Debian maintainer do not understand all the nuances and - implications. The "offset" layout was only added because the Common - RAID Data Disk Format (DDF) supports it, and standard compliance is our - goal. - - See the md(4) manpage for more details. - -8. (One of) my RAID arrays is busy and cannot be stopped. What gives? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - It is perfectly normal for mdadm to report the array with the root - filesystem to be busy on shutdown. The reason for this is that the root - filesystem must be mounted to be able to stop the array (or otherwise - /sbin/mdadm does not exist), but to stop the array, the root filesystem - cannot be mounted. Catch 22. The kernel actually stops the array just before - halting, so it's all well. - - If mdadm cannot stop other arrays on your system, check that these arrays - aren't used anymore. Common causes for busy/locked arrays are: - - * The array contains a mounted filesystem (check the `mount' output) - * The array is used as a swap backend (check /proc/swaps) - * The array is used by the device-mapper (check with `dmsetup') - * LVM - * dm-crypt - * EVMS - * The array contains a swap partition used for suspend-to-ram - (check /etc/initramfs-tools/conf.d/resume) - * The array is used by a process (check with `lsof') - -9. Should I use RAID0 (or linear)? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - No. Unless you know what you're doing and keep backups, or use it for data - that can be lost. - -9b. Why not? -~~~~~~~~~~~~ - RAID0 has zero redundancy. If you stripe a RAID0 across X devices, you - increase the likelyhood of complete loss of the filesystem by a factor of X. - - The same applies to LVM by the way (when LVs are placed over X PVs). - - If you want/must used LVM or RAID0, stripe it across RAID1 arrays - (RAID10/RAID1+0, or LVM on RAID1), and keep backups! - -10. Can I cancel a running array check (checkarray)? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - See the -x option in the `/usr/share/mdadm/checkarray --help` output. - -11. mdadm warns about duplicate/similar superblocks; what gives? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - In certain configurations, especially if your last partition extends all the - way to the end of the device, mdadm may display a warning like: - - mdadm: WARNING /dev/sdXY and /dev/sdX appear to have very similar - superblocks. If they are really different, please --zero the superblock on - one. If they are the same or overlap, please remove one from the DEVICE - list in mdadm.conf. - - There are two ways to solve this: - - (a) recreate the arrays with version-1 superblocks, which is not always an - option -- you cannot yet upgrade version-0 to version-1 superblocks for - existing arrays. - - (b) instead of 'DEVICE partitions', list exactly those devices that are - components of MD arrays on your system. So istead of: - - DEVICE partitions - - for example: - - DEVICE /dev/sd[ab]* /dev/sdc[123] - -12. mdadm -E / mkconf report different arrays with the same device - name / minor number. What gives? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - In almost all cases, mdadm updates the super-minor field in an array's - superblock when assembling the array. It does *not* do this for RAID0 - arrays. Thus, you may end up seeing something like this when you run - mdadm -E or mkconf: - - ARRAY /dev/md0 level=raid0 num-devices=2 UUID=abcd... - ARRAY /dev/md0 level=raid1 num-devices=2 UUID=dcba... - - Note how the two arrays have different UUIDs but both appear as /dev/md0. - - The solution in this case is to explicitly tell mdadm to update the - superblock of the RAID0 array. Assuming that the RAID0 array in the above - example should really be /dev/md1: - - mdadm --stop /dev/md1 - mdadm --assemble --update=super-minor --uuid=abcd... /dev/md1 - - See question 2 of this FAQ, and also http://bugs.debian.org/386315 and - recipe #12 in README.recipes . - -13. Can a MD array be partitioned? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Since kernel 2.6.28, MD arrays can be partitioned like any other block - device. - - Prior to 2.6.28, for a MD array to be able to hold partitions, it must be - created as a "partitionable array", using the configuration auto=part on the - command line or in the configuration file, or by using the standard naming - scheme (md_d* or md/d*) for partitionable arrays: - - mdadm --create --auto=yes ... /dev/md_d0 ... - # see mdadm(8) manpage about the values of the --auto keyword - -14. When would I partition an array? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - This answer by Doug Ledford is shamelessly adapted from [0] (with - permission): - - First, not all MD types make sense to be split up, e.g. multipath. For - those types, when a device fails, the *entire* device is considered to have - failed, but with different arrays you won't switch over to the next path - until each MD array has attempted to access the bad path. This can have - obvious bad consequences for certain array types that do automatic - failover from one port to another (you can end up getting the array in - a loop of switching ports repeatedly to satisfy the fact that one array - failed over during a path down, then the path came back up, and another - array stayed on the old path because it didn't send any commands during - the path down time period). - - Second, convenience. Assume you have a 6 device RAID5 array. If a device - fails and you are using a partitioned MD array, then all the partitions on - the device will already be handled without using that device. No need to - manually fail any still active array members from other arrays. - - Third, safety. Again with the RAID5 array. If you use multiple arrays on - a single device, and that device fails, but it only failed on one array, then - you now need to manually fail that device from the other arrays before - shutting down or hot swapping the device. Generally speaking, that's not - a big deal, but people do occasionally have fat finger syndrome and this - is a good opportunity for someone to accidentally fail the wrong device, and - when you then go to remove the device you create a two device failure instead - of one and now you are in real trouble. - - Forth, to respond to what you wrote about independent of each other -- - part of the reason why you partition. I would argue that's not true. If - your goal is to salvage as much use from a failing device as possible, then - OK. But, generally speaking, people that have something of value on their - devices don't want to salvage any part of a failing device, they want that - device gone and replaced immediately. There simply is little to no value in - an already malfunctioning device. They're too cheap and the data stored on - them too valuable to risk loosing something in an effort to further - utilize broken hardware. This of course is written with the understanding - that the latest MD RAID code will do read error rewrites to compensate for - minor device issues, so anything that will throw a device out of an array is - more than just a minor sector glitch. - - [0] http://thread.gmane.org/gmane.linux.raid/13594/focus=13597 - -15. How can I start a dirty degraded array? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A degraded array (e.g. a RAID5 with only two devices) that has not been - properly stopped cannot be assembled just like that; mdadm will refuse and - complain about a "dirty degraded array", for good reasons. - - The solution might be to force-assemble it, and then to start it. Please see - recipes 4 and 4b of /usr/share/doc/mdadm/README.recipes.gz and make sure you - know what you're doing. - -16. How can I influence the speed with which an array is resynchronised? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - For each array, the MD subsystem exports parameters governing the - synchronisation speed via sysfs. The values are in kB/sec. - - /sys/block/mdX/md/sync_speed -- the current speed - /sys/block/mdX/md/sync_speed_max -- the maximum speed - /sys/block/mdX/md/sync_speed_min -- the guaranteed minimum speed - -17. When I create a new array, why does it resynchronise at first? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - See the mdadm(8) manpage: - When creating a RAID5 array, mdadm will automatically create a degraded - array with an extra spare drive. This is because building the spare into - a degraded array is in general faster than resyncing the parity on - a non-degraded, but not clean, array. This feature can be over-ridden with - the --force option. - - This also applies to RAID levels 4 and 6. - - It does not make much sense for RAID levels 1 and 10 and can thus be - overridden with the --force and --assume-clean options, but it is not - recommended. Read the manpage. - -18. How many failed devics can a RAID10 handle? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (see also question 4b) - - The following table shows how many devices you can lose and still have an - operational array. In some cases, you *can* lose more than the given number - of devices, but there is no guarantee that the array survives. Thus, the - following is the guaranteed number of failed devices a RAID10 array survives - and the maximum number of failed devices the array can (but is not guaranteed - to) handle, given the number of devices used and the number of data block - copies. Note that 2 copies means original + 1 copy. Thus, if you only have - one copy (the original), you cannot handle any failures. - - 1 2 3 4 (# of copies) - 1 0/0 0/0 0/0 0/0 - 2 0/0 1/1 1/1 1/1 - 3 0/0 1/1 2/2 2/2 - 4 0/0 1/2 2/2 3/3 - 5 0/0 1/2 2/2 3/3 - 6 0/0 1/3 2/3 3/3 - 7 0/0 1/3 2/3 3/3 - 8 0/0 1/4 2/3 3/4 - (# of devices) - - Note: I have not really verified the above information. Please don't count - on it. If a device fails, replace it as soon as possible. Corrections welcome. - -19. What should I do if a device fails? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Replace it as soon as possible. - - In case of physical devices with no hot-swap capabilities, for example via: - - mdadm --remove /dev/md0 /dev/sda1 - poweroff - - mdadm --add /dev/md0 /dev/sda1 - -20. So how do I find out which other device(s) can fail without killing the - array? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Did you read the previous question and its answer? - - For cases when you have two copies of each block, the question is easily - answered by looking at the output of /proc/mdstat. For instance on a 4 device - array: - - md3 : active raid10 sdg7[3] sde7[0] sdh7[2] sdf7[1] - - you know that sde7/sdf7 form one pair and sdg7/sgh7 the other. - - If sdh now fails, this will become - - md3 : active raid10 sdg7[3] sde7[0] sdh7[4](F) sdf7[1] - - So now the second pair is degraded; the array could take another failure in - the first pair, but if sdg now also fails, you're history. - - Now go and read question 19. - - For cases with more copies per block, it becomes more complicated. Let's - think of a 7 device array with three copies: - - md5 : active raid10 sdg7[6] sde7[4] sdb7[5] sdf7[2] sda7[3] sdc7[1] sdd7[0] - - Each mirror now has 7/3 = 2.33 devices to it, so in order to determine groups, - you need to round up. Note how the devices are arranged in decreasing order of - their indices (the number in brackes in /proc/mdstat): - - device: -sdd7- -sdc7- -sdf7- -sda7- -sde7- -sdb7- -sdg7- - group: [ one ][ two ][ three ] - - Basically this means that after two devices failed, you need to make sure that - the third failed device doesn't destroy all copies of any given block. And - that's not always easy as it depends on the layout chosen: whether the - blocks are near (same offset within each group), far (spread apart in a way - to maximise the mean distance), or offset (offset by size/n within each - block). - - I'll leave it up to you to figure things out. Now go read question 19. - -21. Why does the kernel speak of 'resync' when using checkarray? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Please see README.checkarray and http://thread.gmane.org/gmane.linux.raid/11864 . - - In short: it's a bug. checkarray is actually not a resync, but the kernel - does not distinguish between them. - -22. Can I prioritise the sync process and sync certain arrays before others? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Upon start, md will resynchronise any unclean arrays, starting in somewhat - random order. Sometimes it's desirable to sync e.g. /dev/md3 first (because - it's the most important), but while /dev/md1 is synchronising, /dev/md3 will - be DELAYED (see /proc/mdstat; only if they share the same physical - components. - - It is possible to delay the synchronisation via /sys: - - echo idle >/sys/block/md1/md/sync_action - - This will cause md1 to go idle and MD to synchronise md3 (or whatever is - queued next; repeat the above for other devices if necessary). MD will also - realise that md1 is still not in sync and queue it for resynchronisation, - so it will sync automatically when its turn has come. - -23. mdadm's init script fails because it cannot find any arrays. What gives? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - This does not happen anymore, if no arrays present in config file, no arrays - will be started. - -24. What happened to mdrun? How do I replace it? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdrun used to be the sledgehammer approach to assembling arrays. It has - accumulated several problems over the years (e.g. Debian bug #354705) and - thus has been deprecated and removed with the 2.6.7-2 version of this package. - - If you are still using mdrun, please ensure that you have a valid - /etc/mdadm/mdadm.conf file (run /usr/share/mdadm/mkconf --generate to get - one), and run - - mdadm --assemble --scan --auto=yes - - instead of mdrun. - -25. Why are my arrays marked auto-read-only in /proc/mdstat? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Arrays are kept read-only until the first write occurs. This allows md to - skip lengthy resynchronisation for arrays that have not been properly shut - down, but which also not have changed. - -26. Why doesn't mdadm find arrays specified in the config file and causes the - boot to fail? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - My boot process dies at an early stage and drops me into the busybox shell. - The last relevant output seems to be from mdadm and is something like - - "/dev/md2 does not exist" - - or - - "No devices listed in conf file found" - - Why does mdadm break my system? - - Short answer: It doesn't, the underlying devices aren't yet available yet - when mdadm runs during the early boot process. - - Long answer: It doesn't, but the drivers of those devices incorrectly - communicate to the kernel that the devices are ready, when in fact they are - not. I consider this a bug in those drivers. Please consider reporting it. - - Workaround: there is nothing mdadm can or will do against this. Fortunately - though, initramfs provides a method, documented at - http://wiki.debian.org/InitramfsDebug. Please append rootdelay=10 (which sets - a delay of 10 seconds before trying to mount the root filesystem) to the - kernel command line and try if the boot now works. - - -- martin f. krafft Wed, 13 May 2009 09:59:53 +0200 diff --git a/debian/README.checkarray b/debian/README.checkarray deleted file mode 100644 index 8071a4d..0000000 --- a/debian/README.checkarray +++ /dev/null @@ -1,33 +0,0 @@ -checkarray notes -================ - -checkarray will run parity checks across all your redundant arrays. By -default, it is configured to run on the first Sunday of each month, at 01:06 -in the morning. This is realised by asking cron to wake up every Sunday with -/etc/cron.d/mdadm, but then only running the script when the day of the month -is less than or equal to 7. See #380425. - -Cron will try to run the check at "idle I/O priority" (see ionice(1)), so that -the check does not overload the system too much. Note that this will only -work if all the component devices of the array employ the (default) "cfq" I/O -scheduler. See the kernel documentation[0] for information on how to verify -and modify the scheduler. checkarray does not verify this for you. - - 0. http://www.kernel.org/doc/Documentation/block/switching-sched.txt - -If you manually invoke checkarray, it runs with default I/O priority. Should -you need to run a check at a higher (or lower) I/O priority, then have a look -at the --idle, --slow, --fast, and --realtime options. - -'check' is a read-only operation, even though the kernel logs may suggest -otherwise (e.g. /proc/mdstat and several kernel messages will mention -"resync"). Please also see question 21 of the FAQ. - -If, however, while reading, a read error occurs, the check will trigger the -normal response to read errors which is to generate the 'correct' data and try -to write that out - so it is possible that a 'check' will trigger a write. -However in the absence of read errors it is read-only. - -You can cancel a running array check with the -x option to checkarray. - - -- martin f. krafft Thu, 02 Sep 2010 10:27:29 +0200 diff --git a/debian/README.recipes b/debian/README.recipes deleted file mode 100644 index 3906629..0000000 --- a/debian/README.recipes +++ /dev/null @@ -1,168 +0,0 @@ -mdadm recipes -============= - -The following examples/recipes may help you with your mdadm experience. I'll -leave it as an exercise to use the correct device names and parameters in each -case. You can find pointers to additional documentation in the README.Debian -file. - -Enjoy. Submissions welcome. - -The latest version of this document is available here: - http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=blob;f=debian/README.recipes;hb=HEAD - -The short options used here are: - - -l Set RAID level. - -n Number of active devices in the array. - -x Specify the number of spare (eXtra) devices in the initial array. - -0. create a new array -~~~~~~~~~~~~~~~~~~~~~ - mdadm --create -l1 -n2 -x1 /dev/md0 /dev/sd[abc]1 # RAID 1, 1 spare - mdadm --create -l5 -n3 -x1 /dev/md0 /dev/sd[abcd]1 # RAID 5, 1 spare - mdadm --create -l6 -n4 -x1 /dev/md0 /dev/sd[abcde]1 # RAID 6, 1 spare - -1. create a degraded array -~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --create -l5 -n3 /dev/md0 /dev/sda1 missing /dev/sdb1 - mdadm --create -l6 -n4 /dev/md0 /dev/sda1 missing /dev/sdb1 missing - -2. assemble an existing array -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --assemble --auto=yes /dev/md0 /dev/sd[abc]1 - - # if the array is degraded, it won't be started. use --run: - mdadm --assemble --auto=yes --run /dev/md0 /dev/sd[ab]1 - - # or start it by hand: - mdadm --run /dev/md0 - -3. assemble all arrays in /etc/mdadm/mdadm.conf -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --assemble --auto=yes --scan - -4. assemble a dirty degraded array -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --assemble --auto=yes --force /dev/md0 /dev/sd[ab]1 - mdadm --run /dev/md0 - -4b. assemble a dirty degraded array at boot-time -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - If the array is started at boot time by the kernel (partition type 0xfd), - you can force-assemble it by passing the kernel boot parameter - - md-mod.start_dirty_degraded=1 - -5. stop arrays -~~~~~~~~~~~~~~ - mdadm --stop /dev/md0 - - # to stop all arrays in /etc/mdadm/mdadm.conf - mdadm --stop --scan - -6. hot-add components -~~~~~~~~~~~~~~~~~~~~~ - # on the running array: - mdadm --add /dev/md0 /dev/sdc1 - - # if you add more components than the array was setup with, additional - # components will be spares - -7. hot-remove components -~~~~~~~~~~~~~~~~~~~~~~~~ - # on the running array: - mdadm --fail /dev/md0 /dev/sdb1 - - # if you have configured spares, watch /proc/mdstat how it fills in - mdadm --remove /dev/md0 /dev/sdb1 - -8. hot-grow a RAID1 by adding new components -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - # on the running array, in either order: - mdadm --grow -n3 /dev/md0 - mdadm --add /dev/md0 /dev/sdc1 - - # note: without growing first, additional devices become spares and are - # *not* synchronised after the add. - -9. hot-shrink a RAID1 by removing components -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mdadm --fail /dev/md0 /dev/sdc1 - mdadm --remove /dev/md0 /dev/sdc1 - mdadm --grow -n2 /dev/md0 - -10. convert existing filesystem to RAID 1 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - # The idea is to create a degraded RAID 1 on the second partition, move - # data, then hot add the first. This seems safer to me than simply to - # force-add a superblock to the existing filesystem. - # - # Assume /dev/sda1 holds the data (and let's assume it's mounted on - # /home) and /dev/sdb1 is empty and of the same size... - # - mdadm --create /dev/md0 -l1 -n2 /dev/sdb1 missing - - mkfs -t /dev/md0 - mount /dev/md0 /mnt - - tar -cf- -C /home . | tar -xf- -C /mnt -p - - # consider verifying the data - umount /home - umount /mnt - mount /dev/md0 /home # also change /etc/fstab - - mdadm --add /dev/md0 /dev/sda1 - - Warren Togami has a document explaining how to convert a filesystem on - a remote system via SSH: http://togami.com/~warren/guides/remoteraidcrazies/ - -10b. convert existing filesystem to RAID 1 in-place -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - In-place conversion of /dev/sda1 to /dev/md0 is effectively - - mdadm --create /dev/md0 -l1 -n2 /dev/sda1 missing - - however, do NOT do this, as you risk filesystem corruption. - - If you need to do this, first unmount and shrink the filesystem by - a megabyte (if supported). Then run the above command, then (optionally) - again grow the filesystem as much as possible. - - Do make sure you have backups. If you do not yet, consider method (10) - instead (and make backups anyway!). - -11. convert existing filesystem to RAID 5/6 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - # See (10) for the basics. - mdadm --create /dev/md0 -l5 -n3 /dev/sdb1 /dev/sdc1 missing - - #mdadm --create /dev/md0 -l6 -n4 /dev/sdb1 /dev/sdc1 /dev/sdd1 missing - mkfs -t /dev/md0 - mount /dev/md0 /mnt - - tar -cf- -C /home . | tar -xf- -C /mnt -p - - # consider verifying the data - umount /home - umount /mnt - mount /dev/md0 /home # also change /etc/fstab - - mdadm --add /dev/md0 /dev/sda1 - -12. change the preferred minor of an MD array (RAID) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - # you need to manually assemble the array to change the preferred minor - # if you manually assemble, the superblock will be updated to reflect - # the preferred minor as you indicate with the assembly. - # for example, to set the preferred minor to 4: - mdadm --assemble /dev/md4 /dev/sd[abc]1 - - # this only works on 2.6 kernels, and only for RAID levels of 1 and above. - # for other MD arrays, you need to specify --update explicitly: - mdadm --assemble --update=super-minor /dev/md4 /dev/sd[abc]1 - - # see also item 12 in the FAQ contained with the Debian package. - - -- martin f. krafft Fri, 06 Oct 2006 15:39:58 +0200 diff --git a/debian/local/doc/FAQ b/debian/local/doc/FAQ new file mode 100644 index 0000000..40e0aba --- /dev/null +++ b/debian/local/doc/FAQ @@ -0,0 +1,669 @@ +Frequently asked questions -- Debian mdadm +========================================== + +Also see /usr/share/doc/mdadm/README.recipes.gz . + +The latest version of this FAQ is available here: + http://anonscm.debian.org/gitweb/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD + +0. What does MD stand for? +~~~~~~~~~~~~~~~~~~~~~~~~~~ + MD is an abbreviation for "multiple device" (also often called "multi- + disk"). The Linux MD implementation implements various strategies for + combining multiple (typically but not necessarily physical) block devices + into single logical ones. The most common use case is commonly known as + "Software RAID". Linux supports RAID levels 1, 4, 5, 6 and 10 as well + as the "pseudo" RAID level 0. + In addition, the MD implementation covers linear and multipath + configurations. + + Most people refer to MD as RAID. Since the original name of the RAID + configuration software is "md"adm, I chose to use MD consistently instead. + +1. How do I overwrite ("zero") the superblock? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --zero-superblock /dev/sdXY + + Note that this is a destructive operation. It does not actually delete any + data, but the device will have lost its "authority". You cannot assemble the + array with it anymore and if you add the device to another array, the + synchronisation process *will* *overwrite* all data on the device. + + Nevertheless, sometimes it is necessary to zero the superblock: + + - If you want ot re-use a device (e.g. a HDD or SSD) that has been part of an + array (with an different superblock version and/or location) in another one. + In this case you zero the superblock before you assemble the array or add + the device to a new array. + + - If you are trying to prevent a device from being recognised as part of an + array. Say for instance you are trying to change an array spanning sd[ab]1 + to sd[bc]1 (maybe because sda is failing or too slow), then automatic + (scan) assembly will still recognise sda1 as a valid device. You can limit + the devices to scan with the DEVICE keyword in the configuration file, but + this may not be what you want. Instead, zeroing the superblock will + (permanently) prevent a device from being considered as part of an array. + + WARNING: Depending on which superblock version you use, it won't work to just + overwrite the first few MiBs of the block device with 0x0 (e.g. via + dd), since the superblock may be at other locations (especially the + end of the device). + Therefore always use mdadm --zero-superblock . + +2. How do I change the preferred minor of an MD array (RAID)? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + See item 12 in /usr/share/doc/mdadm/README.recipes.gz and read the mdadm(8) + manpage (search for 'preferred'). + +3. How does mdadm determine which /dev/mdX or /dev/md/X to use? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The logic used by mdadm to determine the device node name in the mdadm + --examine output (which is used to generate mdadm.conf) depends on several + factors. Here's how mdadm determines it: + + It first checks the superblock version of a given array (or each array in + turn when iterating all of them). Run + + mdadm --detail /dev/mdX | sed -ne 's,.*Version : ,,p' + + to determine the superblock version of a running array, or + + mdadm --examine /dev/sdXY | sed -ne 's,.*Version : ,,p' + + to determine the superblock version from a component device of an array. + + Version 0 superblocks (00.90.XX) + '''''''''''''''''''''''''''''''' + You need to know the preferred minor number stored in the superblock, + so run either of + + mdadm --detail /dev/mdX | sed -ne 's,.*Preferred Minor : ,,p' + mdadm --examine /dev/sdXY | sed -ne 's,.*Preferred Minor : ,,p' + + Let's call the resulting number MINOR. Also see FAQ 2 further up. + + Given MINOR, mdadm will output /dev/md if the device node + /dev/md exists. + Otherwise, it outputs /dev/md/ + + Version 1 superblocks (01.XX.XX) + '''''''''''''''''''''''''''''''' + Version 1 superblocks actually seem to ignore preferred minors and instead + use the value of the name field in the superblock. Unless specified + explicitly during creation (-N|--name) the name is determined from the + device name used, using the following regexp: 's,/dev/md/?(.*),$1,', thus: + + /dev/md0 -> 0 + /dev/md/0 -> 0 + /dev/md_d0 -> _d0 (d0 in later versions) + /dev/md/d0 -> d0 + /dev/md/name -> name + (/dev/name does not seem to work) + + mdadm will append the name to '/dev/md/', so it will always output device + names under the /dev/md/ directory. Newer versions can create a symlink + from /dev/mdX. See the symlinks option in mdadm.con(5) and mdadm(8). + + If you want to change the name, you can do so during assembly: + + mdadm -A -U name -N newname /dev/mdX /dev/sd[abc]X + + I know this all sounds inconsistent and upstream has some work to do. + We're on it. + +4. Which RAID level should I use? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Many people seem to prefer RAID4/5/6 because it makes more efficient use of + space. For example, if you have devices of size X, then in order to get 2X + storage, you need 3 devices for RAID5, but 4 if you use RAID10 or RAID1+ (or + RAID6). + + This gain in usable space comes at a price: performance; RAID1/10 can be up + to four times faster than RAID4/5/6. + + At the same time, however, RAID4/5/6 provide somewhat better redundancy in + the event of two failing devices. In a RAID10 configuration, if one device is + already dead, the RAID can only survive if any of the two devices in the other + RAID1 array fails, but not if the second device in the degraded RAID1 array + fails (see next item, 4b). A RAID6 across four devices can cope with any two + devices failing. However, RAID6 is noticeably slower than RAID5. RAID5 and + RAID4 do not differ much, but can only handle single-device failures. + + If you can afford the extra devices (storage *is* cheap these days), I suggest + RAID1/10 over RAID4/5/6. If you don't care about performance but need as + much space as possible, go with RAID4/5/6, but make sure to have backups. + Heck, make sure to have backups whatever you do. + + Let it be said, however, that I thoroughly regret putting my primary + workstation on RAID5. Anything device-intensive brings the system to its + knees; I will have to migrate to RAID10 at one point. + + Please also consult /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz, + https://en.wikipedia.org/wiki/Standard_RAID_levels and perhaps even + https://en.wikipedia.org/wiki/Non-standard_RAID_levels . + +4b. Can a 4-device RAID10 survive two device failures? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + I am assuming that you are talking about a setup with two copies of each + block, so --layout=near2/far2/offset2: + + In two thirds of the cases, yes[0], and it does not matter which layout you + use. When you assemble 4 devices into a RAID10, you essentially stripe a RAID0 + across two RAID1, so the four devices A,B,C,D become two pairs: A,B and C,D. + If A fails, the RAID10 can only survive if the second failing device is either + C or D; if B fails, your array is dead. + + Thus, if you see a device failing, replace it as soon as possible! + + If you need to handle two failing devices out of a set of four, you have to + use RAID6, or store more than two copies of each block (see the --layout + option in the mdadm(8) manpage). + + See also question 18 further down. + + [0] It's actually (n-2)/(n-1), where n is the number of devices. I am not + a mathematician, see http://aput.net/~jheiss/raid10/, which gives the + chance of *failure* as 1/(n-1), so the chance of success is 1-1/(n-1), or + (n-2)/(n-1), or 2/3 in the four device example. + (Thanks to Per Olofsson for clarifying this in #493577). + +5. How to convert RAID5 to RAID10? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + To convert 3 device RAID5 to RAID10, you need a spare device (either a hot + spare, fourth device in the array, or a new one). Then you remove the spare + and one of the three devices from the RAID5, create a degraded RAID10 across + them, create the filesystem and copy the data (or do a raw copy), then add the + other two devices to the new RAID10. However, mdadm cannot assemble a RAID10 + with 50% missing devices the way you might like it: + + mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing + + For reasons that may be answered by question 20 further down, mdadm actually + cares about the order of devices you give it. If you intersperse the "missing" + keywords with the physical devices, it should work: + + mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing + + or even + + mdadm --create -l 10 -n4 -pn2 /dev/md1 missing /dev/sd[cd] missing + + Also see item (4b) further up, and this thread: + http://thread.gmane.org/gmane.linux.raid/13469/focus=13472 + +6. What is the difference between RAID1+0 and RAID10? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RAID1+0 is a form of RAID in which a RAID0 is striped across two RAID1 + arrays. To assemble it, you create two RAID1 arrays and then create a RAID0 + array with the two md arrays. + + The Linux kernel provides the RAID10 level to do pretty much exactly the + same for you, but with greater flexibility (and somewhat improved + performance). While RAID1+0 makes sense with 4 devices, RAID10 can be + configured to work with only 3 devices. Also, RAID10 has a little less + overhead than RAID1+0, which has data pass the md layer twice. + + I prefer RAID10 over RAID1+0. + +6b. What's the difference between RAID1+0 and RAID0+1? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + In short: RAID1+0 concatenates two mirrored arrays while RAID0+1 mirrors two + concatenated arrays. However, the two are also often switched. + + The linux MD driver supports RAID10, which is equivalent to the above + RAID1+0 definition. + + RAID1+0/10 has a greater chance to survive two device failures, its + performance suffers less when in degraded state, and it resyncs faster after + replacing a failed device. + + See http://aput.net/~jheiss/raid10/ for more details. + +7. Which RAID10 layout scheme should I use +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RAID10 gives you the choice between three ways of laying out chunks on the + devices: near, far and offset. + + The examples below explain the chunk distribution for each of these layouts + with 2 copies per chunk, using either an even number of devices (fewer than 4) + or an odd number (fewer than 5). + + For simplicity we assume that the chunk size matches the block size of the + underlying devices and also the RAID10 device exported by the kernel + (e.g. /dev/md/name). The chunk numbers map therefore directly to the block + addresses in the exported RAID10 device. + + The decimal numbers below (0, 1, 2, …) are the RAID10 chunks. Due to the + foregoing assumption they are also the block addresses in the exported RAID10 + device. Identical numbers refer to copies of a chunk or block, but on different + underlying devices. The hexadecimal numbers (0x00, 0x01, 0x02, …) refer to the + block addresses in the underlying devices. + + "near" layout with 2 copies per chunk (--layout=n2): + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The chunk copies are placed "as close to each other as possible". + + With an even number of devices, they lie at the same offset on the each device. + It is a classic RAID1+0 setup, i.e. two groups of mirrored devices, with both + forming a striped RAID0. + + device1 device2 device3 device4 device1 device2 device3 device4 device5 + ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── + 0 0 1 1 0x00 0 0 1 1 2 + 2 2 3 3 0x01 2 3 3 4 4 + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ + ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ + 254 254 255 255 0x80 317 318 318 319 319 + ╰──────┬──────╯ ╰──────┬──────╯ + RAID1 RAID1 + ╰──────────────┬──────────────╯ + RAID0 + + "far" layout with 2 copies per chunk (--layout=f2): + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The chunk copies are placed "as far from each other as possible". + + Here, a complete sequence of chunks is striped over all devices. Then a second + sequence of chunks is placed next to them. More copies are added as the number + 2 goes up. + + It is undesirable, however, to place copies of the same chunks on the same + devices. That is prevented by a cyclic permutation of each such stripe. + + device1 device2 device3 device4 device1 device2 device3 device4 device5 + ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── + 0 1 2 3 0x00 0 1 2 3 4 ╮ + 4 5 6 7 0x01 5 6 7 8 9 ├ ▒ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ + ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ + 252 253 254 255 0x40 315 316 317 318 319 ╯ + 3 0 1 2 0x41 4 0 1 2 3 ╮ + 7 4 5 6 0x42 9 5 6 7 8 ├ ▒ₚ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ + ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆ + 255 252 253 254 0x80 319 315 316 317 318 ╯ + + Each ▒ in the diagram represents a complete sequence of chunks. ▒ₚ is a cyclic + permutation. + + A major advantage of the "far" layout is that sequential reads can be spread + out over different devices, which makes the setup similar to RAID0 in terms of + speed. For writes, there is a cost of seeking. They are substantially slower. + + "offset" layout with 2 copies per chunk (--layout=o2): + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Here, a number of consecutive chunks are bundled on each device during the + striping operation. The number of consecutive chunks equals the number of + devices. Next, a copy of the same chunks is striped in a different pattern. + More copies are added as the number 2 goes up. + + A cyclic permutation in the pattern prevents copies of the same chunks + landing on the same devices. + + device1 device2 device3 device4 device1 device2 device3 device4 device5 + ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── + 0 1 2 3 0x00 0 1 2 3 4 ) AA + 3 0 1 2 0x01 4 0 1 2 3 ) AAₚ + 4 5 6 7 0x02 5 6 7 8 9 ) AB + 7 4 5 6 0x03 9 5 6 7 8 ) ABₚ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯ + ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯ + 251 252 253 254 0x79 314 315 316 317 318 ) EX + 254 251 252 253 0x80 318 314 315 316 317 ) EXₚ + + With AA, AB, …, AZ, BA, … being the sets of consecutive chunks and + AAₚ, ABₚ, …, AZₚ, BAₚ, … their cyclic permutations. + + The read characteristics are probably similar to the "far" layout when a + suitably large chunk size is chosen, but with less seeking for writes. + + Upstream and the Debian maintainer do not understand all the nuances and + implications. The "offset" layout was only added because the Common + RAID Data Disk Format (DDF) supports it, and standard compliance is our + goal. + + See the md(4) manpage for more details. + +8. (One of) my RAID arrays is busy and cannot be stopped. What gives? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + It is perfectly normal for mdadm to report the array with the root + filesystem to be busy on shutdown. The reason for this is that the root + filesystem must be mounted to be able to stop the array (or otherwise + /sbin/mdadm does not exist), but to stop the array, the root filesystem + cannot be mounted. Catch 22. The kernel actually stops the array just before + halting, so it's all well. + + If mdadm cannot stop other arrays on your system, check that these arrays + aren't used anymore. Common causes for busy/locked arrays are: + + * The array contains a mounted filesystem (check the `mount' output) + * The array is used as a swap backend (check /proc/swaps) + * The array is used by the device-mapper (check with `dmsetup') + * LVM + * dm-crypt + * EVMS + * The array contains a swap partition used for suspend-to-ram + (check /etc/initramfs-tools/conf.d/resume) + * The array is used by a process (check with `lsof') + +9. Should I use RAID0 (or linear)? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + No. Unless you know what you're doing and keep backups, or use it for data + that can be lost. + +9b. Why not? +~~~~~~~~~~~~ + RAID0 has zero redundancy. If you stripe a RAID0 across X devices, you + increase the likelyhood of complete loss of the filesystem by a factor of X. + + The same applies to LVM by the way (when LVs are placed over X PVs). + + If you want/must used LVM or RAID0, stripe it across RAID1 arrays + (RAID10/RAID1+0, or LVM on RAID1), and keep backups! + +10. Can I cancel a running array check (checkarray)? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + See the -x option in the `/usr/share/mdadm/checkarray --help` output. + +11. mdadm warns about duplicate/similar superblocks; what gives? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + In certain configurations, especially if your last partition extends all the + way to the end of the device, mdadm may display a warning like: + + mdadm: WARNING /dev/sdXY and /dev/sdX appear to have very similar + superblocks. If they are really different, please --zero the superblock on + one. If they are the same or overlap, please remove one from the DEVICE + list in mdadm.conf. + + There are two ways to solve this: + + (a) recreate the arrays with version-1 superblocks, which is not always an + option -- you cannot yet upgrade version-0 to version-1 superblocks for + existing arrays. + + (b) instead of 'DEVICE partitions', list exactly those devices that are + components of MD arrays on your system. So istead of: + + DEVICE partitions + + for example: + + DEVICE /dev/sd[ab]* /dev/sdc[123] + +12. mdadm -E / mkconf report different arrays with the same device + name / minor number. What gives? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + In almost all cases, mdadm updates the super-minor field in an array's + superblock when assembling the array. It does *not* do this for RAID0 + arrays. Thus, you may end up seeing something like this when you run + mdadm -E or mkconf: + + ARRAY /dev/md0 level=raid0 num-devices=2 UUID=abcd... + ARRAY /dev/md0 level=raid1 num-devices=2 UUID=dcba... + + Note how the two arrays have different UUIDs but both appear as /dev/md0. + + The solution in this case is to explicitly tell mdadm to update the + superblock of the RAID0 array. Assuming that the RAID0 array in the above + example should really be /dev/md1: + + mdadm --stop /dev/md1 + mdadm --assemble --update=super-minor --uuid=abcd... /dev/md1 + + See question 2 of this FAQ, and also http://bugs.debian.org/386315 and + recipe #12 in README.recipes . + +13. Can a MD array be partitioned? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Since kernel 2.6.28, MD arrays can be partitioned like any other block + device. + + Prior to 2.6.28, for a MD array to be able to hold partitions, it must be + created as a "partitionable array", using the configuration auto=part on the + command line or in the configuration file, or by using the standard naming + scheme (md_d* or md/d*) for partitionable arrays: + + mdadm --create --auto=yes ... /dev/md_d0 ... + # see mdadm(8) manpage about the values of the --auto keyword + +14. When would I partition an array? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + This answer by Doug Ledford is shamelessly adapted from [0] (with + permission): + + First, not all MD types make sense to be split up, e.g. multipath. For + those types, when a device fails, the *entire* device is considered to have + failed, but with different arrays you won't switch over to the next path + until each MD array has attempted to access the bad path. This can have + obvious bad consequences for certain array types that do automatic + failover from one port to another (you can end up getting the array in + a loop of switching ports repeatedly to satisfy the fact that one array + failed over during a path down, then the path came back up, and another + array stayed on the old path because it didn't send any commands during + the path down time period). + + Second, convenience. Assume you have a 6 device RAID5 array. If a device + fails and you are using a partitioned MD array, then all the partitions on + the device will already be handled without using that device. No need to + manually fail any still active array members from other arrays. + + Third, safety. Again with the RAID5 array. If you use multiple arrays on + a single device, and that device fails, but it only failed on one array, then + you now need to manually fail that device from the other arrays before + shutting down or hot swapping the device. Generally speaking, that's not + a big deal, but people do occasionally have fat finger syndrome and this + is a good opportunity for someone to accidentally fail the wrong device, and + when you then go to remove the device you create a two device failure instead + of one and now you are in real trouble. + + Forth, to respond to what you wrote about independent of each other -- + part of the reason why you partition. I would argue that's not true. If + your goal is to salvage as much use from a failing device as possible, then + OK. But, generally speaking, people that have something of value on their + devices don't want to salvage any part of a failing device, they want that + device gone and replaced immediately. There simply is little to no value in + an already malfunctioning device. They're too cheap and the data stored on + them too valuable to risk loosing something in an effort to further + utilize broken hardware. This of course is written with the understanding + that the latest MD RAID code will do read error rewrites to compensate for + minor device issues, so anything that will throw a device out of an array is + more than just a minor sector glitch. + + [0] http://thread.gmane.org/gmane.linux.raid/13594/focus=13597 + +15. How can I start a dirty degraded array? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + A degraded array (e.g. a RAID5 with only two devices) that has not been + properly stopped cannot be assembled just like that; mdadm will refuse and + complain about a "dirty degraded array", for good reasons. + + The solution might be to force-assemble it, and then to start it. Please see + recipes 4 and 4b of /usr/share/doc/mdadm/README.recipes.gz and make sure you + know what you're doing. + +16. How can I influence the speed with which an array is resynchronised? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + For each array, the MD subsystem exports parameters governing the + synchronisation speed via sysfs. The values are in kB/sec. + + /sys/block/mdX/md/sync_speed -- the current speed + /sys/block/mdX/md/sync_speed_max -- the maximum speed + /sys/block/mdX/md/sync_speed_min -- the guaranteed minimum speed + +17. When I create a new array, why does it resynchronise at first? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + See the mdadm(8) manpage: + When creating a RAID5 array, mdadm will automatically create a degraded + array with an extra spare drive. This is because building the spare into + a degraded array is in general faster than resyncing the parity on + a non-degraded, but not clean, array. This feature can be over-ridden with + the --force option. + + This also applies to RAID levels 4 and 6. + + It does not make much sense for RAID levels 1 and 10 and can thus be + overridden with the --force and --assume-clean options, but it is not + recommended. Read the manpage. + +18. How many failed devics can a RAID10 handle? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + (see also question 4b) + + The following table shows how many devices you can lose and still have an + operational array. In some cases, you *can* lose more than the given number + of devices, but there is no guarantee that the array survives. Thus, the + following is the guaranteed number of failed devices a RAID10 array survives + and the maximum number of failed devices the array can (but is not guaranteed + to) handle, given the number of devices used and the number of data block + copies. Note that 2 copies means original + 1 copy. Thus, if you only have + one copy (the original), you cannot handle any failures. + + 1 2 3 4 (# of copies) + 1 0/0 0/0 0/0 0/0 + 2 0/0 1/1 1/1 1/1 + 3 0/0 1/1 2/2 2/2 + 4 0/0 1/2 2/2 3/3 + 5 0/0 1/2 2/2 3/3 + 6 0/0 1/3 2/3 3/3 + 7 0/0 1/3 2/3 3/3 + 8 0/0 1/4 2/3 3/4 + (# of devices) + + Note: I have not really verified the above information. Please don't count + on it. If a device fails, replace it as soon as possible. Corrections welcome. + +19. What should I do if a device fails? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Replace it as soon as possible. + + In case of physical devices with no hot-swap capabilities, for example via: + + mdadm --remove /dev/md0 /dev/sda1 + poweroff + + mdadm --add /dev/md0 /dev/sda1 + +20. So how do I find out which other device(s) can fail without killing the + array? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Did you read the previous question and its answer? + + For cases when you have two copies of each block, the question is easily + answered by looking at the output of /proc/mdstat. For instance on a 4 device + array: + + md3 : active raid10 sdg7[3] sde7[0] sdh7[2] sdf7[1] + + you know that sde7/sdf7 form one pair and sdg7/sgh7 the other. + + If sdh now fails, this will become + + md3 : active raid10 sdg7[3] sde7[0] sdh7[4](F) sdf7[1] + + So now the second pair is degraded; the array could take another failure in + the first pair, but if sdg now also fails, you're history. + + Now go and read question 19. + + For cases with more copies per block, it becomes more complicated. Let's + think of a 7 device array with three copies: + + md5 : active raid10 sdg7[6] sde7[4] sdb7[5] sdf7[2] sda7[3] sdc7[1] sdd7[0] + + Each mirror now has 7/3 = 2.33 devices to it, so in order to determine groups, + you need to round up. Note how the devices are arranged in decreasing order of + their indices (the number in brackes in /proc/mdstat): + + device: -sdd7- -sdc7- -sdf7- -sda7- -sde7- -sdb7- -sdg7- + group: [ one ][ two ][ three ] + + Basically this means that after two devices failed, you need to make sure that + the third failed device doesn't destroy all copies of any given block. And + that's not always easy as it depends on the layout chosen: whether the + blocks are near (same offset within each group), far (spread apart in a way + to maximise the mean distance), or offset (offset by size/n within each + block). + + I'll leave it up to you to figure things out. Now go read question 19. + +21. Why does the kernel speak of 'resync' when using checkarray? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Please see README.checkarray and http://thread.gmane.org/gmane.linux.raid/11864 . + + In short: it's a bug. checkarray is actually not a resync, but the kernel + does not distinguish between them. + +22. Can I prioritise the sync process and sync certain arrays before others? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Upon start, md will resynchronise any unclean arrays, starting in somewhat + random order. Sometimes it's desirable to sync e.g. /dev/md3 first (because + it's the most important), but while /dev/md1 is synchronising, /dev/md3 will + be DELAYED (see /proc/mdstat; only if they share the same physical + components. + + It is possible to delay the synchronisation via /sys: + + echo idle >/sys/block/md1/md/sync_action + + This will cause md1 to go idle and MD to synchronise md3 (or whatever is + queued next; repeat the above for other devices if necessary). MD will also + realise that md1 is still not in sync and queue it for resynchronisation, + so it will sync automatically when its turn has come. + +23. mdadm's init script fails because it cannot find any arrays. What gives? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + This does not happen anymore, if no arrays present in config file, no arrays + will be started. + +24. What happened to mdrun? How do I replace it? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdrun used to be the sledgehammer approach to assembling arrays. It has + accumulated several problems over the years (e.g. Debian bug #354705) and + thus has been deprecated and removed with the 2.6.7-2 version of this package. + + If you are still using mdrun, please ensure that you have a valid + /etc/mdadm/mdadm.conf file (run /usr/share/mdadm/mkconf --generate to get + one), and run + + mdadm --assemble --scan --auto=yes + + instead of mdrun. + +25. Why are my arrays marked auto-read-only in /proc/mdstat? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Arrays are kept read-only until the first write occurs. This allows md to + skip lengthy resynchronisation for arrays that have not been properly shut + down, but which also not have changed. + +26. Why doesn't mdadm find arrays specified in the config file and causes the + boot to fail? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + My boot process dies at an early stage and drops me into the busybox shell. + The last relevant output seems to be from mdadm and is something like + + "/dev/md2 does not exist" + + or + + "No devices listed in conf file found" + + Why does mdadm break my system? + + Short answer: It doesn't, the underlying devices aren't yet available yet + when mdadm runs during the early boot process. + + Long answer: It doesn't, but the drivers of those devices incorrectly + communicate to the kernel that the devices are ready, when in fact they are + not. I consider this a bug in those drivers. Please consider reporting it. + + Workaround: there is nothing mdadm can or will do against this. Fortunately + though, initramfs provides a method, documented at + http://wiki.debian.org/InitramfsDebug. Please append rootdelay=10 (which sets + a delay of 10 seconds before trying to mount the root filesystem) to the + kernel command line and try if the boot now works. + + -- martin f. krafft Wed, 13 May 2009 09:59:53 +0200 diff --git a/debian/local/doc/README.checkarray b/debian/local/doc/README.checkarray new file mode 100644 index 0000000..8071a4d --- /dev/null +++ b/debian/local/doc/README.checkarray @@ -0,0 +1,33 @@ +checkarray notes +================ + +checkarray will run parity checks across all your redundant arrays. By +default, it is configured to run on the first Sunday of each month, at 01:06 +in the morning. This is realised by asking cron to wake up every Sunday with +/etc/cron.d/mdadm, but then only running the script when the day of the month +is less than or equal to 7. See #380425. + +Cron will try to run the check at "idle I/O priority" (see ionice(1)), so that +the check does not overload the system too much. Note that this will only +work if all the component devices of the array employ the (default) "cfq" I/O +scheduler. See the kernel documentation[0] for information on how to verify +and modify the scheduler. checkarray does not verify this for you. + + 0. http://www.kernel.org/doc/Documentation/block/switching-sched.txt + +If you manually invoke checkarray, it runs with default I/O priority. Should +you need to run a check at a higher (or lower) I/O priority, then have a look +at the --idle, --slow, --fast, and --realtime options. + +'check' is a read-only operation, even though the kernel logs may suggest +otherwise (e.g. /proc/mdstat and several kernel messages will mention +"resync"). Please also see question 21 of the FAQ. + +If, however, while reading, a read error occurs, the check will trigger the +normal response to read errors which is to generate the 'correct' data and try +to write that out - so it is possible that a 'check' will trigger a write. +However in the absence of read errors it is read-only. + +You can cancel a running array check with the -x option to checkarray. + + -- martin f. krafft Thu, 02 Sep 2010 10:27:29 +0200 diff --git a/debian/local/doc/README.recipes b/debian/local/doc/README.recipes new file mode 100644 index 0000000..3906629 --- /dev/null +++ b/debian/local/doc/README.recipes @@ -0,0 +1,168 @@ +mdadm recipes +============= + +The following examples/recipes may help you with your mdadm experience. I'll +leave it as an exercise to use the correct device names and parameters in each +case. You can find pointers to additional documentation in the README.Debian +file. + +Enjoy. Submissions welcome. + +The latest version of this document is available here: + http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=blob;f=debian/README.recipes;hb=HEAD + +The short options used here are: + + -l Set RAID level. + -n Number of active devices in the array. + -x Specify the number of spare (eXtra) devices in the initial array. + +0. create a new array +~~~~~~~~~~~~~~~~~~~~~ + mdadm --create -l1 -n2 -x1 /dev/md0 /dev/sd[abc]1 # RAID 1, 1 spare + mdadm --create -l5 -n3 -x1 /dev/md0 /dev/sd[abcd]1 # RAID 5, 1 spare + mdadm --create -l6 -n4 -x1 /dev/md0 /dev/sd[abcde]1 # RAID 6, 1 spare + +1. create a degraded array +~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --create -l5 -n3 /dev/md0 /dev/sda1 missing /dev/sdb1 + mdadm --create -l6 -n4 /dev/md0 /dev/sda1 missing /dev/sdb1 missing + +2. assemble an existing array +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --assemble --auto=yes /dev/md0 /dev/sd[abc]1 + + # if the array is degraded, it won't be started. use --run: + mdadm --assemble --auto=yes --run /dev/md0 /dev/sd[ab]1 + + # or start it by hand: + mdadm --run /dev/md0 + +3. assemble all arrays in /etc/mdadm/mdadm.conf +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --assemble --auto=yes --scan + +4. assemble a dirty degraded array +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --assemble --auto=yes --force /dev/md0 /dev/sd[ab]1 + mdadm --run /dev/md0 + +4b. assemble a dirty degraded array at boot-time +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + If the array is started at boot time by the kernel (partition type 0xfd), + you can force-assemble it by passing the kernel boot parameter + + md-mod.start_dirty_degraded=1 + +5. stop arrays +~~~~~~~~~~~~~~ + mdadm --stop /dev/md0 + + # to stop all arrays in /etc/mdadm/mdadm.conf + mdadm --stop --scan + +6. hot-add components +~~~~~~~~~~~~~~~~~~~~~ + # on the running array: + mdadm --add /dev/md0 /dev/sdc1 + + # if you add more components than the array was setup with, additional + # components will be spares + +7. hot-remove components +~~~~~~~~~~~~~~~~~~~~~~~~ + # on the running array: + mdadm --fail /dev/md0 /dev/sdb1 + + # if you have configured spares, watch /proc/mdstat how it fills in + mdadm --remove /dev/md0 /dev/sdb1 + +8. hot-grow a RAID1 by adding new components +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # on the running array, in either order: + mdadm --grow -n3 /dev/md0 + mdadm --add /dev/md0 /dev/sdc1 + + # note: without growing first, additional devices become spares and are + # *not* synchronised after the add. + +9. hot-shrink a RAID1 by removing components +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + mdadm --fail /dev/md0 /dev/sdc1 + mdadm --remove /dev/md0 /dev/sdc1 + mdadm --grow -n2 /dev/md0 + +10. convert existing filesystem to RAID 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # The idea is to create a degraded RAID 1 on the second partition, move + # data, then hot add the first. This seems safer to me than simply to + # force-add a superblock to the existing filesystem. + # + # Assume /dev/sda1 holds the data (and let's assume it's mounted on + # /home) and /dev/sdb1 is empty and of the same size... + # + mdadm --create /dev/md0 -l1 -n2 /dev/sdb1 missing + + mkfs -t /dev/md0 + mount /dev/md0 /mnt + + tar -cf- -C /home . | tar -xf- -C /mnt -p + + # consider verifying the data + umount /home + umount /mnt + mount /dev/md0 /home # also change /etc/fstab + + mdadm --add /dev/md0 /dev/sda1 + + Warren Togami has a document explaining how to convert a filesystem on + a remote system via SSH: http://togami.com/~warren/guides/remoteraidcrazies/ + +10b. convert existing filesystem to RAID 1 in-place +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + In-place conversion of /dev/sda1 to /dev/md0 is effectively + + mdadm --create /dev/md0 -l1 -n2 /dev/sda1 missing + + however, do NOT do this, as you risk filesystem corruption. + + If you need to do this, first unmount and shrink the filesystem by + a megabyte (if supported). Then run the above command, then (optionally) + again grow the filesystem as much as possible. + + Do make sure you have backups. If you do not yet, consider method (10) + instead (and make backups anyway!). + +11. convert existing filesystem to RAID 5/6 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # See (10) for the basics. + mdadm --create /dev/md0 -l5 -n3 /dev/sdb1 /dev/sdc1 missing + + #mdadm --create /dev/md0 -l6 -n4 /dev/sdb1 /dev/sdc1 /dev/sdd1 missing + mkfs -t /dev/md0 + mount /dev/md0 /mnt + + tar -cf- -C /home . | tar -xf- -C /mnt -p + + # consider verifying the data + umount /home + umount /mnt + mount /dev/md0 /home # also change /etc/fstab + + mdadm --add /dev/md0 /dev/sda1 + +12. change the preferred minor of an MD array (RAID) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # you need to manually assemble the array to change the preferred minor + # if you manually assemble, the superblock will be updated to reflect + # the preferred minor as you indicate with the assembly. + # for example, to set the preferred minor to 4: + mdadm --assemble /dev/md4 /dev/sd[abc]1 + + # this only works on 2.6 kernels, and only for RAID levels of 1 and above. + # for other MD arrays, you need to specify --update explicitly: + mdadm --assemble --update=super-minor /dev/md4 /dev/sd[abc]1 + + # see also item 12 in the FAQ contained with the Debian package. + + -- martin f. krafft Fri, 06 Oct 2006 15:39:58 +0200 diff --git a/debian/mdadm.docs b/debian/mdadm.docs index 830665f..76cb1f0 100644 --- a/debian/mdadm.docs +++ b/debian/mdadm.docs @@ -1,7 +1,5 @@ TODO -debian/README.recipes -debian/README.checkarray -debian/FAQ ANNOUNCE-* external-reshape-design.txt mdmon-design.txt +debian/local/doc/* -- cgit v1.2.3