RAID

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jonathanbrickman0000 (talk | contribs) at 20:22, 20 June 2007 (Standard RAID levels). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Jump to navigation Jump to search

In computing, specifically computer storage, a Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an umbrella term for data storage schemes that divide and/or replicate data among multiple hard drives. They offer, depending on the scheme, increased data reliability and/or I/O performance.

Most schemes are numbered and referred to as levels. RAID began as a collection of five such levels but has since grown to include many more, notably several nested levels and many non-standard levels (most proprietary).

Basic functions

Fundamentally, RAID combines multiple hard disks into a single logical unit. There are two ways this can be done: in hardware and in software. Hardware combines the drives into a logical unit in dedicated hardware which then presents the drives as a single drive to the operating system. Software does this within the operating system and presents the drives as a single drive to the users of the system.

RAID is typically used on servers but can be used on workstations. This is especially true in storage-intensive computers such as those used for video and audio editing.

History

Norman Ken Ouchi at IBM was awarded a 1978 U.S. patent 4,092.732[1] titled "System for recovering data stored in failed memory unit." The claims for this patent describe what would later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that disk mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated parity (that would later be termed RAID 4) were prior art at that time.

The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987. They studied the possibility of using two or more drives to appear as a single device to the host system and published a paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference.[2] This specification suggested a number of prototype "RAID levels", or combinations of drives. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained. This can be confusing, since one implementation of RAID 5, for example, can differ substantially from another. RAID 3 and RAID 4 are often confused and even used interchangeably.

Their paper formally defined RAID levels 1 through 5 in sections 7 to 11:

  • "First Level RAID: Mirrored drives"
  • "Second Level RAID: Hamming Code for Error Correction"
  • "Third Level RAID: Single Check Disk Per Group"
  • "Fourth Level RAID: Independent Reads and Writes"
  • "Fifth Level RAID: Spread data/parity over all drives (no single check disk)"

Standard RAID levels

A quick summary of the most commonly used RAID levels:

  • RAID 0: Striped Set (2 disks minimum) without parity: provides improved performance and additional storage but no fault tolerance from disk errors or disk failure. Any disk failure destroys the array, which becomes more likely with more disks in the array. The reason a single disk failure destroys the entire array is because when data is written to a RAID 0 drive, the data is broken into "fragments". The number of fragments is dictated by the number of disks in the drive. Each of these fragments are written to their respective disks simultaneously on the same sector. This allows the entire chunk of data to be read off the drive in parallel, giving this type of arrangement huge bandwidth. When one sector on one of the disks fails though, the corresponding sector on every other disk is rendered useless because part of the data is now corrupted. RAID 0 does not implement error checking so any error is unrecoverable. More disks in the drive means higher bandwidth, but greater risk of data loss.
  • RAID 1: Mirrored Set (2 disks minimum) without parity: provides fault tolerance from disk errors and single disk failure. Increased read performance occurs when using a multi-threaded operating system that supports split seeks, very small performance reduction when writing. Array continues to operate so long as at least one drive is functioning.
  • RAID 3 and RAID 4: Striped Set (3 disk minimum) with Dedicated Parity, the parity bits represent a memory location each, they have a value of 0 or 1, whether the given memory location is empty or full, thus enhancing the speed of read and write. : Provides improved performance and fault tolerance similar to RAID 5, but with a dedicated parity disk rather than rotated parity stripes. The single disk is a bottle-neck for writing since every write requires updating the parity data. One minor benefit is the dedicated parity disk allows the parity drive to fail and operation will continue without parity or performance penalty.
  • RAID 5: Striped Set (3 disk minimum) with Distributed Parity: Distributed parity requires all but one drive to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive.
  • RAID 6: Striped Set (4 disk minimum) with Dual Distributed Parity: Provides fault tolerance from two drive failures; array continues to operate with up to two failed drives. This makes larger RAID groups more practical. This is becoming a popular choice for SATA drives as they approach 1 Terabyte in size. This is because the single parity RAID levels are vulnerable to data loss until the failed drive is rebuilt. The larger the drive, the longer the rebuild will take. With dual parity, it gives the array time to rebuild onto a large drive with the ability to sustain another drive failure.
  • RAID 10: Mirrored set of striped drives, 4 disk minimum. One drive failure sustainable without loss, with speed benefits over raid 1 from the striping.
  • RAID 50: Effectively two RAID 5 arrays, striping across both. 6 disk minimum. One to four drive failure sustainable without loss, if each failure is in a different RAID 5 array.

Nested RAID levels

Many storage controllers allow RAID levels to be nested. That is, one RAID can use another as its basic element, instead of using physical drives. It is instructive to think of these arrays as layered on top of each other, with physical drives at the bottom.

Nested RAIDs are usually signified by joining the numbers indicating the RAID levels into a single number, sometimes with a '+' in between. For example, RAID 10 (or RAID 1+0) conceptually consists of multiple level 1 arrays stored on physical drives with a level 0 array on top, striped over the level 1 arrays. In the case of RAID 0+1, it is most often called RAID 0+1 as opposed to RAID 01 to avoid confusion with RAID 1. However, when the top array is a RAID 0 (such as in RAID 10 and RAID 50), most vendors choose to omit the '+', though RAID 5+0 is more informative.

Common nested RAID levels

  • RAID 0+1: Striped Set + Mirrored Set (4 disk minimum; Even number of disks) provides fault tolerance and improved performance but increases complexity. The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The Array continues to operate with one or more drives failed in the same mirror set, but if two or more drives fail on different sides of the mirroring, the data on the RAID system is lost.
  • RAID 1+0: Mirrored Set + Striped Set (4 disk minimum; Even number of disks) provides fault tolerance and improved performance but increases complexity. The key difference from RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses as long as no two drives lost comprise a single pair of one mirror.
  • RAID 5+0: A stripe across distributed parity RAID systems
  • RAID 5+1: A mirror striped set with distributed parity (some manufacturers label this as RAID 53)

Non-standard RAID levels

Given the large amount of custom configurations available with a RAID array, many companies, organizations, and groups have created their own non-standard configurations, typically designed to meet at least one but usually very small niche groups of arrays. Most of these non-standard RAID levels are proprietary.

Some of the more prominent modifications are:

RAID implementations

Hardware and/or software RAID based solutions

The distribution of data across multiple drives can be managed either by dedicated hardware or by software. Additionally, there are hybrid RAIDs that are partially software and hardware-based solutions.

Software RAID

Software implementations are provided by most operating systems. A software layer sits above the (generally block based) disk device drivers and provides an abstraction layer between the logical drives (RAID arrays) and physical drives. Software RAID is typically limited to RAID 0 (striping across multiple drives for increased space and performance), RAID 1 (mirroring two drives) and RAID 5 (data striping with parity).

In a multi-threaded operating system (such as Linux, FreeBSD, Mac OS X, Windows NT/2000/XP/Vista and Novell NetWare) the operating system can perform overlapped I/O, allowing multiple read or write requests to be initiated without waiting for completion on each request. This is the capability that makes RAID 0/1 possible in an operating system. However, most operating systems do not support RAID 0/1 striping or mirroring with parity, due to the substantial processing demands of calculating parity[citation needed].

Software implementations require some very small amount of processing time, which is provided by the main CPU in the host system. Since SCSI, PATA, and SATA drives all support asynchronous read/write, any multi-threaded operating system can support non-parity RAID on multiple hard drives with only a one percent increase in CPU overhead[citation needed].

Software implementations can exceed the performance levels of hardware-based RAID due to the high-performance of modern CPUs[citation needed]. Since the software must run on a host server attached to storage, the processor (as mentioned above) on that host must dedicate processing time to run the RAID software. Like hardware-based RAID, if the server experiences a hardware failure, the attached storage could be inaccessible for a period of time.

Software implementations can allow RAID arrays to be created from partitions rather than entire physical drives.

Hardware RAID

A hardware implementation of RAID requires at a minimum a special-purpose RAID controller. On a desktop system, this may be a PCI expansion card, or might be a capability built in to the motherboard. In industrial applications the controller and drives are provided as a stand alone enclosure. The drives may be IDE/ATA, SATA, SCSI, SSA, Fibre Channel, or any combination thereof. The using system can be directly attached to the controller or, more commonly, connected via a SAN. The controller hardware handles the management of the drives, and performs any parity calculations required by the chosen RAID level.

Most hardware implementations provide a non-volatile read/write cache which, depending on the I/O workload, will improve performance. Cached RAID controllers are most commonly used in industrial applications.

Hardware implementations provide guaranteed performance, add no overhead to the local CPU complex and can support many operating systems, as the controller simply presents a logical disk to the operating system.

Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.

Hybrid RAID

Hybrid RAID implementations have become very popular with the introduction of inexpensive RAID controllers, implemented using a standard disk controller and then implementing the RAID in the controllers BIOS extension (for early boot-up/real mode operation) and the operating system driver (for after the system switches to protected mode). Since these controllers actually do all calculations in software, not hardware, they have almost all the disadvantages of both hardware and software RAID. Like hardware RAID, they are typically proprietary to a given RAID controller manufacturer and typically cannot span multiple controllers. The only advantages over software RAID are that the BIOS can boot from them, and the tighter integration with the device driver may offer better error handling.

Both hardware and software implementations may support the use of hot spare drives, a pre-installed drive which is used to immediately (and almost always automatically) replace a drive that has failed. This reduces the mean time to repair period during which a second drive failure in the same RAID redundancy group can result in loss of data. It also prevents data loss when multiple drives fail in a short period of time, as is common when all drives in an array have undergone very similar use patterns, and experience wear-out failures.

Reliability of RAID configurations

Whether RAID provides more reliability than mirrored drives has been disputed[3].

Failure rate
The mean time to failure (MTTF) or the mean time between failure (MTBF) of a given RAID may be lower or higher than those of its constituent hard drives, depending on what type of RAID is employed.
Mean time to data loss (MTTDL)
[4] In this context, the average time before a loss of data in a given array.
Mean time to recovery (MTTR)
In arrays that include redundancy for reliability, this is the time following a failure to restore an array to its normal failure-tolerant mode of operation. This includes time to replace a failed disk mechanism as well as time to re-build the array (i.e. to replicate data for redundancy).
Unrecoverable bit error rate (UBE)
This is the rate at which a disk drive will be unable to recover data after application of cyclic redundancy check (CRC) codes and multiple retries. This failure will present as a sector read failure. Some RAID implementations protect against this failure mode by remapping the bad sector, using the redundant data to retrieve a good copy of the data, and rewriting that good data to the newly mapped replacement sector. The UBE rate is typically specified at 1 bit in 1015 for enterprise class disk drives (SCSI, FC, SAS) , and 1 bit in 1014 for desktop class disk drives (IDE, ATA, SATA). Increasing disk capacities and large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a RAID group after a disk failure because an unrecoverable sector is found on the remaining drives. Double protection schemes such as RAID 6 are attempting to address this issue, but suffer from a very high write penalty.
Atomic Write Failure
Also known by various terms such as torn writes, torn pages, incomplete writes, interrupted writes, non-transactional, etc. This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple" during the early days of relational database commercialization. However, this warning largely went unheeded and fell by the wayside upon the advent of RAID, which many software engineers mistook as solving all data storage integrity and reliability problems. Many software programs update a storage object "in-place"; that is, they write a new version of the object on to the same disk addresses as the old version of the object. While the software may also log some delta information elsewhere, it expects the storage to present "atomic write semantics," meaning that the write of the data either occurred in its entirety or did not occur at all.
However, very few storage systems provide support for atomic writes, and even fewer specify their rate of failure in providing this semantic. Note that during the act of writing an object, a RAID storage device will usually be writing all redundant copies of the object in parallel, although overlapped or staggered writes are more common when a single RAID processor is responsible for multiple drives. Hence an error that occurs during the process of writing may leave the redundant copies in different states, and furthermore may leave the copies in neither the old nor the new state. The little known failure mode is that delta logging relies on the original data being either in the old or the new state so as to enable backing out the logical change, yet few storage systems provide an atomic write semantic on a RAID disk.
Since transactional support is not universally present in hardware RAID, many operating systems include transactional support to protect against data loss during an interrupted write. Novell Netware, starting with version 3.x, included a transaction tracking system. Microsoft introduced transaction tracking via the journalling feature in NTFS.
To mitigate this problem, some high-end RAID cards use a battery-backed write cache. If an "atomic" write only partially completes because of power failure, the controller flushed the unwritten data to disk when the power is restored. Some provide the capability of testing the battery periodically (however, this leaves the system without a fully charged battery for several hours). This solution still has potential failure cases: the battery may have worn out, the power may be off for too long, the disks could be moved to another controller, the controller itself could fail in the middle of a write, etc. However, on a well-maintained machine it probably prevents corruption in most incidents.

See also

References

  1. ^ U.S. patent 4,092,732
  2. ^ Patterson, David (1988). "A Case for Redundant Arrays of Inexpensive drives (RAID)" (PDF). SIGMOD Conference. pp. pp 109–116. {{cite conference}}: |pages= has extra text (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help) retrieved 2006-12-31
  3. ^ Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? Bianca Schroeder and Garth A. Gibson
  4. ^ Jim Gray and Catharine van Ingen, "Empirical Measurements of Disk Failure Rates and Error Rates", MSTR-2005-166, December 2005

White papers

There has been a significant amount of research done into the technical aspects of this storage method. Technical institutions and involved companies have released white papers and technical documentation relevant to RAID arrays and made them available to the public. They are accessible below

Operating system-specific details

If you would like more information detailing the deployment, maintenance, and repair of RAID arrays on a specific operating system, the external links below, sorted by operating system, could prove useful.