Building  Fault Tolerance in Your Computer Establishment

 

Chinmaya S. Rathore

 

This article stresses on the need for computer establishments to pay special attention to protection of their organisational data against potential loss. RAID Technology and Two Data Backup Mechanisms are discussed in this context. 

 

 

 

Is our Data Safe ?

 

Over the past few years, a number of forest departments, environmental organizations  and Non Government Organizations in India have taken initiatives to develop IT infrastructure and IT related applications. In many cases, maps have been painstakingly digitized and databases have been created putting in considerable effort and investment. It is however commonly observed that while millions are spent in equipment, training, software and 'data automation', there is hardly any investment or planning towards 'data protection'. It is the responsibility of the people managing computer establishments to ensure that data, which forms the basic building block of the organisation's information resource, remains protected and safe at all times. Most people relegate this important function to an infrequent routine of copying a few files every now and then to floppy or CD and deriving some sense of security from this operation. A poor perception of threat and lack of knowledge makes many believe that they are immune from such problems and that there is no real need for them to invest in implementing a data security plan. When disaster strikes, consequences are usually enormous involving service disruptions, loss of work, loss of data, network down times and severe embarrassment.

 

It is important to realise that we are living in an era of 'information violence'. The more we use computers, the more susceptible we are to experience this violence in some or the other form. Computers, like any other electronic machine, can also suffer damage. The key question to ask therefore is -- "Is my computer establishment resilient enough to completely recover from a data loss situation? ". If the answer to this question is no, you need to invest time and money to ensure that your data is secure.

 

 

Threats to your data

 

Threats to data can come from many known and unknown quarters. Some of the more important threat perceptions include:

 

§         Damage to data or software by a computer virus.

§         Deliberate deletion by a disgruntled employee.

§         Fire in the building where data resides.

§         Damage to the building by an Earthquake.

§         Failure of Hard disk/disks on which data is stored.

§         Theft of computer equipment.

§         Hacking

§         Accidental deletion or disk formats.

 

Two other issues briefly mentioned earlier also need to be elaborated. One of the issues pertains to those computer installations, which have networked computers in client-server architecture. In such cases, if for some reason there is a hard disk failure on the server, the entire network experiences a disruption of services (like email, Internet) that the server might be providing to the user community.

 

The second issue pertains to the satisfaction and security that many administrators derive from the fact that they are regularly taking backups of their data and that they are safe. The problem is that although backups might be taken regularly and stashed away, the media on which backups are taken might itself suffer damage due to storage or repeated use (like fungal attacks or layers of tape getting stuck together etc) making it non-usable at the time of crisis. As backups come into play only when a system crashes, it is important that the backup media is valid and alive for an unknown day of crisis whenever it arrives.

 

The bottom line therefore is that those responsible for running computer systems need to ensure that their data is well protected and that their systems are fault tolerant. This can be done by purchasing special hardware that can provide such capabilities and/or by adopting systematic procedures that can guard against data loss.

 

In the remaining part of this article, I briefly discuss the RAID Technology for developing a fault-tolerant system. The discussion has been simplified to transmit important points with reference to the RAID concept omitting any surrounding technical flab. Two systematic procedures for taking routine backups are also discussed. It would be however worthwhile to point out that a complete data protection plan may involve inclusion of other components that have not been discussed in this article.

 

 

 

Some Solutions

 

What is RAID ?

 

RAID stands for Redundant Array of Inexpensive or Independent Disks. The RAID technology has its origins in 1987 at the University of California at Berkeley when a paper by Patterson, Gibson, and Katz reported that if in place of one large expensive hard disk, as commonly used in computers, an array (or a stack ,typically two or more) of hard disks were used, the performance of the system and data reliability could be substantially improved.  To the computer system, this array of disks would continue to appear as a single large drive.

 

After this paper, RAID technology caught on to become a multi-billion industry. Currently, many computer manufacturers and third party vendors offer RAID.

 

 

What is the Advantage of RAID ?

 

RAID offers the following two principle advantages as against a single disk

 

§         Improved performance of the computer system resulting due to faster read and writes to and from the array of disks (explained later).

 

§         Protection against data loss, higher data reliability and availability.

 

In what situations can RAID be Useful ?

 

In a typical Indian computer establishment context (also applicable in a wider context), RAID technology would be useful on a SERVER in a computer network. As clients access the server for storing and retrieving information, a properly configured RAID enabled server should be able to access information faster and thus improve total performance. The RAID enabled server would also provide a higher data security and fault tolerance to the Network. In application environments like GIS, digital image processing and multi-user RDBMS where a large amount of data access is involved, an appropriate RAID configuration can result in dramatic improvements in system performance at the same time providing protection from potential data loss.

 

Does RAID require additional Hardware and Software?

 

Although some levels (discussed later) of RAID can be implemented through software i.e. without involving any additional hardware, extra hardware is usually required  and recommended to implement  RAID if full benefits are to be realized.

 

Major additional RAID elements include:

 

§                     A RAID Controller card/cards with a high cache memory (like 16 MB).

§                     Additional Hard disks as dictated by the RAID level that will be used.

§                     RAID management software.

 

The RAID hardware can be purchased at any time and added to the system provided the server motherboard supports a RAID controller card, a fact that should be verified at the time of purchasing the server.

 

However, it is best that if installation of RAID has been assessed as a essential requirement, the RAID hardware be procured at the time of buying the server.  For efficient use, RAID systems require to be configured properly which is done by looking closely at the application (like an Oracle based application or digital image processing application etc.) for which the server will be used most of the time. Although vendors may make you believe that setting up RAID is a child's play involving plugging in a few cards and cables, it is usually much more difficult than that. An understanding of RAID technology becomes essential in interacting with your vendor and ensuring that you have an efficient RAID system for your network.

 

There was a time when people used to be obsessed with the speed of their computer processor (people still are) as the absolute gateway to better overall system performance. In large computer establishments today, the stress is on building efficient and reliable disk storage systems which can have dramatic effects on the total network performance. 

 

What are different RAID levels?

 

The RAID technology defines different levels of RAID from 0 to 6. Some manufacturers define a RAID 7 level also. However, before I go on to explain what different levels of RAID mean, I would like to make one point very clear.  In the computer industry, in general, higher numbers associated with software versions or processors usually convey to the user or buyer that the product having a higher number is  'better'. This is not the case with RAID levels.  In other words, RAID 5 may or may not be superior to RAID 3 just because RAID 3 is less than RAID 5.  Each RAID level instead indicates a scheme of reading and writing on a array of disks offering some advantages and disadvantages over the other.  The choice of a RAID level therefore depends largely on whether you want high disk performance, high data protection or maybe both.

 

I will now venture out to explain in very simple terms what some of the popular RAID levels mean. This discussion should also elucidate how the RAID technology achieves faster performance and higher data protection. In the following explanations, I presume that there is an array of four (4) disks in the computer as against one.

 

RAID Level 0

 

In RAID 0 performance is maximized at the cost of data protection. This is done by a technique called striping  in which data is spread across disks (figure 1). Once data is striped on disks, whenever it is required , data is accessed from  all disks at the same time in parallel. This opposed to recovering data from one disk. To take a simple conceptual example, if  a file is 2048 bytes long and the size of each data block is 512 bytes, in a RIAD 0 arrangement , data constituting this file  will be spread across all four disks as data block 1 , data block 2 , data block 3,data block 4. When this file has to be retrieved, block 1 from disk 1 , block 2 from disk 2 , block 3 form disk 3 and block 4 from disk 4 are retrieved from each disk all at the same time. If this data was residing on one disk , it would take considerably longer to get all four blocks.

 

Access times would in such a case would not exactly reduce by 4 times but something nearly quite like that. RAID 0 can benefit very large file access considerably as data would be striped across many disks. The total disk storage in this case is sum of the capacities of each drives ( e.g. if each disk is 4 GB , total  storage is 16 GB).

 

RAID 0 provides no data protection as in the event of failure of any one of the disks in the array , there is no way to get back lost data. In our example above , if disk 3 fails in the  following RAID 0 arrangement ,  data block 3 of our file would be lost and there is no way that it can be recovered using information stored on other disks.


 

 

 


Figure 1 : A Conceptual diagram of   RAID 0

 

 

RAID Level 1

 

In RAID 1, data on one disk is duplicated or "mirrored" across other disks in the array providing considerable data redundancy and protection. Infact this arrangement can be done with just 2 disks. In implements of RAID 1 where copies of the first disk are kept on all others (not in pairs), all except one disks in the array can fail and still the entire data will remain available. Performance is however a casualty and the disk space of the array is limited to the capacity of 1 or 2 disks, and in the case of disks having unequal capacity,  to the disk having the smallest capacity. Usually all disks in the RAID 1 arrangement are of the same capacity.


 

 


Figure 2 : Schematic figure for RAID 1

 

A hybrid level called RAID 0+1 (also wrongly called RAID 10) combines striping and mirroring attributes of both RAID 0 and RAID 1 is very popular and is commercially supported.

 

RAID Level 3

 

In RAID level 3, data is striped across disks with one of the disks holding parity information for the other disks. In a four-disk array like the one we have been using in earlier examples, the first three disks in a RAID 3 arrangement will hold blocks of data while the fourth disk will hold parity information on these blocks. In other words, in a RAID 3 arrangement, one of the drives is dedicated to hold parity information for the other drives. Whenever data is written on the first three drives in our example, parity information about this data is first calculated and then written on to the fourth drive. In the event that one of the disks fails, data on that disk can be reconstructed using data on other disks and the data on the parity disk. The RAID 3 is different from RAID 1 on two accounts - (a) that unlike RAID 1 but like RAID 0, RAID 3 stripes or distributes data across multiple disks adding to higher performance and (b) unlike RAID 1, where a carbon copy of the data is maintained on two or more drives, only parity data ( a very clever counting scheme which can help reconstruct data) is stored on a dedicated drive (figure 3).


 

 

 

 


Figure 3: A conceptual diagram of RAID 3

 

 

If curiosity has arisen to know what is parity, it is justified because you cannot really understand RAID 3 or for that matter RAID 5 without knowing a thing or two about parity. At the risk of being categorised as a clinical nerd by the forestry community, I will however venture out to explain what is parity and then the data reconstruction mechanism that RAID employs using parity. In the following discussions, although we will be entering somewhat deep computing waters, I have tried to keep explanations as simple as possible.

 

The computer measures memory in units called bytes.  When you type or press a "A " on your keyboard (or for that matter any button that you press on the keyboard), you consume 1 byte of memory. How does the computer store this "A" ?  The computer uses the binary numbering system to hold the "A". In the binary  numbering system as you all probably know , numbers are written using 0's and 1's. As 0 can represent 'OFF' and 1 can represent 'ON', the binary number system provides a format that is amenable to create electronic circuit equivalents of these numbers. Every key on the keyboard has a number code. The code for A (capital A) is 65.  When you press A, its code 65 is picked up and converted to its binary equivalent which is 0 1 0 0 0 0 0 1.  A byte as you can see, is made up of 8 binary digits or bits. This pattern of ON;s and OFF's is then held by the circuitry of the computer till required.

 

Parity is a clever scheme that ensures that data (i.e. each byte of data) when transmitted via a modem to another computer or in the case of RAID, when written to a number of hard disks is transmitted or written correctly. It is a kind of checking mechanism that ensures that data has arrived or has been written correctly.  Let us see how this works. Parity is calculated by counting the number of 1s in the byte. If this count sums up to an even number, a ninth bit for every byte, called the parity bit is added to store a value of 1 , making the total of 1s in the 8 data bits and the ninth parity bit an odd number.  In our example of the byte representing  "A", we see that its binary equivalent 0 1 0 0 0 0 0 1 has 2  1s(ones) in it.  As 2 is an even number so a 9th parity bit is added which is set to a value of 1 making the total number of 1s in the 8 data bits and the 1 parity bit equal to 3 , which is an odd number.  However if a "C" was typed (code 67), its binary equivalent 0 1 0 0 0 0 1 1  has 3 1s (ones). As 3 is an odd number , a 0 is stored in the parity bit making the total of  8 data bits and the parity bits an odd number. This is shown below

 

 

Data

Bits

Parity Bit

'A'

0 1 0 0 0 0 0 1

1

'B'

0 1 0 0 0 0 1 1

0

 

 

 

 

 

In other words, the parity bit ensures that the sum total of all 9 bits will always remain a odd number. This is called Odd parity. You can also employ an  Even parity scheme where the sum total of all data bits will always be an even number by reversing the above parity logic and setting the parity bit to 0 if 1s in the data bits total to a even number and to 1, if they total to an odd number.

 

What is the advantage of storing the parity bit for every byte in this fashion? Well after storing bytes in this fashion, when the system reads back that byte from the disk, it checks parity information for that byte. If 1s in that byte (8 bits) total up to an odd number with the parity bit set ting of 1, then that byte must have an error (assuming that odd parity is being used) because the total of all nine bits is an even number. In this way, the system can ensure data integrity. In the case of a modem transmission, the parity bit is added at the sending end and checked at the receiving end.  If the parity is in error, a request is sent to the sending end to retransmit the byte.

 

In RAID systems, just the detection of error using parity is not good enough. The system should be able to reconstruct data in the event of a loss using parity information.

 

So in RAID systems, parity information is stored at a much grander scale. As described earlier , RAID 3 uses 1 drive to store parity information. Here I use a very simple example to give a possible glimpse of how parity information can be used to reconstruct data. Let us say that we have a 4 drive RAID array. Let us assume that bytes representing "A" , "B" and "C" are striped on the first three disks. Using the odd parity scheme,  a corresponding byte of parity information is written to the parity disk  where each bit in the parity byte is parity information of the same bit on the three bytes on the other disks. This is shown below:

 

 

Disk 1

Disk 2

Disk 3

Disk 4

0 1 0 0 0 0 0 1

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 1

1 0 1 1 1 1 1 1

  

 

 

The parity byte has been written using the odd parity scheme described above but the individual values come by counting the number of 1s in the corresponding bits  on the three drives. For example, the first value (from left) in the parity byte is 1. It is 1 because the total number of 1s in the first bit position in the other bytes ( 0,0,0) is 0 and to make the total of the three bits and the parity bit into an odd number , the parity bit at that position is set to 1. Likewise, parity information is calculated and stored for other bit positions and written to the parity drive (disk 4). The number of 1's at the next bit position is 1,1,1 or 3 which is a odd number and so a 0 is set at that position in the parity bit.

 

Suppose drive 2 fails and we remove that defective drive and put a new drive,  the RAID , which is in a degraded state  now starts reconstructing data on disk 2 using information on the other drives and the parity drive. It counts the number of 1s on the first bit position from bytes of Disk 1 and Disk 3 ( 0 0 ) and then look at the parity information for that bit position  which at the time of writing was set to 1. As we use an odd parity scheme. It knows that this value could have been written as 1 only if  the value on the first bit on disk 2 was a 0 as when all three would have been totalled, the parity bit must have been set to 1 to make the sum an odd number. So it reconstructs the first bit on drive 2 as 0. Likewise when we look at bit 2 ,  we see that the number of 1s on drive 1 and 3 are equal to 2 an the parity bit for bit 2 is set to 0. As the total of 1s is 2 and the parity bit is set to make the total odd and contains a 0, the lost data on bit must have been a 1 such that the total of the data bits and the parity bit was a odd number (3 + 0).

 

In this manner in RAID 3, parity information is used to reconstruct lost data while striping ensures a very good performance.  The above example is a very simplified example just to illustrate a concept. In reality, the parity checking and writing process gets a lot of inputs from the hardware, which helps detect errors and reconstruct data. 

 

One of the drawbacks in RAID 3 is that as it has a high write overhead. What this means is that suppose a data block is re written one of the drives, the parity for that block on all drives has to recalculated and rewritten to the parity drive. This is a time overhead for a write operation. As only one drive is dedicated to store parity, things choke up considerably.

 

RAID 3 is therefore more suitable for high- read and low- write applications such as data warehouses etc. Once data is written, if most of the time it is to be read, reads are faster on RAID 3.  The RAID 3 configuration can bear the loss of one drive from the array.

 

RAID 5

 

RAID 5 is the same as RAID 3 except that instead of writing parity data to only one disk,

RAID 5  stripes it evenly across all disks (figure 4).  This considerably reduces

 write overheads as in RAID 3 and improves significantly improves performance.

 


 

 

 


Figure 4 : Conceptual Diagram of RAID 5 with Distributed Parity.

 

 

Raid 5 is the most versatile and popular of all RAID systems. Other commercially available RAID systems include 0, 1 and  0+1. A number of vendors like Compaq , Dell Sun, etc. provide a wide array of RAID solutions . The concepts discussed above for all RAID systems however are common.

 

 

A number of  hardware issues also go along with the implementation of RAID . These pertain specifically to the number of controllers, the cache on controller, battery backups on controller cards and fibre or SCSI channel connectivity that have a major impact on performance.

 

 

Backing Up your Data

 

In addition to developing fault tolerance in your system through techniques like RAID, it is important that computer installations develop a comprehensive backup system to secure their valuable data. These backup systems are usually established using one of the many available tape rotation schemes. A good system of backups must ensure the following :

 

      a) earlier versions of data are available.

      b) used tapes leave the rotation and are archived.

      c) movement of a fairly current backup copy to a off-site  location.

 

You can purchase DAT drives for a very reasonable cost that have capacities such as 4 GB or 8 GB and on which you can dump your entire hard disk at one shot. These DAT drives use small DAT tapes (having capacities like 4GB etc) which cost  around 500 - 800 Rupees and  provide  very economical storage. The use of the word "tape" in the following discussion refers to such a storage medium as DAT. It can also apply to CD's, ZIP diskettes or on a smaller scale, to floppies.

 

I describe here two tape rotation schemes that can help IT establishments systematically back-up and secure their data.

 

The Grandfather Method of Tape Rotation

 

In the grandfather method of rotation, a yearly backup plan requires the use of 20 tapes. This system of rotation provides full daily, weekly and monthly backups. What this effectively means is that if on Tuesday morning, your computer hard disk fails in an unrecoverable manner, you can get a new hard disk, and load the Monday backup tape without loosing any work at all.

As this scheme also provides for off location movement of tapes, it ensures that data you can get back your data even if your computer centre catches fire and everything is destroyed in it.

 

The Grandfather Method is described in sequenced steps

 

1. Get 20 New Good Quality Tapes ( or CD's).

 

2. Label the Tapes in the following manner

 

4 Tapes (Daily Tapes)-  one tape each for the first four days of the week to be labelled as Monday , Tuesday, Wednesday, Thursday .

 

4 Tapes (Weekly Tapes)- one each for every Friday of the month labelled as Friday-1, Friday-2, Friday-3, Friday-4.

 

                 12 Tapes (Monthly Tapes) - one each for every month labelled as January,   February, March.....,December.

 

 3. At the end of each day , a backup of the entire disk or the  data directory is taken on the tape designated for that day.

 

4. On the first Friday, the Friday-1 tape is used to backup data.

 

 5. Assuming a five day week, when you come next Monday i.e. week 2 of the month, you take the Monday's backup on the Monday tape (erasing last Monday's data) Tuesday tape for Tuesday and so on till on Friday, you use the Friday-2 tape.

  

    Every week on Friday, the tape of the last Friday is taken off the site ( Example on Friday of week 2 , when back up is  being taken on Friday-2 tape, Friday-1 tape is taken away  from site to some other location).On Friday 3, the Friday-2  tape is sent off and the Friday-1 tape is brought back

 

    In short during a month, the weekly tapes (M,T,W,Th) are  recycled while the Friday     tapes a not recycled.

 

6.  On the last Friday of the month i.e. Friday 4 (LAST FRIDAY OF THE MONTH), a backup is taken on the Friday-4 tape and a  copy is made on the January tape also which is archived.

 

7. The process described in steps 1 to 6 continues for the other months of the year. 

 

As can be seen, in the event of a data disaster, you loose work only on the day of the disaster. The system maintains multiple copies of data and so in the event that tapes do not work for some reason you can still fall back on earlier dates and get back data.

 

One problem with the Grandfather Method is that it uses daily tapes very frequently. Media has a life and this frequent use of tapes can wear them out . So it is recommended that the daily tapes be replaced every 4 to 5 months (less than 60 writes) and the Friday tapes once a year. The Grandfather method is very easy to understand and therefore easy to use.

 

If you feel that taking a backup every day is too much of a bother , then you can alter the time table to once in two days or may be a week depending on the amount of loss you can bear in the event of a catastrophe. If you work on a Saturday  and in the event that a month has 5 Fridays, you will need extra tapes in the same scheme to accommodate these extra days.

 

The Saxon Method of Tape Rotation

 

The Saxon method in an alternative to the Grandfather Method and is just a bit slippery to grasp. However , the Saxon method, unlike the GF method has a more even use of tapes and has a neat ordering scheme. This method has given very good results for keeping long term backups and archives.

 

In the Saxon method, we divide the year into 52 weeks where week 1 is the first week of January and week 52 is the last week of December. This method uses 8 tapes initially and then adds one tape every month totalling about 20 tapes in 1 year. This method, unlike the Grandfather method, stores an archive every 4 weeks and not at the end of the calendar month. In words a back up is taken exactly at the end of every four weeks whether or not that date is the end of the month.

 

The implementation of the scheme is described in the following steps:

 

1.  Take 8 tapes and label them from 1 to 8 i.e. first tape labelled as 1 and the 8th tape labelled as 8.

 

2.  As we start our backup process, at the end of the first day (Monday) of week one, take your data backup on tape number 8. On Tuesday, you use tape number 7 to take backup of your data, on Wednesday tape 6, on Thursday tape 5 and tape 4 on Friday.

 

3. When work starts again on Monday on week two, the highest numbered tape from last week's backups i.e. tape number 8 is taken off site. When work finishes on the first day of week two, backup is taken on Tape number 7 (erasing last Tuesday's backup). On Tuesday we, use tape 6, Wednesday tape 5 , Thursday tape 4 and Friday  tape 3.

 

4. When work starts, on Monday of week three, the highest numbered tape in last weeks backups i.e. tape 7 is taken off site and tape 8 brought back and kept on site.

 

5. On close of Monday week three, we take backup on tape number 6, Tuesday on 5, Wednesday on 4, Thursday on 3 and Friday on 2.

 

6. When work starts on Monday of week 4, the highest number tape in last weeks rotation, tape 6 , is sent off site and tape seven is brought back and kept. As work comes to a close on Monday of week 4, backup is taken on tape 5, Tuesday on tape 4, Wednesday tape 3, Thursday tape 2 and Friday Tape 1.  After this backup is taken on tape 1, tape 1 leaves the rotation and is archived. Tape 5 which is the highest numbered tape from last week goes off site and 6 comes back

 

7.  When work starts on Monday of week 5, a new tape number 9 is introduced to take backup at the end of the day. Tape 8, which has come back is used on Tuesday (erasing old data), Wednesday Tape 7, which has also come back  is used, Thursday tape 6 is used, and on Friday , tape 5 is called back and used and 9 sent off site.

 

In this manner the rotation continues. In the event of a loss of data on a day when tape 3 was to be used for backup, the recent backups are 4,5,6,7,8 and 9 in that order.

 

I have explained this process in detail such that the reader has a clear understanding of how this system works. Shorter explanations can be very confusing, as the ordering scheme is difficult to visualise. The Saxon method is summarised as through a schematic diagram in figure 5.

 

 

Figure 5: Saxon Method of Tape Backup (Currid and Saxon,1995)

 

Week Day

Week 1

Week 2

Week 3

Week 4