Raid
4.What is Striping?
Striping improves the performance of the array by distributing
the data across all the drives. The main principle behind
striping is parallelism. Imagine you have a large file on
a single hard drive. If you want to read the file, you have
to wait for the hard drive to read the file from beginning
to end. Now, if you break the file up into multiple pieces
and distribute it across multiple hard drives, you have
all these drives reading a part of the file at the same
time. You only have to wait as long as it takes to read
each piece since the drives are working in parallel. The
same is true if you were writing a large file to a disk.
Transfer performance is greatly increased. The more hard
drives you have, the greater the increase in performance.
The number of drives is also the same as the stripe width,
that is the number of simultaneous stripes that can be transferred
simultaneously.
Every piece of data that comes into the RAID controller
is divided into smaller pieces. There are two levels of
striping that use different techniques to divide the data,
byte level and block level striping. Byte level striping
involves breaking up the data into bytes and storing them
sequentially across the hard drives. For example, if the
data is broken into 16 bytes and there are 4 hard drives,
the first byte is stored in the first hard drive, the second
byte in the second drive, and so on. The fifth byte is stored
in the first hard drive and the cycle continues. Block level
striping involves breaking up the data into a given size
block. These blocks are then distributed the same way across
the array as in byte level striping. The size of these blocks
is called the stripe size. A variety of stripe sizes are
usually available depending on the RAID implementation used.
The stripe size is a largely debated topic. There is no
ideal stripe size but certain sizes work best with certain
applications. The performance effects of increasing or decreasing
stripe size are apparent. Using a small stripe size will
enable files to be broken up more and distributed across
the drives. The transfer performance will increase due to
the increased parallelism. However, this also increases
the randomness of the position of each piece of the file.
Using a large stripe size will do the opposite of decreasing
the size. The data will be less distributed and transfer
performance is decreased. The randomness is decreased as
well. The best way to find out the right stripe size for
your particular application is to experiment. Start out
with a medium stripe size and try decreasing or increasing
the size and recording the difference in over-all performance
How striping works: The data file that comes in is broken
up into blocks and distributed across the drives. If you
had more hard drives, each block would have been distributed
to those as well. Now if you want to move or transfer the
file somewhere, the controller accesses both drives simultaneously,
which is where the performance gain kicks in. It only takes
half the time to transfer the file. If you increase the
number of hard drives, the file will be transferred in a
fraction of a second as opposed to the time it takes to
transfer from 1 hard drive.