As part of my role as a Chief-Architect, in these days I’ve been involved in the architecture definition of a complex solution. I had to analyze the scenarios related to the solution from the functional and non-functional perspective and the more relevant aspects to take care in this case are performance, scalability and availability (traditional architectural quality attributes).
From the point of view of the application, I’ve architected the solution using multi-layer architecture and analyzed which components were feasible to be deployed in their own execution process as well as the provisioning of clustering environment for the application deployment and other design strategies to fulfill these architectural quality attributes. These strategies are out of the scope of this article because I would like to talk about the architectural strategies (related to the storage system) that I’ve envisioned for this particular solution.
From the point of view of the storage system, I have architected an enterprise storage solution based on the best practices in the industry which I will be describing next in this article.
To support the physical layer, I have made the decision of the development of a storage area network (SAN) comprising of an array of storage devices (disks, tapes, servers, etc) connected by using Fiber Channel technology. Summing all of these devices, we end up with a very big physical storage (see Figure 1).
In order to start supporting performance and availability attributes at the logical level of the storage architecture, on top of the previous big physical storage, I have made the decisions to create logical drive using RAID technology to replicate data (redundant data between disks) by combining multiple physical disks. Another step is to setup logical drives with different RAID levels depending the level of redundancy and performance that we´ve been willing to achieve. We´ve setup RAID1+0 logical drives (logical_drive_01) for transaction data (write-oriented operations over the data) because it has better write performance. As well, we´ve setup RAID5 logical drive (logical_drive_02) for other type of data (balanced read- and write-based operations over the data). It´s remarkable to say that each logical drive has a spare physical disk as fail-over mechanism (see Figure 1).
Now that we have setup the logical drives is the time to create the partitions or volumes. I have made the decision to create a LUN for each data nature on top of the RAID logical drives, that is, transactional data (LUN1 on logical_drive_01), master data (LUN2 on logical_drive_02), indexes (LUN3 on logical_drive_02), files (BLOBs) (LUN4 on logical_drive_02), log files (LUN5 on logical_drive_02) and temporary data (LUN6 on logical_drive_02) (see Figure 1). Of course, each LUN must be formatted using a comprehensive file system. In this case, the solution will be deployed in Windows environment, so the decision is to format the LUN using NTFS.
Now it’s time to map the storage artifacts on the RDBMS (in this case MS SQL Server 2008 R2) with the structures of the storage and file systems (SAN+RAID+LUN+NTFS). From the RDBMS perspective, I have made the decision to separate the data structures (tables, indexes, logs) in different Filegroups according to their responsibility in the database. Filegroup is the concept related to mapping the storage mechanism of the RDBMS to the underlying storage system following the principle that the data structures (tables, indexes, logs) live inside a Filegroup while a Filegroup is related to a group of files (as its name implies) in the storage system.
I’ve defined six filegroups and the underlying association to LUNs as described below (see Figure 2 and Figure 3):
- PRIMARY: to store all the catalog of the database on LUN1
- TransFG: to store transactional data and related partitions on LUN1
- MasterDataFG: to store all the reference data for my application (master data) or cached (imported) from external data sources on LUN2
- IndexFG: to store the indexes related to the database on LUN3
- FileStreamFG: to store files and BLOBs referenced from the database on LUN4
- LogsFG: to store the log files associated to the database on LUN5
Finally, I’ve consulted to move the TempDB (used in MS SQL Server for supporting operations) database to the LUN6 where the temporary data resides (see Figure 2 and Figure 3).