Scalability is the ability of system (system==software in the case of computer science) for growing without degradation. That is, if the amount of request (work) increases significatively, then the quality aspects of the software system, particularly the performance, are not impacted negatively; so we add more computing resources easily.
Common scalability scenario are request growing over 30% a month, 500 million page requests a day, a peak rate of ~40k requests per second and ~3TB of new data to store a day.
So it’s important to architect/design system with the scalability in mind in advanced because this is a very important aspect and sometimes it’s very costly to fulfill late in the product lifecycle. Today, scalability is not a difficult constraint because we can grow, in theory, unlimited and very cheaply using cloud computing resources; and the only requirement is to follow a good architecture in mind.
In this article, I will cover several cloud architecture patterns to support scalability. I will follow a logical evolution of the software architecture according to the level of services provided by the product when the workload increases. In order to make concrete the architecture, I will provide examples based on Amazon AWS.
The simplest architecture pattern is to have a web server/application server software stack and database server in the same node (AWS EC2) and database backups are done to a high available storage medium (AWS S3). This infrastructure is in the same availability zone or data center.
Next architecture pattern is for improving the availability, scalability and performance of the system. It´s based on the idea of separation of dynamic-generated data from static data and the underlying processing flow. We need to create/use a virtual volume for data storage (AWS ESB) to store dynamic data (mainly for relational databases) independently of the node (AWS EC2) of the web/application server stack and database server. In the case of failure of the main node, we can instantiate another node (AWS EC2) with the same configuration (web/application/database servers) and mount the dynamic data volume to this new node. Static data is stored in a high available storage medium (AWS S3). We also have to make database and logs rotation backups as well as volume snapshot to the high available storage medium (AWS S3). This infrastructure is in the same availability zone or data center.
Next architecture pattern is for mainly improving performance and availability of the system. It´s based on the idea of setting a content delivery network (AWS CloudFront) for static data (text, graphics, scripts, media files, documents, etc) in order to make closer the data to the final user as well as caching the mostly used, so reducing the latency and throughput when serving content. AWS CloudFront technology is distributed in servers around the world. We need to register two domains: one for accessing dynamic data and the other to accessing static data. This infrastructure for (web/application/database server) is in the same availability zone or data center while AWS CloudFront servers are in different data centers across the world.
Next architecture pattern is for improving the availability, scalability, reliability, security and performance of the system. It´s based on the idea of separation of architecture artifacts and the underlying processing nodes according to their concerns in the whole picture. In order to achieve this goal, we have a multi-layer architecture as described below:
- Front-end layer is the only layer facing to the final user. In this layer, we have our public IPs and registered domain names (AWS Route53 or other DNS provider) as well as the load balancers configured (AWS ELB or other technology such as HAProxy hosted in AWS EC2). Requests are incoming to this point using a secure channel (SSL to protect the data in transit by Internet) and the processing flow is balanced/forwarded according to the workload of the back-end application servers (achieving high availability, scalability and high performance). Load balancers are in different sub-networks than the back-end servers (possible separated by network elements such as routers and firewalls), so if an intruder breaks this layer cannot proceed to the inside (achieving security)
- Application server layer. This layer is the first in the back-end and mainly hosts the farm of web/application servers for processing the requests sent by the load balancers (achieving high availability, scalability and high performance). For request processing, we select a huge variety of web framework stack. For example, a common platform stack might be Apache HTTPD web server in the front serving HTTP requests plus Tomcat application server(s) as servlet container processing business logic. We can also improve the performance in this layer by adding caching technology (AWS Elastic Cache or other technology such as memcached) for caching/storing in memory master data and (pre-)processed results in order to alleviate recurring processing and database server workload. We can also use database connection pool technologies (improving performance) such as Pgbouncer for maintaining open connections (note: it´s very expensive to open connections to database servers) to PostgreSQL servers. I recommend for these servers (nodes in the web farm) to use AWS High CPU Extra Large EC2 machines
- The storage system layer. It´s where our data is persisted and comprises the relational database servers (RDBMS) and storage platform. In order to process a huge amount of transactions from the application layer, we have established a master-slave scheme for the RDBMS, so we have one active (master) server serving the transaction requests and replicating the changes to the slave servers (one or more) at a reasonable frequency to avoid outdated data (achieving high availability, scalability and high performance). The master-slave scheme is well supported by several RDBMS such as Oracle, SQL Server and PostgreSQL. Another configuration to improve the performance is to enable the applications send their database write requests to the master database, while the read requests are directed to a load balancer, which distributes those read requests to a pool of slave databases (note: for applications that rapidly write and then read the same data object, this may not be the most effective method of database scaling). It´s remarkable to say that a master-master scheme is not a good scalable solution because in a multiple master databases, with each master has the ability to modify any data object within the database using distributed transactions which is very costly and locks a lot objects or transaction replications with a latency between masters, and it is extremely difficult to ensure consistency between masters, and thus the integrity of the data. The database server instances must be AWS High Memory Extra Large machines in order to support the workload. And finally, for the storage platform, the idea is to have different volumes (AWS ESB) for each kind of data objects (transactional data, partitioning and sharding data, master data, indexes, transaction logs, external file, etc). For transactional data, we need to provision high level IOPS to improve the requests to data. Static data is stored in a high available storage medium (AWS S3). We also have to make database and logs rotation backups as well as volume snapshot to the high available storage medium (AWS S3). It´s remarkable to format the volume using XFS for making easy the creation of the snapshot and improving performance of the file system
In the general way, we need to separate the dynamic-generated data and static data. The static data is served using CDN mechanisms to reduce the latency and throughput. In order to implement disaster recovery mechanisms and improve the system availability, the idea is to distribute the nodes (load balancer, web/application servers, database servers and memcache servers) in different availability zones or data centers across the world.
The architecture vision is illustrated in the following figure.
Another scalability technique at the database level is data sharding. A database shard is a horizontal partition in a database, that is, take a large database, and break it into a number of smaller databases across servers. This design principle (horizontal partition) whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server.
Let´s illustrate this concept as shown below. The primary shard table is the customer entity. The customer table is the parent of the shard hierarchy, with the customer_order and order_item entities as child tables. The global tables are the common lookup tables, which have relatively low activity, and these tables are replicated to all shards to avoid cross-shard joins. There are design concerns when you architect your data shards:
- Generate and assign unique id to each piece of data
- Shard id based on least used shard and the shardId is embedded in the primary key itself
- Whenever an object needs to be looked up we parse the objectId and get the logical shard id, from this we lookup the physical shard id and get a connection pool and do the query there. For example: ObjectId= 32.XXXXXXX maps to logical shard 32 and logical shard 32 lives physically on shard 4 and shard 4 lives physically on host2, schema5
In this article, I´ve covered the key principles, techniques and architectures related to scalability of software system, specifically from the cloud computing perspective in order to take advantage of this emerging/well-established technology that fits very well when growing our business and the underlying applications.