In these days, I’ve been thinking about basics courses when I was in the university studying computer science. There are some basics that IT professionals tend to forget because they think that the lessons learned are far away from everyday work. Paradoxically, there are a lot of lessons which are very closed to the reality with new IT challenges today. One of these forgotten but important lessons is algorithm complexity.
Today, when we (as software/solution/information architect and application designer) are normally faced with the need to crunch a big volume of data almost in real time using the most cost-effective way to make business decision right now.
The knowledge of algorithm complexity is mandatory in order to solve this kind of problems and make right decisions to impact positively in the application and the business. For example, if we have the ability to understand and choose the right data structure and algorithm; we can scale (growth without incrementing significantly the cost and effort) our software application without degradation when the volume of data or transactions increases; otherwise, we end up adding unnecessary computing resources.
Even today when computing resources tends to be cheap thanks to the cloud computing; from the business point of view it’s always necessary to reduce cost in order to save money and be more efficient and competent. In other words, it’s not the same to process a chunk of data with a cluster of 15 nodes than a cluster of 1000 nodes when we/our customers have to pay the bills.
Let me explain the principles around algorithm complexity and how it can help to save money to the business. The time complexity is used to represent how long an algorithm will take for a given amount of data. The mathematical expression to represent this notion is known as big O notation. What’s that? Well, big O notation is expressed by a mathematical function describing/providing as output how many operations an algorithm needs for a given amount of input data. Regarding to the mathematical function, the number of operations may/may not increase as the amount of input data increases.
The common used mathematical functions are:
- O(1). This is a constant complexity. It costs one operation no matter the amount of input data. For example, to lookup a data item in a hash table or to lookup a data item by its position in an array
- O(log(n)). The number of operations stays low even with a big volume of data. For example, to search a data item in a well-balanced binary tree when using a binary search
- O(n). The number of operations increases linearly as the amount of data increases. For example, to search a data item in an array
- O(log(n)*n). The number operations increase significantly as the amount of data increases. For example, sorting data items using efficient sorting algorithms such as QuickSort, HeapSort and MergeSort
- O(n2). The number of operations explodes quickly as the amount of data increases
- O(n3), O(2n), O(n!), O(nn). There are a lot of worst complexities when the number of operations explodes significantly. These are bad time complexities. You should never design an algorithm with a complexity above O(n!) otherwise you should ask yourself if designing and implementing applications is really your field
We can visualize the complexity of different functions (amount of input data à number of operations) as shown below.
Now let’s see the concepts with concrete examples.
For example, when the amount of data is low, there is no significant different between O(1) and O(n2) algorithms on data processing because the current processors can handle hundreds of billions of operations per second.
Let’s suppose that we want to process 10,000 data items, we have the following scenario:
- O(1). It costs 1 operation
- O(log(n)). It costs 13 operations
- O(n). It costs 10,000 operations
- O(log(n)*n). It costs around 130,000 operations
- O(n2). It costs 10,000,000,000
As we can see the O(n2) algorithm costs 10,000,000,000 operations which may occur in a fraction of a second with the current processors.
Now let’s supposed that we want to process 10,000,000 data items. It’s really not a big database today. We have the following scenario:
- O(1). It costs 1 operation
- O(log(n)). It costs 23 operations
- O(n). It costs 10,000,000 operations
- O(log(n)*n). It costs around 230,000,000 operations
- O(n2). It costs 100,000,000,000,000 operations
As we can see the O(n2) algorithm costs 100,000,000,000,000 operations. We can go to take a long sleep while the processing is done.
Now we can see why we, as architect and application designer, must understand deeply the problems and select the right data structure and algorithm for the underlying solutions. If we select the right algorithm/data structure with a low complexity, we’re choosing the way to appropriately scale the application, in other words, to add the right computing resources when the volume of data and/or transactions increases impacting positively on saving a lot of money to our customers.