Friday, February 24, 2012

How to Scale? Distributed Architecture

Distributed *

After the success of the revolutions of Distributed Data and Distributed Computing, it is time to think about Distributed Architecture.
I believe that you can find many similarities between the way that data processing is evolving throughout the years and how system architecture is evolving and can continue to evolve.
60 years of Database Evolution
 I extended on the diagram from the evolution of database to include the addition of NoSQL paradigm, after it was almost clear the RDBMS are the answer to every Data problem around during the early 2000 years. The big RDBMS tried to scale down their DB to be embedded in mobile devices and at the same time tried to scale up their DB to handle huge amount of data. It is now clear that there is no single best data solution for every problem. The days of "buying an Oracle DB and building every solution on top of it" are gone.
I would argue that the same concept is true for the architecture of most large systems; the days of "We are a Java shop, therefore every solution is based on OSGi and J2EE" are gone as well.
You may argue that OSGi is exactly built to provide the kind of modularity that we need. But I think that it is better to look on problems (after we proved that it is time to re-architect them for scale), not as nails since we have a hammer, or OSGi bundles in this case. Each scalability issue should be solved with the best tool for the problem.
The main incentive for the evolution of Data Processing, Large Scale Computing and new Architecture designs is the changing environment, mainly the explosion of Data, Users, Services and Markets. The existing technologies were unable to scale fast enough. It is impossible to build a computer big enough to compute fast enough planet scale problems. It is impossible to build a single relational database big enough to store and retrieve fast enough Big Data. In the same line, it is impossible to architect a single technology architecture to scale flexibly enough for your evolving business.

Distributed Scale

In a similar way that NoSQL databases require different thinking from the DBA and MapReduce requires different thinking from developers, software architects should think differently on Scalability issues. 
Distributed Architecture for Scale
The goal of the distributed architecture is to allow linear scalability of the system in any of the axis of growth.

  • Users - If you have double the users, you can have simply double the boxes. 
  • Services - If you want to add additional services, you can simply add these boxes independently from the other services. 
  • Markets - When you enter a new market, geographically or other dimension, you would like to do it independently from the existing ones. 
If you have one big WAR file, deployed on your JBoss server, with a single MS-SQL DB behind it, there is little chance that you will be able to scale linearly (if at all), no matter what architectural and tuning efforts you will do. Many of the optimization efforts are trying to move less critical services (like reporting) away from the main system and database, to decrease the load. It is important to understand that I recommend to move the most critical services away from the system or database and architect them in a more focused and scaleable way. This way you will get a better ROI on your scalability efforts.
The diagram above was adapted from Christian Timmerer's post to illustrate the required modularity. 
Once your decided that it is time to scale, start breaking down your system to smaller independent services, each with its own API and its own data store ("Share Nothing" concept). The data store can be based on RDBMS, but more likely you can find more suitable data solution, now that you only need to solve the data issues of a more focused service. Maybe a simple Hash Table in memory can do everything that is needed, or any of the many ready-made data solutions that were developed for similar services.
When the service is narrower and more focused, it is easier to identify what is the right scalability pattern that is needed, and to choose the right set of tools to solve it.
This is certainly not a new idea as it is discuss greatly as SOA (Service Oriented Architecture), but this it is usually used for external services, and less for internal break down of a single system. It seems that the overhead of SOA compare to a simple function call or updating of the same RDBMS, is too high for many cases. But this is the easiest option if you wish to scale differently different aspects of your system (functions or data wise).

No comments:

Post a Comment