Art IT for Scale: February 2012

Friday, February 24, 2012

How to Scale? Distributed Architecture

Distributed *

After the success of the revolutions of Distributed Data and Distributed Computing, it is time to think about Distributed Architecture.
I believe that you can find many similarities between the way that data processing is evolving throughout the years and how system architecture is evolving and can continue to evolve.

60 years of Database Evolution

I extended on the diagram from the evolution of database to include the addition of NoSQL paradigm, after it was almost clear the RDBMS are the answer to every Data problem around during the early 2000 years. The big RDBMS tried to scale down their DB to be embedded in mobile devices and at the same time tried to scale up their DB to handle huge amount of data. It is now clear that there is no single best data solution for every problem. The days of "buying an Oracle DB and building every solution on top of it" are gone.
I would argue that the same concept is true for the architecture of most large systems; the days of "We are a Java shop, therefore every solution is based on OSGi and J2EE" are gone as well.
You may argue that OSGi is exactly built to provide the kind of modularity that we need. But I think that it is better to look on problems (after we proved that it is time to re-architect them for scale), not as nails since we have a hammer, or OSGi bundles in this case. Each scalability issue should be solved with the best tool for the problem.
The main incentive for the evolution of Data Processing, Large Scale Computing and new Architecture designs is the changing environment, mainly the explosion of Data, Users, Services and Markets. The existing technologies were unable to scale fast enough. It is impossible to build a computer big enough to compute fast enough planet scale problems. It is impossible to build a single relational database big enough to store and retrieve fast enough Big Data. In the same line, it is impossible to architect a single technology architecture to scale flexibly enough for your evolving business.

Distributed Scale

In a similar way that NoSQL databases require different thinking from the DBA and MapReduce requires different thinking from developers, software architects should think differently on Scalability issues.

Distributed Architecture for Scale

The goal of the distributed architecture is to allow linear scalability of the system in any of the axis of growth.

Users - If you have double the users, you can have simply double the boxes.
Services - If you want to add additional services, you can simply add these boxes independently from the other services.
Markets - When you enter a new market, geographically or other dimension, you would like to do it independently from the existing ones.

If you have one big WAR file, deployed on your JBoss server, with a single MS-SQL DB behind it, there is little chance that you will be able to scale linearly (if at all), no matter what architectural and tuning efforts you will do. Many of the optimization efforts are trying to move less critical services (like reporting) away from the main system and database, to decrease the load. It is important to understand that I recommend to move the most critical services away from the system or database and architect them in a more focused and scaleable way. This way you will get a better ROI on your scalability efforts.
The diagram above was adapted from Christian Timmerer's post to illustrate the required modularity.

Once your decided that it is time to scale, start breaking down your system to smaller independent services, each with its own API and its own data store ("Share Nothing" concept). The data store can be based on RDBMS, but more likely you can find more suitable data solution, now that you only need to solve the data issues of a more focused service. Maybe a simple Hash Table in memory can do everything that is needed, or any of the many ready-made data solutions that were developed for similar services.

When the service is narrower and more focused, it is easier to identify what is the right scalability pattern that is needed, and to choose the right set of tools to solve it.
This is certainly not a new idea as it is discuss greatly as SOA (Service Oriented Architecture), but this it is usually used for external services, and less for internal break down of a single system. It seems that the overhead of SOA compare to a simple function call or updating of the same RDBMS, is too high for many cases. But this is the easiest option if you wish to scale differently different aspects of your system (functions or data wise).

Wednesday, February 22, 2012

When to Scale?

You know that you need to make significant changes to be able to scale sometime in the future.
Your service is serving thousands of users and the number is growing. You also have plans for wonderful features that will only bring more and more users to your service. Your business development are also telling you of a great partner (Microsoft/Google/Twitter...), who is going to sign a partnership with you and multiply your user base 10 times at least. The future is bright, if only you could scale your service in time.

Theoretical measurement of module cost/performance

Don't Scale too early

Why couldn't you build the system to scale in the first place?
This is usually the delicate balance between "Building it right" and "Building the right 'it'" (if I may borrow the terms from Pretotyping Manifesto). You could either spend your time in laying the grounds for future possible scale and risking over architecture, or you could focus on testing your ideas in a "quick-and-dirty" way. If you are successful, I can bet that you use the latter, therefore, you need to scale now.
If you scaled too early, the risk is that you spent too much time ("Initial Cost" in the diagram above) and now you have to wait longer and be much more successful to get to the "Break Even" point. I see many start up companies who are coming to raise VC funding "to finish the development", while VC prefer to put their money "to boost the marketing and sales".
Even if you could build the system to scale from the beginning, you shouldn't!

Don't Scale too late

Why couldn't you scale the service at the right time?
If you take a look at the diagram above, you can understand that taking the right time to work on scale is not simple. At the beginning of the "Change Decision Range", your system is performing well and scale in almost linear. The more efforts you put in the system (Software Development, Performance Tuning, Hardware Addition...), the better results you receive. This is certainly not the time to spend on rewriting your system for scale, or is it?
You do start to see signs of problems; some features are taking too much time to implement, some errors are causing the system to fail, some upgrades are not going smoothly as they used to be. But everything is manageable more or less, and rather quickly these problems are resolved. These are system smells, if I can borrow the concept of code smell from Martin Fowler.

The following facts are causing you to make the scale decision too late:

The slow gradual degradation of your architecture is building your confidence that you are still on the right track, as you are getting better in solving these problems
The boost in the business is putting more pressure on "keeping the lights on"
The focus on "keeping the lights on" is preventing you from working on deep rewrites
The slow gradual trend change from linear growth to plateau and to linear decline is difficult to detect
The more you invest in the current technology, the harder it gets to decide to switch to a new technology. This is true for managers ("How can I come and say that I was wrong before") and for developers ("I'm a professional Java developer, why should I learn Scala?").

You could scale the system at the right time, but it is very difficult!

Scale Right

How can you know when it is the best time to scale?
First, measure!
Work on measuring your effort spending (development, optimization, hardware...) vs. your benefit received (request number, response time, accuracy...).
Second, visualize!
Put your measurement in front of your eyes, and the eyes of your peers. A diagram similar to the one above should be reviewed every week.
Third, realize!
You can't fight reality, although it is very easy to ignore it. Realize that the linear growing curve will turn flat and then down. Know that you are going to invest in a new technology, that will return its investment in the "Break Even" point only after considerable time period.
Fourth, decide!
When you see the signs that you hit that point in the curve, start working on making the needed technology changes and rewriting of your great performing system.