24 June 2017

Gleaming the scale cube

Image: Pixabay "cubism"
This concept was introduced in a book called "The art of scalability" and it's a very useful way of looking at your application.  It encourages you to picture our application capacity as a cube. In this analogy, if we increase the width, height, or breadth of the cube then we increase our application capacity.

It's particularly useful to consider while you're modelling your domain with a view to decomposing a monolithic application.  The book by Eric Evans "Domain Driven Design" describes the concept of bounded contexts and this plays very nicely into the scale cube concept.

X-scaling, or horizontal scaling

The first dimension we're going to look at is horizontal scaling. This is where we make our cube longer along the X-axis.

When we scale horizontally we create multiple copies of the application and then use a load-balancer to split up the traffic between them.

This is a very easy way to scale your application but it does have some limitations. Firstly it is costly to add more compute to your stack and at some point it won't make financial sense for you to add a new server to your cluster. Secondly, this approach does not address the architecture of your application and as it grows bigger and bigger it becomes harder and harder to manage its performance.

Load balancer
Horizontal scaling is cheap and easy in the early stages but offers a diminishing return that means you'll hit a limit. You can extend that limit and increase the amount of return by using the other two scaling methods in conjunction with horizontal scaling.


Y-scaling is the act of splitting your application into multiple components, each of which is responsible for a set of related functions. This can be called modular programming because it breaks your monolithic application into smaller modules which work together to offer the application functionality.

Data sharding
You might not see how this can help your application scale, but in general a small piece of code that is focused on one thing will do that thing quicker than a monolithic application that tries to do everything.

By having your application split into smaller components you'll be able to perform horizontal scaling at a more granular level. You'll find that not all parts of your application are equally busy. For example a feature that lets users automatically tweet what music they're listening to might not be used nearly as much as a recommendation engine that suggests music to them.

Instead of having to create a new copy of your entire application you'll be able to create new instances of the individual components that are experiencing load. This improves the cost efficiency of horizontal scaling and extends the limit to which you can afford to scale up.


Z-scaling involves splitting up responsibility for different parts of the data. You deploy multiple copies of your application, but each one becomes responsible for just a part of the data set. In other words you partition your data and then assign a server to a particular group of your data.

For example, you could set up three different servers each of which is devoted to serving only certain customers based on their surname. When a request arrives at your application your router will determine which of the servers is intended for that user and send the request accordingly.

If you compare this to X-scaling you'll notice a similarity in that you have deployed multiple identical copies of your application. The difference is that now instead of having a load balancer send traffic to any server you will be consistently routing traffic to a particular server based on your sharding rules.

There are several downsides to Z-scaling. It doesn't help with reducing your application complexity because a full version is still deployed on each server. In fact it's probably going to make your application more complicated to understand because you have the additional problem of having your data in pieces.

The advantage of sharding is that it gives you another way to tune the application stack to match the load. It also has the effect of isolating failure to one part of your application; If a shard goes down then traffic that is destined for other shards will continue to flow. This improves the fault tolerance of your architecture.

Bringing it together

The scale cube is useful for looking at different dimensions and opportunities for scaling. Horizontal scaling is the cheapest option initially but offers diminishing returns and eventually will no longer be an option. The other two dimensions are more expensive to perform but will be necessary for you to offer a truly scalable solution.