|Image: Pixabay "cubism"|
It's particularly useful to consider while you're modelling your domain with a view to decomposing a monolithic application. The book by Eric Evans "Domain Driven Design" describes the concept of bounded contexts and this plays very nicely into the scale cube concept.
X-scaling, or horizontal scalingThe first dimension we're going to look at is horizontal scaling. This is where we make our cube longer along the X-axis.
When we scale horizontally we create multiple copies of the application and then use a load-balancer to split up the traffic between them.
This is a very easy way to scale your application but it does have some limitations. Firstly it is costly to add more compute to your stack and at some point it won't make financial sense for you to add a new server to your cluster. Secondly, this approach does not address the architecture of your application and as it grows bigger and bigger it becomes harder and harder to manage its performance.
Y-scalingY-scaling is the act of splitting your application into multiple components, each of which is responsible for a set of related functions. This can be called modular programming because it breaks your monolithic application into smaller modules which work together to offer the application functionality.
By having your application split into smaller components you'll be able to perform horizontal scaling at a more granular level. You'll find that not all parts of your application are equally busy. For example a feature that lets users automatically tweet what music they're listening to might not be used nearly as much as a recommendation engine that suggests music to them.
Instead of having to create a new copy of your entire application you'll be able to create new instances of the individual components that are experiencing load. This improves the cost efficiency of horizontal scaling and extends the limit to which you can afford to scale up.
Z-scalingZ-scaling involves splitting up responsibility for different parts of the data. You deploy multiple copies of your application, but each one becomes responsible for just a part of the data set. In other words you partition your data and then assign a server to a particular group of your data.
For example, you could set up three different servers each of which is devoted to serving only certain customers based on their surname. When a request arrives at your application your router will determine which of the servers is intended for that user and send the request accordingly.
If you compare this to X-scaling you'll notice a similarity in that you have deployed multiple identical copies of your application. The difference is that now instead of having a load balancer send traffic to any server you will be consistently routing traffic to a particular server based on your sharding rules.
There are several downsides to Z-scaling. It doesn't help with reducing your application complexity because a full version is still deployed on each server. In fact it's probably going to make your application more complicated to understand because you have the additional problem of having your data in pieces.
The advantage of sharding is that it gives you another way to tune the application stack to match the load. It also has the effect of isolating failure to one part of your application; If a shard goes down then traffic that is destined for other shards will continue to flow. This improves the fault tolerance of your architecture.