Storage Scalability

We try to explain some of the terminologies in simple words. Lookup wiki for a more formal definition.

Replication : Replication refers to frequently copying the data across multiple machines. Post replication, multiple copies of the data exists across machines. This might help in case one or more of the machines die due to some failure.
Consistency: Assuming you have a storage system which has more than one machine, consistency implies that the data is same across the cluster, so you can read or write to/from any node and get the same data.
- Eventual consistency : Exactly what the name suggests. In a cluster, if multiple machines store the same data, an eventual consistent model implies that all machines will have the same data eventually. Its possible that at a given instance, those machines have different versions of the same data ( temporarily inconsistent ) but they will eventually reach a state where they have the same data.
Availability: In the context of a database cluster, Availability refers to the ability to always respond to queries ( read or write ) irrespective of nodes going down.

Vertical scaling and Horizontal scaling :In simple terms, to scale horizontally is adding more servers. To scale vertically is to increase the resources of the server ( RAM, CPU, storage, etc. ).
Example: Lets say you own a restaurant which is now exceeding its seating capacity. One way of accomodating more people ( scaling ) would be to add more and more chairs (scaling vertically). However since the space is limited, you won’t be able to add more chairs once the space is full.
Another way of scaling would be to open new branches of the restaurant ( horizontal scaling ).

Vertical Scaling

Vertical scaling, or improving the capabilities of a node/server, gives greater capacity to the node but does not decrease the overall load on existing members of the cluster. That is, the ability for the improved node to handle existing load is increased, but the load itself is unchanged. Reasons to scale vertically include increasing IOPS, increasing CPU/RAM capacity, and increasing disk capacity.

Horizontal Scaling

Horizontal scaling, or increasing the number of nodes in the cluster, reduces the responsibilities of each member node by spreading the keyspace wider and providing additional endpoints for client connections. That is, the capacity of each individual node does not change, but its load is decreased. Reasons to scale horizontally include increasing I/O concurrency, reducing the load on existing nodes, and increasing disk capacity.

Sharding : With most huge systems, data does not fit on a single machine. In such cases, sharding refers to splitting the very large database into smaller, faster and more manageable parts called data shards.

Search This Blog

design patterns