Tuesday, October 8, 2019

Cassandra : What to do when you run out of disk space?

Hi folks

In the first place, running out of disk space with a Cassandra cluster is not something you really want to experiment, trust me...
Caused by: No space left on device
I guess that if you're reading this post, you don't mind about this advice because it's too late though.

Why Cassandra need (so much) free space?

Under the hood, Cassandra works with internal processes which needs temporary disk space (up to the size it's already using...) such as : 
  • Running Compactions: As SSTables are immutable, Compactions are the processes which reorganize SSTables by recreating new SSTables and to do so, use space on disk.
  • Keeping Snapshots: a snapshot corresponds to a copy of SSTables at a certain point of time. Cassandra uses hard link to create the snapshots. Basically, taking a snapshot is not something that will increase the disk but over the time, keeping snapshots will increase the disk (because the snapshot files are not deleted).

What can you do?

First question is : Is it only one node or your entire cluster which is running out of disk space?

If it's only one node, you can follow the proposals below but sometimes it's easier to trash the node and to replace it with a fresh new node... Thanks to Cassandra and the way it distributes the data across a cluster, you should not end up with one node with (a lot) more data than the others.

If it's your entire cluster which is running out of disk space... Well... It's where the fun begins... I can't promise that you will not loose data...
Most of the actions that you can run on a node to reclaim spaces will start to increase a bit the disk usage...

Quick wins

  • Stop writing data into the cluster
It sounds a bit weird but yes, first of all let's stop the bleeding...
  • Clear Snapshots 
run on each node:
nodetool cfstats
nodetool listsnapshots
Theses commands will show you if you have any snapshots. If so they are good candidates to reclaim spaces and then to delete them: 
nodetool clearsnapshot
  • Increase the disk size of the nodes 
If you can temporary increase the disk size of your nodes, it's worth to do it to get back on a less critical state first. A state, where you can think of adding nodes and so on. If you're running your cluster on the cloud, most of the cloud provider provide ways to extend the disk. In most of cases you'll have to stop and start the VMs hosting your nodes.
  • Remove data (if you can...)
It can be a bit extreme as well but depending on your context and your use case it's perhaps possible...

You can drop or truncate tables. This solution is quite efficient because no tombstones are written. Cassandra just create a snapshot of the table when you run the command. The disk space is released when you clear the snapshot.

Not so quick win

  • Add nodes
This is the usual procedure... If your cluster needs more space, add more nodes...
Adding nodes means that the ownership data is changing between nodes. Cassandra does not automatically release the data which has been moved to other nodes. Do not forget to run a cleanup on each nodes:
nodetool cleanup
If you're still very limited in terms of available disk space, be careful because running a cleanup temporary increase the disk space. You can limit this increase by running the cleanup table per table :
nodetool cleanup yourkeyspace yourtable
  • Remove data (if you can...)
The other way to delete data by inserting tombstones in your cluster. To avoid to wait the gc_grace_seconds parameter before the tombstones will be evicted (by default it's 10 days), you can change the value by using the ALTER cql command. Before doing that check, check that old nodes are Up :
ALTER TABLE keyspace.yourtable WITH gc_grace_seconds = 3600

Best advice

If you survived to the issue, I'm pretty sure that you don't want to face it one more time. I would warmly recommend to monitor the disk usage of the nodes in your cluster. 

If you're using the SizeTieredCompactionStrategy (which is the worst regarding the needed free space) a good practice is to keep your disk usage below 50 to 60% and to add nodes before reaching the 70% threshold.

There's a currently an opened ticket to tackle this type of problem: