Monday, September 30, 2019

Cassandra : Compaction, compaction and compaction

Hi Folks,

The word "compaction" is heavily used in the Cassandra world. You can see it everywhere while reading documentation, blog post, mailing list and so on.

Sometimes the use of compaction in combination of another word can be a bit misleading. Let's try to take a step back and to sump up a bit all the different concepts where the word compaction is used:

Machine makes cars compact, Fig. 1


Let's say the default one when you talk about Cassandra compaction. Cassandra has a write path which is very efficient. The concept is like "let's put as fast as possible the data into a file and in memory". In a second step, the data will be flushed on the disk as is. Ok but then, to enable Cassandra to also have an efficient read path, especially when you have to read a file from the disk, you need to tidy up and rearrange all the files. This process is called the compaction. A bit more technically : SSTTable are immutable and compaction is the action of generating new SSTable by merging and purging the old ones (duplicate, deleted data with expired ttl and tombstone).

Minor [Compaction]

We are still talking about the default compaction. The minor compaction is not really something which is minor.... The minor compaction is the compaction handled automatically by Cassandra as a background process and according to the chosen compaction strategy (see below).

Major [Compaction]

We are still talking about the default compaction mechanism in Cassandra. Contrary to the minor compaction, the major compaction is triggered by a manual action on a node (using nodetool compact). The major compaction can behave differently depending on the compaction strategy (see below). The other main difference between major and minor compaction and that explain the naming difference is that a major compaction has a bigger impact in terms of I/O.

[Compaction] Strategy

As already said, Compaction will merge into SSTables into new ones. Depending on you use case, Cassandra propose different strategies to compact the data new SSTables.  There is a default strategy applied if you do not specify it but you can choose the one you want per table. There are 3 main compaction strategies.

Size Tiered [Compaction] Strategy

This is the default compaction strategy. It fits for the write heavy and general workload. More documentation here:

Leveled [Compaction] Strategy

This compaction strategy fits perfectly for read heavy workloads. This strategy involves a bit more I/O than the Size Tiered Compaction. It can be a good idea to combine that with SSDs. More documentation here:

Time Window [Compaction] Strategy

This compaction strategy fits perfectly for time series. Basically the data is compacted regarding the timeMore documentation here:

A good blog blog post which deep dive into it:

Validation [Compaction]

This one is a fake friend from my point of view. It's considered as a compaction but is not really about compaction. The validation compaction is the process of building Merkle tree on nodes during a repair. It's anyway called validation compaction because this action is anyway controlled by the  


The anti compaction occurs during incremental repair. The goal is to split into two SSTables the repaired data from the unrepaired data. The 2 sets of data can no longer be compacted together and it's why it's called anticompaction.


Hopefully you get now a better vision of what means compaction regarding the context it's used into the Cassandra world.

Monday, September 23, 2019

Remote working in IT - 7 years later

Hi Everyone,

This post is a follow up of 2 previous articles related to the home working : 

Why again a post about that?

Because it's still a hot topic folks and I still have a lot of questions about it. The questions are now a bit different because most of the people who know me also know that I'm working from home for a long time. 

In addition to that, you may have notice that there's a small change in the post titles between the previous post and this one. There was evolutions on my way of working remotely.

So let's try to give some answers and try to explain the changes since the previous post.

Home working vs Remote working?

I totally agree with what Hubert said: 
For me, the "remote working" is the concept of working remotely from the company you're working for and where a part of your coworkers are not working in the same office. I would describe the "Home working" as one of the possible type of remote working where you work remotely from your home. 

Along this 7 years of remote working, I had the chance to experiment a bunch of different possibilities.

1. Pure home working

I already described this in one of previous posts about the topics. Initially, I started to work remotely from my home and recreated an office in my house. (here)

2. Colleague home working

A good friend of mine, also freelancer, Thomas to not name him, was also working for the same customer in the same context. Our houses are close from each other. It's something like a less than 5 minutes car commute to go to his house. What happened is that, over the time and more and more, we were working together at his house or at my house. It was most of the time not decided upfront but more like : you start to work from your home, you have your first coffee and then a message was popping up on my screen:  

"Hello mate, your house or mine today?"

The difference with an isolated home working is that, to be able to work in the same room, we were working in the living room. The good thing with that is that we were still very flexible and independant which means that to work together, we needed to : 
  • be willing to work together. Just a "no, not possible today" was enough without the need of any justification.
  • have one of the houses free for work which means without one of the family around the house.

We ended up with few cool setup : 

The drawback of this way of working is related to the lack of isolation between the work and the family. Work hours are unfortunately longer than school hours and we ended up each afternoon with the kids from one or the other family in the living room and that's totally ok when it's from time to time. 

In contrast, when it begins to be the default situation and that you do that for more than one year, I started to feel an embarrassment when the family of my colleague was back home (not due to them!). Usually it ended up by relocating back home to finish the day of work.

3. Coworking space working

To solve some of the drawbacks of the "intensive" colleague home working, I started to search for alternatives. I started to search for a coworking space where i could go one day or two per week in addition to my usual home working. And you know what? I found out that a new coworking space called La maison du coworking (literally "the house of coworking") just opened few months ago and 10 minutes away from home. They were offering from shared desks on an open space to isolated offices per month. I contacted them but they were no yet offering a one day shared desk. They told me : no problem, let's create it!

That's how my colleague and I ended up moving away from the colleague home working to the coworking space working!

Résultat de recherche d'images pour "la maison du coworking"

The really cool thing is that the open space was really quiet and everyone was working different thing and for different companies. So you do not fall back into the potential issues of open spaces like being disturbed every minutes by people coming to see you. The other benefit is that you can access common facilities of the coworking space : fiber internet connection, shared printers, kitchen, afterwork, meeting room...

For example, here is a meetup we organized in of the meeting room of "La maison du coworkingfor a Cassandra Lan party :

It would have been hard to do it in my living room ;-)

4. Office remote working

While working at the coworking place on the open space, I had the opportunities to get a private 4 desks office in the building next to the open space building for a very reasonable price that my customer was willing to pay... There was no reason to say no...

This office was proposed "as a service" by the coworking space which include all the facilities of the open space plus the cleaning once a week

The really cool thing was to get access to : 
  • a private place but with all the coworking facilities
  • close to my home with a less than 10 minutes commute by car, motorcycle, bike, ...
Compared to the open space, We had access to the same level of facilities but with a private whiteboard. We could also use the office as a meeting room when we needed to host the local sprint planning of the team. 

You can also prepare the talk you'll give at Devoxx france and put all the mess you want in the office because it's yours!

5. Which one to choose?

As a good consultant answer, i could say : "The one that fits better for you"

For me the key thing is that i can still have a choice, everyday i can adapt to the setup that better fits my mood of the day.

Conclusion : The commute switch

As a conclusion, I would say that having an office really close to my house is really a bonus because it enforces a short commute which enables the "brain switch" : When I go to the office in the morning, the 10 minutes commute allows me to prepare my brain for the work switch. Same goes for when I'm going back home in the evening, my brain processes the day and unplugs from the work data. This is really important and you don't get this when you work from home, the switch is instantaneous.

I can still remember my wife saying : "hey, you're still at work?" even if i was on the living room with the kids because my body did the work/family switch but not my brain. If you work from home, I would advise you to take a bit of time to simulate this commute time before you switch into family mode. You can do this by reading the news, surfing on internet to find your next bike (because everyone is always looking for the next bike ;-))...

Enjoy! Your feedback is warmly welcomed in the comment section!