[Part-1] Achieving 70% lossless compression for Redis Writes

Omkar Kulkarni
4 min readAug 12, 2022

Using a combination of Kryo Serialization, Lettuce DEFLATE and modifying database can help developers gain amazing memory gains!

This was a blog in my draft list for last 2 years. However, one of my recent use cases compelled me to get back to thinking the most optimal way of compressing data before storing into redis. Before writing further, I need to give due credits to a blog post from DoorDash that triggered the idea of improving compression: https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/

Dataset of Embeddings

Before getting into actual exercise, let us define our single row that we will work on. For more clarity, this one row is basically a row from a mxn metrics of embeddings with rows of type A whereas columns are of type B. In classic examples like metrics factorization used for recommendations, you can consider A being users whereas B being items, with values at each cells being a double. Our goal is to store this mxn dataset into redis.

For sake of brevity, however, we will consider just 1 row and work on it.

Following gist contains a sample row that we will use for this exercise:

Here’s a list of tasks we will perform on this json object before storing into redis:

  1. Try changing the data structure of data field.
  2. Try serializing data field in a couple of different ways.
  3. Use a custom codec in case #2 is possible.

We will use Java/Scala for this exercise. This exercise is not possible in python at the moment because stress tests were conducted using JVM with memory profiling done with JVM compatible serializer. In case you are using python, you may want to refer to this excellent article: https://itecnote.com/tecnote/python-which-is-the-best-way-to-compress-json-to-store-in-a-memory-based-store-like-redis-or-memcache/

Step 1: Changing the data structure of data field

Storing my metrics in In this case, instead of storing entire dictionary as is, I decided to use an auxiliary data structure called embeddingMap where i extracted the _1 field into an index map.

This step helped splitting the big json blob into two auxiliary data structures:

  1. An embeddingMap to store the _1 field with its index in a Map[String, Integer]
  2. A list of Double with maintained order for indices stored in the embeddingMap

Creating these two datasets itself reduced memory footprints by more than 50%. embeddingMap can be stored in DynamoDB with DAX enabled to be fetched in runtime to map the embeddings again as required(we won’t get into this part for now). We will focus on how efficiently we can store List[Double] in redis in the subsequent sections.

Step 2: Try to find the best serializer

Redis stores everything as byte[] . Therefore, we need to find the most optimal serializer that can help us to store data to redis as bytes. We will focus on writing this List[Float] in redis(in reality, the length of this embedding list can be in scale of 1000's):

Without compression:

// Using Kryo> memory usage a-test-key-a
(integer) 16952
// Using Jackson> memory usage a-test-key-b
-> Redirected to slot [8404] located at
(integer) 65592
// Using plain GZIP> memory usage a-test-key-c
-> Redirected to slot [12533] located at
(integer) 15928

With compression enabled(using CompressionCodec.valueCompressor(codec, CompressionCodec.CompressionType.DEFLATE):

// Using Kryo> memory usage a-test-key-a
-> Redirected to slot [4279] located at
(integer) 11320
// Using Jackson> memory usage a-test-key-b
-> Redirected to slot [8404] located at
(integer) 15928
// Using plain GZIP> memory usage a-test-key-c
-> Redirected to slot [12533] located at
(integer) 15928

As seen from above snippets, Kryo stood out to be a winner with 30% lesser compressed memory. I was quite fascinated by this result and decided to write my custom implementation for a RedisCodec<T, U>

Step 3: Building a custom codec for creating Redis Connection to AWS Elasticache

After investigating what is kryo in a deeper sense, I noticed that developers din’t make it inherently thread-safe. What that means is, if a kryo instance is being used for serialization, it may be used by other thread with byte overflow/leaks causing garbage bytes collected for the results. Deeper discussion on kryo’s thread-safety can be found here: https://github.com/EsotericSoftware/kryo/issues/188

In my case, I then decided to instead use an ObjectPool. Rest of this article will focus on building a custom RedisCodec<T, U> that uses kryo serializer for ingesting bytes into redis.

Before starting to emulate the code, add this dependency in the build.gradle OR pom.xml :


Step 3.1: A utility for creating a Kryo Pool :

Step 3.2: A KryoProvider

Step 3.3: Create an implementation of KryoProvider:

Step 3.4: Creating a custom Serde<T, U>:

Now using the kryo provider created in step 3.2 and 3.3, we have an AbstractSerde<T,U> that can be used directly for implementing concrete Serde's :

Now using this AbstractSerde<T, U> , we can create a concrete implementation for our List<Float> :

Step 3.5: Use your FloatSerde in lettuce’s RedisCodec<String, List<Float>> :

Husshhh!!!! Well that was lot of code. Now you can use the FloatWithKryoRedisCodec to initialize connection using lettuce:6.0* library.

How to use this codec is clearly documented here on lettuce’s official github handle: https://github.com/lettuce-io/lettuce-core/wiki/Codecs

That’s all for today. In Part 2 of this blog post, I will focus on reviewing and adding results of stress and load test to confirm the extra serialization efforts by Kryo have no bearing on the read latency(I have tested this, but haven’t got time to really write it down at the moment. Stay tuned for just a week!)

What do you think? Can you try implementing this? Would love to hear from you about the usage!