Tiered Caching System

I remember reading years ago about a feature in membase where they dynamically moved data between caches of various latencies. Trying to keep the most frequently accessed items in the lowest latency storage while keeping less frequently accessed items in higher latency but cheaper to operate storage.

Motivation

While working on a feature for storyteller accelerator I wanted similar functionality. I wanted to build a tiered caching system, the first tier would utilize memcached, the second tier Amazon’s S3 and then finally the content origin.

Implementation Details

The majority of the accelerator is already built using Google Go, and Go Channels seemed like a great feature to take advantage of when building this functionality.

Internally we defined an interface for a storage tier, and then built a system that manages the individual tiers and makes them appear like one gigantic tier. Currently we have an S3 tier utilizing the goamz package and a Memcached tier using the gomemcache package.

Example Code

Here is the Get function from our storage system, which concurrently searches all tiers for a key:

This little bit of code, lets us do something pretty amazing. We can search every tier in the cache, and return the first entry that does not result in an error. We also update the other “higher” tiers should a lower level tier contain the data but not the higher ones, this easily allows the hot items to bubble up from one tier to the next.

Two things to note about this snippet.

  1. This code looks synchronous to the caller but it isn’t.
  2. We take advantage of passing parameters to the closure:

     go func(index int, tier Interface) ...
    

This allows us to pass the individual tiers, note that if you try to use the index and tier variables directly, they do not retain their value at the time the go func() is invoked and instead will have the value at the time they run. Not the functionality you want :-)

Summary

This solution proved to be very tidy, easy to write and test, and contains no callbacks, no semaphores, or other cruft. To top it all off the caller has no idea anything special is happening, it just works. Not bad for a days work.

blog comments powered by Disqus