Skip to content

Of Python, memcached and decorators: easy-peasy function caching

Following on from channam’s Memcached in 2 minutes, I’ve been working on a decorator to make life even simpler than this, memorised.
The most popular use-case for using memcached in Python apps is to cache the return value of a function or method. Over and over again you’ll find yourself doing something like:

mc = memcached.Client(['localhost:11211'])
def get_something(mc=mc):
    value = mc.get('something')
    if value is not None:
        return value
        # Do something else ...
        return 'hello world'

So in the interests of DRY, why not reduce that down to a reusable pattern?
We can do this using a decorator, to replace our function with a function that instead checks for the existance of some sort of key, returns the value, otherwise delegates to the actual function for output (and then caches this output back into memcache).

The simplest way to add reusable implementations of patterns such as these onto existing functions in Python is to use a decorator, effectively a function wrapper that replaces (or returns a replacement) function in place of the original call.
How does this help? Well using this we can intercept a call to the decorated function, generate a signature for use as a key in memcache, check if the item is available in the cache, if so return that, otherwise grab the output from the function, pop it into memcache using our generated key, and finally return this value.

The biggest challenge we face is generating the unique (at least to that particularly function call) key to reference in memcache. The way I’ve found best to do this is to use a combination of module, class name and key attributes (if it’s a method of an instance or decorated with @classmethod), function name, and call arguments, in this form: <module>.<class>[<key attributes>]::(<arguments>)

In order to do this we need information about both the arguments, and if it’s a method we also need information about the class the method is bound to. Ordinarily these are both easy tasks using the inspect module, and the im_self attribute (funnily enough, referencing self) that bound methods contain.
However, a quick explanation of how the @<decorator> shortcut tag works, and indeed how decorators work, reveals a slight kink in this assumption. For example take the following:

class Test:
    def test(self, arg1):

This simple bit of syntactic sugar is actually equal in functionality to the following (which is how you had to do it before Python 2.4, see PEP-318):

class Test:
    def test(self, arg1):
    test = testfunc(test)

To explain further, as I said before, decorators are just function which take other function instances and wrap around or replace them, and they are applied at the time of definition, and not as you might think at first calltime.
This is the reason the decorator function returns a function, as it is first called and instanced at this point, but any arguments to the function it is wrapping are passed at call time as a call to the function returned by the decorator.

If this confuses you, don’t worry, it’s not actually that important right now, except for the fact that because the function instance is not passed in at calltime, it means it is not bound, and loses it’s frame (e.g. how it is called, from which instance etc.).
This means we can no longer use im_self and several of the class functions in the inspect module. What we can do however is cheat and use the fact that bound methods always pass in their bound object instance or class instance as the first argument of a call, the ‘self’ argument.

memorised.decorators.memorise() uses the following trick, to first check if ‘self’ or ‘cls’ (the standard first parameter of an @classmethod) is there, and then using the *args list of passed in arguments to access the first parameter and grab either .__class__._name for an object instance’s class name or .__name__ for class instances:

# Get the list of arg names from func_code
argnames = fn.func_code.co_varnames[:fn.func_code.co_argcount]

. . .

if classmethod:
    # Get the class name from the cls argument
    class_name = args[0].__name__
    # Get the class name from the self argument
    class_name = args[0].__class__.__name__

By then merging *args and **kwargs, we can build a hash key of this particular function call. Next just create a handy MD5 hash of this string using hashlib.md5, and then do our memcache, checks pretty much as above in the first example.

Using memorise() to replace the first example, we get:

mc = memcached.Client(['localhost:11211'])
def get_something():
    return 'hello world'

Notice I’m still defining the mc variable to be a memcached.Client instance, memorise() does handle do this itself, either by using the default localhost:11211 server setting or by accepting a list of servers (via an argument named ‘mc_servers’). However, this isn’t ideal as the memcached.Client instance would be created every time a function definition is decorated with memorise() (which could be lots), so best to pass an instance in each time.
Not to mention the fact using dependency injection like this over any other way of keeping the instance (e.g. singleton) is much cleaner.

Another point of interest is that we need to always include the call parenthesis even when not passing in any arguments to memorise(), e.g. @memorise() and not @memorise, as you would expect from decorators such as @classmethod. This is best of the way arguments are passed to both a decorator, and then to the function being decorated. There are workarounds for this problem, but up until now I haven’t seen one that can be used with class-based decorators (which memorise() is). I hope to solve this in a future release, so expect a follow up post on using optional arguments with class-based decorators sometime in the near future.

Finally I’ll finish with a more realistic example of using this to decorate methods on a Django Model:

class BlogUser(User):
    objects = UserManager()
    def __unicode__(self):
        return u'%s' self.get_full_name();

    @memorise(parent_keys=['id'], mc=mc)
    def posts(self):

And there you have it, any posts by that user will be cached for as long as memcached’s cachetime is set, or until memorised.utils.uncache() is used to clear down the cache for that method.