One of the most common use cases of using memcached is to build a distributed cache environment over multiple machines in a cluster. The setup allows you to scale up memcached horizontally by adding more machines to a cluster, you expand the total memory available for your application as a cache. The benefit of having a horizontally scalable caching, is that you are not limited by the amount of RAM you can install in a single server any more. It also means that you can utilize some of the free memory you have in your web server or so, and collectively you will have a distributed memcached environment with a large single virtual memory pool for your caching needs.
Building a distributed memcached environment is far simpler than you might have thought. The memcached daemon is blind about the cluster setup and has no special configuration on the server side to run the cluster, the client is actually doing the data distribution not the server.
So, it all starts when a single server cannot hold your entire cache and you need to split the cache pool across several servers.
If you are running multiple instances of the memcached daemon on the same server, make sure you are running them on different ports.
memcached -p 3030 memcached -p 3031
The server installation goes as previously described and the cluster configuration goes to your client by adding the list of servers to all your clients.
It's important to note that in order to ensure that the cluster is sane, is to have the same order of servers in all of your clients.
As an example, I'll be using python's pylibmc
library to communicate with the memcached cluster:
import pylibmc mc = pylibmc.Client(["127.0.0.1:3030", "127.0.0.1:3031"], binary=True, behaviors={"tcp_nodelay": True, "ketama": True}) mc["ahmed"] = "Hello World" mc["tek"] = "Hello World"
What happens is that you specify a list of your servers to your client configuration and the client library uses consistent hashing to decide which server a certain key-value should go to.
The constructor of the client object here was fed with a couple of interesting parameters:
binary = True
: This is to configurepylibmc
to use the memcached binary protocol not the ASCII protocol.behaviors={"tcp_nodelay": True, "ketama": True}
: This configures the memcached connection socket to use thetcp_nodelay
socket option which disables Nagle's algorithm (http://en.wikipedia.org/wiki/Nagle%27s_algorithm) on the socket level. Setting"ketama" = True
means thatpylibmc
is usingmd5
hashing and that it's using consistent hashing for key distribution.
After we have created the client object, we have set two keys ahmed
and tek
with the value Hello World
and what actually happens behind the scenes is that each key-value pair is actually stored on a different daemon, according to the consistent hashing of the key.
Sometimes you want your caching server to be persistent; there are several very good alternatives to memcached that can help you achieve that.
You can checkout Redis at http://redis.io and Kyoto Tycoon at http://fallabs.com/kyototycoon/.