Sunday, December 16, 2007
Distribute your cache like the big boys do
Now here comes the problem: all of your webservers have their own private cache memory. So if five customers come within ten minutes and query your database for exactly the same product, chances are that each customer is served by a different webserver and the database is still hit five times. And the cached resultsets are now claiming five times the memory. Also: the amount of available cache memory in the server is limited (normally to about 800 MB, around 1200 if you use the /3GB switch and much more if you run win64). Sooner or later, you'll hit that limit. So, while caching hugely increases the scalability of you solution, the caching solution itself does not scale out very well with many (>2) webservers. Also, depending on your site's characteristics, you might cache more aggressively if you had (much) more memory available.
A solution for this situation may be a distributed cache memory, like memcached (or). this solution will keep cached resources in memory on only one machine and uses the key to determine which machine holds which items. It means that your cache must be accessed over the network, significantly slower than in-process memory, but nowadays much faster than file access (the bottleneck of your database server). Many of the largest sites (facebook, wikipedia, YouTube, livejournal) use memcached (see here and here and here), which proves that is scales like hell, but I think it can be usefull in scenarios much smaller than that. You could set up a number of cheap 64 bit Linux boxes with loads of memory. Note that these boxes need not even to have a hard disk and processor requirements are very modest. This allows you to create a huge amount of in-memory cache, opening up caching scenarios that you would normally dismiss without serious thought. A Win32 port is available, so you could also run an instance on each of your webservers, using just the memory you are now using for ASP.NET Cache. If you happen to use NHibernate, it has a memcached caching provider for caching both object instances and queries.
What a pity that the Cache object has no plug-in model.
Nice, but not related to what I describe above.
http://www.codeplex.com/SharedCache
I have no idea of the maturity of the product, but it fits the requirements of this post nicely and a managed solution may have advantages. On the other hand: memcached has been tested in extreme circumstances and is highly optimized (better than a managed code solution could ever be, I think).
Links to this post:
<< Home



