Update: 2008-02-21 We’re looking into using ActiveMessaging and Amazon SQS to help with the workflow for background processing. Stay tuned for an updated post.
With before_save and after_save filters being so easy to use, it’s tempting to add more and more pre and post-processing to saving an ActiveRecord model. For Obsidian Portal, we update permissions, set timestamps of associated objects, and do all sorts of stuff. Unfortunately, all this extra work takes time, and can significantly slow down your application. The more work you do on the main execution thread, the more time Mongrel is tied up doing stuff unrelated to servicing requests. If something goes wrong in any of the filters, Rails will rollback the database transaction, and *poof* it’s all gone!
A while back, we started seeing ‘rbuf_fill’ timeout errors in the server logs. From what we could see, calls to acts_as_solr indexing were timing out, interrupting the save. For us, this was really bad. People would spend lots of time painstakingly crafting their perfect blog posting or wiki page, only to have it evaporate into nothing. All they saw was our default “Internal Server Error” page. Sure, it looks nice, but no one wants to see that
Tracing the timeout back to Solr was not hard, and the solution was clear: take the indexing out of the main execution thread and move it to a background process. Luckily, acts_as_solr made this a fairly easy refactoring process. Here’s what we did:
Add an :if clause to your acts_as_solr macro call
acts_as_solr supports an :if clause that will be used to determine whether or not the record will be indexed when save is called. We want this to always evaluate to false, except when we explicitly set it to true during off-line processing. Below is an example from one of our models:
Use rake/cron to do the indexing in the background.
Now that indexing does not happen on save, we need to make sure it happens at some point. Our solution was to move it to a rake task that gets executed by a periodic cron job. Rake + cron has worked well for us in the past, so we’ll stick with it.
The task itself is very simple. Find all the objects that have been updated since the last indexing, and push them to Solr.
Below is the rake task that I wrote. If I were more clever, I would probably come up with a neat trick for automatically finding all the models that support Solr indexing. Now that I’m an official committer on acts_as_solr, maybe I’ll try to figure something out and get it into the trunk. Still…I’m lazy
Set up a cron job to run this every thirty minutes or so. For most sites, a half hour will be a good balance between keeping the load down and making sure the searching is fairly up to date.
By moving the indexing off the main thread, we’ve noticed a significant reduction in the number of Solr related exceptions. That means our users have seen a significant reduction in the number of “Sorry, we lost all your data” errors, and that is exactly what we were hoping for.












Recent Comments