<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Optimizing Solr and Rails &#8211; Index in the background</title>
	<atom:link href="http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/</link>
	<description>Late nights eventually pay off</description>
	<lastBuildDate>Wed, 08 Feb 2012 19:15:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
	<item>
		<title>By: shokal_s</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2285</link>
		<dc:creator>shokal_s</dc:creator>
		<pubDate>Wed, 06 Aug 2008 08:13:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2285</guid>
		<description>Modifying the mergeFactor in the solrconfig.xml from 10 (default on installation) to 20, made the solr index 10 times faster (3 sec instead of 30sec) and the query time wasn&#039;t significantly higher.</description>
		<content:encoded><![CDATA[<p>Modifying the mergeFactor in the solrconfig.xml from 10 (default on installation) to 20, made the solr index 10 times faster (3 sec instead of 30sec) and the query time wasn&#8217;t significantly higher.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shokal_s</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2284</link>
		<dc:creator>shokal_s</dc:creator>
		<pubDate>Wed, 06 Aug 2008 07:56:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2284</guid>
		<description>Modifying the  in the solrconfig.xml file from 10 (default value on installation) to 20 has made the index action 10 times faster for me (reduced from 30sec to 3sec), while the query time is not significantly slower.</description>
		<content:encoded><![CDATA[<p>Modifying the  in the solrconfig.xml file from 10 (default value on installation) to 20 has made the index action 10 times faster for me (reduced from 30sec to 3sec), while the query time is not significantly slower.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Midnight Oil &#187; Blog Archive &#187; Hacking the Ultrasphinx plugin to work with paginating_find</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2201</link>
		<dc:creator>Midnight Oil &#187; Blog Archive &#187; Hacking the Ultrasphinx plugin to work with paginating_find</dc:creator>
		<pubDate>Sat, 19 Apr 2008 22:59:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2201</guid>
		<description>[...] Getting started with acts_as_solr acts_as_solr for development and production in one Tomcat instance Optimizing Solr and Rails - Index in the background [...]</description>
		<content:encoded><![CDATA[<p>[...] Getting started with acts_as_solr acts_as_solr for development and production in one Tomcat instance Optimizing Solr and Rails &#8211; Index in the background [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Geoff</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2126</link>
		<dc:creator>Geoff</dc:creator>
		<pubDate>Sat, 01 Mar 2008 18:23:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2126</guid>
		<description>preventing solr from indexing on every save is a good practice. But I suspect that the solr has even deeper problems after reading this:

http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html</description>
		<content:encoded><![CDATA[<p>preventing solr from indexing on every save is a good practice. But I suspect that the solr has even deeper problems after reading this:</p>
<p><a href="http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html" rel="nofollow">http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micah</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2097</link>
		<dc:creator>Micah</dc:creator>
		<pubDate>Tue, 05 Feb 2008 14:28:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2097</guid>
		<description>I may have to write a follow up to this article covering the use of message queue servers.  That seems to be a common theme with Solr.

On that note, what is the memory usage like for ActiveMQ?  Our VPS is already swapping like mad thanks to Solr and RMagick.  We may have to disable search entirely just to make sure that MySQL doesn&#039;t have to swap to disk all the time.  From a few mailing list posts I&#039;ve read, ActiveMQ (like most things Java) is not shy about grabbing memory, which is a big no-no for those of us trying to squeeze everything in a limited size.

Which message server has the absolute smallest memory footprint?  That means more to me than fancy features.</description>
		<content:encoded><![CDATA[<p>I may have to write a follow up to this article covering the use of message queue servers.  That seems to be a common theme with Solr.</p>
<p>On that note, what is the memory usage like for ActiveMQ?  Our VPS is already swapping like mad thanks to Solr and RMagick.  We may have to disable search entirely just to make sure that MySQL doesn&#8217;t have to swap to disk all the time.  From a few mailing list posts I&#8217;ve read, ActiveMQ (like most things Java) is not shy about grabbing memory, which is a big no-no for those of us trying to squeeze everything in a limited size.</p>
<p>Which message server has the absolute smallest memory footprint?  That means more to me than fancy features.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Payne</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2096</link>
		<dc:creator>Alex Payne</dc:creator>
		<pubDate>Tue, 05 Feb 2008 01:02:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2096</guid>
		<description>We use Solr for our &quot;people search&quot; feature at Twitter.  We use acts_as_modified to determine whether or not an ActiveRecord object has changed and thus should be updated in the Solr index.  Our Starling queue server stores the list of objects that need to be re-indexed.  A simple daemon reads from that queue and fires off messages to Solr.  Two cron jobs periodically commit and optimize the index.</description>
		<content:encoded><![CDATA[<p>We use Solr for our &#8220;people search&#8221; feature at Twitter.  We use acts_as_modified to determine whether or not an ActiveRecord object has changed and thus should be updated in the Solr index.  Our Starling queue server stores the list of objects that need to be re-indexed.  A simple daemon reads from that queue and fires off messages to Solr.  Two cron jobs periodically commit and optimize the index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micah</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2093</link>
		<dc:creator>Micah</dc:creator>
		<pubDate>Mon, 04 Feb 2008 14:04:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2093</guid>
		<description>Jacob,

Thanks.  This is all great stuff.  I&#039;ll have to look into ActiveMessaging, as the rake + cron solution is workable, but seems inelegant.

This might actually solve another problem we&#039;re looking at: capturing deletes.  Currently, deletes from Solr still take place in the web request, because once they&#039;re gone from the database, we have no way to figure out what exactly was deleted.  From your post, it sounds like we could use ActiveMessaging to handle this.</description>
		<content:encoded><![CDATA[<p>Jacob,</p>
<p>Thanks.  This is all great stuff.  I&#8217;ll have to look into ActiveMessaging, as the rake + cron solution is workable, but seems inelegant.</p>
<p>This might actually solve another problem we&#8217;re looking at: capturing deletes.  Currently, deletes from Solr still take place in the web request, because once they&#8217;re gone from the database, we have no way to figure out what exactly was deleted.  From your post, it sounds like we could use ActiveMessaging to handle this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Stetser</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2088</link>
		<dc:creator>Jacob Stetser</dc:creator>
		<pubDate>Sat, 02 Feb 2008 23:21:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2088</guid>
		<description>We had the same issue with solr, and also traced it to acts_as_solr committing after every request. Eventually, we figured out the same thing - stop committing every save. Then the problem became - newer stuff was not showing up in the index, until we figured out we needed to send a commit command every so often! 

The newer versions of Solr also support autocommitting by time elapsed or number of documents uncommitted, so now we let Solr handle its own reindexing.

That solved a lot of our issues, but we&#039;ve also moved to ActiveMQ and ActiveMessaging to offload the process of sending updates to the index. Each update goes into an indexing queue and then a separate processor (written in ruby with active_messaging) grabs from the queue and ships it off to Solr.

Since then, we&#039;ve had much faster saves, up-to-date indexing and no save timeouts - even if the Solr service was offline. When we restart it, the processors resume where they left off.

Once you get to this point, being able to offload stuff that doesn&#039;t need to happen in the space of a web request is vital to performance. The hardest part of ActiveMessaging with ActiveMQ is getting ActiveMQ running (java!!!) - but there are extremely simple-to-use ruby message queuing solutions around. The main idea is to move to a message queue architecture where you rapidly hand off the work request (update Solr with this record) to a queue, instead of slowly wait for Solr to respond and possibly lose the input because of a timeout error.

As far as backgroundRB, I wouldn&#039;t recommend using it at this point - ActiveMessaging or &#039;bj&#039; are much newer and more reliable, and the author of bdrb has publicly recommended against using it any more.</description>
		<content:encoded><![CDATA[<p>We had the same issue with solr, and also traced it to acts_as_solr committing after every request. Eventually, we figured out the same thing &#8211; stop committing every save. Then the problem became &#8211; newer stuff was not showing up in the index, until we figured out we needed to send a commit command every so often! </p>
<p>The newer versions of Solr also support autocommitting by time elapsed or number of documents uncommitted, so now we let Solr handle its own reindexing.</p>
<p>That solved a lot of our issues, but we&#8217;ve also moved to ActiveMQ and ActiveMessaging to offload the process of sending updates to the index. Each update goes into an indexing queue and then a separate processor (written in ruby with active_messaging) grabs from the queue and ships it off to Solr.</p>
<p>Since then, we&#8217;ve had much faster saves, up-to-date indexing and no save timeouts &#8211; even if the Solr service was offline. When we restart it, the processors resume where they left off.</p>
<p>Once you get to this point, being able to offload stuff that doesn&#8217;t need to happen in the space of a web request is vital to performance. The hardest part of ActiveMessaging with ActiveMQ is getting ActiveMQ running (java!!!) &#8211; but there are extremely simple-to-use ruby message queuing solutions around. The main idea is to move to a message queue architecture where you rapidly hand off the work request (update Solr with this record) to a queue, instead of slowly wait for Solr to respond and possibly lose the input because of a timeout error.</p>
<p>As far as backgroundRB, I wouldn&#8217;t recommend using it at this point &#8211; ActiveMessaging or &#8216;bj&#8217; are much newer and more reliable, and the author of bdrb has publicly recommended against using it any more.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nazar</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2080</link>
		<dc:creator>Nazar</dc:creator>
		<pubDate>Fri, 01 Feb 2008 11:38:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2080</guid>
		<description>Check out backgroundDRB if you need to run one off or scheduled background tasks. It offers a far more elegant solution compared to rake + cron.</description>
		<content:encoded><![CDATA[<p>Check out backgroundDRB if you need to run one off or scheduled background tasks. It offers a far more elegant solution compared to rake + cron.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micah</title>
		<link>http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2076</link>
		<dc:creator>Micah</dc:creator>
		<pubDate>Mon, 28 Jan 2008 05:01:14 +0000</pubDate>
		<guid isPermaLink="false">http://blog.aisleten.com/2008/01/26/optimizing-solr-and-rails-index-in-the-background/#comment-2076</guid>
		<description>Hi Nima,

I&#039;ve never heard of sphinx.  I&#039;ll have to take a closer look sometime.

As to the active_messaging, that&#039;s also new to me.  We&#039;ve been using rake + cron, and while it works, it seems a little ungainly.  So, if there&#039;s a better way to do it (that&#039;s not a total pain to get working), then I&#039;m all over that.</description>
		<content:encoded><![CDATA[<p>Hi Nima,</p>
<p>I&#8217;ve never heard of sphinx.  I&#8217;ll have to take a closer look sometime.</p>
<p>As to the active_messaging, that&#8217;s also new to me.  We&#8217;ve been using rake + cron, and while it works, it seems a little ungainly.  So, if there&#8217;s a better way to do it (that&#8217;s not a total pain to get working), then I&#8217;m all over that.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

