Getting started with acts_as_solr

Plugins, Ruby on Rails Add comments

I recently downloaded and installed acts_as_solr, and I must say that I am quite impressed. I was dreading trying to include any sort of search service, but acts_as_solr made it a snap. As always with Rails plugins, there were a few snafus along the way, but I’ll try to help people get over them.

Note: This writeup deals with acts_as_solr 0.7, so be wary if you’re using a different version.

Config/Setup

Get the plugin

As always, the first step is to get the plugin. You can use the built in plugin installation scripts like so:

script/plugin install svn://svn.railsfreaks.com/projects/acts_as_solr/trunk

However, I will urge you to use a vendor drop instead. This plugin is unstable enough that you will probably want to edit the source, and a vendor drop will allow you to do it a little easier. Instructions for performing a vendor drop in Rails can be found in one of my earlier posts.

If you do a vendor drop, it will skip one small installation step, so you’ll have to manually copy the lib/solr.yml file in the acts_as_solr plugin into your main config directory. No big deal.

Get Solr

Solr is a Java app that is deployed inside a servlet container, for example, Tomcat. If you’re like me, that fact brings an unhappy frown to your face. One of the main reasons I switched to Rails was because I was tired of editing arcane servlet config files. Still, over the years, there have been a lot of great projects written as servlets, so sometimes we just have to grin and bear it.

Anyways, to get started fastest, just grab the latest Solr distribution from the Solr homepage. This will come with an embedded Jetty servlet container which will get you going super fast.

After you unzip the Solr distribution somewhere, you’ll need to copy the schema.xml file into the right place:

cp myrailsproject/vendor/plugins/acts_as_solr/schema.xml apache_solr_directory/example/solr/conf/

At this point, you’re ready to rock.

cd apache_solr_directory/example
java -jar start.jar

Test it out

The basic installation given in the acts_as_solr documentation have you using Jetty on port 8983. So, to test it out, point your browser to http://localhost:8983/solr/admin If that brings up the Solr admin interface, you’re 99% home. If not, it’s time to start debugging your servlet container. Good luck :(

Using it in a model

There’s really no point to me explaining this part, as there is an excellent tutorial movie on the acts_as_solr homepage. Just look for Up and running in less than 5 minutes.

Still, there’s one special point I want to make at this stage. After everything is set up and you have Solr running, you might find that you cannot use it to find any of your model objects. This is because, by default, acts_as_solr only indexes your objects on save. This makes good sense, but it also means that all objects currently in your database will have to be loaded and re-saved in order to show up in the index. There is probably an easier way to do it, but as of right now, I don’t know what it is.

Update: In the comments section, Jason posted a great way to get all your model objects into Solr quickly using rebuild_solr_index. Assuming your model is called Book, you can index all the Book objects in your database with the following call:

Book.rebuild_solr_index

Note, however, that with the current behavior, acts_as_solr calls a commit after indexing each object, which can be a costly operation, according to the Solr docs. I will see if I can find a way to modify rebuild_solr_index to only call commit after all objects are committed, or perhaps at user-specified intervals along the way.

To verify that objects are being indexed, watch the output from Jetty in the terminal window where you started it from. As soon as an object is indexed, Jetty should scroll some output about it. At that point, use the find_by_solr command to see if it was successfully placed in the index.

Single Table Inheritance

Because of the way that acts_as_solr names the indices, version 0.7 is incompatible with single table inheritance (STI), and therefore does not work with model object heirarchies. However, it’s a very small bug, and I am guessing that it was simply an oversight on the part of the author. I looked into it and was able to create a fix in about 10 minutes.

Attached is my acts_as_solr STI patch to enable indexing of STI models. (see update)

Update: This patch has been merged into the trunk as part of acts_as_solr 0.8. So, if you’re using 0.8 or higher, the patch is unnecessary.

Development vs Production environment – Howto

WARNING! Do not have your development and production environments pointing to the same Solr installation. They will step all over each other, and find_by_solr will return incorrect results. It is roughly equivalent to using the same database for production and development. So, a general rule of thumb is that each separate database needs its own Solr index.

The easiest way to do this is to set up multiple servet containers on different ports. You can have Jetty for production on 8983 and Jetty for development on 8984, for example. To enable this, you will want two separate copies of the Solr distribution. To change the port Jetty is listening on, edit the jetty.xml file at apache_solr_directory/example/etc/jetty.xml. Look for the line that says:

<Set name="Port"><SystemProperty name="jetty.port" default="8983"/></Set>

Even though it’s easy, just adding more servlet containers is usually not the best solution, due to memory and sysadmin overhead. Instead, it’s probably a better idea to deploy multiple Solr servlets to the same container. Heck, it may even be possible to have a single Solr instance handle multiple distinct indices.

Update: I have found a way to deploy multiple Solr instances inside a single Tomcat container. The details are available in another blog post.

An example in the wild

I have acts_as_solr up and running on my site, Obsidian Portal. For a good example of it in action, Take a look at the characters list page and search for ‘Nent’. If you look at the individual characters, you’ll see that each of them has ‘Nent’ in their content somewhere. This is the sort of functionality that acts_as_solr gives out of the box.

Gotchas

Coming soon…
Until then: did you forget to copy schema.xml into the Solr conf directory?

19 Responses to “Getting started with acts_as_solr”

  1. Midnight Oil » Blog Archive » acts_as_solr for development and production in one Tomcat instance Says:

    [...] Working late into the night, two guys are trying to make the Web a friendlier place for Dungeons & Dragons. « Getting started with acts_as_solr [...]

  2. Jason Says:

    To rebuild the index for a model you simply call rebuild_solr_index (ActsAsSolr::ClassMethods) for each model.

    http://acts-as-solr.rubyforge.org/classes/ActsAsSolr/ClassMethods.html#M000006

  3. Cody Caughlan Says:

    You can get all your model objects into Solr really fast using this small loop:

    http://pastie.caboo.se/55583

    Of course, if you have a LOT of records you might want to wrap this loop in another and do it in small batches, but you get the point.

  4. Micah Says:

    @Jason:

    Thanks for the tip about rebuild_solr_index I’ve updated the main post to add this.

    Now I need to find a way to make it batch all the index operations together before calling commit…

  5. acts_as_solr Tutorial: More Search Goodness in Rails Says:

    [...] search features based on proven software. Micha Wedemeyer has put together a great resource on how to install and setup Solr and acts_as_solr and the official acts_as_solr homepage has examples of how to use it within your own Rails [...]

  6. Andrew Turner Says:

    Hrm, I’ve gotten it up and running – kudos for making it so easy. And I’ve rebuilt my index (watched the solr output too).

    However, when I search in solr (either via Ruby, or even the solr console) – no results are ever returned? How can I see what’s being actually stored (or has been stored) in the index?

  7. Micah Says:

    @Andrew

    I’m still very new to Solr itself, so I don’t know how to examine the indices.

    Is anything being displayed in the solr output when you do a search?

    Here is some sample output from my installation when I do a search:

    GameCharacter.find_by_solr('hero')

    shows up in the logs as:

    May 3, 2007 12:03:38 PM org.apache.solr.core.SolrCore execute
    INFO: wt=ruby&q=(hero)+AND+type_t:GameCharacter&fl=pk_i 0 919
    

    Is anything like this displaying when you do a search?

    Also, what environment are you running in? Test? Development? Production? Make sure that the index rebuilding and searching is happening in the same environment.

  8. Andy Pols Says:

    You can use http://www.getopt.org/luke/ to explore the search index to see what’s stored in it.

  9. Joerg Says:

    Hi, has anybody figured out why no results are returned? I’m using Luke, and the documents are definitely inside the index. Also, when I look at the admin interface, every time I do a search I get another “hit” … so SOLR is also finding the document inside the index. But still … acts_as_solr returns no results.

  10. Henrik N Says:

    I wrote a rake task that loops over all models acting_as_solr and does a rebuild_solr_index on each.

    Looking at the source snippets in the docs, it seems like it’d be pretty simple to have the code do commits less often. If I find the time, I might try that.

  11. Micah Says:

    @Joerg,

    I’m not clear on the problem. Can you be more specific? Also, a warning: I don’t think acts_as_solr will work with a pre-existing index, if that’s what you’re trying.

    @Henrik,

    Before you spend the time writing anything, search the source for the cron job stuff. I didn’t read much, but it seems that you can set up a ruby cron job to delay commits until a later date. Just do a search on the source for cron.

  12. Dr J Says:

    @Joerg & Micah,

    I’m having the same problem with not results being returned from my find_by_solr query.

    I built the index manually not using the acts_as_solr plugin and I get no results back. Why do you think it won’t work is an index not build through acts_as_solr?

  13. Nakul Says:

    @Dr J: This is because the schema.xml you build for your solr is different from what acts_as_solr uses. acts_as_solr uses dynamic fields for field and their types while in solr you manually specify the fields and their types.

  14. mahesh Says:

    hai its very helpfull, but can u give me clear explanation on it, how to use it in the controller and integration to application

  15. A Fresh Cup » Blog Archive » Double Shot #37 Says:

    [...] Getting started with acts_as_solr – Another way to manage full-text searching in a Rails application. [...]

  16. Adhiraj Says:

    Do we need some different settings if we are using acts_as_solr along with mongrel_cluster?

  17. Midnight Oil » Blog Archive » Hacking the Ultrasphinx plugin to work with paginating_find Says:

    [...] Getting started with acts_as_solr acts_as_solr for development and production in one Tomcat instance Optimizing Solr and Rails – Index in the background [...]

  18. Frank Says:

    Just a small not to tell you the acts_as_solr homepage is no longer available… you may want to update the link. Thanks!

  19. Brian Mulloy Says:

    After getting through initial setup and a few exceptions I was also having a problem with empty search results. Eventually I killed and restarted solr, ran setup and index again and I started getting results.

Leave a Reply

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in