Ferret On Rails

acts_as_ferret plugin

The primary source for information on acts_as_ferret is now http://projects.jkraemer.net/acts_as_ferret/wiki.

Please use the information below only for informational purposes

======

SVN repository and a simple demo project

The combined efforts of Kasper Weibel, Thomas Lockney and Jens Kraemer on acts_as_ferret was unified by Jens in February 2006 and put into a SVN repository.

You should still feel free to use the code below as inspiration, but the intention is to use the SVN as the main acts_as_ferret source. This should minimize the confusion with having 3 seperate although very similar versions of acts_as_ferret in the public domain.

You can use

script/plugin install https://svn.jkraemer.net/svn/projects/ferret-demo/trunk/vendor/plugins/acts_as_ferret/

for easy installation of the plugin. You'll get a version based upon those below, with various changes done by Jens.

The whole demo project (containing a simple model class, scaffolded CRUD GUI and a search form) is available at https://svn.jkraemer.net/svn/projects/ferret-demo/trunk/ .

UTF-8

For acts_as_ferret unicode support see Albert Delamednolls code example on his blog.

Original code by Kasper Weibel

This code was taken from an email on the rails mailing list by Kasper Weibel. It has been modified so that it will work on multiple ActiveRecord? Objects. It hasn't been thoroughly tested yet.

The result is the acts_as_ferret Mixin for ActiveRecord?.

Use it as follows: In any model.rb add acts_as_ferret

class Foo < ActiveRecord::Base
 acts_as_ferret
end

All CRUD operations will be performed on both ActiveRecord? (as usual) and a ferret index for further searching.

The following method is available in your controllers:

ActiveRecord::find_by_contents(query) # Query is a string representing your query

The plugin follows the usual plugin structure and consists of 2 files:

{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb

The Ferret DB is stored in:

{RAILS_ROOT}/db/index.db

Here follows the code:

# CODE for init.rb
require 'acts_as_ferret'
# END init.rb
# Copyright (c) 2006 Kasper Weibel Nielsen-Refs

# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
        
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
        
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# CODE for acts_as_ferret.rb
require 'active_record'
require 'ferret'

module FerretMixin
  module Acts #:nodoc:
     module ARFerret #:nodoc:

        def self.append_features(base)
           super
           base.extend(MacroMethods)
        end

        # declare the class level helper methods
        # which will load the relevant instance methods defined below when invoked

        module MacroMethods

           def acts_as_ferret
              extend FerretMixin::Acts::ARFerret::ClassMethods
              class_eval do
                 include FerretMixin::Acts::ARFerret::ClassMethods

                 after_create :ferret_create
                 after_update :ferret_update
                 after_destroy :ferret_destroy
              end
           end

        end

        module ClassMethods
           include Ferret

           INDEX_DIR = "#{RAILS_ROOT}/db/index.db"

           def self.reloadable?; false end

           # Finds instances by file contents.
           def find_by_contents(query, options = {})
              index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
              query_parser   ||= QueryParser.new(index_searcher.reader.get_field_names.to_a)
              query = query_parser.parse(query + " +ferret_table:#{self.table_name}")

              result = []
              index_searcher.search_each(query) do |doc, score|
                 id = index_searcher.reader.get_document(doc)[:id]
                 res = self.find(id)
                 result << res if res
              end
              return result
           end

           # private

           def ferret_create
              # code to update or add to the index
              index ||= Index::Index.new(:key => [:id, :ferret_table],
                                         :path => INDEX_DIR,
                                         :auto_flush => true)
              index << self.to_doc
           end
           alias :ferret_update :ferret_create

           def ferret_destroy
              # code to delete from index
              index ||= Index::Index.new(:key => [:id, :ferret_table],
                                         :path => INDEX_DIR,
                                         :auto_flush => true)
              index.query_delete("+id:#{self.id} +ferret_table:#{self.table_name}")
           end

           def to_doc
              # Churn through the complete Active Record and add it to the Ferret document
              doc = Ferret::Document::Document.new
              doc << Ferret::Document::Field.new(:ferret_table, self.table_name, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED)
              self.attributes.each_pair do |key,val|
                 if key == :id
                    doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED)
                 else
                    doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED)
                 end
              end
              return doc
           end
        end
     end
  end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include FerretMixin::Acts::ARFerret
end

# END acts_as_ferret.rb

Alternate Version by Thomas Lockney

The code listed above has a few issues as discussed in this email thread. I've been working on some enhancements, but it's still a work in progress. Here's the code I have so far. There are definitely bugs, but I'll update the code here as I work through them and add other features.

A couple of notes about this implementation:

  • The class based querying is broken, but then again so is the implementation in the code listed above.
  • It would be nice to allow for the use of both the filesystem based indexing AND the in-memory approach, but currently I only allow for a string path to the index. I think this should be a straightforward fix, but it's not in there yet.
  • I'm still working on implementing the code that allows for passing a Query object to the find_by_contents method.
  • There are certainly a lot of other options for the index that need to be allowed for. I'm thinking that this could be implemented as a hash that can be set in environment.rb and then overridden in the case of per-class indexes.

--Thomas Lockney

# CODE for acts_as_ferret.rb
require 'active_record'
require 'ferret'

module FerretMixin
  module Acts #:nodoc:
    module ARFerret #:nodoc:
        mattr_accessor :index_dir
        @@index_dir ||= "#{RAILS_ROOT}/index"
      def self.append_features(base)
        super
          base.extend(MacroMethods)
        end

        # declare the class level helper methods
        # which will load the relevant instance methods defined below when invoked
        module MacroMethods
        
          def define_to_field_method(field, options = {})         
            default_opts = { :store => Field::Store::YES, 
                             :index => Field::Index::UNTOKENIZED, 
                             :term_vector => Field::TermVector::NO,
                             :binary => false,
                             :boost => 1.0}
            default_opts.update(options) if options.is_a?(Hash) 
            fields_for_ferret << field 
            define_method ("#{field}_to_ferret".to_sym) do                              
                val = self[field] || self.instance_variable_get("@#{field.to_s}".to_sym)
                logger.debug("Adding field #{field} with value '#{val}' to index")
                Ferret::Document::Field.new(field.to_s, 
                                            val, 
                                            default_opts[:store], 
                                            default_opts[:index], 
                                            default_opts[:term_vector], 
                                            default_opts[:binary], 
                                            default_opts[:boost]) 
            end
          end

          def acts_as_ferret(options={})
            configuration = {:fields => :all, :index_dir => FerretMixin::Acts::ARFerret::index_dir}
            configuration.update(options) if options.is_a?(Hash)
            extend FerretMixin::Acts::ARFerret::SingletonMethods
            class_eval <<-EOV
              include FerretMixin::Acts::ARFerret::SingletonMethods

              after_create :ferret_create
              after_update :ferret_update
              after_destroy :ferret_destroy      
              
              cattr_accessor :fields_for_ferret   
              cattr_accessor :class_index_dir
              
              @@fields_for_ferret = Array.new
              @@class_index_dir = configuration[:index_dir]
                            
              # private
              if configuration[:fields].respond_to?(:each_pair)
                configuration[:fields].each_pair do |key,val|
                  define_to_field_method(key,val)                  
                end
              elsif configuration[:fields].respond_to?(:each)
                configuration[:fields].each do |field| 
                        define_to_field_method(field)
                end                
              else
                #need to handle :all case
              end
            EOV
          end                   

        end

        module SingletonMethods
          include Ferret         

          def self.reloadable?; false end
          
          def ferret_index
            @@index ||= Index::Index.new(:key => [:id, :ferret_class],
                                         :path => class_index_dir,
                                         :auto_flush => true,
                                         :create_if_missing => true)                                            
          end  
          
          # Finds instances by file contents.
          def find_by_contents(q, options = {})
            index_searcher ||= Search::IndexSearcher.new(FerretMixin::Acts::ARFerret::index_dir)            
            query_parser   ||= QueryParser.new(index_searcher.reader.get_field_names.to_a)
            query = Search::BooleanQuery.new
            if (q.is_a?(Search::Query))
              query << Search::BooleanClause.new(q)
            else  
              query << Search::BooleanClause.new(query_parser.parse(q))
            end
            query << Search::BooleanClause.new(Search::TermQuery.new(Index::Term.new("ferret_class", self.class.name)))

            result = []
            index_searcher.search_each(query) do |doc, score|
              id = index_searcher.reader.get_document(doc)["id"]
              res = self.find(id)
              result << res 
            end
            return result
          end      
          
          def ferret_create
            ferret_index << self.to_doc
          end
          alias :ferret_update :ferret_create
          
          def ferret_destroy
            # code to delete from index
            begin
              ferret_index.query_delete("+id:#{self.id} +ferret_class:#{self.class.name}")
            rescue
              logger.warn("Could not find indexed value for this object")
            end
          end
          
          def to_doc
            # Churn through the complete Active Record and add it to the Ferret document
            doc = Document::Document.new
            # store the table_name for every item indexed
            doc << Document::Field.new("ferret_class", "#{self.class.name}", Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED)                             
            # store the id of each item
            doc << Document::Field.new("id", self.id, Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED)
            # iterate through the fields and add them to the document
            fields_for_ferret.each do |field|
                doc << self.send("#{field}_to_ferret")
            end
            return doc
          end
      
        end
     end
  end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include FerretMixin::Acts::ARFerret
end

# END acts_as_ferret.rb

Third Version by Jens Kraemer - integrating Ferret with Typo

I just integrated Ferret into my Typo installation, using above acts_as_ferret implementations as a starting point. See this post for more info and the code.