Ferret On Rails
acts_as_ferret plugin
The primary source for information on acts_as_ferret is now http://projects.jkraemer.net/acts_as_ferret/wiki.
Please use the information below only for informational purposes
======
SVN repository and a simple demo project
The combined efforts of Kasper Weibel, Thomas Lockney and Jens Kraemer on acts_as_ferret was unified by Jens in February 2006 and put into a SVN repository.
You should still feel free to use the code below as inspiration, but the intention is to use the SVN as the main acts_as_ferret source. This should minimize the confusion with having 3 seperate although very similar versions of acts_as_ferret in the public domain.
You can use
script/plugin install https://svn.jkraemer.net/svn/projects/ferret-demo/trunk/vendor/plugins/acts_as_ferret/
for easy installation of the plugin. You'll get a version based upon those below, with various changes done by Jens.
The whole demo project (containing a simple model class, scaffolded CRUD GUI and a search form) is available at https://svn.jkraemer.net/svn/projects/ferret-demo/trunk/ .
UTF-8
For acts_as_ferret unicode support see Albert Delamednolls code example on his blog.
Original code by Kasper Weibel
This code was taken from an email on the rails mailing list by Kasper Weibel. It has been modified so that it will work on multiple ActiveRecord? Objects. It hasn't been thoroughly tested yet.
The result is the acts_as_ferret Mixin for ActiveRecord?.
Use it as follows: In any model.rb add acts_as_ferret
class Foo < ActiveRecord::Base acts_as_ferret end
All CRUD operations will be performed on both ActiveRecord? (as usual) and a ferret index for further searching.
The following method is available in your controllers:
ActiveRecord::find_by_contents(query) # Query is a string representing your query
The plugin follows the usual plugin structure and consists of 2 files:
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb {RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb
The Ferret DB is stored in:
{RAILS_ROOT}/db/index.db
Here follows the code:
# CODE for init.rb require 'acts_as_ferret' # END init.rb
# Copyright (c) 2006 Kasper Weibel Nielsen-Refs # Permission is hereby granted, free of charge, to any person obtaining # a copy of this software and associated documentation files (the # "Software"), to deal in the Software without restriction, including # without limitation the rights to use, copy, modify, merge, publish, # distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to # the following conditions: # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software. # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # CODE for acts_as_ferret.rb require 'active_record' require 'ferret' module FerretMixin module Acts #:nodoc: module ARFerret #:nodoc: def self.append_features(base) super base.extend(MacroMethods) end # declare the class level helper methods # which will load the relevant instance methods defined below when invoked module MacroMethods def acts_as_ferret extend FerretMixin::Acts::ARFerret::ClassMethods class_eval do include FerretMixin::Acts::ARFerret::ClassMethods after_create :ferret_create after_update :ferret_update after_destroy :ferret_destroy end end end module ClassMethods include Ferret INDEX_DIR = "#{RAILS_ROOT}/db/index.db" def self.reloadable?; false end # Finds instances by file contents. def find_by_contents(query, options = {}) index_searcher ||= Search::IndexSearcher.new(INDEX_DIR) query_parser ||= QueryParser.new(index_searcher.reader.get_field_names.to_a) query = query_parser.parse(query + " +ferret_table:#{self.table_name}") result = [] index_searcher.search_each(query) do |doc, score| id = index_searcher.reader.get_document(doc)[:id] res = self.find(id) result << res if res end return result end # private def ferret_create # code to update or add to the index index ||= Index::Index.new(:key => [:id, :ferret_table], :path => INDEX_DIR, :auto_flush => true) index << self.to_doc end alias :ferret_update :ferret_create def ferret_destroy # code to delete from index index ||= Index::Index.new(:key => [:id, :ferret_table], :path => INDEX_DIR, :auto_flush => true) index.query_delete("+id:#{self.id} +ferret_table:#{self.table_name}") end def to_doc # Churn through the complete Active Record and add it to the Ferret document doc = Ferret::Document::Document.new doc << Ferret::Document::Field.new(:ferret_table, self.table_name, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED) self.attributes.each_pair do |key,val| if key == :id doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED) else doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED) end end return doc end end end end end # reopen ActiveRecord and include all the above to make # them available to all our models if they want it ActiveRecord::Base.class_eval do include FerretMixin::Acts::ARFerret end # END acts_as_ferret.rb
Alternate Version by Thomas Lockney
The code listed above has a few issues as discussed in this email thread. I've been working on some enhancements, but it's still a work in progress. Here's the code I have so far. There are definitely bugs, but I'll update the code here as I work through them and add other features.
A couple of notes about this implementation:
- The class based querying is broken, but then again so is the implementation in the code listed above.
- It would be nice to allow for the use of both the filesystem based indexing AND the in-memory approach, but currently I only allow for a string path to the index. I think this should be a straightforward fix, but it's not in there yet.
- I'm still working on implementing the code that allows for passing a Query object to the find_by_contents method.
- There are certainly a lot of other options for the index that need to be allowed for. I'm thinking that this could be implemented as a hash that can be set in environment.rb and then overridden in the case of per-class indexes.
# CODE for acts_as_ferret.rb require 'active_record' require 'ferret' module FerretMixin module Acts #:nodoc: module ARFerret #:nodoc: mattr_accessor :index_dir @@index_dir ||= "#{RAILS_ROOT}/index" def self.append_features(base) super base.extend(MacroMethods) end # declare the class level helper methods # which will load the relevant instance methods defined below when invoked module MacroMethods def define_to_field_method(field, options = {}) default_opts = { :store => Field::Store::YES, :index => Field::Index::UNTOKENIZED, :term_vector => Field::TermVector::NO, :binary => false, :boost => 1.0} default_opts.update(options) if options.is_a?(Hash) fields_for_ferret << field define_method ("#{field}_to_ferret".to_sym) do val = self[field] || self.instance_variable_get("@#{field.to_s}".to_sym) logger.debug("Adding field #{field} with value '#{val}' to index") Ferret::Document::Field.new(field.to_s, val, default_opts[:store], default_opts[:index], default_opts[:term_vector], default_opts[:binary], default_opts[:boost]) end end def acts_as_ferret(options={}) configuration = {:fields => :all, :index_dir => FerretMixin::Acts::ARFerret::index_dir} configuration.update(options) if options.is_a?(Hash) extend FerretMixin::Acts::ARFerret::SingletonMethods class_eval <<-EOV include FerretMixin::Acts::ARFerret::SingletonMethods after_create :ferret_create after_update :ferret_update after_destroy :ferret_destroy cattr_accessor :fields_for_ferret cattr_accessor :class_index_dir @@fields_for_ferret = Array.new @@class_index_dir = configuration[:index_dir] # private if configuration[:fields].respond_to?(:each_pair) configuration[:fields].each_pair do |key,val| define_to_field_method(key,val) end elsif configuration[:fields].respond_to?(:each) configuration[:fields].each do |field| define_to_field_method(field) end else #need to handle :all case end EOV end end module SingletonMethods include Ferret def self.reloadable?; false end def ferret_index @@index ||= Index::Index.new(:key => [:id, :ferret_class], :path => class_index_dir, :auto_flush => true, :create_if_missing => true) end # Finds instances by file contents. def find_by_contents(q, options = {}) index_searcher ||= Search::IndexSearcher.new(FerretMixin::Acts::ARFerret::index_dir) query_parser ||= QueryParser.new(index_searcher.reader.get_field_names.to_a) query = Search::BooleanQuery.new if (q.is_a?(Search::Query)) query << Search::BooleanClause.new(q) else query << Search::BooleanClause.new(query_parser.parse(q)) end query << Search::BooleanClause.new(Search::TermQuery.new(Index::Term.new("ferret_class", self.class.name))) result = [] index_searcher.search_each(query) do |doc, score| id = index_searcher.reader.get_document(doc)["id"] res = self.find(id) result << res end return result end def ferret_create ferret_index << self.to_doc end alias :ferret_update :ferret_create def ferret_destroy # code to delete from index begin ferret_index.query_delete("+id:#{self.id} +ferret_class:#{self.class.name}") rescue logger.warn("Could not find indexed value for this object") end end def to_doc # Churn through the complete Active Record and add it to the Ferret document doc = Document::Document.new # store the table_name for every item indexed doc << Document::Field.new("ferret_class", "#{self.class.name}", Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED) # store the id of each item doc << Document::Field.new("id", self.id, Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED) # iterate through the fields and add them to the document fields_for_ferret.each do |field| doc << self.send("#{field}_to_ferret") end return doc end end end end end # reopen ActiveRecord and include all the above to make # them available to all our models if they want it ActiveRecord::Base.class_eval do include FerretMixin::Acts::ARFerret end # END acts_as_ferret.rb
Third Version by Jens Kraemer - integrating Ferret with Typo
I just integrated Ferret into my Typo installation, using above acts_as_ferret implementations as a starting point. See this post for more info and the code.
