Skip to content
jkraemer edited this page Sep 13, 2010 · 1 revision

Prior Versions of this plugin

These are here mainly for historical reasons, please use the svn repository.

Original code by Kasper Weibel

This code was originally published by Kasper Weibel in an email on the Ruby on rails mailing list. It has been modified so that it will work on multiple ActiveRecord Objects. It hasn’t been thoroughly tested yet. The result is the acts_as_ferret Mixin for ActiveRecord. Use it as follows: In any model.rb add acts_as_ferret
class Foo < [[ActiveRecord]]::Base
   acts_as_ferret
 end
All CRUD operations will be performed on both ActiveRecord (as usual) and a ferret index for further searching. The following method is available in your controllers:
ActiveRecord::find_by_contents(query) # Query is a string representing your query

The plugin follows the usual plugin structure and consists of 2 files:

{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb

The Ferret DB is stored in:

{RAILS_ROOT}/db/index.db

(Does this hurt scaleability with multiple round-robin servers not sharing a common disk space? Too intensive to fit this into a central DB table?)
Here follows the code:

# CODE for init.rb
require 'acts_as_ferret'
# END init.rb
# Copyright (c) 2006 Kasper Weibel Nielsen-Refs
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# CODE for acts_as_ferret.rb
require 'active_record'
require 'ferret'
module [[FerretMixin]]
  module Acts #:nodoc:
     module ARFerret #:nodoc:
       def self.append_features(base)
           super
           base.extend(MacroMethods)
        end
        # declare the class level helper methods
        # which will load the relevant instance methods defined below when invoked
        module [[MacroMethods]]
          def acts_as_ferret
              extend [[FerretMixin]]::Acts::ARFerret::ClassMethods
              class_eval do
                 include [[FerretMixin]]::Acts::ARFerret::ClassMethods
                 after_create :ferret_create
                 after_update :ferret_update
                 after_destroy :ferret_destroy
              end
        end
        end
        module [[ClassMethods]]
           include Ferret
           INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
           def self.reloadable?; false end
          # Finds instances by file contents.
           def find_by_contents(query, options = {})
              index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
              query_parser   ||= [[QueryParser]].new(index_searcher.reader.get_field_names.to_a)
              query = query_parser.parse(query + " +ferret_table:#{self.table_name}")
              result = []
              index_searcher.search_each(query) do |doc, score|
                 id = index_searcher.reader.get_document(doc)[:id]
              res = self.find(id)
                 result << res if res
              end
              return result
           end
           # private
           def ferret_create
              # code to update or add to the index
           index ||= Index::Index.new(:key => [:id, :ferret_table],
                                         :path => INDEX_DIR,
                                         :auto_flush => true)
              index << self.to_doc
           end
           alias :ferret_update :ferret_create
           def ferret_destroy
              # code to delete from index
              index ||= Index::Index.new(:key => [:id, :ferret_table],
                                      :path => INDEX_DIR,
                                         :auto_flush => true)
              index.query_delete("+id:#{self.id} +ferret_table:#{self.table_name}")
           end
           def to_doc
              # Churn through the complete Active Record and add it to the Ferret document
              doc = Ferret::Document::Document.new
              doc << Ferret::Document::Field.new(:ferret_table, self.table_name, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED)
              self.attributes.each_pair do |key,val|
              if key == :id
                    doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED)
                 else
                    doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED)
                 end
              end
              return doc
           end
        end
     end
    end
end
# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include FerretMixin::Acts::ARFerret
end
# END acts_as_ferret.rb

Alternate Version by Thomas Lockney

The code listed above has a few issues as discussed in this email thread. I’ve been working on some enhancements, but it’s still a work in progress. Here’s the code I have so far. There are definitely bugs, but I’ll update the code here as I work through them and add other features.

A couple of notes about this implementation:

  • The class based querying is broken, but then again so is the implementation in the code listed above.
  • It would be nice to allow for the use of both the filesystem based indexing AND the in-memory approach, but currently I only allow for a string path to the index. I think this should be a straightforward fix, but it’s not in there yet.
  • I’m still working on implementing the code that allows for passing a Query object to the find_by_contents method.
  • There are certainly a lot of other options for the index that need to be allowed for. I’m thinking that this could be implemented as a hash that can be set in environment.rb and then overridden in the case of per-class indexes.

Thomas Lockney

# CODE for acts_as_ferret.rb
require 'active_record'
require 'ferret'
module [[FerretMixin]]
  module Acts #:nodoc:
    module ARFerret #:nodoc:
        mattr_accessor :index_dir
        @@index_dir ||= "#{RAILS_ROOT}/index"
      def self.append_features(base)
        super
          base.extend(MacroMethods)
        end
       # declare the class level helper methods
        # which will load the relevant instance methods defined below when invoked
        module [[MacroMethods]]
          def define_to_field_method(field, options = {})
            default_opts = { :store => Field::Store::YES,
                             :index => Field::Index::UNTOKENIZED,
                             :term_vector => Field::TermVector::NO,
                             :binary => false,
                          :boost => 1.0}
            default_opts.update(options) if options.is_a?(Hash)
            fields_for_ferret << field
            define_method ("#{field}_to_ferret".to_sym) do
                val = self[field] || self.instance_variable_get("@#{field.to_s}".to_sym)
                logger.debug("Adding field #{field} with value '#{val}' to index")
                Ferret::Document::Field.new(field.to_s,
                                            val,
                                            default_opts[:store],
                                            default_opts[:index],
                                         default_opts[:term_vector],
                                            default_opts[:binary],
                                            default_opts[:boost])
            end
          end
          def acts_as_ferret(options={})
            configuration = {:fields => :all, :index_dir => [[FerretMixin]]::Acts::ARFerret::index_dir}
            configuration.update(options) if options.is_a?(Hash)
            extend [[FerretMixin]]::Acts::ARFerret::SingletonMethods
         class_eval <<-EOV
              include [[FerretMixin]]::Acts::ARFerret::SingletonMethods
              after_create :ferret_create
              after_update :ferret_update
              after_destroy :ferret_destroy
              cattr_accessor :fields_for_ferret
              cattr_accessor :class_index_dir
           @@fields_for_ferret = Array.new
              @@class_index_dir = configuration[:index_dir]
              # private
              if configuration[:fields].respond_to?(:each_pair)
                configuration[:fields].each_pair do |key,val|
                  define_to_field_method(key,val)
                end
              elsif configuration[:fields].respond_to?(:each)
                configuration[:fields].each do |field|
                     define_to_field_method(field)
                end
              else
                #need to handle :all case
              end
            EOV
          end
        end
     module [[SingletonMethods]]
          include Ferret
          def self.reloadable?; false end
          def ferret_index
            @@index ||= Index::Index.new(:key => [:id, :ferret_class],
                                         :path => class_index_dir,
                                         :auto_flush => true,
                                         :create_if_missing => true)
       end
          # Finds instances by file contents.
          def find_by_contents(q, options = {})
            index_searcher ||= Search::IndexSearcher.new(FerretMixin::Acts::ARFerret::index_dir)
            query_parser   ||= [[QueryParser]].new(index_searcher.reader.get_field_names.to_a)
            query = Search::BooleanQuery.new
            if (q.is_a?(Search::Query))
              query << Search::BooleanClause.new(q)
            else
           query << Search::BooleanClause.new(query_parser.parse(q))
            end
            query << Search::BooleanClause.new(Search::TermQuery.new(Index::Term.new("ferret_class", self.class.name)))
            result = []
            index_searcher.search_each(query) do |doc, score|
              id = index_searcher.reader.get_document(doc)[[id]]
              res = self.find(id)
              result << res
            end
         return result
          end
          def ferret_create
            ferret_index << self.to_doc
          end
          alias :ferret_update :ferret_create
          def ferret_destroy
            # code to delete from index
         begin
              ferret_index.query_delete("+id:#{self.id} +ferret_class:#{self.class.name}")
            rescue
              logger.warn("Could not find indexed value for this object")
            end
          end
          def to_doc
            # Churn through the complete Active Record and add it to the Ferret document
            doc = Document::Document.new
         # store the table_name for every item indexed
            doc << Document::Field.new("ferret_class", "#{self.class.name}", Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED)
            # store the id of each item
            doc << Document::Field.new("id", self.id, Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED)
            # iterate through the fields and add them to the document
            fields_for_ferret.each do |field|
                doc << self.send("#{field}_to_ferret")
            end
            return doc
          end
        end
     end
  end
end
# reopen [[ActiveRecord]] and include all the above to make
# them available to all our models if they want it
[[ActiveRecord]]::Base.class_eval do
  include [[FerretMixin]]::Acts::ARFerret
end
# END acts_as_ferret.rb

Third Version by Jens Kraemer – integrating Ferret with Typo

Jens integrated Ferret into his Typo installation, using above acts_as_ferret implementations as a starting point.
See this post for more info and the code.