-
Notifications
You must be signed in to change notification settings - Fork 40
OriginalVersions
These are here mainly for historical reasons, please use the svn repository.
This code was originally published by Kasper Weibel in an email on the Ruby on rails mailing list. It has been modified so that it will work on multiple ActiveRecord Objects. It hasn’t been thoroughly tested yet. The result is the acts_as_ferret Mixin for ActiveRecord. Use it as follows: In any model.rb add acts_as_ferretclass Foo < [[ActiveRecord]]::Base acts_as_ferret endAll CRUD operations will be performed on both ActiveRecord (as usual) and a ferret index for further searching. The following method is available in your controllers:
ActiveRecord::find_by_contents(query) # Query is a string representing your query
The plugin follows the usual plugin structure and consists of 2 files:
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb {RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb
The Ferret DB is stored in:
{RAILS_ROOT}/db/index.db
(Does this hurt scaleability with multiple round-robin servers not sharing a common disk space? Too intensive to fit this into a central DB table?)
Here follows the code:
# CODE for init.rb require 'acts_as_ferret' # END init.rb
# Copyright (c) 2006 Kasper Weibel Nielsen-Refs
# Permission is hereby granted, free of charge, to any person obtaining # a copy of this software and associated documentation files (the # "Software"), to deal in the Software without restriction, including # without limitation the rights to use, copy, modify, merge, publish, # distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to # the following conditions:
# The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# CODE for acts_as_ferret.rb require 'active_record' require 'ferret'
module [[FerretMixin]] module Acts #:nodoc: module ARFerret #:nodoc: def self.append_features(base) super base.extend(MacroMethods) end
# declare the class level helper methods # which will load the relevant instance methods defined below when invoked
module [[MacroMethods]] def acts_as_ferret extend [[FerretMixin]]::Acts::ARFerret::ClassMethods class_eval do include [[FerretMixin]]::Acts::ARFerret::ClassMethods
after_create :ferret_create after_update :ferret_update after_destroy :ferret_destroy end end
end
module [[ClassMethods]] include Ferret
INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
def self.reloadable?; false end # Finds instances by file contents. def find_by_contents(query, options = {}) index_searcher ||= Search::IndexSearcher.new(INDEX_DIR) query_parser ||= [[QueryParser]].new(index_searcher.reader.get_field_names.to_a) query = query_parser.parse(query + " +ferret_table:#{self.table_name}")
result = [] index_searcher.search_each(query) do |doc, score| id = index_searcher.reader.get_document(doc)[:id] res = self.find(id) result << res if res end return result end
# private
def ferret_create # code to update or add to the index index ||= Index::Index.new(:key => [:id, :ferret_table], :path => INDEX_DIR, :auto_flush => true) index << self.to_doc end alias :ferret_update :ferret_create
def ferret_destroy # code to delete from index index ||= Index::Index.new(:key => [:id, :ferret_table], :path => INDEX_DIR, :auto_flush => true) index.query_delete("+id:#{self.id} +ferret_table:#{self.table_name}") end
def to_doc # Churn through the complete Active Record and add it to the Ferret document doc = Ferret::Document::Document.new doc << Ferret::Document::Field.new(:ferret_table, self.table_name, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED) self.attributes.each_pair do |key,val| if key == :id doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED) else doc << Ferret::Document::Field.new(key, val.to_s, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED) end end return doc end end end end end
# reopen ActiveRecord and include all the above to make # them available to all our models if they want it ActiveRecord::Base.class_eval do include FerretMixin::Acts::ARFerret end
# END acts_as_ferret.rb
The code listed above has a few issues as discussed in this email thread. I’ve been working on some enhancements, but it’s still a work in progress. Here’s the code I have so far. There are definitely bugs, but I’ll update the code here as I work through them and add other features.
A couple of notes about this implementation:
- The class based querying is broken, but then again so is the implementation in the code listed above.
- It would be nice to allow for the use of both the filesystem based indexing AND the in-memory approach, but currently I only allow for a string path to the index. I think this should be a straightforward fix, but it’s not in there yet.
- I’m still working on implementing the code that allows for passing a Query object to the find_by_contents method.
- There are certainly a lot of other options for the index that need to be allowed for. I’m thinking that this could be implemented as a hash that can be set in environment.rb and then overridden in the case of per-class indexes.
# CODE for acts_as_ferret.rb require 'active_record' require 'ferret' module [[FerretMixin]] module Acts #:nodoc: module ARFerret #:nodoc: mattr_accessor :index_dir @@index_dir ||= "#{RAILS_ROOT}/index" def self.append_features(base) super base.extend(MacroMethods) end # declare the class level helper methods # which will load the relevant instance methods defined below when invoked module [[MacroMethods]]
def define_to_field_method(field, options = {}) default_opts = { :store => Field::Store::YES, :index => Field::Index::UNTOKENIZED, :term_vector => Field::TermVector::NO, :binary => false, :boost => 1.0} default_opts.update(options) if options.is_a?(Hash) fields_for_ferret << field define_method ("#{field}_to_ferret".to_sym) do val = self[field] || self.instance_variable_get("@#{field.to_s}".to_sym) logger.debug("Adding field #{field} with value '#{val}' to index") Ferret::Document::Field.new(field.to_s, val, default_opts[:store], default_opts[:index], default_opts[:term_vector], default_opts[:binary], default_opts[:boost]) end end
def acts_as_ferret(options={}) configuration = {:fields => :all, :index_dir => [[FerretMixin]]::Acts::ARFerret::index_dir} configuration.update(options) if options.is_a?(Hash) extend [[FerretMixin]]::Acts::ARFerret::SingletonMethods class_eval <<-EOV include [[FerretMixin]]::Acts::ARFerret::SingletonMethods
after_create :ferret_create after_update :ferret_update after_destroy :ferret_destroy
cattr_accessor :fields_for_ferret cattr_accessor :class_index_dir
@@fields_for_ferret = Array.new @@class_index_dir = configuration[:index_dir]
# private if configuration[:fields].respond_to?(:each_pair) configuration[:fields].each_pair do |key,val| define_to_field_method(key,val) end elsif configuration[:fields].respond_to?(:each) configuration[:fields].each do |field| define_to_field_method(field) end else #need to handle :all case end EOV end
end
module [[SingletonMethods]] include Ferret
def self.reloadable?; false end
def ferret_index @@index ||= Index::Index.new(:key => [:id, :ferret_class], :path => class_index_dir, :auto_flush => true, :create_if_missing => true) end
# Finds instances by file contents. def find_by_contents(q, options = {}) index_searcher ||= Search::IndexSearcher.new(FerretMixin::Acts::ARFerret::index_dir) query_parser ||= [[QueryParser]].new(index_searcher.reader.get_field_names.to_a) query = Search::BooleanQuery.new if (q.is_a?(Search::Query)) query << Search::BooleanClause.new(q) else query << Search::BooleanClause.new(query_parser.parse(q)) end query << Search::BooleanClause.new(Search::TermQuery.new(Index::Term.new("ferret_class", self.class.name)))
result = [] index_searcher.search_each(query) do |doc, score| id = index_searcher.reader.get_document(doc)[[id]] res = self.find(id) result << res end return result end
def ferret_create ferret_index << self.to_doc end alias :ferret_update :ferret_create
def ferret_destroy # code to delete from index begin ferret_index.query_delete("+id:#{self.id} +ferret_class:#{self.class.name}") rescue logger.warn("Could not find indexed value for this object") end end
def to_doc # Churn through the complete Active Record and add it to the Ferret document doc = Document::Document.new # store the table_name for every item indexed doc << Document::Field.new("ferret_class", "#{self.class.name}", Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED) # store the id of each item doc << Document::Field.new("id", self.id, Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED) # iterate through the fields and add them to the document fields_for_ferret.each do |field| doc << self.send("#{field}_to_ferret") end return doc end
end end end end
# reopen [[ActiveRecord]] and include all the above to make # them available to all our models if they want it [[ActiveRecord]]::Base.class_eval do include [[FerretMixin]]::Acts::ARFerret end
# END acts_as_ferret.rb
Jens integrated Ferret into his Typo installation, using above acts_as_ferret implementations as a starting point.
See this post for more info and the code.