Class: Ferret::Index::Index

This is a simplified interface to the index. See the TUTORIAL for more information on how to use this class.

Attributes

NameRead/write?
options R

Public Class Methods


new (options = {}) {|self| ...}

If you create an Index without any options, it‘ll simply create an index in memory. But this class is highly configurable and every option that you can supply to IndexWriter and QueryParser, you can also set here. Please look at the options for the constructors to these classes.

Options

See;

default_input_field:Default: "id". This specifies the default field that will be used when you add a simple string to the index using add_document or <<.
id_field:Default: "id". This field is as the field to search when doing searches on a term. For example, if you do a lookup by term "cat", ie index["cat"], this will be the field that is searched.
key:Default: nil. Expert: This should only be used if you really know what you are doing. Basically you can set a field or an array of fields to be the key for the index. So if you add a document with a same key as an existing document, the existing document will be replaced by the new object. Using a multiple field key will slow down indexing so it should not be done if performance is a concern. A single field key (or id) should be find however. Also, you must make sure that your key/keys are either untokenized or that they are not broken up by the analyzer.
auto_flush:Default: false. Set this option to true if you want the index automatically flushed every time you do a write (includes delete) to the index. This is useful if you have multiple processes accessing the index and you don‘t want lock errors. Setting :auto_flush to true has a huge performance impact so don‘t use it if you are concerned about performance. In that case you should think about setting up a DRb indexing service.
lock_retry_time:Default: 2 seconds. This parameter specifies how long to wait before retrying to obtain the commit lock when detecting if the IndexReader is at the latest version.
close_dir:Default: false. If you explicitly pass a Directory object to this class and you want Index to close it when it is closed itself then set this to true.
use_typed_range_query:Default: true. Use TypedRangeQuery instead of the standard RangeQuery when parsing range queries. This is useful if you have number fields which you want to perform range queries on. You won‘t need to pad or normalize the data in the field in anyway to get correct results. However, performance will be a lot slower for large indexes, hence the default.

Examples

  index = Index::Index.new(:analyzer => WhiteSpaceAnalyzer.new())

  index = Index::Index.new(:path => '/path/to/index',
                           :create_if_missing => false,
                           :auto_flush => true)

  index = Index::Index.new(:dir => directory,
                           :default_slop => 2,
                           :handle_parse_errors => false)

You can also pass a block if you like. The index will be yielded and closed at the index of the box. For example;

  Ferret::I.new() do |index|
    # do stuff with index. Most of your actions will be cached.
  end
     # File lib/ferret/index.rb, line 105
105:     def initialize(options = {}, &block)
106:       super()
107: 
108:       if options[:key]
109:         @key = options[:key]
110:         if @key.is_a?(Array)
111:           @key.flatten.map {|k| k.to_s.intern}
112:         end
113:       else
114:         @key = nil
115:       end
116: 
117:       if (fi = options[:field_infos]).is_a?(String)
118:         options[:field_infos] = FieldInfos.load(fi)
119:       end
120: 
121:       @close_dir = options[:close_dir]
122:       if options[:dir].is_a?(String)
123:         options[:path] = options[:dir]
124:       end
125:       if options[:path]
126:         @close_dir = true
127:         begin
128:           @dir = FSDirectory.new(options[:path], options[:create])
129:         rescue IOError => io
130:           @dir = FSDirectory.new(options[:path],
131:                                  options[:create_if_missing] != false)
132:         end
133:       elsif options[:dir]
134:         @dir = options[:dir]
135:       else
136:         options[:create] = true # this should always be true for a new RAMDir
137:         @close_dir = true
138:         @dir = RAMDirectory.new
139:       end
140: 
141:       @dir.extend(MonitorMixin).extend(SynchroLockMixin)
142:       options[:dir] = @dir
143:       options[:lock_retry_time]||= 2
144:       @options = options
145:       if (!@dir.exists?("segments")) || options[:create]
146:         IndexWriter.new(options).close
147:       end
148:       options[:analyzer]||= Ferret::Analysis::StandardAnalyzer.new
149:       if options[:use_typed_range_query].nil?
150:         options[:use_typed_range_query] = true
151:       end
152: 
153:       @searcher = nil
154:       @writer = nil
155:       @reader = nil
156: 
157:       @options.delete(:create) # only create the first time if at all
158:       @auto_flush = @options[:auto_flush] || false
159:       if (@options[:id_field].nil? and @key.is_a?(Symbol))
160:         @id_field = @key
161:       else
162:         @id_field = @options[:id_field] || :id
163:       end
164:       @default_field = (@options[:default_field]||= :*)
165:       @default_input_field = options[:default_input_field] || @id_field
166: 
167:       if @default_input_field.respond_to?(:intern)
168:         @default_input_field = @default_input_field.intern
169:       end
170:       @open = true
171:       @qp = nil
172:       if block
173:         yield self
174:         self.close
175:       end
176:     end

Public Instance Methods



[] (*arg)

Alias for doc


add_document (doc, analyzer = nil)

Adds a document to this index, using the provided analyzer instead of the local analyzer if provided. If the document contains more than IndexWriter::MAX_FIELD_LENGTH terms for a given field, the remainder are discarded.

There are three ways to add a document to the index. To add a document you can simply add a string or an array of strings. This will store all the strings in the "" (ie empty string) field (unless you specify the default_field when you create the index).

  index << "This is a new document to be indexed"
  index << ["And here", "is another", "new document", "to be indexed"]

But these are pretty simple documents. If this is all you want to index you could probably just use SimpleSearch. So let‘s give our documents some fields;

  index << {:title => "Programming Ruby", :content => "blah blah blah"}
  index << {:title => "Programming Ruby", :content => "yada yada yada"}

Or if you are indexing data stored in a database, you‘ll probably want to store the id;

  index << {:id => row.id, :title => row.title, :date => row.date}

See FieldInfos for more information on how to set field properties.

     # File lib/ferret/index.rb, line 277
277:     def add_document(doc, analyzer = nil)
278:       @dir.synchrolock do
279:         ensure_writer_open()
280:         if doc.is_a?(String) or doc.is_a?(Array)
281:           doc = {@default_input_field => doc}
282:         end
283: 
284:         # delete existing documents with the same key
285:         if @key
286:           if @key.is_a?(Array)
287:             query = @key.inject(BooleanQuery.new()) do |bq, field|
288:               bq.add_query(TermQuery.new(field, doc[field].to_s), :must)
289:               bq
290:             end
291:             query_delete(query)
292:           else
293:             id = doc[@key].to_s
294:             if id
295:               ensure_writer_open()
296:               @writer.delete(@key, id)
297:               @writer.commit
298:             end
299:           end
300:         end
301:         ensure_writer_open()
302: 
303:         if analyzer
304:           old_analyzer = @writer.analyzer
305:           @writer.analyzer = analyzer
306:           @writer.add_document(doc)
307:           @writer.analyzer = old_analyzer
308:         else
309:           @writer.add_document(doc)
310:         end
311: 
312:         flush() if @auto_flush
313:       end
314:     end

add_indexes (indexes)

Merges all segments from an index or an array of indexes into this index. You can pass a single Index::Index, Index::Reader, Store::Directory or an array of any single one of these.

This may be used to parallelize batch indexing. A large document collection can be broken into sub-collections. Each sub-collection can be indexed in parallel, on a different thread, process or machine and perhaps all in memory. The complete index can then be created by merging sub-collection indexes with this method.

After this completes, the index is optimized.

     # File lib/ferret/index.rb, line 772
772:     def add_indexes(indexes)
773:       @dir.synchrolock do
774:         ensure_writer_open()
775:         indexes = [indexes].flatten   # make sure we have an array
776:         return if indexes.size == 0 # nothing to do
777:         if indexes[0].is_a?(Index)
778:           indexes.delete(self) # don't merge with self
779:           indexes = indexes.map {|index| index.reader }
780:         elsif indexes[0].is_a?(Ferret::Store::Directory)
781:           indexes.delete(@dir) # don't merge with self
782:           indexes = indexes.map {|dir| IndexReader.new(dir) }
783:         elsif indexes[0].is_a?(IndexReader)
784:           indexes.delete(@reader) # don't merge with self
785:         else
786:           raise ArgumentError, "Unknown index type when trying to merge indexes"
787:         end
788:         ensure_writer_open
789:         @writer.add_readers(indexes)
790:       end
791:     end

batch_update (docs)

Batch updates the documents in an index. You can pass either a Hash or an Array.

Array (recommended)

If you pass an Array then each value needs to be a Document or a Hash and each of those documents must have an +:id_field+ which will be used to delete the old document that this document is replacing.

Hash

If you pass a Hash then the keys of the Hash will be considered the id‘s and the values will be the new documents to replace the old ones with.If the id is an Integer then it is considered a Ferret document number and the corresponding document will be deleted. If the id is a String or a Symbol then the id will be considered a term and the documents that contain that term in the +:id_field+ will be deleted.

Note: No error will be raised if the document does not currently exist. A new document will simply be created.

Examples

  # will replace the documents with the +id+'s id:133 and id:254
  @index.batch_update({
      '133' => {:id => '133', :content => 'yada yada yada'},
      '253' => {:id => '253', :content => 'bla bla bal'}
    })

  # will replace the documents with the Ferret Document numbers 2 and 92
  @index.batch_update({
      2  => {:id => '133', :content => 'yada yada yada'},
      92 => {:id => '253', :content => 'bla bla bal'}
    })

  # will replace the documents with the +id+'s id:133 and id:254
  # this is recommended as it guarantees no duplicate keys
  @index.batch_update([
      {:id => '133', :content => 'yada yada yada'},
      {:id => '253', :content => 'bla bla bal'}
    ])
docs:A Hash of id/document pairs. The set of documents to be updated
     # File lib/ferret/index.rb, line 642
642:     def batch_update(docs)
643:       @dir.synchrolock do
644:         ids = values = nil
645:         case docs
646:         when Array
647:           ids = docs.collect{|doc| doc[@id_field].to_s}
648:           if ids.include?(nil)
649:             raise ArgumentError, "all documents must have an #{@id_field} " 
650:                                  "field when doing a batch update"
651:           end
652:         when Hash
653:           ids = docs.keys
654:           docs = docs.values
655:         else
656:           raise ArgumentError, "must pass Hash or Array, not #{docs.class}"
657:         end
658:         batch_delete(ids)
659:         ensure_writer_open()
660:         docs.each {|new_doc| @writer << new_doc }
661:         flush()
662:       end
663:     end

close ()

Closes this index by closing its associated reader and writer objects.

     # File lib/ferret/index.rb, line 216
216:     def close
217:       @dir.synchronize do
218:         if not @open
219:           raise(StandardError, "tried to close an already closed directory")
220:         end
221:         @searcher.close() if @searcher
222:         @reader.close() if @reader
223:         @writer.close() if @writer
224:         @dir.close() if @close_dir
225: 
226:         @open = false
227:       end
228:     end

commit ()

Alias for flush


delete (arg)

Deletes a document/documents from the index. The method for determining the document to delete depends on the type of the argument passed.

If arg is an Integer then delete the document based on the internal document number. Will raise an error if the document does not exist.

If arg is a String then search for the documents with arg in the id field. The id field is either :id or whatever you set +:id_field+ parameter to when you create the Index object. Will fail quietly if the no document exists.

If arg is a Hash or an Array then a batch delete will be performed. If arg is an Array then it will be considered an array of id‘s. If it is a Hash, then its keys will be used instead as the Array of document id‘s. If the id is an Integer then it is considered a Ferret document number and the corresponding document will be deleted. If the id is a String or a Symbol then the id will be considered a term and the documents that contain that term in the +:id_field+ will be deleted.

     # File lib/ferret/index.rb, line 533
533:     def delete(arg)
534:       @dir.synchrolock do
535:         if arg.is_a?(String) or arg.is_a?(Symbol)
536:           ensure_writer_open()
537:           @writer.delete(@id_field, arg.to_s)
538:         elsif arg.is_a?(Integer)
539:           ensure_reader_open()
540:           cnt = @reader.delete(arg)
541:         elsif arg.is_a?(Hash) or arg.is_a?(Array)
542:           batch_delete(arg)
543:         else
544:           raise ArgumentError, "Cannot delete for arg of type #{arg.class}"
545:         end
546:         flush() if @auto_flush
547:       end
548:       return self
549:     end

deleted? (n)

Returns true if document n has been deleted

     # File lib/ferret/index.rb, line 569
569:     def deleted?(n)
570:       @dir.synchronize do 
571:         ensure_reader_open()
572:         return @reader.deleted?(n) 
573:       end
574:     end

doc (*arg)

Retrieves a document/documents from the index. The method for retrieval depends on the type of the argument passed.

If arg is an Integer then return the document based on the internal document number.

If arg is a Range, then return the documents within the range based on internal document number.

If arg is a String then search for the first document with arg in the id field. The id field is either :id or whatever you set +:id_field+ parameter to when you create the Index object.

     # File lib/ferret/index.rb, line 467
467:     def doc(*arg)
468:       @dir.synchronize do
469:         id = arg[0]
470:         if id.kind_of?(String) or id.kind_of?(Symbol)
471:           ensure_reader_open()
472:           term_doc_enum = @reader.term_docs_for(@id_field, id.to_s)
473:           return term_doc_enum.next? ? @reader[term_doc_enum.doc] : nil
474:         else
475:           ensure_reader_open(false)
476:           return @reader[*arg]
477:         end
478:       end
479:     end

each () {|@reader[i].load unless @reader.deleted?(i)| ...}

iterate through all documents in the index. This method preloads the documents so you don‘t need to call load on the document to load all the fields.

     # File lib/ferret/index.rb, line 505
505:     def each
506:       @dir.synchronize do
507:         ensure_reader_open
508:         (0...@reader.max_doc).each do |i|
509:           yield @reader[i].load unless @reader.deleted?(i)
510:         end
511:       end
512:     end

explain (query, doc)

Returns an Explanation that describes how doc scored against query.

This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.

     # File lib/ferret/index.rb, line 838
838:     def explain(query, doc)
839:       @dir.synchronize do
840:         ensure_searcher_open()
841:         query = do_process_query(query)
842: 
843:         return @searcher.explain(query, doc)
844:       end
845:     end

field_infos ()

Returns the field_infos object so that you can add new fields to the index.

     # File lib/ferret/index.rb, line 857
857:     def field_infos
858:       @dir.synchrolock do
859:         ensure_writer_open()
860:         return @writer.field_infos
861:       end
862:     end

flush ()

Flushes all writes to the index. This will not optimize the index but it will make sure that all writes are written to it.

NOTE: this is not necessary if you are only using this class. All writes will automatically flush when you perform an operation that reads the index.

     # File lib/ferret/index.rb, line 727
727:     def flush()
728:       @dir.synchronize do
729:         if @reader
730:           if @searcher
731:             @searcher.close
732:             @searcher = nil
733:           end
734:           @reader.commit
735:         elsif @writer
736:           @writer.commit
737:         end
738:       end
739:     end

has_deletions? ()

Returns true if any documents have been deleted since the index was last flushed.

     # File lib/ferret/index.rb, line 714
714:     def has_deletions?()
715:       @dir.synchronize do
716:         ensure_reader_open()
717:         return @reader.has_deletions?
718:       end
719:     end

highlight (query, doc_id, options = {})

Returns an array of strings with the matches highlighted. The query can either a query String or a Ferret::Search::Query object. The doc_id is the id of the document you want to highlight (usually returned by the search methods). There are also a number of options you can pass;

Options

field:Default: @options[:default_field]. The default_field is the field that is usually highlighted but you can specify which field you want to highlight here. If you want to highlight multiple fields then you will need to call this method multiple times.
excerpt_length:Default: 150. Length of excerpt to show. Highlighted terms will be in the centre of the excerpt. Set to :all to highlight the entire field.
num_excerpts:Default: 2. Number of excerpts to return.
pre_tag:Default: "<b>". Tag to place to the left of the match. You‘ll probably want to change this to a "<span>" tag with a class. Try "\033[36m" for use in a terminal.
post_tag:Default: "</b>". This tag should close the +:pre_tag+. Try tag "\033[m" in the terminal.
ellipsis:Default: "…". This is the string that is appended at the beginning and end of excerpts (unless the excerpt hits the start or end of the field. Alternatively you may want to use the HTML entity &8230; or the UTF-8 string "\342\200\246".
     # File lib/ferret/index.rb, line 205
205:     def highlight(query, doc_id, options = {})
206:       @dir.synchronize do
207:         ensure_searcher_open()
208:         @searcher.highlight(do_process_query(query),
209:                             doc_id,
210:                             options[:field]||@options[:default_field],
211:                             options)
212:       end
213:     end

optimize ()

optimizes the index. This should only be called when the index will no longer be updated very often, but will be read a lot.

     # File lib/ferret/index.rb, line 744
744:     def optimize()
745:       @dir.synchrolock do
746:         ensure_writer_open()
747:         @writer.optimize()
748:         @writer.close()
749:         @writer = nil
750:       end
751:     end

persist (directory, create = true)

This is a simple utility method for saving an in memory or RAM index to the file system. The same thing can be achieved by using the Index::Index#add_indexes method and you will have more options when creating the new index, however this is a simple way to turn a RAM index into a file system index.

directory:This can either be a Store::Directory object or a String representing the path to the directory where you would like to store the index.
create:True if you‘d like to create the directory if it doesn‘t exist or copy over an existing directory. False if you‘d like to merge with the existing directory. This defaults to false.
     # File lib/ferret/index.rb, line 807
807:     def persist(directory, create = true)
808:       synchronize do
809:         close_all()
810:         old_dir = @dir
811:         if directory.is_a?(String)
812:           @dir = FSDirectory.new(directory, create)
813:         elsif directory.is_a?(Ferret::Store::Directory)
814:           @dir = directory
815:         end
816:         @dir.extend(MonitorMixin).extend(SynchroLockMixin)
817:         @options[:dir] = @dir
818:         @options[:create_if_missing] = true
819:         add_indexes([old_dir])
820:       end
821:     end

process_query (query)

Turn a query string into a Query object with the Index‘s QueryParser

     # File lib/ferret/index.rb, line 848
848:     def process_query(query)
849:       @dir.synchronize do
850:         ensure_searcher_open()
851:         return do_process_query(query)
852:       end
853:     end

query_delete (query)

Delete all documents returned by the query.

query:The query to find documents you wish to delete. Can either be a string (in which case it is parsed by the standard query parser) or an actual query object.
     # File lib/ferret/index.rb, line 556
556:     def query_delete(query)
557:       @dir.synchrolock do
558:         ensure_writer_open()
559:         ensure_searcher_open()
560:         query = do_process_query(query)
561:         @searcher.search_each(query, :limit => :all) do |doc, score|
562:           @reader.delete(doc)
563:         end
564:         flush() if @auto_flush
565:       end
566:     end

query_update (query, new_val)

Update all the documents returned by the query.

query:The query to find documents you wish to update. Can either be a string (in which case it is parsed by the standard query parser) or an actual query object.
new_val:The values we are updating. This can be a string in which case the default field is updated, or it can be a hash, in which case, all fields in the hash are merged into the old hash. That is, the old fields are replaced by values in the new hash if they exist.

Example

  index << {:id => "26", :title => "Babylon", :artist => "David Grey"}
  index << {:id => "29", :title => "My Oh My", :artist => "David Grey"}

  # correct
  index.query_update('artist:"David Grey"', {:artist => "David Gray"})

  index["26"]
    #=> {:id => "26", :title => "Babylon", :artist => "David Gray"}
  index["28"]
    #=> {:id => "28", :title => "My Oh My", :artist => "David Gray"}
     # File lib/ferret/index.rb, line 690
690:     def query_update(query, new_val)
691:       @dir.synchrolock do
692:         ensure_writer_open()
693:         ensure_searcher_open()
694:         docs_to_add = []
695:         query = do_process_query(query)
696:         @searcher.search_each(query, :limit => :all) do |id, score|
697:           document = @searcher[id].load
698:           if new_val.is_a?(Hash)
699:             document.merge!(new_val)
700:           else new_val.is_a?(String) or new_val.is_a?(Symbol)
701:             document[@default_input_field] = new_val.to_s
702:           end
703:           docs_to_add << document
704:           @reader.delete(id)
705:         end
706:         ensure_writer_open()
707:         docs_to_add.each {|doc| @writer << doc }
708:         flush() if @auto_flush
709:       end
710:     end

reader ()

Get the reader for this index.

NOTE:This will close the writer from this index.
     # File lib/ferret/index.rb, line 232
232:     def reader
233:       ensure_reader_open()
234:       return @reader
235:     end

scan (query, options = {})

Run a query through the Searcher on the index, ignoring scoring and starting at +:start_doc+ and stopping when +:limit+ matches have been found. It returns an array of the matching document numbers.

There is a big performance advange when using this search method on a very large index when there are potentially thousands of matching documents and you only want say 50 of them. The other search methods need to look at every single match to decide which one has the highest score. This search method just needs to find +:limit+ number of matches before it returns.

Options

start_doc:Default: 0. The start document to start the search from. NOTE very carefully that this is not the same as the +:offset+ parameter used in the other search methods which refers to the offset in the result-set. This is the document to start the scan from. So if you scanning through the index in increments of 50 documents at a time you need to use the last matched doc in the previous search to start your next search. See the example below.
limit:Default: 50. This is the number of results you want returned, also called the page size. Set +:limit+ to +:all+ to return all results.

TODO: add option to return loaded documents instead

Options

  start_doc = 0
  begin
    results = @searcher.scan(query, :start_doc => start_doc)
    yield results # or do something with them
    start_doc = results.last
    # start_doc will be nil now if results is empty, ie no more matches
  end while start_doc
     # File lib/ferret/index.rb, line 446
446:     def scan(query, options = {})
447:       @dir.synchronize do
448:         ensure_searcher_open()
449:         query = do_process_query(query)
450: 
451:         @searcher.scan(query, options)
452:       end
453:     end

search (query, options = {})

Run a query through the Searcher on the index. A TopDocs object is returned with the relevant results. The query is a built in Query object or a query string that can be parsed by the Ferret::QueryParser. Here are the options;

Options

offset:Default: 0. The offset of the start of the section of the result-set to return. This is used for paging through results. Let‘s say you have a page size of 10. If you don‘t find the result you want among the first 10 results then set +:offset+ to 10 and look at the next 10 results, then 20 and so on.
limit:Default: 10. This is the number of results you want returned, also called the page size. Set +:limit+ to +:all+ to return all results
sort:A Sort object or sort string describing how the field should be sorted. A sort string is made up of field names which cannot contain spaces and the word "DESC" if you want the field reversed, all separated by commas. For example; "rating DESC, author, title". Note that Ferret will try to determine a field‘s type by looking at the first term in the index and seeing if it can be parsed as an integer or a float. Keep this in mind as you may need to specify a fields type to sort it correctly. For more on this, see the documentation for SortField
filter:a Filter object to filter the search results with
filter_proc:a filter Proc is a Proc which takes the doc_id, the score and the Searcher object as its parameters and returns a Boolean value specifying whether the result should be included in the result set.
     # File lib/ferret/index.rb, line 348
348:     def search(query, options = {})
349:       @dir.synchronize do
350:         return do_search(query, options)
351:       end
352:     end

search_each (query, options = {}) {|doc, score| ...}

Run a query through the Searcher on the index. A TopDocs object is returned with the relevant results. The query is a Query object or a query string that can be validly parsed by the Ferret::QueryParser. The Searcher#search_each method yields the internal document id (used to reference documents in the Searcher object like this; +searcher[doc_id]+) and the search score for that document. It is possible for the score to be greater than 1.0 for some queries and taking boosts into account. This method will also normalize scores to the range 0.0..1.0 when the max-score is greater than 1.0. Here are the options;

Options

offset:Default: 0. The offset of the start of the section of the result-set to return. This is used for paging through results. Let‘s say you have a page size of 10. If you don‘t find the result you want among the first 10 results then set +:offset+ to 10 and look at the next 10 results, then 20 and so on.
limit:Default: 10. This is the number of results you want returned, also called the page size. Set +:limit+ to +:all+ to return all results
sort:A Sort object or sort string describing how the field should be sorted. A sort string is made up of field names which cannot contain spaces and the word "DESC" if you want the field reversed, all separated by commas. For example; "rating DESC, author, title". Note that Ferret will try to determine a field‘s type by looking at the first term in the index and seeing if it can be parsed as an integer or a float. Keep this in mind as you may need to specify a fields type to sort it correctly. For more on this, see the documentation for SortField
filter:a Filter object to filter the search results with
filter_proc:a filter Proc is a Proc which takes the doc_id, the score and the Searcher object as its parameters and returns a Boolean value specifying whether the result should be included in the result set.
returns:The total number of hits.

Example

eg.

  index.search_each(query, options = {}) do |doc, score|
    puts "hit document number #{doc} with a score of #{score}"
  end
     # File lib/ferret/index.rb, line 400
400:     def search_each(query, options = {}) # :yield: doc, score
401:       @dir.synchronize do
402:         ensure_searcher_open()
403:         query = do_process_query(query)
404: 
405:         @searcher.search_each(query, options) do |doc, score|
406:           yield doc, score
407:         end
408:       end
409:     end

searcher ()

Get the searcher for this index.

NOTE:This will close the writer from this index.
     # File lib/ferret/index.rb, line 239
239:     def searcher
240:       ensure_searcher_open()
241:       return @searcher
242:     end

size ()

returns the number of documents in the index

     # File lib/ferret/index.rb, line 754
754:     def size()
755:       @dir.synchronize do
756:         ensure_reader_open()
757:         return @reader.num_docs()
758:       end
759:     end

term_vector (id, field)

Retrieves the term_vector for a document. The document can be referenced by either a string id to match the id field or an integer corresponding to Ferret‘s document number.

See Ferret::Index::IndexReader#term_vector

     # File lib/ferret/index.rb, line 487
487:     def term_vector(id, field)
488:       @dir.synchronize do
489:         ensure_reader_open()
490:         if id.kind_of?(String) or id.kind_of?(Symbol)
491:           term_doc_enum = @reader.term_docs_for(@id_field, id.to_s)
492:           if term_doc_enum.next?
493:             id = term_doc_enum.doc
494:           else
495:             return nil
496:           end
497:         end
498:         return @reader.term_vector(id, field)
499:       end
500:     end

to_s ()

     # File lib/ferret/index.rb, line 823
823:     def to_s
824:       buf = ""
825:       (0...(size)).each do |i|
826:         buf << self[i].to_s + "\n" if not deleted?(i)
827:       end
828:       buf
829:     end

update (id, new_doc)

Update the document referenced by the document number id if id is an integer or all of the documents which have the term id if id is a term.. For batch update of set of documents, for performance reasons, see batch_update

id:The number of the document to update. Can also be a string representing the value in the id field. Also consider using the :key attribute.
new_doc:The document to replace the old document with
     # File lib/ferret/index.rb, line 585
585:     def update(id, new_doc)
586:       @dir.synchrolock do
587:         ensure_writer_open()
588:         delete(id)
589:         if id.is_a?(String) or id.is_a?(Symbol)
590:           @writer.commit
591:         else
592:           ensure_writer_open()
593:         end
594:         @writer << new_doc
595:         flush() if @auto_flush
596:       end
597:     end

writer ()

Get the writer for this index.

NOTE:This will close the reader from this index.
     # File lib/ferret/index.rb, line 246
246:     def writer
247:       ensure_writer_open()
248:       return @writer
249:     end

Protected Instance Methods


ensure_reader_open (get_latest = true)

returns the new reader if one is opened

     # File lib/ferret/index.rb, line 879
879:       def ensure_reader_open(get_latest = true)
880:         raise "tried to use a closed index" if not @open
881:         if @reader
882:           if get_latest
883:             latest = false
884:             begin
885:               latest = @reader.latest?
886:             rescue Lock::LockError => le
887:               sleep(@options[:lock_retry_time]) # sleep for 2 seconds and try again
888:               latest = @reader.latest?
889:             end
890:             if not latest
891:               @searcher.close if @searcher
892:               @reader.close
893:               return @reader = IndexReader.new(@dir)
894:             end
895:           end
896:         else
897:           if @writer
898:             @writer.close
899:             @writer = nil
900:           end
901:           return @reader = IndexReader.new(@dir)
902:         end
903:         return false
904:       end

ensure_searcher_open ()

     # File lib/ferret/index.rb, line 906
906:       def ensure_searcher_open()
907:         raise "tried to use a closed index" if not @open
908:         if ensure_reader_open() or not @searcher
909:           @searcher = Searcher.new(@reader)
910:         end
911:       end

ensure_writer_open ()

     # File lib/ferret/index.rb, line 866
866:       def ensure_writer_open()
867:         raise "tried to use a closed index" if not @open
868:         return if @writer
869:         if @reader
870:           @searcher.close if @searcher
871:           @reader.close
872:           @reader = nil
873:           @searcher = nil
874:         end
875:         @writer = IndexWriter.new(@options)
876:       end