Definitions:

What is Ferret?

Ferret is a high-performance, full-featured text search engine library written for Ruby. It is inspired by the Apache Lucene Java project. It is mostly written in C making it one of the fastest search libraries available.

What is Lucene?

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

What is cFerret?

cFerret is the C code that Ferret is bound to. It has no Ruby-specific code and it would be a reasonably simple task to write bindings for Ferret in another language such as Perl or Python. It is currently in development but should be quite stable as of ferret 0.10. It improves the index/search performance dramatically compared to the ruby only version.

What is a document?

Documents are the unit of indexing and search. Usually Documents are just Hashes but Ferret also comes with a Document class which extends Hash by adding a boost attribute. Everytime you wish to index some data, you index a document. And everytime you search for a query, an array of documents will be returned (actually, you will get an array of LazyDocs? which extend Hash be lazily loading fields when they are requested). Furthermore a Document is a collection of fields. See the Indexing Section for some examples.

What is a field?

A field is a section of a Document. Ferret's Field class extends Array by adding a boost property but you can simply use a String or and Array of Strings in place of a Field. Whenever you wish to index a document, you will add fields to the document before adding the document to the index. Fields are important for your queries. You might want to search for certain queries in certain fields, e.g. searching for 'ferret' in the field :title, but not in other fields like :content. If you choose to store the field in the index (see fieldinfo), the returning LazyDocs? will include the values for the fields. See the Indexing Section for some examples.

What is a field-info?

A FieldInfo is a set of parameters describing a field. When you build your index, you should think about all fields you are going to added. If you do not define a special FieldInfo for a field you add to a document later on, it will be indexed using the default FieldInfo properties. You can set the properties :store (should the value of the field be stored in the index), :index (should this field be searchable, should it be tokenized) and 'term_vector' (should TermVectors be stored for this field).

What is a term-vector?

TermVectors hold a record of all the terms that appear in a field. They include term frequences for each term in the field so you could use a TermVector? to create a tag-cloud for a document. The can also optionally include positions and term-offsets for each term.

What are term-offsets?

Term offsets are the start and end byte offsets of a term in a field. So in the phrase "the quick brown fox", the terms "the", "quick", "brown", "fox" would have the offsets (0,3), (4,9), (10,15), (16,19) respectively.