Ticket #340 (closed enhancement: fixed)
[PATCH] Batch processing functions
| Reported by: | francois.lagunas@… | Owned by: | somebody |
|---|---|---|---|
| Priority: | major | Milestone: | milestone2 |
| Component: | component1 | Version: | 2.0 |
| Keywords: | batch processing | Cc: |
Description
This patch add new functions to batch update and batch delete documents in a ferret index.
They are directly inspired of their single document versions.
The main advantage is that locking and committing costs (file system access mainly) are shared between a full set of documents. In practice, batch updating a few thousands documents at a time lead to a 10x speed-up on indexing.
This can be used in the acts_as_ferret rails plugin, to speed-up re-indexing and other operations :
http://projects.jkraemer.net/acts_as_ferret/ticket/202
Francois Lagunas
Scientific Director, Dailymotion
http://www.tourteaser.com
Attachments
Change History
Changed 3 years ago by francois.lagunas@…
-
attachment
batch_processing.diff
added
comment:1 Changed 3 years ago by dbalmain
- Status changed from new to closed
- Resolution set to fixed
Applied patch and then made significant modifications. Here are some examples.
To batch delete documents you can use the IndexWriter#delete? method;
@index_writer.delete(:id, ['12', '34', '123'])
You can also batch delete with the Ferret::Index::Index#delete method;
# The field used is whatever the :id_field was set to @index.delete(['12', '34', '123']) # You can also batch delete by Ferret document number @index.delete([12, 34, 123])
To batch update you need to use the Index#batch_update method;
@index.batch_update([ {:id => 234, :content => 'yada yada yada'}, {:id => 897, :content => 'blah blah blah'}, {:id => 932, :content => 'nani nani nani'} ])

Patch for batch processing