Ticket #347 (new defect)

Opened 4 months ago

Last modified 3 months ago

Seg Faults When indexing URL

Reported by: hattwj Owned by: somebody
Priority: major Milestone:
Component: component1 Version:
Keywords: Cc:

Description

Hello all,

I am working on indexing a large collection of HTML files, and although this bug wont stop me from working on my project I thought I should let you guys know about it as I have a fairly good idea of what is causing the problem.

The actual code I use to add a document to the index is: @fer_index.add_document(:header=>ff) where ff = "Some long URL"

I get the following error when I try to index a particular string: /home/hattb/ruby1.8/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret/index.rb:298: [BUG] Segmentation fault ruby 1.8.6 (2007-09-23) [x86_64-linux]

The string that I am trying to index is a rather long URL: "http://xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx"

The interesting thing is that if the "http" or ":" are removed there is no longer a segmentation fault, additionally shortening the URL or removing the file name from the URL will not produce a seg fault. Examples:

"http://xxx.xxxx.xxx/xxxxxx/x_x.xxxx" Works!

"dddd://xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx" Works!

"http://xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/" Works!

"http//xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx" Works!

Attachments

Change History

Changed 3 months ago by anonymous

Just tested this on a Mac (Ferret 0.11.6, ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0]) and it works fine... but I was able to duplicate it on a box running a 64-bit version of Ubuntu (Ferret 0.11.6, ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]). Weird stuff...

require 'rubygems' require 'ferret'

ff = "http://xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx"

@fer_index = Ferret::I.new(:path => 'some_new_idx',

:create_if_missing => true, :auto_flush => true)

@fer_index.add_document(:header=>ff)

Add/Change #347 (Seg Faults When indexing URL)

Author



Change Properties
<Author field>
Action
as new
as The resolution will be set. Next status will be 'closed'
to The owner will change from somebody. Next status will be 'new'
The owner will change from somebody to anonymous. Next status will be 'assigned'
 
Note: See TracTickets for help on using tickets.