Ticket #347 (new defect)
Seg Faults When indexing URL
| Reported by: | hattwj | Owned by: | somebody |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | component1 | Version: | |
| Keywords: | Cc: |
Description
Hello all,
I am working on indexing a large collection of HTML files, and although this bug wont stop me from working on my project I thought I should let you guys know about it as I have a fairly good idea of what is causing the problem.
The actual code I use to add a document to the index is: @fer_index.add_document(:header=>ff) where ff = "Some long URL"
I get the following error when I try to index a particular string: /home/hattb/ruby1.8/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret/index.rb:298: [BUG] Segmentation fault ruby 1.8.6 (2007-09-23) [x86_64-linux]
The string that I am trying to index is a rather long URL: "http://xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx"
The interesting thing is that if the "http" or ":" are removed there is no longer a segmentation fault, additionally shortening the URL or removing the file name from the URL will not produce a seg fault. Examples:
"http://xxx.xxxx.xxx/xxxxxx/x_x.xxxx" Works!
"http//xxx.xxxx.xxx/xxxxxx/xxxxxxx_xx.xxxx/xxxxxxxx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxx/xxxxxxx_xx/xxxxxxxx/x_x.xxxx" Works!
