Ticket #68 (closed defect: wontfix)

Opened 4 years ago

Last modified 3 years ago

Exponential behaviour in StandardTokenizer

Reported by: anonymous Owned by: somebody
Priority: critical Milestone:
Component: component1 Version:
Keywords: Cc:

Description

the regular expression used in StandardTokenizer? shows exponential behaviour using simple strings [every added underscore in given example will double processing time]

too reproduce:

def test_lots_of_underscore()
    sa = StandardAnalyzer.new
   
    input = "_________________________"

    t = sa.token_stream("field", input)
   
    now = Time.new
    t.each do |token| 
        puts "#{token.text}"
    end  
    
    assert Time.new - now < 1, "tokenizing taking to long" 
end

see  http://www.codinghorror.com/blog/archives/000488.html for more information

Attachments

Add/Change #68 (Exponential behaviour in StandardTokenizer)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.