Ticket #326 (closed defect: wontfix)

Opened 15 months ago

Last modified 4 days ago

Ferret 0.11.4 (windows) german umlauts in queries aren't case insensitive anymore

Reported by: neongrau@… Owned by: somebody
Priority: blocker Milestone:
Component: component1 Version:
Keywords: Cc:

Description (last modified by dbalmain) (diff)

took me some hours to figure out it was a bug with the current version, after i downgraded to 0.10.9 it worked again.

here's a sample to reproduce the bug:

require 'rubygems'
require 'ferret'

i = Ferret::I.new

i << 'Übersicht'
i << 'übersicht'

for q in [ 'Übersicht', 'übersicht', '*bersicht' ]
  puts "#{q} : #{i.search(q).total_hits} hit(s)"
end
in 0.11.4 it returns:
Übersicht : 1 hit(s)
übersicht : 1 hit(s)
*bersicht : 2 hit(s)

in 0.10.9 it returns:
Übersicht : 2 hit(s)
übersicht : 2 hit(s)
*bersicht : 2 hit(s)

Attachments

Change History

Changed 15 months ago by dbalmain

  • status changed from new to closed
  • resolution set to wontfix
  • description modified (diff)

This is actually a bug in 0.10.9 although I can see how it would appear to be the other way around. In Ferret 0.10.9 I was forcing the locale to the system locale when you load Ferret. This upset a few people, so in the current version of Ferret you need to make sure that you have the correct locale set in Ruby. Try this;

# Set the locale in Ruby. Make sure you use a locale installed on 
# your system. Any UTF-8 locale should work.
ENV["LANG"] = "de_DE.UTF-8"

require 'rubygems'
require 'ferret'

# If the line above didn't work you can try this;
Ferret.locale = "de_DE.UTF-8"

i = Ferret::I.new

i << 'Übersicht'
i << 'übersicht'

for q in [ 'Übersicht', 'übersicht', '*bersicht' ]
  puts "#{q} : #{i.search(q).total_hits} hit(s)"
end

Hope this helps. Let me know here if you still have any problems.

Changed 15 months ago by neongrau@…

i'm on windows, UTF-8 won't work with ferret. and would it be correct to set ENVLANG? to iso-8859-1 while the rails app actually runs in UTF-8 it's just the problems with ferret on windows that made me Iconv'everything that gets fet to ferret.

and i've tried about every locale setting before i downgraded.

how am i supposed to set the locale on windows to make ferret use iso-8859-1 and leave ruby at UTF-8?

i tried

Ferret.locale = "de_DE.ISO8859-1" Ferret.locale = "de_DE.ISO-8859-1" Ferret.locale = "de_DE.ISO88591"

but none worked.

even the rails console confuses me. when i check "Ferret.locale" it always returns nil.

Changed 15 months ago by dbalmain

Sorry, I just assumed you were on Linux. I should have guessed that you were on Windows by the fact that you rolled back to version 0.10.9. Anyway, try this;

Ferret.locale = ""

I don't know why this works. This is how I was setting Ferret to use the system locale in 0.10.9 (internally) and sure enough it seems to fix things in 0.11.4. Let me know if it works for you.

Changed 15 months ago by neongrau@…

still no joy :(

actually i have 2 problems which occur mutually exclusive. and i'm trying for 2 days now to get it fixed having rebuild my index dozens and dozens of times and getting desperate because i'm totally out of ideas what to do about it.

1. problem i can find the term "Übersicht" with the queries "übersicht" or "Übersicht" but somehow the umlauts aren't properly indexed so queries that have umlauts and jokers like "übers*" will return 0 hits (while the query "*bers*" would work).

2. problem umlauts are properly indexed but umlauts in queries are case sensitive. "Übersicht" or "Übers*" would work as a query. but "übersicht" or "übers*" again wouldn't return any hits.

with 0.11.4 installed and setting Ferret.locale to an empty string i get problem number 1.

when setting Ferret.locale to s.th. like 'de_DE.iso885915' or 'de_DE.iso88591' or 'en_US.iso88591' i'm stuck with problem number 2.

i also tested various combinations of ENVLANG? in my environment.rb of my rails app. but looks like they aren't used at all on windows since i couldn't notice any change in behavior.

Changed 15 months ago by dbalmain

I'm sorry, it must have something to do with your environment. I'm not really a windows user so I probably can't be much help. Running this code here;

require 'rubygems'
require 'ferret'
Ferret.locale = ''

i = Ferret::I.new

i << 'Übersicht'
i << 'übersicht'

for q in [ 'Übersicht', 'übersicht', '*bersicht', 'über*' ]
  puts "#{q} : #{i.search(q).total_hits} hit(s)"
end

I get;

in 0.11.4 it returns:
Übersicht : 2 hit(s)
übersicht : 2 hit(s)
*bersicht : 2 hit(s)
über* : 2 hit(s)

Changed 10 months ago by anonymous

Changed 10 months ago by anonymous

江苏大洋冷却塔有限公司是集冷却塔研发、设计、制造、安装维修为一体的科研型经济实体。公司位于江南水乡溧阳市平陵西路288号,交通便捷,人文荟萃。

主产品"大正"牌冷却塔在2004年荣获"中国市场首选放心品牌","全国用户产品质量满意、售后服务满意示范单位"。连续五年被评为"重合同、守信用"企业,连续多年被企业信用等级评定委员会评定为"AAA"级,企业通过了中联认证中心"ISO9001:2000的质量管理体系认证"以及"ISO14001:1996环境管理体系认证"。被江苏市场产品质量监督调查办公室、江苏名牌企业促进会,江苏3.15维权投诉监督跟踪调查办公室授予"江苏市场公认名牌产品"、"质量·信誉"先进单位。在今后的发展中,本公司将一如既往,以严谨的质量保证模式来满足用户的需求。 江苏大洋冷却塔有限公司积几十年生产实践经验,形成了GNZF-500~4500型、HLT-500~4500型、NH-500~4500型、HH-500~4500型等多个品种的冷却塔产品,产品远销东南亚各国。在企业的发展过程中,公司员工牢固树立"凝聚、务实、诚信、进取"的总方针,建立了严密的新产品开发操作程序,严格的管理制度和严谨的质量保证体系。公司拥有一支素质良好的员工队伍,尤其在玻璃钢成型工艺,钢结构的防腐处理,塑料加工工艺等方面已形成了独特、稳定、可靠的工艺流程和加工方法。

Add/Change #326 (Ferret 0.11.4 (windows) german umlauts in queries aren't case insensitive anymore)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.