Class: Ferret::Analysis::AsciiWhiteSpaceAnalyzer
Summary
The AsciiWhiteSpaceAnalyzer recognizes tokens as maximal strings of non-whitespace characters. If implemented in Ruby the AsciiWhiteSpaceAnalyzer would look like;
class AsciiWhiteSpaceAnalyzer
def initialize(lower = true)
@lower = lower
end
def token_stream(field, str)
if @lower
return AsciiLowerCaseFilter.new(AsciiWhiteSpaceTokenizer.new(str))
else
return AsciiWhiteSpaceTokenizer.new(str)
end
end
end
As you can see it makes use of the AsciiWhiteSpaceTokenizer. You should use WhiteSpaceAnalyzer if you want to recognize multibyte encodings such as "UTF-8".
Public Class Methods
AsciiWhiteSpaceAnalyzer.new(lower = false) → analyzer
Create a new AsciiWhiteSpaceAnalyzer which downcases tokens by default but can optionally leave case as is. Lowercasing will only be done to ASCII characters.
| lower: | set to false if you don‘t want the field‘s tokens to be downcased |
/*
* call-seq:
* AsciiWhiteSpaceAnalyzer.new(lower = false) -> analyzer
*
* Create a new AsciiWhiteSpaceAnalyzer which downcases tokens by default
* but can optionally leave case as is. Lowercasing will only be done to
* ASCII characters.
*
* lower:: set to false if you don't want the field's tokens to be downcased
*/
static VALUE
frt_a_white_space_analyzer_init(int argc, VALUE *argv, VALUE self)
{
Analyzer *a;
GET_LOWER(false);
a = whitespace_analyzer_new(lower);
Frt_Wrap_Struct(self, NULL, &frt_analyzer_free, a);
object_add(a, self);
return self;
}