public final class CustomAnalyzer extends Analyzer
TokenizerFactory
,
TokenFilterFactory
, and CharFilterFactory
.
You can create an instance of this Analyzer using the builder by passing the SPI names (as defined by interface) to it:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer(StandardTokenizerFactory.NAME) .addTokenFilter(LowerCaseFilterFactory.NAME) .addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of
TokenizerFactory
, TokenFilterFactory
, and CharFilterFactory
.
This is the same as the above:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer("standard") .addTokenFilter("lowercase") .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();
The list of names to be used for components can be looked up through:
TokenizerFactory.availableTokenizers()
, TokenFilterFactory.availableTokenFilters()
,
and CharFilterFactory.availableCharFilters()
.
You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...)
and
CustomAnalyzer.Builder.whenTerm(Predicate)
:
Analyzer ana = CustomAnalyzer.builder() .withTokenizer("standard") .addTokenFilter("lowercase") .whenTerm(t -> t.length() > 10) .addTokenFilter("reversestring") .endwhen() .build();
Modifier and Type | Class and Description |
---|---|
static class |
CustomAnalyzer.Builder
Builder for
CustomAnalyzer . |
static class |
CustomAnalyzer.ConditionBuilder
Factory class for a
ConditionalTokenFilter |
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Modifier and Type | Method and Description |
---|---|
static CustomAnalyzer.Builder |
builder()
Returns a builder for custom analyzers that loads all resources from
Lucene's classloader.
|
static CustomAnalyzer.Builder |
configDir)
Returns a builder for custom analyzers that loads all resources from the given
file system base directory.
|
static CustomAnalyzer.Builder |
builder(ResourceLoader loader)
Returns a builder for custom analyzers that loads all resources using the given
ResourceLoader . |
protected Analyzer.TokenStreamComponents |
fieldName) |
<CharFilterFactory> |
getCharFilterFactories()
Returns the list of char filters that are used in this analyzer.
|
int |
fieldName) |
int |
fieldName) |
<TokenFilterFactory> |
getTokenFilterFactories()
Returns the list of token filters that are used in this analyzer.
|
TokenizerFactory |
getTokenizerFactory()
Returns the tokenizer that is used in this analyzer.
|
protected |
fieldName,
reader) |
protected |
fieldName,
reader) |
protected TokenStream |
fieldName,
TokenStream in) |
|
toString() |
attributeFactory, close, getReuseStrategy, getVersion, normalize, setVersion, tokenStream, tokenStream
public static CustomAnalyzer.Builder builder()
public static configDir)
public static CustomAnalyzer.Builder builder(ResourceLoader loader)
ResourceLoader
.protected initReader( fieldName, reader)
initReader
in class Analyzer
protected initReaderForNormalization( fieldName, reader)
initReaderForNormalization
in class Analyzer
protected fieldName)
createComponents
in class Analyzer
protected fieldName, TokenStream in)
public int getPositionIncrementGap( fieldName)
getPositionIncrementGap
in class Analyzer
public int getOffsetGap( fieldName)
getOffsetGap
in class Analyzer
public <CharFilterFactory> getCharFilterFactories()
public TokenizerFactory getTokenizerFactory()
public <TokenFilterFactory> getTokenFilterFactories()
public toString()
in class
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.