Class LuceneAnalyzerUtil
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class LuceneAnalyzerUtil extends org.apache.lucene.analysis.Analyzer
Creates a custom analyzer based on user provided inputs
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LuceneAnalyzerUtil.StemFilterType
-
Constructor Summary
Constructors Constructor Description LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, boolean useStopFilter)
Creates an analyzer instance based on Lucene default stopword set if the param useStopFilter is set to trueLuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, List<String> stopWords, boolean addToDefault)
Creates an analyzer instance based on user provided stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents
createComponents(String fieldName)
-
-
-
Constructor Detail
-
LuceneAnalyzerUtil
public LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, boolean useStopFilter)
Creates an analyzer instance based on Lucene default stopword set if the param useStopFilter is set to true- Parameters:
stemFilterType
- a preferredLuceneAnalyzerUtil.StemFilterType
to use. Can be one ofLuceneAnalyzerUtil.StemFilterType.PORTERSTEM_FILTER
,LuceneAnalyzerUtil.StemFilterType.ENGLISHMINIMALSTEM_FILTER
, orLuceneAnalyzerUtil.StemFilterType.NONE
useStopFilter
- if true use the default Lucene stopword set, false otherwise
-
LuceneAnalyzerUtil
public LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, List<String> stopWords, boolean addToDefault)
Creates an analyzer instance based on user provided stop words. If the param addToDefault is set to true, then user provided stop words will be added to the Lucene default stopset.- Parameters:
stemFilterType
- a preferredLuceneAnalyzerUtil.StemFilterType
to use. Can be one ofLuceneAnalyzerUtil.StemFilterType.PORTERSTEM_FILTER
,LuceneAnalyzerUtil.StemFilterType.ENGLISHMINIMALSTEM_FILTER
, orLuceneAnalyzerUtil.StemFilterType.NONE
stopWords
- aList
of stop word StringsaddToDefault
- if true the provided stop words will be added to the default Lucene stopword set, false otherwise
-
-
Method Detail
-
createComponents
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
- Specified by:
createComponents
in classorg.apache.lucene.analysis.Analyzer
-
-