Class StaticFieldIndexer
- java.lang.Object
-
- org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class StaticFieldIndexer extends Object implements IndexingFilter
A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description StaticFieldIndexer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheStaticFieldIndexer
filter object which adds fields as per configuration setting.Configuration
getConf()
Get theConfiguration
objectprotected String
regexEscape(String in)
Escapes any character that needs escaping so it can be used in a regexp.void
setConf(Configuration conf)
Set theConfiguration
object
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheStaticFieldIndexer
filter object which adds fields as per configuration setting. Seeindex.static
in nutch-default.xml.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- TheNutchDocument
objectparse
- The relevantParse
object passing through the filterurl
- URL to be filtered for anchor textdatum
- TheCrawlDatum
entryinlinks
- TheInlinks
containing anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException
- if an error occurs during during filtering
-
setConf
public void setConf(Configuration conf)
Set theConfiguration
object- Specified by:
setConf
in interfaceConfigurable
-
getConf
public Configuration getConf()
Get theConfiguration
object- Specified by:
getConf
in interfaceConfigurable
-
-