Package org.creativecommons.nutch
Class CCIndexingFilter
- java.lang.Object
 - 
- org.creativecommons.nutch.CCIndexingFilter
 
 
- 
- All Implemented Interfaces:
 Configurable,IndexingFilter,Pluggable
public class CCIndexingFilter extends Object implements IndexingFilter
Adds basic searchable fields to a document. 
- 
- 
Field Summary
Fields Modifier and Type Field Description static StringFIELDThe name of the document field we use.- 
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID 
 - 
 
- 
Constructor Summary
Constructors Constructor Description CCIndexingFilter() 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddUrlFeatures(NutchDocument doc, String urlString)Add the features represented by a license URL.NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Adds fields or otherwise modifies the document that will be indexed for a parse.ConfigurationgetConf()voidsetConf(Configuration conf) 
 - 
 
- 
- 
Field Detail
- 
FIELD
public static String FIELD
The name of the document field we use. 
 - 
 
- 
Method Detail
- 
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Description copied from interface:IndexingFilterAdds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.- Specified by:
 filterin interfaceIndexingFilter- Parameters:
 doc- document instance for collecting fieldsparse- parse data instanceurl- page urldatum- crawl datum for the page (fetch datum from segment containing fetch status and fetch time)inlinks- page inlinks- Returns:
 - modified (or a new) document instance, or null (meaning the document should be discarded)
 - Throws:
 IndexingException- if an error occurs during during filtering
 
- 
addUrlFeatures
public void addUrlFeatures(NutchDocument doc, String urlString)
Add the features represented by a license URL. Urls are of the form "http://creativecommons.org/licenses/xx-xx/xx/xx", where "xx" names a license feature.- Parameters:
 doc- aNutchDocumentto augmenturlString- the url to extract features from
 
- 
setConf
public void setConf(Configuration conf)
- Specified by:
 setConfin interfaceConfigurable
 
- 
getConf
public Configuration getConf()
- Specified by:
 getConfin interfaceConfigurable
 
 - 
 
 -