Package org.apache.nutch.scoring.metadata
Metadata Scoring Plugin
Propagates Metadata from an injected or outlink url in the crawldb to the url's different procecssed objects. In moving any metadata item, you need to copy metadata in three steps:
- Crawldb to content: Copy a metadata entry stored in the crawldb record of the url to the url's fetched content object. You need to specify the entry in the scoring.db.md property
- Content to parsedData: Copy a metadata entry stored in the Content object of a crawled url to its parsedData. You need to specify the entry in the scoring.content.md property
- ParsedData to outlink objects: Copy a metadata entry stored in the parsedData of a crawl item to the crawldb records of the url's outlinks. You need to specify the entry in the scoring.parse.md property
Note that you can not move data directly from a crawldb record to parseData or outlink objects. The sequence of moving the metadata should be crawldb -> content -> parsedData -> outlink objects.
-
Class Summary Class Description MetadataScoringFilter For documentation:org.apache.nutch.scoring.metadata