Uses of Class
org.apache.nutch.parse.ParseData
-
Packages that use ParseData Package Description org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.similarity org.apache.nutch.scoring.similarity.cosine Implements the cosine similarity metric for scoring relevant documentsorg.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.segment A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.org.apache.nutch.tools Miscellaneous tools. -
-
Uses of ParseData in org.apache.nutch.crawl
Methods in org.apache.nutch.crawl with parameters of type ParseData Modifier and Type Method Description void
LinkDb.LinkDbMapper. map(Text key, ParseData parseData, Mapper.Context context)
-
Uses of ParseData in org.apache.nutch.parse
Methods in org.apache.nutch.parse that return ParseData Modifier and Type Method Description ParseData
Parse. getData()
Other data extracted from the page.ParseData
ParseImpl. getData()
static ParseData
ParseData. read(DataInput in)
Methods in org.apache.nutch.parse with parameters of type ParseData Modifier and Type Method Description void
ParseResult. put(String key, ParseText text, ParseData data)
Store a result of parsing.void
ParseResult. put(Text key, ParseText text, ParseData data)
Store a result of parsing.Constructors in org.apache.nutch.parse with parameters of type ParseData Constructor Description ParseImpl(String text, ParseData data)
ParseImpl(ParseText text, ParseData data)
ParseImpl(ParseText text, ParseData data, boolean isCanonical)
-
Uses of ParseData in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring with parameters of type ParseData Modifier and Type Method Description CrawlDatum
AbstractScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
CrawlDatum
ScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
Distribute score value from the current page to all its outlinked pages.CrawlDatum
ScoringFilters. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
-
Uses of ParseData in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth with parameters of type ParseData Modifier and Type Method Description CrawlDatum
DepthScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
-
Uses of ParseData in org.apache.nutch.scoring.metadata
Methods in org.apache.nutch.scoring.metadata with parameters of type ParseData Modifier and Type Method Description CrawlDatum
MetadataScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
This will take the metadata that you have listed in your "scoring.parse.md" property, and looks for them inside the parseData object. -
Uses of ParseData in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic with parameters of type ParseData Modifier and Type Method Description CrawlDatum
OPICScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. -
Uses of ParseData in org.apache.nutch.scoring.similarity
Methods in org.apache.nutch.scoring.similarity with parameters of type ParseData Modifier and Type Method Description CrawlDatum
SimilarityModel. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
CrawlDatum
SimilarityScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
-
Uses of ParseData in org.apache.nutch.scoring.similarity.cosine
Methods in org.apache.nutch.scoring.similarity.cosine with parameters of type ParseData Modifier and Type Method Description CrawlDatum
CosineSimilarity. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
-
Uses of ParseData in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta with parameters of type ParseData Modifier and Type Method Description CrawlDatum
URLMetaScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object. -
Uses of ParseData in org.apache.nutch.segment
Methods in org.apache.nutch.segment with parameters of type ParseData Modifier and Type Method Description boolean
SegmentMergeFilter. filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL).boolean
SegmentMergeFilters. filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
Iterates over allSegmentMergeFilter
extensions and if any of them returns false, it will return false as well. -
Uses of ParseData in org.apache.nutch.tools
Methods in org.apache.nutch.tools with parameters of type ParseData Modifier and Type Method Description String
AbstractCommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
String
CommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
Returns a string representation of the JSON structure of the URL content.String
CommonCrawlFormatWARC. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
Constructors in org.apache.nutch.tools with parameters of type ParseData Constructor Description CommonCrawlFormatWARC(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config, ParseData parseData)
-