Uses of Class
org.apache.nutch.parse.ParseResult
-
Packages that use ParseResult Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.parse.ext Parse wrapper to run external command to do the parsing.org.apache.nutch.parse.feed Parse RSS feeds.org.apache.nutch.parse.headings Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.org.apache.nutch.parse.html An HTML document parsing plugin.org.apache.nutch.parse.js Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets.org.apache.nutch.parse.metatags Parse filter to extract meta tags: keywords, description, etc.org.apache.nutch.parse.tika Parse various document formats with help of Apache Tika.org.apache.nutch.parse.zip Parse ZIP files: embedded files are recursively passed to appropriate parsers.org.apache.nutch.parsefilter.debug Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML).org.apache.nutch.parsefilter.naivebayes Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.org.apache.nutch.parsefilter.regex RegexParseFilter.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
-
Uses of ParseResult in org.apache.nutch.analysis.lang
Methods in org.apache.nutch.analysis.lang that return ParseResult Modifier and Type Method Description ParseResult
HTMLLanguageParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the HTML document looking at possible indications of content language
1.Methods in org.apache.nutch.analysis.lang with parameters of type ParseResult Modifier and Type Method Description ParseResult
HTMLLanguageParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the HTML document looking at possible indications of content language
1. -
Uses of ParseResult in org.apache.nutch.microformats.reltag
Methods in org.apache.nutch.microformats.reltag that return ParseResult Modifier and Type Method Description ParseResult
RelTagParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the HTML document looking at possible rel-tagsMethods in org.apache.nutch.microformats.reltag with parameters of type ParseResult Modifier and Type Method Description ParseResult
RelTagParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the HTML document looking at possible rel-tags -
Uses of ParseResult in org.apache.nutch.parse
Methods in org.apache.nutch.parse that return ParseResult Modifier and Type Method Description static ParseResult
ParseResult. createParseResult(String url, Parse parse)
Convenience method for obtainingParseResult
from a singleParse
output.ParseResult
HtmlParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.ParseResult
HtmlParseFilters. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Run all defined filters.ParseResult
ParseStatus. getEmptyParseResult(String url, Configuration conf)
Creates an emptyParseResult
for a given URLParseResult
Parser. getParse(Content c)
This method parses the given content and returns a map of <key, parse> pairs.ParseResult
ParseUtil. parse(Content content)
ParseResult
ParseUtil. parseByExtensionId(String extId, Content content)
Methods in org.apache.nutch.parse with parameters of type ParseResult Modifier and Type Method Description ParseResult
HtmlParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.ParseResult
HtmlParseFilters. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Run all defined filters. -
Uses of ParseResult in org.apache.nutch.parse.ext
Methods in org.apache.nutch.parse.ext that return ParseResult Modifier and Type Method Description ParseResult
ExtParser. getParse(Content content)
-
Uses of ParseResult in org.apache.nutch.parse.feed
Methods in org.apache.nutch.parse.feed that return ParseResult Modifier and Type Method Description ParseResult
FeedParser. getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library. -
Uses of ParseResult in org.apache.nutch.parse.headings
Methods in org.apache.nutch.parse.headings that return ParseResult Modifier and Type Method Description ParseResult
HeadingsParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Methods in org.apache.nutch.parse.headings with parameters of type ParseResult Modifier and Type Method Description ParseResult
HeadingsParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
-
Uses of ParseResult in org.apache.nutch.parse.html
Methods in org.apache.nutch.parse.html that return ParseResult Modifier and Type Method Description ParseResult
HtmlParser. getParse(Content content)
-
Uses of ParseResult in org.apache.nutch.parse.js
Methods in org.apache.nutch.parse.js that return ParseResult Modifier and Type Method Description ParseResult
JSParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the JavaScript fragments of a HTML page looking for possibleOutlink
'sParseResult
JSParseFilter. getParse(Content c)
Parse a JavaScript file and extract outlinksMethods in org.apache.nutch.parse.js with parameters of type ParseResult Modifier and Type Method Description ParseResult
JSParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the JavaScript fragments of a HTML page looking for possibleOutlink
's -
Uses of ParseResult in org.apache.nutch.parse.metatags
Methods in org.apache.nutch.parse.metatags that return ParseResult Modifier and Type Method Description ParseResult
MetaTagsParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Methods in org.apache.nutch.parse.metatags with parameters of type ParseResult Modifier and Type Method Description ParseResult
MetaTagsParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
-
Uses of ParseResult in org.apache.nutch.parse.tika
Methods in org.apache.nutch.parse.tika that return ParseResult Modifier and Type Method Description ParseResult
TikaParser. getParse(Content content)
-
Uses of ParseResult in org.apache.nutch.parse.zip
Methods in org.apache.nutch.parse.zip that return ParseResult Modifier and Type Method Description ParseResult
ZipParser. getParse(Content content)
-
Uses of ParseResult in org.apache.nutch.parsefilter.debug
Methods in org.apache.nutch.parsefilter.debug that return ParseResult Modifier and Type Method Description ParseResult
DebugParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Methods in org.apache.nutch.parsefilter.debug with parameters of type ParseResult Modifier and Type Method Description ParseResult
DebugParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
-
Uses of ParseResult in org.apache.nutch.parsefilter.naivebayes
Methods in org.apache.nutch.parsefilter.naivebayes that return ParseResult Modifier and Type Method Description ParseResult
NaiveBayesParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Methods in org.apache.nutch.parsefilter.naivebayes with parameters of type ParseResult Modifier and Type Method Description ParseResult
NaiveBayesParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
-
Uses of ParseResult in org.apache.nutch.parsefilter.regex
Methods in org.apache.nutch.parsefilter.regex that return ParseResult Modifier and Type Method Description ParseResult
RegexParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Methods in org.apache.nutch.parsefilter.regex with parameters of type ParseResult Modifier and Type Method Description ParseResult
RegexParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
-
Uses of ParseResult in org.creativecommons.nutch
Methods in org.creativecommons.nutch that return ParseResult Modifier and Type Method Description ParseResult
CCParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.Methods in org.creativecommons.nutch with parameters of type ParseResult Modifier and Type Method Description ParseResult
CCParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
-