Uses of Package
org.apache.nutch.plugin
-
Packages that use org.apache.nutch.plugin Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.collection Subcollection is a subset of an index.org.apache.nutch.exchange Control code for exchange component, which acts in indexing job and decides to which index writer a document should be routed, based on plugins behavior.org.apache.nutch.exchange.jexl Plugin of Exchange component based on JEXL expressions.org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.indexwriter.cloudsearch org.apache.nutch.indexwriter.csv Index writer plugin to write a plain CSV file.org.apache.nutch.indexwriter.dummy Index writer plugin for debugging, writes pairs of <action, url> to a text file, action is one of "add", "update", or "delete".org.apache.nutch.indexwriter.elastic Index writer plugin for Elasticsearch.org.apache.nutch.indexwriter.kafka Index writer plugin to produce JSON messages to Kafka.org.apache.nutch.indexwriter.opensearch1x Index writer plugin for OpenSearch.org.apache.nutch.indexwriter.rabbit org.apache.nutch.indexwriter.solr Index writer plugin for Apache Solr.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.net Web-related interfaces: URLfilters
andnormalizers
.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.parse.ext Parse wrapper to run external command to do the parsing.org.apache.nutch.parse.feed Parse RSS feeds.org.apache.nutch.parse.headings Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.org.apache.nutch.parse.html An HTML document parsing plugin.org.apache.nutch.parse.js Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets.org.apache.nutch.parse.metatags Parse filter to extract meta tags: keywords, description, etc.org.apache.nutch.parse.tika Parse various document formats with help of Apache Tika.org.apache.nutch.parse.zip Parse ZIP files: embedded files are recursively passed to appropriate parsers.org.apache.nutch.parsefilter.debug Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML).org.apache.nutch.parsefilter.naivebayes Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.org.apache.nutch.parsefilter.regex RegexParseFilter.org.apache.nutch.plugin The NutchPlugin
System.org.apache.nutch.protocol Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
.org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http
,httpclient
, etc.)org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium.org.apache.nutch.publisher org.apache.nutch.publisher.rabbitmq Publisher package to implement queuesorg.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.orphan Scoring filter to modify score or status of orphaned pages (no inlinks found for a configurable amount of time).org.apache.nutch.scoring.similarity org.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.urlfilter.api GenericURL filter
library, abstracting away from regular expression implementations.org.apache.nutch.urlfilter.automaton URL filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM.org.apache.nutch.urlfilter.domain URL filter plugin to include only URLs which match an element in a given list of domain suffixes, domain names, and/or host names.org.apache.nutch.urlfilter.domaindenylist URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.org.apache.nutch.urlfilter.fast URL filter plugin that first does fast exact suffix matches on host/domain names before applying regular expressions to the path component of a URL.org.apache.nutch.urlfilter.ignoreexempt URL filter plugin which identifies exemptions to external urls when when external urls are set to ignore.org.apache.nutch.urlfilter.prefix URL filter plugin to include only URLs which match one of a given list of URL prefixes.org.apache.nutch.urlfilter.regex URL filter plugin to include and/or exclude URLs matching Java regular expressions.org.apache.nutch.urlfilter.suffix URL filter plugin to either exclude or include only URLs which match one of the given (path) suffixes.org.apache.nutch.urlfilter.validator URL filter plugin that validates given urls.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.analysis.lang Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.collection Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.exchange Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.exchange.jexl Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.anchor Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.arbitrary Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.basic Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.feed Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.filter Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.geoip Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.jexl Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.links Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.metadata Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.more Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.replace Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.staticfield Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.subcollection Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.tld Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexer.urlmeta Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.cloudsearch Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.csv Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.dummy Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.elastic Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.kafka Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.opensearch1x Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.rabbit Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.indexwriter.solr Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.microformats.reltag Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.net Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse Class Description Extension AnExtension
is a kind of listener descriptor that will be installed on a concreteExtensionPoint
that acts as kind of Publisher.Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.ext Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.feed Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.headings Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.html Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.js Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.metatags Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.tika Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parse.zip Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parsefilter.debug Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parsefilter.naivebayes Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.parsefilter.regex Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.plugin Class Description Extension AnExtension
is a kind of listener descriptor that will be installed on a concreteExtensionPoint
that acts as kind of Publisher.ExtensionPoint TheExtensionPoint
provide meta information of a extension point.Plugin A nutch-plugin is an container for a set of custom logic that provide extensions to the nutch core functionality or another plugin that provides an API for extending.PluginClassLoader ThePluginClassLoader
is a child-first classloader that only contains classes of the runtime libraries setuped in the plugin manifest file and exported libraries of plugins that are required plugins.PluginDescriptor ThePluginDescriptor
provide access to all meta information of a nutch-plugin, as well to the internationalizable resources and the plugin own classloader.PluginRepository The plugin repository is a registry of all plugins.PluginRuntimeException PluginRuntimeException
will be thrown until a exception in the plugin managemnt occurs.URLStreamHandlerFactory This URLStreamHandlerFactory knows about all the plugins in use and thus can create the correct URLStreamHandler even if it comes from a plugin classpath. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol Class Description Pluggable Defines the capability of a class to be plugged into Nutch.PluginRuntimeException PluginRuntimeException
will be thrown until a exception in the plugin managemnt occurs. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.file Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.ftp Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.htmlunit Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.http Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.http.api Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.httpclient Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.interactiveselenium Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.okhttp Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.protocol.selenium Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.publisher Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.publisher.rabbitmq Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.depth Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.link Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.metadata Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.opic Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.orphan Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.similarity Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.tld Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.scoring.urlmeta Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.api Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.automaton Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.domain Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.domaindenylist Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.fast Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.ignoreexempt Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.prefix Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.regex Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.suffix Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.apache.nutch.urlfilter.validator Class Description Pluggable Defines the capability of a class to be plugged into Nutch. -
Classes in org.apache.nutch.plugin used by org.creativecommons.nutch Class Description Pluggable Defines the capability of a class to be plugged into Nutch.