Uses of Class
org.apache.nutch.util.NutchTool
-
Packages that use NutchTool Package Description org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.fetcher The Nutch multi-threaded fetching moduleorg.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.service.impl org.apache.nutch.tools Miscellaneous tools. -
-
Uses of NutchTool in org.apache.nutch.crawl
Subclasses of NutchTool in org.apache.nutch.crawl Modifier and Type Class Description class
CrawlDb
This class takes the output of the fetcher and updates the crawldb accordingly.class
DeduplicationJob
Generic deduplicator which groups fetched URLs with the same digest and marks all of them as duplicate except the one with the highest score (based on the score in the crawldb, which is not necessarily the same as the score indexed).class
Generator
Generates a subset of a crawl db to fetch.class
Injector
Injector takes a flat text file of URLs (or a folder containing text files) and merges ("injects") these URLs into the CrawlDb.class
LinkDb
Maintains an inverted link map, listing incoming links for each url. -
Uses of NutchTool in org.apache.nutch.fetcher
Subclasses of NutchTool in org.apache.nutch.fetcher Modifier and Type Class Description class
Fetcher
A queue-based fetcher. -
Uses of NutchTool in org.apache.nutch.indexer
Subclasses of NutchTool in org.apache.nutch.indexer Modifier and Type Class Description class
IndexingJob
Generic indexer which relies on the plugins implementing IndexWriter -
Uses of NutchTool in org.apache.nutch.parse
Subclasses of NutchTool in org.apache.nutch.parse Modifier and Type Class Description class
ParseSegment
-
Uses of NutchTool in org.apache.nutch.service.impl
Methods in org.apache.nutch.service.impl that return NutchTool Modifier and Type Method Description NutchTool
JobFactory. createToolByClassName(String className, Configuration conf)
NutchTool
JobFactory. createToolByType(JobManager.JobType type, Configuration conf)
Constructors in org.apache.nutch.service.impl with parameters of type NutchTool Constructor Description JobWorker(JobConfig jobConfig, Configuration conf, NutchTool tool)
To initialize JobWorker thread with the Job Configurations provided by user.ServiceWorker(ServiceConfig serviceConfig, NutchTool tool)
-
Uses of NutchTool in org.apache.nutch.tools
Subclasses of NutchTool in org.apache.nutch.tools Modifier and Type Class Description class
CommonCrawlDataDumper
The Common Crawl Data Dumper tool enables one to reverse generate the raw content from Nutch segment data directories into a common crawling data format, consumed by many applications.
-