Package org.apache.nutch.util
Miscellaneous utility classes.
-
Class Summary Class Description AbstractChecker Scaffolding class for the various Checker implementations.CommandRunner CrawlCompletionStats Extracts some simple crawl completion stats from the crawldb Stats will be sorted by host/domain and will be of the form: 1 www.spitzer.caltech.edu FETCHED 50 www.spitzer.caltech.edu UNFETCHEDCrawlCompletionStats.CrawlCompletionStatsCombiner DeflateUtils A collection of utility methods for working on deflated data.DomUtil DumpFileUtil EncodingDetector A simple class for detecting character encodings.FSUtils Utility methods for common filesystem operations.GenericWritableConfigurable A generic Writable wrapper that can inject Configuration toConfigurable
sGZIPUtils A collection of utility methods for working on GZIPed data.HadoopFSUtil JexlUtil Utility methods for handling JEXL expressionsLockUtil Utility methods for handling application-level locking.MimeUtil This is a facade class to insulate Nutch from its underlying Mime Type substrate library, Apache Tika.NodeWalker A utility class that allows the walking of any DOM tree using a stack instead of recursion.NutchConfiguration Utility to create HadoopConfiguration
s that include Nutch-specific resources.NutchJob AJob
for Nutch jobs.NutchTool ObjectCache PrefixStringMatcher A class for efficiently matchingString
s against a set of prefixes.ProtocolStatusStatistics Extracts protocol status code information from the crawl database.ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner SegmentReaderUtil SitemapProcessor Performs sitemap processing by fetching sitemap links, parsing the content and merging the URLs from sitemaps (with the metadata) into the CrawlDb.StringUtil A collection of String processing utility methods.SuffixStringMatcher A class for efficiently matchingString
s against a set of suffixes.TableUtil TimingUtil TrieStringMatcher TrieStringMatcher is a base class for simple tree-based string matching.URLUtil Utility class for URL analysis