Class WARCExporter

  • All Implemented Interfaces:
    Configurable, Tool

    public class WARCExporter
    extends Configured
    implements Tool
    MapReduce job to exports Nutch segments as WARC files. The file format is documented in the [ISO Standard](http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1_latestdraft.pdf). Generates elements of type response if the configuration 'store.http.headers' was set to true during the fetching and the http headers were stored verbatim; generates elements of type 'resource' otherwise.