Class ArcSegmentCreator

  • All Implemented Interfaces:
    Configurable, Tool

    public class ArcSegmentCreator
    extends Configured
    implements Tool

    The ArcSegmentCreator is a replacement for fetcher that will take arc files as input and produce a nutch segment as output.

    Arc files are tars of compressed gzips which are produced by both the internet archive project and the grub distributed crawler project.

    • Constructor Detail

      • ArcSegmentCreator

        public ArcSegmentCreator()
      • ArcSegmentCreator

        public ArcSegmentCreator​(Configuration conf)
        Constructor that sets the job configuration.
        Parameters:
        conf - a populated Configuration