Class Injector.InjectMapper

  • Enclosing class:
    Injector

    public static class Injector.InjectMapper
    extends Mapper<Text,​Writable,​Text,​CrawlDatum>
    InjectMapper reads
    • the CrawlDb seeds are injected into
    • the plain-text seed files and parses each line into the URL and metadata. Seed URLs are passed to the reducer with STATUS_INJECTED.
    Depending on configuration and command-line parameters the URLs are normalized and filtered using the configured plugins.