Package org.apache.nutch.hostdb
Class FetchOverdueCrawlDatumProcessor
- java.lang.Object
-
- org.apache.nutch.hostdb.FetchOverdueCrawlDatumProcessor
-
- All Implemented Interfaces:
CrawlDatumProcessor
public class FetchOverdueCrawlDatumProcessor extends Object implements CrawlDatumProcessor
Simple custom crawl datum processor that counts the number of records that are overdue for fetching, e.g. new unfetched URLs that haven't been fetched within two days.
-
-
Field Summary
Fields Modifier and Type Field Description protected Configuration
conf
protected long
numOverDue
protected long
overDueTime
protected long
overDueTimeLimit
-
Constructor Summary
Constructors Constructor Description FetchOverdueCrawlDatumProcessor(Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
count(CrawlDatum crawlDatum)
Process a single crawl datum instance to aggregate custom counts.void
finalize(HostDatum hostDatum)
Process the final host datum instance and store the aggregated custom counts in the HostDatum.
-
-
-
Field Detail
-
conf
protected final Configuration conf
-
overDueTimeLimit
protected long overDueTimeLimit
-
overDueTime
protected long overDueTime
-
numOverDue
protected long numOverDue
-
-
Constructor Detail
-
FetchOverdueCrawlDatumProcessor
public FetchOverdueCrawlDatumProcessor(Configuration conf)
-
-
Method Detail
-
count
public void count(CrawlDatum crawlDatum)
Description copied from interface:CrawlDatumProcessor
Process a single crawl datum instance to aggregate custom counts.- Specified by:
count
in interfaceCrawlDatumProcessor
- Parameters:
crawlDatum
- CrawlDatum instance to count information from
-
finalize
public void finalize(HostDatum hostDatum)
Description copied from interface:CrawlDatumProcessor
Process the final host datum instance and store the aggregated custom counts in the HostDatum.- Specified by:
finalize
in interfaceCrawlDatumProcessor
- Parameters:
hostDatum
- HostDatum instance to hold the aggregated custom counts
-
-