Package org.apache.nutch.fetcher
Class FetchItemQueues
- java.lang.Object
 - 
- org.apache.nutch.fetcher.FetchItemQueues
 
 
- 
public class FetchItemQueues extends Object
A collection of queues that keeps track of the total number of items, and provides items eligible for fetching from any queue. 
- 
- 
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_IDstatic StringQUEUE_MODE_DOMAINstatic StringQUEUE_MODE_HOSTstatic StringQUEUE_MODE_IP 
- 
Constructor Summary
Constructors Constructor Description FetchItemQueues(Configuration conf) 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.nutch.fetcher.FetchItemQueues.QueuingStatusaddFetchItem(Text url, CrawlDatum datum)org.apache.nutch.fetcher.FetchItemQueues.QueuingStatusaddFetchItem(FetchItem it)intcheckExceptionThreshold(String queueid)Increment the exception counter of a queue in case of an exception e.g.intcheckExceptionThreshold(String queueid, int maxExceptions, long delay)Increment the exception counter of a queue in case of an exception e.g.protected static StringcheckQueueMode(String queueMode)Check whether queue mode is valid, fall-back to default mode if not.intcheckTimelimit()voiddump()intemptyQueues()voidfinishFetchItem(FetchItem it)voidfinishFetchItem(FetchItem it, boolean asap)FetchItemgetFetchItem()FetchItemQueuegetFetchItemQueue(String id)intgetQueueCount()intgetQueueCountMaxExceptions()intgetTotalSize()booleanredirectIsQueuedRecently(Text redirUrl)voidsetTimeoutReached()Signal that the hard timeout is reached because new fetches / requests where made during half of the MapReduce task timeout (mapreduce.task.timeout, default value: 10 minutes).booleantimelimitExceeded() 
 - 
 
- 
- 
Field Detail
- 
DEFAULT_ID
public static final String DEFAULT_ID
- See Also:
 - Constant Field Values
 
 
- 
QUEUE_MODE_HOST
public static final String QUEUE_MODE_HOST
- See Also:
 - Constant Field Values
 
 
- 
QUEUE_MODE_DOMAIN
public static final String QUEUE_MODE_DOMAIN
- See Also:
 - Constant Field Values
 
 
- 
QUEUE_MODE_IP
public static final String QUEUE_MODE_IP
- See Also:
 - Constant Field Values
 
 
 - 
 
- 
Constructor Detail
- 
FetchItemQueues
public FetchItemQueues(Configuration conf)
 
 - 
 
- 
Method Detail
- 
checkQueueMode
protected static String checkQueueMode(String queueMode)
Check whether queue mode is valid, fall-back to default mode if not.- Parameters:
 queueMode- queue mode to check- Returns:
 - valid queue mode or default
 
 
- 
getTotalSize
public int getTotalSize()
 
- 
getQueueCount
public int getQueueCount()
 
- 
getQueueCountMaxExceptions
public int getQueueCountMaxExceptions()
 
- 
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(Text url, CrawlDatum datum)
 
- 
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(FetchItem it)
 
- 
finishFetchItem
public void finishFetchItem(FetchItem it)
 
- 
finishFetchItem
public void finishFetchItem(FetchItem it, boolean asap)
 
- 
getFetchItemQueue
public FetchItemQueue getFetchItemQueue(String id)
 
- 
getFetchItem
public FetchItem getFetchItem()
 
- 
timelimitExceeded
public boolean timelimitExceeded()
- Returns:
 - true if the fetcher timelimit is defined and has been exceeded
         (
fetcher.timelimit.minsminutes after fetching started) 
 
- 
checkTimelimit
public int checkTimelimit()
 
- 
setTimeoutReached
public void setTimeoutReached()
Signal that the hard timeout is reached because new fetches / requests where made during half of the MapReduce task timeout (mapreduce.task.timeout, default value: 10 minutes). In order to avoid that the task timeout is hit and the fetcher job is failed, we stop the fetching now. See also the propertyfetcher.threads.timeout.divisor. 
- 
emptyQueues
public int emptyQueues()
 
- 
checkExceptionThreshold
public int checkExceptionThreshold(String queueid, int maxExceptions, long delay)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue. The next fetch is delayed if specified by the paramdelayor configured by the propertyfetcher.exceptions.per.queue.delay.- Parameters:
 queueid- a queue identifier to locate and checkmaxExceptions- custom-defined number of max. exceptions - if negative the value of the propertyfetcher.max.exceptions.per.queueis used.delay- a custom-defined time span in milliseconds to delay the next fetch in addition to the delay defined for the given queue. If a negative value is passed the delay is chosen byfetcher.exceptions.per.queue.delay- Returns:
 - number of purged items
 
 
- 
checkExceptionThreshold
public int checkExceptionThreshold(String queueid)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue.- Parameters:
 queueid- queue identifier to locate and check- Returns:
 - number of purged items
 - See Also:
 checkExceptionThreshold(String, int, long)
 
- 
redirectIsQueuedRecently
public boolean redirectIsQueuedRecently(Text redirUrl)
- Parameters:
 redirUrl- redirect target- Returns:
 - true if redirects are deduplicated and redirUrl has been queued recently
 
 
- 
dump
public void dump()
 
 - 
 
 -