Package org.apache.nutch.fetcher
Class FetchItemQueues
- java.lang.Object
-
- org.apache.nutch.fetcher.FetchItemQueues
-
public class FetchItemQueues extends Object
A collection of queues that keeps track of the total number of items, and provides items eligible for fetching from any queue.
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_ID
static String
QUEUE_MODE_DOMAIN
static String
QUEUE_MODE_HOST
static String
QUEUE_MODE_IP
-
Constructor Summary
Constructors Constructor Description FetchItemQueues(Configuration conf)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus
addFetchItem(Text url, CrawlDatum datum)
org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus
addFetchItem(FetchItem it)
int
checkExceptionThreshold(String queueid)
Increment the exception counter of a queue in case of an exception e.g.int
checkExceptionThreshold(String queueid, int maxExceptions, long delay)
Increment the exception counter of a queue in case of an exception e.g.protected static String
checkQueueMode(String queueMode)
Check whether queue mode is valid, fall-back to default mode if not.int
checkTimelimit()
void
dump()
int
emptyQueues()
void
finishFetchItem(FetchItem it)
void
finishFetchItem(FetchItem it, boolean asap)
FetchItem
getFetchItem()
FetchItemQueue
getFetchItemQueue(String id)
int
getQueueCount()
int
getQueueCountMaxExceptions()
int
getTotalSize()
boolean
redirectIsQueuedRecently(Text redirUrl)
boolean
timelimitExceeded()
-
-
-
Field Detail
-
DEFAULT_ID
public static final String DEFAULT_ID
- See Also:
- Constant Field Values
-
QUEUE_MODE_HOST
public static final String QUEUE_MODE_HOST
- See Also:
- Constant Field Values
-
QUEUE_MODE_DOMAIN
public static final String QUEUE_MODE_DOMAIN
- See Also:
- Constant Field Values
-
QUEUE_MODE_IP
public static final String QUEUE_MODE_IP
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FetchItemQueues
public FetchItemQueues(Configuration conf)
-
-
Method Detail
-
checkQueueMode
protected static String checkQueueMode(String queueMode)
Check whether queue mode is valid, fall-back to default mode if not.- Parameters:
queueMode
- queue mode to check- Returns:
- valid queue mode or default
-
getTotalSize
public int getTotalSize()
-
getQueueCount
public int getQueueCount()
-
getQueueCountMaxExceptions
public int getQueueCountMaxExceptions()
-
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(Text url, CrawlDatum datum)
-
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(FetchItem it)
-
finishFetchItem
public void finishFetchItem(FetchItem it)
-
finishFetchItem
public void finishFetchItem(FetchItem it, boolean asap)
-
getFetchItemQueue
public FetchItemQueue getFetchItemQueue(String id)
-
getFetchItem
public FetchItem getFetchItem()
-
timelimitExceeded
public boolean timelimitExceeded()
- Returns:
- true if the fetcher timelimit is defined and has been exceeded
(
fetcher.timelimit.mins
minutes after fetching started)
-
checkTimelimit
public int checkTimelimit()
-
emptyQueues
public int emptyQueues()
-
checkExceptionThreshold
public int checkExceptionThreshold(String queueid, int maxExceptions, long delay)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue. The next fetch is delayed if specified by the paramdelay
or configured by the propertyfetcher.exceptions.per.queue.delay
.- Parameters:
queueid
- a queue identifier to locate and checkmaxExceptions
- custom-defined number of max. exceptions - if negative the value of the propertyfetcher.max.exceptions.per.queue
is used.delay
- a custom-defined time span in milliseconds to delay the next fetch in addition to the delay defined for the given queue. If a negative value is passed the delay is chosen byfetcher.exceptions.per.queue.delay
- Returns:
- number of purged items
-
checkExceptionThreshold
public int checkExceptionThreshold(String queueid)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue.- Parameters:
queueid
- queue identifier to locate and check- Returns:
- number of purged items
- See Also:
checkExceptionThreshold(String, int, long)
-
redirectIsQueuedRecently
public boolean redirectIsQueuedRecently(Text redirUrl)
- Parameters:
redirUrl
- redirect target- Returns:
- true if redirects are deduplicated and redirUrl has been queued recently
-
dump
public void dump()
-
-