Class FetchItemQueues


  • public class FetchItemQueues
    extends Object
    A collection of queues that keeps track of the total number of items, and provides items eligible for fetching from any queue.
    • Constructor Detail

      • FetchItemQueues

        public FetchItemQueues​(Configuration conf)
    • Method Detail

      • checkQueueMode

        protected static String checkQueueMode​(String queueMode)
        Check whether queue mode is valid, fall-back to default mode if not.
        Parameters:
        queueMode - queue mode to check
        Returns:
        valid queue mode or default
      • getTotalSize

        public int getTotalSize()
      • getQueueCount

        public int getQueueCount()
      • getQueueCountMaxExceptions

        public int getQueueCountMaxExceptions()
      • addFetchItem

        public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem​(Text url,
                                                                                   CrawlDatum datum)
      • addFetchItem

        public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem​(FetchItem it)
      • finishFetchItem

        public void finishFetchItem​(FetchItem it)
      • finishFetchItem

        public void finishFetchItem​(FetchItem it,
                                    boolean asap)
      • getFetchItem

        public FetchItem getFetchItem()
      • timelimitExceeded

        public boolean timelimitExceeded()
        Returns:
        true if the fetcher timelimit is defined and has been exceeded (fetcher.timelimit.mins minutes after fetching started)
      • checkTimelimit

        public int checkTimelimit()
      • emptyQueues

        public int emptyQueues()
      • checkExceptionThreshold

        public int checkExceptionThreshold​(String queueid,
                                           int maxExceptions,
                                           long delay)
        Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue. The next fetch is delayed if specified by the param delay or configured by the property fetcher.exceptions.per.queue.delay.
        Parameters:
        queueid - a queue identifier to locate and check
        maxExceptions - custom-defined number of max. exceptions - if negative the value of the property fetcher.max.exceptions.per.queue is used.
        delay - a custom-defined time span in milliseconds to delay the next fetch in addition to the delay defined for the given queue. If a negative value is passed the delay is chosen by fetcher.exceptions.per.queue.delay
        Returns:
        number of purged items
      • checkExceptionThreshold

        public int checkExceptionThreshold​(String queueid)
        Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue.
        Parameters:
        queueid - queue identifier to locate and check
        Returns:
        number of purged items
        See Also:
        checkExceptionThreshold(String, int, long)
      • redirectIsQueuedRecently

        public boolean redirectIsQueuedRecently​(Text redirUrl)
        Parameters:
        redirUrl - redirect target
        Returns:
        true if redirects are deduplicated and redirUrl has been queued recently
      • dump

        public void dump()