Package org.apache.nutch.fetcher
Class FetchItem
- java.lang.Object
-
- org.apache.nutch.fetcher.FetchItem
-
public class FetchItem extends Object
This class describes the item to be fetched.
-
-
Constructor Summary
Constructors Constructor Description FetchItem(Text url, URL u, CrawlDatum datum, String queueID)
FetchItem(Text url, URL u, CrawlDatum datum, String queueID, int outlinkDepth)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static FetchItem
create(Text url, CrawlDatum datum, String queueMode)
Create an item.static FetchItem
create(Text url, CrawlDatum datum, String queueMode, int outlinkDepth)
Create an item.CrawlDatum
getDatum()
String
getQueueID()
Text
getUrl()
URL
getURL2()
-
-
-
Constructor Detail
-
FetchItem
public FetchItem(Text url, URL u, CrawlDatum datum, String queueID)
-
FetchItem
public FetchItem(Text url, URL u, CrawlDatum datum, String queueID, int outlinkDepth)
-
-
Method Detail
-
create
public static FetchItem create(Text url, CrawlDatum datum, String queueMode)
Create an item. Queue id will be created based onqueueMode
argument, either as a protocol + hostname pair, protocol + IP address pair or protocol+domain pair. Sets outlink depth to 0.- Parameters:
url
- URL of fetch itemdatum
- webpage information associated with the URLqueueMode
- either byHost, byDomain or byIP.- Returns:
- a
FetchItem
with outlinks depth of 0 - See Also:
FetchItemQueues.QUEUE_MODE_DOMAIN
,FetchItemQueues.QUEUE_MODE_HOST
,FetchItemQueues.QUEUE_MODE_IP
-
create
public static FetchItem create(Text url, CrawlDatum datum, String queueMode, int outlinkDepth)
Create an item. Queue id will be created based onqueueMode
argument, either as a protocol + hostname pair, protocol + IP address pair or protocol+domain pair. Configurable outlink depth.- Parameters:
url
- URL of fetch itemdatum
- webpage information associated with the URLqueueMode
- either byHost, byDomain or byIPoutlinkDepth
- the desired depth of outlink for this given FetchItem- Returns:
- a
FetchItem
- See Also:
FetchItemQueues.QUEUE_MODE_DOMAIN
,FetchItemQueues.QUEUE_MODE_HOST
,FetchItemQueues.QUEUE_MODE_IP
-
getDatum
public CrawlDatum getDatum()
-
getQueueID
public String getQueueID()
-
getUrl
public Text getUrl()
-
getURL2
public URL getURL2()
-
-