Package org.apache.nutch.tools
Interface CommonCrawlFormat
-
- All Superinterfaces:
AutoCloseable
,Closeable
- All Known Implementing Classes:
AbstractCommonCrawlFormat
,CommonCrawlFormatJackson
,CommonCrawlFormatJettinson
,CommonCrawlFormatSimple
,CommonCrawlFormatWARC
public interface CommonCrawlFormat extends Closeable
Interface for all CommonCrawl formatter. It provides the signature for the method used to get JSON data.- Author:
- gtotaro
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
close()
Optional method that could be implemented if the actual format needs some close procedure.List<String>
getInLinks()
gets set of inlinksString
getJsonData()
Get a string representation of the JSON structure of the URL content.String
getJsonData(String url, Content content, Metadata metadata)
Returns a string representation of the JSON structure of the URL content.String
getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
Returns a string representation of the JSON structure of the URL content.void
setInLinks(List<String> inLinks)
sets inlinks of this document
-
-
-
Method Detail
-
getJsonData
String getJsonData() throws IOException
Get a string representation of the JSON structure of the URL content.- Returns:
- the JSON URL content string
- Throws:
IOException
- if there is a fatal I/O error obtaining JSON data
-
getJsonData
String getJsonData(String url, Content content, Metadata metadata) throws IOException
Returns a string representation of the JSON structure of the URL content. Takes into consideration both theContent
andMetadata
- Parameters:
url
- the canonical urlcontent
- urlContent
metadata
- urlMetadata
- Returns:
- the JSON URL content string
- Throws:
IOException
- if there is a fatal I/O error obtaining JSON data
-
getJsonData
String getJsonData(String url, Content content, Metadata metadata, ParseData parseData) throws IOException
Returns a string representation of the JSON structure of the URL content. Takes into consideration theContent
,Metadata
andParseData
.- Parameters:
url
- the canonical urlcontent
- urlContent
metadata
- urlMetadata
parseData
- urlParseData
- Returns:
- the JSON URL content string
- Throws:
IOException
- if there is a fatal I/O error obtaining JSON data
-
setInLinks
void setInLinks(List<String> inLinks)
sets inlinks of this document- Parameters:
inLinks
- list of inlinks
-
close
void close()
Optional method that could be implemented if the actual format needs some close procedure.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
-