Package org.apache.nutch.tools
Class CommonCrawlFormatSimple
- java.lang.Object
-
- org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- org.apache.nutch.tools.CommonCrawlFormatSimple
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,CommonCrawlFormat
public class CommonCrawlFormatSimple extends AbstractCommonCrawlFormat
This class provides methods to map crawled data on JSON using a StringBuilder object.- See Also:
- StringBuilder
-
-
Field Summary
-
Fields inherited from class org.apache.nutch.tools.AbstractCommonCrawlFormat
conf, content, inLinks, jsonArray, keyPrefix, LOG, metadata, reverseKey, reverseKeyValue, simpleDateFormat, url
-
-
Constructor Summary
Constructors Constructor Description CommonCrawlFormatSimple(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
closeArray(String key, boolean nested, boolean newline)
protected void
closeObject(String key)
protected String
generateJson()
protected void
startArray(String key, boolean nested, boolean newline)
protected void
startObject(String key)
protected void
writeArrayValue(String value)
protected void
writeKeyNull(String key)
protected void
writeKeyValue(String key, String value)
-
Methods inherited from class org.apache.nutch.tools.AbstractCommonCrawlFormat
close, getImported, getInLinks, getJsonData, getJsonData, getJsonData, getKey, getMethod, getRequestAccept, getRequestAcceptEncoding, getRequestAcceptLanguage, getRequestContactEmail, getRequestContactName, getRequestHostAddress, getRequestHostName, getRequestRobots, getRequestSoftware, getRequestUserAgent, getResponseAddress, getResponseContent, getResponseContentEncoding, getResponseContentType, getResponseDate, getResponseHostName, getResponseServer, getResponseStatus, getTimestamp, getUrl, setInLinks
-
-
-
-
Constructor Detail
-
CommonCrawlFormatSimple
public CommonCrawlFormatSimple(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config) throws IOException
- Throws:
IOException
-
-
Method Detail
-
writeKeyValue
protected void writeKeyValue(String key, String value) throws IOException
- Specified by:
writeKeyValue
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
writeKeyNull
protected void writeKeyNull(String key) throws IOException
- Specified by:
writeKeyNull
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
startArray
protected void startArray(String key, boolean nested, boolean newline) throws IOException
- Specified by:
startArray
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
closeArray
protected void closeArray(String key, boolean nested, boolean newline) throws IOException
- Specified by:
closeArray
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
writeArrayValue
protected void writeArrayValue(String value)
- Specified by:
writeArrayValue
in classAbstractCommonCrawlFormat
-
startObject
protected void startObject(String key) throws IOException
- Specified by:
startObject
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
closeObject
protected void closeObject(String key) throws IOException
- Specified by:
closeObject
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
generateJson
protected String generateJson() throws IOException
- Specified by:
generateJson
in classAbstractCommonCrawlFormat
- Throws:
IOException
-
-