Class Http
- java.lang.Object
-
- org.apache.nutch.protocol.http.api.HttpBase
-
- org.apache.nutch.protocol.httpclient.Http
-
- All Implemented Interfaces:
Configurable
,Pluggable
,Protocol
public class Http extends HttpBase
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.
Documentation can be found on the Nutch HttpAuthenticationSchemes wiki page.
The original description of the motivation to support HttpPostAuthentication is also included on the Nutch wiki. Additionally HttpPostAuthentication development is documented at the NUTCH-827 Jira issue.
- Author:
- Susam Pal
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.slf4j.Logger
LOG
-
Fields inherited from class org.apache.nutch.protocol.http.api.HttpBase
accept, acceptCharset, acceptLanguage, BUFFER_SIZE, COOKIE, enableCookieHeader, enableIfModifiedsinceHeader, maxContent, maxCrawlDelay, maxDuration, partialAsTruncated, proxyException, proxyHost, proxyPort, proxyType, RESPONSE_TIME, responseTime, storeHttpHeaders, storeHttpRequest, storeIPAddress, timeout, tlsCheckCertificate, tlsPreferredCipherSuites, tlsPreferredProtocols, useHttp11, useHttp2, useProxy, userAgent
-
Fields inherited from interface org.apache.nutch.protocol.Protocol
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description Http()
Constructs this plugin.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Response
getResponse(URL url, CrawlDatum datum, boolean redirect)
Fetches theurl
with a configured HTTP client and gets the response.static void
main(String[] args)
Main method.void
setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets the configuration.-
Methods inherited from class org.apache.nutch.protocol.http.api.HttpBase
getAccept, getAcceptCharset, getAcceptLanguage, getConf, getCookie, getMaxContent, getMaxDuration, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getTlsPreferredCipherSuites, getTlsPreferredProtocols, getUseHttp11, getUserAgent, isCookieEnabled, isIfModifiedSinceEnabled, isStoreHttpHeaders, isStoreHttpRequest, isStoreIPAddress, isStorePartialAsTruncated, isTlsCheckCertificates, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy, useProxy, useProxy
-
-
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets the configuration.- Specified by:
setConf
in interfaceConfigurable
- Overrides:
setConf
in classHttpBase
- Parameters:
conf
- Configuration
-
main
public static void main(String[] args) throws Exception
Main method.- Parameters:
args
- Command line arguments- Throws:
Exception
- if a fatal error is encountered whilst running the program
-
getResponse
protected Response getResponse(URL url, CrawlDatum datum, boolean redirect) throws ProtocolException, IOException
Fetches theurl
with a configured HTTP client and gets the response.- Specified by:
getResponse
in classHttpBase
- Parameters:
url
- URL to be fetcheddatum
- Crawl dataredirect
- Follow redirects if and only if true- Returns:
- HTTP response
- Throws:
ProtocolException
IOException
-
-