Class ProtocolURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- All Implemented Interfaces:
Configurable
,URLNormalizer
public class ProtocolURLNormalizer extends Object implements URLNormalizer
URL normalizer to normalize the protocol for all URLs of a given host or domain, e.g. normalizehttp://nutch.apache.org/path/
tohttps://www.apache.org/path/
if it's known that the hostnutch.apache.org
supports https and http-URLs either cause duplicate content or are redirected to https. Seeorg.apache.nutch.net.urlnormalizer.protocol
for details and configuration.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description ProtocolURLNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Configuration
getConf()
String
normalize(String url, String scope)
void
setConf(Configuration conf)
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
normalize
public String normalize(String url, String scope) throws MalformedURLException
- Specified by:
normalize
in interfaceURLNormalizer
- Throws:
MalformedURLException
-
-