Class ProtocolURLNormalizer

  • All Implemented Interfaces:
    Configurable, URLNormalizer

    public class ProtocolURLNormalizer
    extends Object
    implements URLNormalizer
    URL normalizer to normalize the protocol for all URLs of a given host or domain, e.g. normalize http://nutch.apache.org/path/ to https://www.apache.org/path/ if it's known that the host nutch.apache.org supports https and http-URLs either cause duplicate content or are redirected to https. See org.apache.nutch.net.urlnormalizer.protocol for details and configuration.