Interface CommonCrawlFormat

    • Method Detail

      • getJsonData

        String getJsonData()
                    throws IOException
        Get a string representation of the JSON structure of the URL content.
        Returns:
        the JSON URL content string
        Throws:
        IOException - if there is a fatal I/O error obtaining JSON data
      • getJsonData

        String getJsonData​(String url,
                           Content content,
                           Metadata metadata)
                    throws IOException
        Returns a string representation of the JSON structure of the URL content. Takes into consideration both the Content and Metadata
        Parameters:
        url - the canonical url
        content - url Content
        metadata - url Metadata
        Returns:
        the JSON URL content string
        Throws:
        IOException - if there is a fatal I/O error obtaining JSON data
      • setInLinks

        void setInLinks​(List<String> inLinks)
        sets inlinks of this document
        Parameters:
        inLinks - list of inlinks
      • getInLinks

        List<String> getInLinks()
        gets set of inlinks
        Returns:
        gets inlinks of this document
      • close

        void close()
        Optional method that could be implemented if the actual format needs some close procedure.
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface Closeable