Class TikaUtils


  • public final class TikaUtils
    extends Object
    Apache Tika utilities.
    • Method Detail

      • parsePdfToString

        public static String parsePdfToString​(InputStream pdfStream)
                                       throws IOException,
                                              org.apache.tika.exception.TikaException
        Parses the given document and returns the extracted text content.
        Parameters:
        pdfStream - PDF input stream
        Returns:
        extracted text content
        Throws:
        IOException - if IO exception occurs
        org.apache.tika.exception.TikaException - if Tika exception occurs
      • parsePdfToString

        public static String parsePdfToString​(InputStream pdfStream,
                                              org.apache.tika.metadata.Metadata metadata)
                                       throws IOException,
                                              org.apache.tika.exception.TikaException
        Parses the given document and returns the extracted text content.
        Parameters:
        pdfStream - PDF input stream
        metadata - document metadata
        Returns:
        extracted text content
        Throws:
        IOException - if IO exception occurs
        org.apache.tika.exception.TikaException - if Tika exception occurs
      • parsePdfToString

        public static String parsePdfToString​(InputStream pdfStream,
                                              org.apache.tika.metadata.Metadata metadata,
                                              int maxLength)
                                       throws IOException,
                                              org.apache.tika.exception.TikaException
        Parses the given document and returns the extracted text content.
        Parameters:
        pdfStream - PDF input stream
        metadata - document metadata
        maxLength - maximum length of the returned string
        Returns:
        extracted text content
        Throws:
        IOException - if IO exception occurs
        org.apache.tika.exception.TikaException - if Tika exception occurs
      • parsePdfToString

        public static String parsePdfToString​(File pdfFile)
                                       throws IOException,
                                              org.apache.tika.exception.TikaException
        Parses the given document and returns the extracted text content.
        Parameters:
        pdfFile - PDF file
        Returns:
        extracted text content
        Throws:
        IOException - if IO exception occurs
        org.apache.tika.exception.TikaException - if Tika exception occurs
      • parsePdfToString

        public static String parsePdfToString​(URL pdfURL)
                                       throws IOException,
                                              org.apache.tika.exception.TikaException
        Parses the given document and returns the extracted text content.
        Parameters:
        pdfURL - PDF resource URL
        Returns:
        extracted text content
        Throws:
        IOException - if IO exception occurs
        org.apache.tika.exception.TikaException - if Tika exception occurs