java.lang.Object
org.onehippo.forge.content.exim.core.util.TikaUtils

public final class TikaUtils extends Object
Apache Tika utilities.
  • Method Details

    • parsePdfToString

      public static String parsePdfToString(InputStream pdfStream) throws IOException, org.apache.tika.exception.TikaException
      Parses the given document and returns the extracted text content.
      Parameters:
      pdfStream - PDF input stream
      Returns:
      extracted text content
      Throws:
      IOException - if IO exception occurs
      org.apache.tika.exception.TikaException - if Tika exception occurs
    • parsePdfToString

      public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata) throws IOException, org.apache.tika.exception.TikaException
      Parses the given document and returns the extracted text content.
      Parameters:
      pdfStream - PDF input stream
      metadata - document metadata
      Returns:
      extracted text content
      Throws:
      IOException - if IO exception occurs
      org.apache.tika.exception.TikaException - if Tika exception occurs
    • parsePdfToString

      public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata, int maxLength) throws IOException, org.apache.tika.exception.TikaException
      Parses the given document and returns the extracted text content.
      Parameters:
      pdfStream - PDF input stream
      metadata - document metadata
      maxLength - maximum length of the returned string
      Returns:
      extracted text content
      Throws:
      IOException - if IO exception occurs
      org.apache.tika.exception.TikaException - if Tika exception occurs
    • parsePdfToString

      public static String parsePdfToString(File pdfFile) throws IOException, org.apache.tika.exception.TikaException
      Parses the given document and returns the extracted text content.
      Parameters:
      pdfFile - PDF file
      Returns:
      extracted text content
      Throws:
      IOException - if IO exception occurs
      org.apache.tika.exception.TikaException - if Tika exception occurs
    • parsePdfToString

      public static String parsePdfToString(URL pdfURL) throws IOException, org.apache.tika.exception.TikaException
      Parses the given document and returns the extracted text content.
      Parameters:
      pdfURL - PDF resource URL
      Returns:
      extracted text content
      Throws:
      IOException - if IO exception occurs
      org.apache.tika.exception.TikaException - if Tika exception occurs