Class TikaUtils
java.lang.Object
org.onehippo.forge.content.exim.core.util.TikaUtils
Apache Tika utilities.
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
parsePdfToString
(File pdfFile) Parses the given document and returns the extracted text content.static String
parsePdfToString
(InputStream pdfStream) Parses the given document and returns the extracted text content.static String
parsePdfToString
(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata) Parses the given document and returns the extracted text content.static String
parsePdfToString
(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata, int maxLength) Parses the given document and returns the extracted text content.static String
parsePdfToString
(URL pdfURL) Parses the given document and returns the extracted text content.
-
Method Details
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream) throws IOException, org.apache.tika.exception.TikaException Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input stream- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata) throws IOException, org.apache.tika.exception.TikaException Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input streammetadata
- document metadata- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata, int maxLength) throws IOException, org.apache.tika.exception.TikaException Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input streammetadata
- document metadatamaxLength
- maximum length of the returned string- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(File pdfFile) throws IOException, org.apache.tika.exception.TikaException Parses the given document and returns the extracted text content.- Parameters:
pdfFile
- PDF file- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(URL pdfURL) throws IOException, org.apache.tika.exception.TikaException Parses the given document and returns the extracted text content.- Parameters:
pdfURL
- PDF resource URL- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-