Class TikaUtils
- java.lang.Object
-
- org.onehippo.forge.content.exim.core.util.TikaUtils
-
public final class TikaUtils extends Object
Apache Tika utilities.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
parsePdfToString(File pdfFile)
Parses the given document and returns the extracted text content.static String
parsePdfToString(InputStream pdfStream)
Parses the given document and returns the extracted text content.static String
parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata)
Parses the given document and returns the extracted text content.static String
parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata, int maxLength)
Parses the given document and returns the extracted text content.static String
parsePdfToString(URL pdfURL)
Parses the given document and returns the extracted text content.
-
-
-
Method Detail
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream) throws IOException, org.apache.tika.exception.TikaException
Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input stream- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata) throws IOException, org.apache.tika.exception.TikaException
Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input streammetadata
- document metadata- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(InputStream pdfStream, org.apache.tika.metadata.Metadata metadata, int maxLength) throws IOException, org.apache.tika.exception.TikaException
Parses the given document and returns the extracted text content.- Parameters:
pdfStream
- PDF input streammetadata
- document metadatamaxLength
- maximum length of the returned string- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(File pdfFile) throws IOException, org.apache.tika.exception.TikaException
Parses the given document and returns the extracted text content.- Parameters:
pdfFile
- PDF file- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
parsePdfToString
public static String parsePdfToString(URL pdfURL) throws IOException, org.apache.tika.exception.TikaException
Parses the given document and returns the extracted text content.- Parameters:
pdfURL
- PDF resource URL- Returns:
- extracted text content
- Throws:
IOException
- if IO exception occursorg.apache.tika.exception.TikaException
- if Tika exception occurs
-
-