Extracts specific content from a PDF document and returns it as text.
Common use cases
Data Manipulation WrkFlows
Application
PDF
Inputs (what you have)
NAME | DESCRIPTION | TYPE | REQUIRED | EXAMPLE |
File name | The name of the document | Text | Yes | File-1 |
Type of object(s) to extract
| The type of object(s) to extract
| Predefined Choice List
| Yes
| Page 1
|
Number of object(s) to extract
| Specify the number of objects to be extracted
| Integer
| Yes
| 1 cell, 2 rows
|
Search location keyword
| The keyword to be used to locate the content to be extracted. | Text
| No | Data
|
Page number | The document page where the content to be extracted is located | Integer | No | 1 |
Section heading name | The section where the content to be extracted is located | Text | No | Heading 1 |
Paragraph number | The paragraph where the content to be extracted is located | Integer | No | 5 |
Line number | The paragraph line where the content to be extracted is located | Integer | No | 3 |
Cell row | Row to extract, if object is a table | Integer | No | 5 |
Cell column | Column to extract, if the object is a table | Integer | No | 2 |
Note: The value of inputs can either be a set value in the configuration of the Wrk Action within the Wrkflow, or a variable from the Data Library. These variables in the Data Library are the outputs of previous Wrk Actions in the Wrkflow.
How it works
The extraction will always begin at the start of the object being extracted using the first instance of the keyword in relation to the page number, paragraph number or section heading.
Please note:
A paragraph is one or more sentences beginning on a new line.
Text to extract | Optional inputs to configure |
Pages | Only page number |
Paragraphs | Provide at least two of the following: Page number, paragraph number, section heading, search keyword |
Table cells | Row or column or both |
Bulleted lists | Provide a search keyword with a page number or section heading |
Sentences | Provide a search keyword, and at least one of the following: page number, paragraph number, line number |
Lines | Provide a search keyword and any or all optional inputs except cell row and cell column |
Words | Provide a search keyword and any or all optional inputs except cell row and cell column |
Outputs (what you get)
NAME | DESCRIPTION | TYPE | REQUIRED | EXAMPLE |
Extracted text | Text retrieved from the PDF document | Text | Yes | wrk technologies |
Outcomes
NAME | DESCRIPTION |
Success | This status is selected when the text was successfully retrieved from a PDF. |
No Result | This status is selected in the event of the following scenarios: - Information cannot be found in the PDF document |
Unsuccessful | This status is selected in the event of the following scenarios: - The file cannot be opened - The file is not a PDF document |
Requirements
N/A