Skip to main content
All CollectionsWrk Action LibraryPDF
Retrieve text from a PDF
Retrieve text from a PDF
Wrk Product avatar
Written by Wrk Product
Updated over a year ago

Extracts specific content from a PDF document and returns it as text.

Common use cases

  • Data Manipulation WrkFlows

Application

  • PDF

Inputs (what you have)

NAME

DESCRIPTION

TYPE

REQUIRED

EXAMPLE

File name

The name of the document

Text

Yes

File-1

Type of object(s) to extract

The type of object(s) to extract

  • Pages

  • Paragraphs

  • Table cell

  • Bulleted list

  • Numbered list

  • Sentences

  • Words

  • Lines

Predefined

Choice List

Yes

Page 1

Number of object(s) to extract

Specify the number of objects to be extracted

Integer

Yes

1 cell, 2 rows

Search location keyword

The keyword to be used to locate the content to be extracted.

Text

No

Data

Page number

The document page where the content to be extracted is located

Integer

No

1

Section heading name

The section where the content to be extracted is located

Text

No

Heading 1

Paragraph number

The paragraph where the content to be extracted is located

Integer

No

5

Line number

The paragraph line where the content to be extracted is located

Integer

No

3

Cell row

Row to extract, if object is a table

Integer

No

5

Cell column

Column to extract, if the object is a table

Integer

No

2

Note: The value of inputs can either be a set value in the configuration of the Wrk Action within the Wrkflow, or a variable from the Data Library. These variables in the Data Library are the outputs of previous Wrk Actions in the Wrkflow.

How it works

The extraction will always begin at the start of the object being extracted using the first instance of the keyword in relation to the page number, paragraph number or section heading.

Please note:

  • A paragraph is one or more sentences beginning on a new line.

Text to extract

Optional inputs to configure

Pages

Only page number

Paragraphs

Provide at least two of the following:

Page number, paragraph number, section heading, search keyword

Table cells

Row or column or both

Bulleted lists

Provide a search keyword with a page number or section heading

Sentences

Provide a search keyword, and at least one of the following:

page number, paragraph number, line number

Lines

Provide a search keyword and any or all optional inputs except cell row and cell column

Words

Provide a search keyword and any or all optional inputs except cell row and cell column

Outputs (what you get)

NAME

DESCRIPTION

TYPE

REQUIRED

EXAMPLE

Extracted text

Text retrieved from the PDF document

Text

Yes

wrk technologies

Outcomes

NAME

DESCRIPTION

Success

This status is selected when the text was successfully retrieved from a PDF.

No Result

This status is selected in the event of the following scenarios:

- Information cannot be found in the PDF document

Unsuccessful

This status is selected in the event of the following scenarios:

- The file cannot be opened

- The file is not a PDF document

Requirements

  • N/A

Did this answer your question?