Import text data from PDF document

You can import text data from a PDF document using the pdftools package. Here is how.

  • Open the “Manage R package” dialog from the project menu.

  • Click the “Install New Packages” tab, type in pdftools and click “Install”.

Click the “+” icon at the Data Frame and select “R Script” to create a new R Script Data Frame.

Type in the following text. Replace <FILE_PATH_TO_PDF> with a full path to your PDF file.

pdf.text <- pdftools::pdf_text("<FILE_PATH_TO_PDF>")
data.frame(text=pdf.text)

You get full text in a single column called ‘text’. To extract the data that you need, you can check out the Text Wrangling document.

1 Like