Here is a great cheat sheet of Regular Expression in R, from the folks at RStudio.
If you have any questions of using RegEx, feel free to ask!
Here is a great cheat sheet of Regular Expression in R, from the folks at RStudio.
If you have any questions of using RegEx, feel free to ask!
This book is a great reference for regular expressions, as well as the stringr site(which I wish I had know about earlier) :
http://stringr.tidyverse.org/articles/regular-expressions.html
Hi
I am stuck with some pdfs that I had to convert to text and they lost its tabular format.
The data (wich is .txt file) have this structure
Question A
This is the text that I want to extract.
Question B
This is the text in answer B.
Question N
Answer N
I am wondering how to extract the text among questions, e.g. A and B and then reshape it as a tabular format. For instance
Question Answer
Question A This is the text that I want to extract
Question B This is the answer B
Question N Answer N
Any clues?
Many thanks in advance.
Alan
HI Alan, Can you send me the text file so that I can look into it?
@Alan_Ponce I think something like this might be what you are after. I had a similar situation trying to analyze a large txt file, and I think this approach works pretty well.
See this:
Assuming your data has two lines for each chunk, I think something like this might work for you if you had a delimiter between Question A and your text:
read_lines from readr would read your text into a large character vector, which would allow you to parse it easily to work with.
(\s) captures the space in between your questions as a delimiter, and you can spread your data based on that delimiter into columns:
> ReadChunkFile <- function(x) {
> data_frame(text = read_lines(x)) %>%
> filter(text != "") %>%
> separate(text, c("var", "value"), "(\s)", extra = "merge") %>%
> mutate(
> chunk_id = rep(1:(nrow(.) / 13), each = 13),
> value = trimws(value)
> ) %>%
> spread(var, value)
> }
dataset would be a filepath to where your dataset is
df <- ReadChunkedFile(dataset)
Now you can call your function to read in the data.
@Kan_Nishida It might be worthwhile to have a tutorial on how to do this using exploratory (with functions from tidyr in exploratory)?