How to read in multiple pipe-delimited CSV files?

Stephanie_Brady · February 1, 2021, 10:33pm

I have 500+ text files to read into Exploratory, and they are pipe-delimited:

This blog post gives an elegant solution for reading in multiple CSV files:

How to Read Multiple Excel or CSV Files Together

But when I try it, here is the result:

How can I specify that the delimiter is “|”?

sugiaki · February 2, 2021, 1:00am

How about the following solution? Please adjust the file path to your environment and run the following script in the R script data frame of Exploratory.

It may take a bit of time to load 500+ text files at once, so another solution may be needed…I hope this helps.

read_bar_csv <- function(files, ...){
  read_delim(file = files, delim = "|", skip = 1, ...)
}

files <- list.files(path = "~/Desktop/multiple_files/", pattern = "*.csv", full.names = TRUE)
df <- sapply(X = files, simplify=FALSE, FUN = read_bar_csv) %>% bind_rows(.id = "id")
df

I have prepared three sample csv data with pipe-delimited in the format you specify, as shown below.

> cat ~/Desktop/multiple_files/test001.csv
Number of observations used = 33|
facility|outfall|date|bod|nh3|flow|rain|sensor
TX00211|001 - 1 - A|05/31/2018|3.2|1.42|0.418|1|1150
TX00211|001 - 1 - A|05/31/2018|3.3|1.43|0.419|2|1151
TX00211|001 - 1 - A|05/31/2018|3.4|1.44|0.420|3|1152
TX00211|001 - 1 - A|05/31/2018|3.5|1.45|0.421|4|1153
~ 
> cat ~/Desktop/multiple_files/test002.csv
Number of observations used = 35|
facility|outfall|date|bod|nh3|flow|rain|sensor
TX00212|002 - 1 - A|05/31/2018|3.1|1.52|1.418|1|2150
TX00212|002 - 1 - A|06/31/2018|3.3|1.63|3.419|2|2151
TX00212|002 - 1 - A|07/31/2018|3.2|1.74|2.420|3|2152
TX00212|002 - 1 - A|08/31/2018|3.5|1.85|1.421|4|2153
~ 
> cat ~/Desktop/multiple_files/test003.csv
Number of observations used = 33|
facility|outfall|date|bod|nh3|flow|rain|sensor
TX00213|003 - 1 - A|05/31/2018|3.2|1.42|0.418|1|1150
TX00213|003 - 1 - A|05/31/2018|3.3|1.43|0.419|2|1151
TX00213|003 - 1 - A|05/31/2018|3.4|1.44|0.420|3|1152
TX00213|003 - 1 - A|05/31/2018|3.5|1.45|0.421|4|1153

Hide_Kojima1 · February 2, 2021, 5:07pm

Hi Stephanie,
if you use Windows, the path part would look like below. (i.e. please use \\ instead of /)

path = "C:\\Test\\Folder"

Stephanie_Brady · February 2, 2021, 5:10pm

@sugiaki thank you so much for the detailed post!

I made a few different attempts relating to replacing the file path and keep getting this error:

Any ideas what I can try next?

Stephanie_Brady · February 2, 2021, 5:23pm

@hide_kojima1

Thank you, I am on a Windows machine.
I have made the changes you suggested and I am still getting the “subscript out of bounds” error.

Stephanie_Brady · February 2, 2021, 6:01pm

My apologies, I actually have “.txt” files. Changing the extension (pattern = “*.txt”) worked great!

read_bar_csv <- function(files, ...){
  read_delim(file = files, delim = "|", skip = 1, ...)
}

files <- list.files(path = "C:/.../.../Test/Folder", pattern = "*.txt", full.names = TRUE)
df <- sapply(X = files, simplify=FALSE, FUN = read_bar_csv) %>% bind_rows(.id = "id")
df

Stephanie_Brady · February 3, 2021, 4:11am

One more update. There was some variation in my input files, and I had trouble importing all of them at once due to issues with how the data type was detected. I clicked the “Auto Detect Data Type” option off and it didn’t force the data types to character.

I looked into the read_delim function and learned how to set the col_type , as shown below:

@hide_kojima1 indicated that

if you uncheck the “Auto Detect Data Type”, since you use read_delim in your R script, it honors the the data type detected by the read_delim.

Thank you both for your help!