Convert text file into a comma delimited string - bash

I don't seem to locate an SO question that matches this exact problem.
I have a text file that has one text token per line, without any commas, tabs, or quotes. I want to create a comma delimited string based on the file content.
Input:
one
two
three
Output:
one,two,three
I am using this command:
csv_string=$(tr '\n' ',' < file | sed 's/,$//')
Is there a more efficient way to do this?

The usual command to do this is paste
csv_string=$(paste -sd, file.txt)

You can do it entirely with bash parameter expansion operators instead of using tr and sed.
csv_string=$(<file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_string%,} # remove trailing comma

One way with Awk would be to reset the RS and treat the records as separated by blank lines. This would handle words with spaces and format them in CSV format as expected.
awk '{$1=$1}1' FS='\n' OFS=',' RS= file
The {$1=$1} is a way to reconstruct the fields in each line($0) of the file based on modifications to Field (FS/OFS) and/or Record separators(RS/ORS). The trailing 1 is to print every line with the modifications done inside {..}.

With Perl one-liner:
$ cat csv_2_text
one
two
three
$ perl -ne '{ chomp; push(#lines,$_) } END { $x=join(",",#lines); print "$x" }' csv_2_text
one,two,three
$ perl -ne ' { chomp; $_="$_," if not eof ;printf("%s",$_) } ' csv_2_text
one,two,three
$
From #codeforester
$ perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }' csv_2_text
one,two,three
$

Tested the four approaches on a Linux box - Bash only, paste, awk, Perl, as well as the tr | sed approach shown in the question:
#!/bin/bash
# generate test data
seq 1 10000 > test.file
times=${1:-50}
printf '%s\n' "Testing paste solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(paste -sd, test.file)
done
}
printf -- '----\n%s\n' "Testing pure Bash solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(<test.file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_strings%,} # remove trailing comma
done
}
printf -- '----\n%s\n' "Testing Awk solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(awk '{$1=$1}1' FS='\n' OFS=',' RS= test.file)
done
}
printf -- '----\n%s\n' "Testing Perl solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(perl -ne '{ chomp; $_="$_," if not eof; printf("%s",$_) }' test.file)
done
}
printf -- '----\n%s\n' "Testing tr | sed solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(tr '\n' ',' < test.file | sed 's/,$//')
done
}
Surprisingly, the Bash only solution does quite poorly. paste comes on top, followed by tr | sed, Awk, and perl:
Testing paste solution
real 0m0.109s
user 0m0.052s
sys 0m0.075s
----
Testing pure Bash solution
real 1m57.777s
user 1m57.113s
sys 0m0.341s
----
Testing Awk solution
real 0m0.221s
user 0m0.152s
sys 0m0.077s
----
Testing Perl solution
real 0m0.424s
user 0m0.388s
sys 0m0.080s
----
Testing tr | sed solution
real 0m0.162s
user 0m0.092s
sys 0m0.141s
For some reasons, csv_string=${csv_string//$'\n'/,} hangs on macOS Mojave running Bash 4.4.23.
Related posts:
How to join multiple lines of file names into one with custom delimiter?
Concise and portable “join” on the Unix command-line
Turning multi-line string into single comma-separated

Related

Replace one character by the other (and vice-versa) in shell

Say I have strings that look like this:
$ a='/o\\'
$ echo $a
/o\
$ b='\//\\\\/'
$ echo $b
\//\\/
I'd like a shell script (ideally a one-liner) to replace / occurrences by \ and vice-versa.
Suppose the command is called invert, it would yield (in a shell prompt):
$ invert $a
\o/
$ invert $b
/\\//\
For example using sed, it seems unavoidable to use a temporary character, which is not great, like so:
$ echo $a | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
\o/
$ echo $b | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
/\\//\
For some context, this is useful for proper printing of git log --graph --all | tac (I like to see newer commits at the bottom).
tr is your friend:
% echo 'abc' | tr ab ba
bac
% echo '/o\' | tr '\\/' '/\\'
\o/
(escaping the backslashes in the output might require a separate step)
I think this can be done with (g)awk:
$ echo a/\\b\\/c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a\/b/\c
$ echo a\\/b/\\c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a/\b\/c
$
-F "/" This defines the separator, The input will be split in "/", and should no longer contain a "/" character.
for(i=1;i<=NF;i++) gsub(/\\/,"/",$i);. This will replace, in all items in the input, the backslash (\) for a slash (/).
If you want to replace every instance of / with \, you can uses the y command of sed, which is quite similar to what tr does:
$ a='/o\'
$ echo "$a"
/o\
$ echo "$a" | sed 'y|/\\|\\/|'
\o/
$ b='\//\\/'
$ echo "$b"
\//\\/
$ echo "$b" | sed 'y|/\\|\\/|'
/\\//\
If you are strictly limited to GNU AWK you might get desired result following way, let file.txt content be
\//\\\\/
then
awk 'BEGIN{FPAT=".";OFS="";arr["/"]="\\";arr["\\"]="/"}{for(i=1;i<=NF;i+=1){if($i in arr){$i=arr[$i]}};print}' file.txt
gives output
/\\////\
Explanation: I inform GNU AWK that field is any single character using FPAT built-in variable and that output field separator (OFS) is empty string and create array where key-value pair represent charactertobereplace-replacement, \ needs to be escaped hence \\ denote literal \. Then for each line I iterate overall all fields using for loop and if given field hold character present in array arr keys I do exchange it for corresponding value, after loop I print line.
(tested in gawk 4.2.1)

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

Bash - Transpose a single field keeping the rest same and repeat it across

I have a file with pipe separated fields.
eg.
1,2,3|xyz|abc
I need the output in below format:
1|xyz|abc
2|xyz|abc
3|xyz|abc
I have a working code in bash:
while read i
do
f1=`echo $i | cut -d'|' -f1`
f2=`echo $i | cut -d'|' -f2-`
echo $f1 | tr ',' '\n' | sed "s:$:|$f2:" >> output.txt
done < pipe_delimited_file.txt
Can anyone suggest a way to achieve this witout using loop.
The file contains a large number of records.
Uses a loop, but it's inside awk, so very fast:
awk -F\| 'BEGIN{OFS="|"}{n = split($1, a, ","); $1=""; for(i=1; i<=n; i++) {print a[i] $0}}' pipe_delimited_file.txt
Perl may be a bit faster than awk:
perl -F'[|]' -ane 'for $n (split /,/, $F[0]) {$F[0] = $n; print join "|", #F}' file
bash is very slow, but here's a quicker way to use it. This uses plain bash without calling any external programs:
( # in a subshell:
IFS=, # use comma as field separator
set -f # turn off filename generation
while IFS='|' read -r f1 rest; do # temporarily using pipe as field separator,
# read first field and rest of line
for word in $f1; do # iterate over comma-separated words
echo "$word|$rest"
done
done
) < file

How to print "\n" character using bash?

I have a .csv file where some strings have some special characters like "\n" (new line).
I'm using this script to extract the data from column 1 and 3:
while IFS=";" read f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
The main problem is that in some $f3 I have the \n special character and I need to print it.
At the moment, this script is omitting this character.
e.g. If I have
\nXPTO
it will print
XPTO
and I would expect that would print
\nXPTO
Thanks
awk to the rescue!
$ echo "a;b;\nXPTO;d" | awk -F';' '{print $1 "=" $3}'
a=\nXPTO
or with file in/out
$ awk ... input_file > output_file
Use read -r to prevent read from interpreting escape sequences:
while IFS=";" read -r f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
Side Note: While it can be done with bash as I showed above, I agree with Karafka that awk is ideal for that kind of problems and performs very well. Better than bash itself, having that the input file has a significant size.

printing results in one line separated by commas in bash

How can I print all text file location separated by commas in one line? Can I do this in for loop?
Here is an example of files.
/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt
/data/home/files/txt_files_2/file3.txt
output would look like
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Thanks
Here is the correct code
#!/bin/bash
delim=""
for i in /data/home/files/txt_files_1/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
printf "\\"
printf "\n"
for i in /data/home/files/txt_files_2/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
For single file input:
awk -v OFS=, -v RS= 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Or
awk -v OFS=, -v RS= -v ORS='\n\n' 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
You can use printf "%s," "$file" to print several names into a single line. To get the delimiters right, I use this trick:
delim=""
...loop...
printf "%s%s" "$delim" "$file"
delim=","
printf "\n"
<command to generate lines of paths> | tr '\n' ','
example:
echo "/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt" | tr '\n' ','
outputs:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt,,/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt
Assuming your input is in a file called list, this Perl one-liner does the job:
perl -F'\n' -00 -ane 'push #a, join(",", #F) }{ print(join(" \\\n\n", #a), "\n")' list
explanation
-00, in combination with -n, reads the file one block (paragraph) at a time.
The -a switch in combination with -F'\n' auto-splits the text on each new line. The result goes into the array #F.
An array is built, each element containing the comma separated list of the elements in #F
Once the file has been processed, all the elements of the array #a are printed, joined together as you specified. The additional "\n" on the end is optional.
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt

Resources