Parsing and storing the values of a csv file using shell script outputs :::: instead of actual characters - bash

I am trying to read a csv file using shell script,using the following command.
cat file.csv | while read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
When i run this command the first column in the file is not being read properly.
For Ex: If 1 st column contents are
number1,
number2,
number3,
number4,
(so on)
It outputs:
::::er1,
::::er2,
::::er3,
::::er4,
some characters are replaced by ':'
this happens only for the first column contents. Where am i going wrong?

The problem is due to most likely a couple of issues:-
You are reading the file without the IFS=,
Your csv file might likely have carriage returns(\r) which could mangle how read command processes the input stream.
To remove the carriage returns(\r) use tr -d '\r' < oldFile.csv > newFile.csv and in the new file do the parsing as mentioned below.
Without setting the Internal Field Separator (IFS=","), while reading from the input stream read doesn't know where to delimit your words. Add the same in the command as below.
cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
You can see it working as below. I have the contents of the file.csv as follows.
$ cat file.csv
abc,def,ghi,ijk,lmn,opz
1,2,3,4,5,6
$ cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
abc:def:ghi:ijk:lmn:opz
1:2:3:4:5:6
More over using cat and looping it over it is not recommended and bash enthusiasts often call it as UUOC - Useless Use Of Cat
You can avoid this by doing
#!/bin/bash
while IFS="," read -r a b c d e f;
do
echo "$a:$b:$c:$d:$e:$f"
done < file.csv

Related

Extracting file content using a for loop [duplicate]

I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"

Convert data from a simple JSON format to a DSV format

I have a file in Unix, with data sample like the following:
{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}
The desired output is
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexico
456|Americas|Canada
567|APAC|Japan
I tried with a few sed commands. I could remove the following: '{', '}', ' " ', ':'
There are 2 issues with the output file
All rows from input appear in single line in the output.
Adding the pipe ('|') as delimiter.
Any pointers are highly appreciated.
I recommend the tool jq (http://stedolan.github.io/jq/); jq is a lightweight and flexible command-line JSON processor.
jq -r '"\(.ID)|\(.Region)|\(.Location)"' < infile
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Explanation
-r is --raw-output
Through awk,
awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file
Example:
$ cat file
{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}
$ awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
EXplanation:
-F'"' Sets " as Field Separator value.
OFS="|" Sets | as Output Field Separator value.
Atfirst, awk would execute the function inside the BEGIN block. It helps to print the header section.
This sed one-liner does what you want. It's capturing the field values using parenthesized expressions, and then putting them into the output using \1, \2, and \3.
s/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/
Invoke it like:
$ sed -f one-liner.sed input.txt
Or you can invoke it within a Bash script, producing the header:
echo 'ID|Region|Location'
sed -e 's/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/' $input
It is a JSON file so it is best to use a JSON parser. Here is a perl implementation of it.
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
open my $fh, '<', 'path/to/your/file';
#keys of your structure
my #key = qw(ID Region Location);
print join ("|", #key), "\n";
#iterate over your file, decode it and print in order of your key structure
while (my $json = <$fh>) {
my $text = decode_json($json);
print join ("|", map { $$text{$_} } #key ),"\n";
}
Output:
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Using sed as follows
Command line
echo "my_string" |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g'
or
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g' my_file
I tried this in a terminal as follows:
echo '{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}' |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g'
Output
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Many thanks for your response and the pointers/ solutions did help a lot.
For some mysterious reasons, I couldn't get any sed commands work. So, I devised my own solution. Although it's not elegant, it's still worked.
Here is the script I prepared which resolved the issue.
#!/bin/bash
# ource file path.
infile=/home/exfile.txt
# remove if these temp file exist already.
rm ./efile.txt ./xfile.txt ./yfile.txt ./zfile.txt
# removing the curly braces from input file.
cat exfile.txt | cut -d "{" -f2 | cut -d "}" -f1 >> ./efile.txt
# setting input file name to different value.
infile=./efile.txt
# remove double quotes from the file.
while IFS= read -r line
do
echo $line | sed 's/\"//g' >> ./xfile.txt
done < "$infile"
# creating another temp file.
infile2=./xfile.txt
# remove colon from file.
while IFS= read -r line
do
echo $line | sed 's/\:/,/g' >> ./yfile.txt
done < "$infile2"
# set input file path to new temp file.
infile3=yfile.txt
# initialize variables to hold header column values.
t1=0
t3=0
t5=0
# read each of the line to extract header row. Exit loop after reading 1st row.
once=1
while IFS=',' read -r f1 f2 f3 f4 f5 f6
do
"$f1 $f2 $f3 $f4 $f5 $f6"
t1=$f1
t3=$f3
t5=$f5
if [ "$once" -eq 1 ]; then
break
fi
done < "$infile3"
# Read each of the line from input file. Write only the value to another output file.
while IFS=',' read -r f1 f2 f3 f4 f5 f6
do
echo "$f2|$f4|$f6" >> ./zfile.txt
done < "$infile3"
# insert the header column row into the file generated in the step above.
frstline="$t1|$t3|$t5"
sed -i '1i ID|Region|Location' ./zfile.txt

Append index variable at the end of each line

I am trying to append an index variable at the end of each line of a file that I have. However I dont want to lose the escape characters that I have in the textFile and thus cannot echo into the file again.
Here's what I tried:
while read p; do
tempCom+=$p
tempCom+=$indexVar
echo $tempCom >> otherFile.txt
tempCom=""
done < result.txt
What I am after:
Read:
"asdasdasdasdasdasd\ asdasd/asda"
"qweqweqweqweqweqwe\ qweqwe/qweq"
Output:
"asdasdasdasdasdasd\ asdasd/asda" 1
"qweqweqweqweqweqwe\ qweqwe/qweq" 2
Note that indexVar is an index that is stored elsewhere and does not necessarily correspond to the line that its being appended to.
Your problem is very likely a quoting problem. Observe the IFS= and the -r option in the read statement too.
while IFS= read -r p
tempCom+=$p$indexVar
printf '%s\n' "$tempCom" >> otherFile.txt # Observe the quotes
tempCom=
done < result.txt
If you just want to append the line number to the end why not use awk?
awk '{print $0, "\t", NR}' < file.txt
EDIT 1: It sounds like you want to use paste then (assuming you want to just join line by line)
paste file1.txt file2.txt > fileresults.txt
EDIT 2: You can use sed then:
sed "s|$|${indexVar}|" input
Use the -r option of the read command, so that the backslashes are preserved.
while read -r p; do

Stopping paste after any input is exhausted

I have two programs that produce data on stdout, and I'd like to paste their output together. I can successfully do this like so:
paste <(./prog1) <(./prog2)
But I find that this method will print all lines from both inputs,
and what I really want is to stop paste after either input program is finished.
So if ./prog1 produces the output:
a
b
c
But ./prog2 produces:
Hello
World
I would expect the output:
a Hello
b World
Also note that one of the input programs may actually produce infinite output, and I want to be able to handle that case as well. For example, if my inputs are yes and ./prog2, I should get:
y Hello
y World
Use join instead, with a variation on the Schwartzian transform:
numbered () {
nl -s- -ba -nrz
}
join -j 1 <(prog1 | numbered) <(prog2 | numbered) | sed 's/^[^-]*-//'
Piping to nl numbers each line, and join -1 1 will join corresponding lines with the same number. The extra lines in the longer file will have no join partner and be omitted. Once the join is complete, pipe through sed to remove the line numbers.
Here's one solution:
while IFS= read -r -u7 a && IFS= read -r -u8 b; do echo "$a $b"; done 7<$file1 8<$file2
This has the slightly annoying effect of ignoring the last line of an input file if it is not terminated with a newline (but such a file is not a valid text file).
You can wrap this in a function, of course:
paste_short() {
(
while IFS= read -r -u7 a && IFS= read -r -u8 b; do
echo "$a $b"
done
) 7<"$1" 8<"$2"
}
Consider using awk:
awk 'FNR==NR{a[++i]=$0;next} FNR>i{exit}
{print a[FNR], $0}' <(printf "hello\nworld\n") <(printf "a\nb\nc\n")
hello a
world b
Keep the longer output producing program as your 2nd input.

inserting text into a specific line

I've got a text file, and using Bash I wish to insert text into into a specific line.
Text to be inserted for example is !comment: http://www.test.com into line 5
!aaaa
!bbbb
!cccc
!dddd
!eeee
!ffff
becomes,
!aaaa
!bbbb
!cccc
!dddd
!comment: http://www.test.com
!eeee
!ffff
sed '4a\
!comment: http://www.test.com' file.txt > result.txt
i inserts before the current line, a appends after the line.
you can use awk as well
$ awk 'NR==5{$0="!comment: http://www.test.com\n"$0}1' file
!aaaa
!bbbb
!cccc
!dddd
!comment: http://www.test.com
!eeee
!ffff
Using man 1 ed (which reads entire file into memory and performs in-place file editing without previous backup):
# cf. http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
line='!comment: http://www.test.com'
#printf '%s\n' H '/!eeee/i' "$line" . wq | ed -s file
printf '%s\n' H 5i "$line" . wq | ed -s file

Resources