I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"
sample.text file .
var1=https://www.process.com
var2=https://www.hp.com
var3=http://www.google.com
:
:
varz=https://www.sample.com
i am sending this sample txt as input to one script.
that script should split the lines and assign the variables to diff parameters
like
$varn= $var1,....$varn
$value=https://www.sample.com ( all the variables value)
i am trying with below script not working .
#!/bin/bash
for $1 in ( cat sample.txt );
do
echo $1 #var1=https://www.process.com
sed 's/=/\n/g' $1 | awk 'NR%2==0'
done
main aim is to assign all urls to one variable and vars to one variable and process the file
If sample.text already contains your variable assignments for you, e.g.
var1=https://www.process.com
var2=https://www.hp.com
var3=http://www.google.com
and you want access to var1, var2, ... varn, then you are making things difficult on yourself by trying to read and parse sample.text instead of simply sourcing it with '.' or source.
For example, given sample.text containing:
$ cat sample.text
var1=https://www.process.com
var2=https://www.hp.com
var3=http://www.google.com
varz=https://www.sample.com
You need only source the file to access the variable, e.g.
#!/bin/bash
. sample.text || {
printf "error sourcing sample.text\n"
exit 1
}
printf "%s\n" $var{1..3} $varz
Example Use/Output
$ bash source_sample.sh
https://www.process.com
https://www.hp.com
http://www.google.com
https://www.sample.com
Look things over and let me know if you have further questions.
I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l
There's a getStrings() function that calls getPage() function that returns some html page. That html is piped through egrep and sed combination to get only 3 strings. Then I try to put every string into separate variable link, profile, gallery respectively using while read.. construction. But it works only inside the while...done loop because it runs in subprocess. What should I do to use those variables outside the getStrings() function?
getStrings() {
local i=2
local C=0
getPage $(getPageLink 1 $i) |
egrep *some expression that results in 3 strings* |
while read line; do
if (( (C % 3) == 0 )); then
link=$line
elif (( (C % 3) == 1 )); then
profile=$line
else
gallery=$line
fi
C=$((C+1)) #Counter
done
}
Simple: don't run the loop in a subprocess :)
To actually accomplish that, you can use process substitution.
while read line; do
...
done < <(getPage $(getPageLink 1 $i) | egrep ...)
For the curious, a POSIX-compatible way is to use a named pipe (and its possible that bash uses named pipes to implement process substitution):
mkfifo pipe
getPage $(getPageLink 1 $i) | egrep ... > pipe &
while read line; do
...
done < pipe
Starting in bash 4.2, you can just set the lastpipe option, which causes the last command in a pipeline to run in the current shell, rather than a subshell.
shopt -s lastpipe
getPage $(getPageLink 1 $i) | egrep ... | while read line; do
...
done
However, using a while loop is not the best way to set the three variables. It's easier to just call read three times within a command group, so that they all read from the same stream. In any of the three scenarios above, replace the while loop with
{ read link; read profile; read gallery; }
If you want to be a little more flexible, put the names of the variables you might want to read in an array:
fields=( link profile gallery )
then replace the while loop with this for loop instead:
for var in "${fields[#]}"; do read $var; done
This lets you easily adjust your code, should the pipeline ever return more or fewer lines, by just editing the fields array to have the appropriate field names.
One more solving using array:
getStrings() {
array_3=( `getPage | #some function
egrep | ...` ) #pipe conveyor
}
I'm relatively new to shell scripting and am writing a script to organize my music library. I'm using awk to parse the id3 tag info and am generating a newline separated list like so:
Kanye West
College Dropout
All Falls Down
I want to store each field in a separate variable so I can easily compose some mkdir and mv commands. I've tried piping the output to IFS=$'\n' read artist album title but each variable remains empty. I'm open to producing a different output from awk, but I still want to know how to parse a newline separated list using bash.
Edit:
It turns out that by piping directly to read by doing:
id3info "$filename" | awk "$awkscript" | {read artist; read album; read title;}
WILL NOT WORK. It results in the variables existing in a different scope. I found that using a herestring works best:
{read artist; read album; read title;} <<< "$(id3info "$filename" | awk "$awkscript")"
read normally reads one line at a time. So, if your id3 info is in the file testfile.txt, you can read it in as follows:
{ read artist ; read album ; read song ; } <testfile.txt
echo "artist='$artist' album='$album' song='$song'"
# insert your mkdir and mv commands....
When run on your test file, the above outputs:
artist='Kanye West' album='College Dropout' song='All Falls Down'
You can just read the file into a bash array and loop through the array like so:
IFS=$'\r\n' content=($(cat ${filepath}))
for ((idx = 0; idx < ${#content[#]}; idx+=3)); do
artist=${content[idx]}
album=${content[idx+1]}
title=${content[idx+2]}
done
Or read three lines in a loop.
yourscript |
while read artist; do # read first line of input
read album # read second line of input
read song # read third line of input
: self-destruct if the genre is rap
done
This loop will consume input lines in groups of three. If there is not an even multiple of three lines of input, the reads after that inside the loop will simply fail and the variables will be empty.
You can read the output from awk into an array. E.g.
readarray -t array <<< "$(printf '%s\n' 'Kanye West' 'College Dropout' 'All Falls Down')"
for ((i=0; i<${#array[#]}; i++ )) ; do
echo "array[$i]=${array[$i]}"
done
Produces:
array[0]=Kanye West
array[1]=College Dropout
array[2]=All Falls Down