Read value from csv - bash

For a csv file appearing as:
variables = cl, cd, clp, clv, cdp, cdv, ...
-0.00000002, 0.01023266, -0.00000002, 0.00000000, 0.00985099, 0.00038167, ...
-0.00000000, 0.01023305, -0.00000000, 0.00000000, 0.00985080, 0.00038225,
-0.00000002, 0.01023390, -0.00000002, 0.00000000, 0.00985075, 0.00038315,
0.00000002, 0.01023482, 0.00000002, 0.00000000, 0.00985070, 0.00038412,
-0.00000004, 0.01023574, -0.00000004, 0.00000000, 0.00985065, 0.00038509,
...
I have a short script to read values from a csv file, but as it is, it returns the entire column. I need it to assign a single value to each variable.
export IFS=","
cat file.csv | while read a b c d e f; do echo "$b; done
This returns:
cd
0.01023266
0.01023305
0.01023390
0.01023482
0.01023574
How do I make this command return only:
0.01023482
Edit:
The following line is supposed to return a single value from my file:
row=4
col=2
str=$(awk 'BEGIN{FS=OFS=","} FNR==$row' history.out | awk '{print substr($col, 1, length($col)-1)}')
echo $str
It works when I use $4, but not $row. What's the final fix? As it is, $str is returned empty.
Thanks in advance.

EDIT: Since OP has shown samples now so adding solution as per it now.
row=4
col=2
awk -v R="$row" -v C="$col" 'BEGIN{FS=","} FNR==R{print $C}' Input_file
Change FS="," to FS=", " in case you have space in your comma delimiters too.
This could be easily done in awk.
awk 'BEGIN{FS=OFS=","} FNR==4' Input_file

Related

How can one dynamically create a new csv from selected columns of another csv file?

I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas.
I save these indexes to an array that looks like
echo "${cols_needed[#]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[#]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[#]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[#]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
Your first approach is ok, just:
change -v fields=${cols_needed[#]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
ie. BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[#]}"; do
fields[j++]="\$$i"
done
IFS=,
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
Expanding on my comments re: passing the bash array into awk:
Passing the array in as an awk variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i++) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[++n]=$1;next} END {for (i=1;i<=n;i++) print i,f[i]}' <(printf "%s\n" "${cols_needed[#]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash array in as an awk variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i++) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
Suppose you have this variable in bash:
$ echo "${cols_needed[#]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[#]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28

Awk add a variable to part of a column string

Objective
add "67" to column 1 of the output file with 67 being the variable ($iv) classified on the difference between 2 dates.
File1.csv
display,dc,client,20572431,5383594
display,dc,client,20589101,4932821
display,dc,client,23030494,4795549
display,dc,client,22973424,5844194
display,dc,client,21489000,4251031
display,dc,client,23150347,3123945
display,dc,client,23194965,2503875
display,dc,client,20578983,1522448
display,dc,client,22243554,920166
display,dc,client,20572149,118865
display,dc,client,23077785,28077
display,dc,client,21811100,5439
Current Output 3_file1.csv
BOB-UK-,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Desired Output 3_file1.csv
BOB-UK-67,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-67,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-67,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-67,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-67,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-67,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-67,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-67,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-67,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-67,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-67,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-67,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Current Code
#! bin/sh
set -eu
de=$(date +"%d-%m-%Y" -d "1 month ago")
ds="15-04-2014"
iv=$(awk -vdate1=$de -vdate2=$ds 'BEGIN{split(date1, A,"-");split(date2, B,"-");year_diff=A[3]-B[3];if(year_diff){months_diff=A[2] + 12 * year_diff - B[2] + 1;} else {months_diff=A[2]>B[2]?A[2]-B[2]+1:B[2]-A[2]+1};print months_diff}')
for f in $(find *.csv); do
awk -F"," -v OFS=',' '{print "BOB-UK-"$iv,$0,0.05}' $f > "1_$f.csv" ##PROBLEM LINE##
awk -F"," -v OFS=',' '{print $0,$6*$7/1000}' "1_$f.csv" > "2_$f.csv" ##calculate price
awk -F"," -v OFS=',' '{print $0}; {sum+=$6}{sum2+=$8} END {print "TOTAL,,,,," (sum)",,"(sum2)}' "2_$f.csv" > "3_$f.csv" ##calculate total
done
Issue
When I run the first awk line (Marked as "## PROBLEM LINE##") the loop doesn't change column $1 to include the "67" after "BOB-UK-". This should be done with the print "BOB-UK-"$iv but instead it doesn't do anything. I suspect this is due to the way print works in awk but I haven't been able to work out a way to treat it within this row. Does anyone know if this is possible or do I need to create a new row to achieve this?
You have to pass the variable value to awk. awk does not inherit variables from the shell and does not expand $variable variables like shell. It is another tool with it's internal language.
awk -v iv="$iv" -F"," -v OFS=',' '{print "BOB-UK-"iv,$0,0.05}' "$f"
Tested in repl with the input provided.
for f in $(find *.csv)
Is useless use of find, makes no sense, just
for f in *.csv
Also note that you are creating 1_$f.csv, 2_$f.csv and 3_$f.csv files in the current directory in your loop, so the next time you run your script there will be 4 times more .csv files to iterate through. Dunno if that's relevant.
How $iv works in awk?
The $<number> is the field number <number> from the line in awk. So for example the $1 is the first field of the line in awk. The $2 is the second field. The $0 is special and it is the whole line.
The $iv expands to $ + the value of iv. So for example:
echo a b c | awk '{iv=2; print $iv}'
will output b, as the $iv expands to $2 then $2 expands to the second field from the input - ie. b.
Uninitialized variables in awk are initialized with 0. So $iv is substituted for $0 in your awk line, so it expands for the whole line.

Replace a field with a value in the input data

I have a data
A BC 3 CD
note that the spaces in between the fields are not constant
Now I want to replace the third field with another number which is stored in another variable v.
I have used awk in this way:
echo "A BC 3 CD" | awk '{$3 = $v; print}'
The output is the third field is getting replaced with the entire line(wrong output)
Is there any possible to get the desired output without changing the spaces in the original data?
Thanks for your help!!
Try this:
$ v=25
$ echo "A BC 3 CD" | gawk '{print gensub(/[^ \t]+/, v, 3)}' v="$v"
A BC 25 CD
In your code, $v is being evaluated by awk, not bash, with v=0. Hence $3 gets replaced by $0, which is entire line.
Note that gensub is gawk enhancement...

Find and update(append) csv with shell script

Input file:
ID,Name,Values
1,A,vA|A2
2,B,VB
Expected output:
1,A,vA|VA2|vA3
2,B,VB
Search file for a given ID and then append a given value in the values {field}
use case : append 'testvalue' to the values filed of ID = 1
Problem is : How tho cache the line found ?
sed's s can be used to substitution, I used sed's p {print but of no use }.
Just set n to ID of the row you want to update and x to the value:
# vA3 to entry with ID==1
$ awk -F, '$1==n{$0=$0"|"x}1' n=1 x="vA3" file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
# TEST_VALUE to entry with ID==2
$ awk -F, '$1==x{$0=$0"|"v}1' x=2 v="TEST_VALUE" file
ID,Name,Values
1,A,vA|A2
2,B,VB|TEST_VALUE
Explanation:
-F, sets the field separator to be a comma.
$1==x checks if the line we are looking at contains the ID we want to change. Where $1 is the first field on each line and x is the variable we define.
If the previous condition was true then follow block gets executed {$0=$0"|"v} where $0 is the variable containing the whole line so we are just appending the string "|" and value of the variable v to end of the line.
The trailing 1 is just a shortcut in awk to say print the line. The 1 is the condition for the block which is evaluated to true and since no block is provide awk executes the default block {print $0}. Explicitly the script would be awk -F, '$1==n{$0=$0"|"x}{print $0}' n=1 x="vA3" file.
The following script is doing something similar to Your need. It is in pure bash.
#!/usr/bin/bash
[ $# -ne 2 ] && echo "Arg missing" && exit 1;
while read l; do
[ ${l%%,*} == "$1" ] && l="$l|$2"
echo $l
done <infile
You can use as script <ID> <VALUE>. Example:
$ ./script 1 va3
ID,Name,Values
1,A,vA|A2|va3
2,B,VB
$ cat infile
ID,Name,Values
1,A,vA|A2
2,B,VB
or may be this?
awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
$ echo "1,A,vA
2,B,VB" | awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
1,A,vA|VA2
2,B,VB
Edit 1: awk started supporting in-file substitution recently. But with your requirement it is best to go with sed solution that Kent has posted above.
$ cat file
ID,Name,Values
1,A,vA|A2
2,B,VB
$ awk '$1==1 { $NF=$NF"|vA3" } 1' FS=, OFS=, file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
are your looking for this?
kent$ echo "1,A,vA
2,B,VB"|sed '/vA/s/$/|VA2/'
1,A,vA|VA2
2,B,VB
EDIT check the ID, then replace
kent$ echo "ID,Name,Values
1,A,vA|A2
2,B,VB"|sed 's/^1,.*/&|vA3/'
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
& means the matched part. that would be what you meant "cache"
sed ' 1 a\ |VA2|vA3 ' file1.txt

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources