how to pass in a variable to awk commandline - bash

I'm having some trouble passing bash script variables into awk command-line.
Here is pseudocode:
for FILE in $INPUT_DIR/*.txt; do
filename=`echo $FILE | sed -n 's/^.*\(chr[0-9A-Z]*\).*.vcf$/\1/p'`
OUTPUT_FILE=$OUTPUT_DIR/$filename.snps.txt
egrep -v "^#" $FILE | awk '{print $2,$4,$5}' > $OUTPUT_FILE
done
The final line where I awk the columns, I would like it to be flexible or user input. For example, the user could want columns 6,7,and 8 as well, or column 133 and 138, or column 245 through 248. So how do I custom this so I can have that 'print $2 .... $5' be a user input thing? For example the user would run this script like : bash script.sh input_dir output_dir [user inputs whatever string of columns], and then I would get those columns in the output. I tried passing it in, but I guess I'm not getting the syntax right.

With awk, you should declare the variable before use it. This is better than the escape method (awk '{print $'$var'}'):
awk -v var1="$col1" -v var2="$col2" 'BEGIN {print var1,var2 }'
Where $col1 and $col2 would be the input variables.
Maybe you can try an input variable as string with "$2,$4,$5" and print this variable to get the values (I am not sure if this works)

The following test works for me:
A="\$3" ; ls -l | awk "{ print $A }"

Related

Awk add a variable to part of a column string

Objective
add "67" to column 1 of the output file with 67 being the variable ($iv) classified on the difference between 2 dates.
File1.csv
display,dc,client,20572431,5383594
display,dc,client,20589101,4932821
display,dc,client,23030494,4795549
display,dc,client,22973424,5844194
display,dc,client,21489000,4251031
display,dc,client,23150347,3123945
display,dc,client,23194965,2503875
display,dc,client,20578983,1522448
display,dc,client,22243554,920166
display,dc,client,20572149,118865
display,dc,client,23077785,28077
display,dc,client,21811100,5439
Current Output 3_file1.csv
BOB-UK-,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Desired Output 3_file1.csv
BOB-UK-67,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-67,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-67,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-67,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-67,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-67,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-67,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-67,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-67,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-67,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-67,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-67,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Current Code
#! bin/sh
set -eu
de=$(date +"%d-%m-%Y" -d "1 month ago")
ds="15-04-2014"
iv=$(awk -vdate1=$de -vdate2=$ds 'BEGIN{split(date1, A,"-");split(date2, B,"-");year_diff=A[3]-B[3];if(year_diff){months_diff=A[2] + 12 * year_diff - B[2] + 1;} else {months_diff=A[2]>B[2]?A[2]-B[2]+1:B[2]-A[2]+1};print months_diff}')
for f in $(find *.csv); do
awk -F"," -v OFS=',' '{print "BOB-UK-"$iv,$0,0.05}' $f > "1_$f.csv" ##PROBLEM LINE##
awk -F"," -v OFS=',' '{print $0,$6*$7/1000}' "1_$f.csv" > "2_$f.csv" ##calculate price
awk -F"," -v OFS=',' '{print $0}; {sum+=$6}{sum2+=$8} END {print "TOTAL,,,,," (sum)",,"(sum2)}' "2_$f.csv" > "3_$f.csv" ##calculate total
done
Issue
When I run the first awk line (Marked as "## PROBLEM LINE##") the loop doesn't change column $1 to include the "67" after "BOB-UK-". This should be done with the print "BOB-UK-"$iv but instead it doesn't do anything. I suspect this is due to the way print works in awk but I haven't been able to work out a way to treat it within this row. Does anyone know if this is possible or do I need to create a new row to achieve this?
You have to pass the variable value to awk. awk does not inherit variables from the shell and does not expand $variable variables like shell. It is another tool with it's internal language.
awk -v iv="$iv" -F"," -v OFS=',' '{print "BOB-UK-"iv,$0,0.05}' "$f"
Tested in repl with the input provided.
for f in $(find *.csv)
Is useless use of find, makes no sense, just
for f in *.csv
Also note that you are creating 1_$f.csv, 2_$f.csv and 3_$f.csv files in the current directory in your loop, so the next time you run your script there will be 4 times more .csv files to iterate through. Dunno if that's relevant.
How $iv works in awk?
The $<number> is the field number <number> from the line in awk. So for example the $1 is the first field of the line in awk. The $2 is the second field. The $0 is special and it is the whole line.
The $iv expands to $ + the value of iv. So for example:
echo a b c | awk '{iv=2; print $iv}'
will output b, as the $iv expands to $2 then $2 expands to the second field from the input - ie. b.
Uninitialized variables in awk are initialized with 0. So $iv is substituted for $0 in your awk line, so it expands for the whole line.

Store String var in awk command

I have txt file like this:
Dupont Charles
Martin Paul
Dupuis Jean
I want, for each line, to make a login corresponding to first 2 caracters of each names.
For instance : Dupont Charles ==> duch
awk '{print tolower(substr($1,1,1)substr($2,1,1))}' liste.txt
works perfectly.
but i want to store each login in var and call a bash script (uscript) with that var...
awk '{login=tolower(substr($1,1,1)substr($2,1,1));print $login; script1 $login; }' liste.txt
But it does not work and the content of login is not what I want.
The way to use variables in awk is different from Bash. In awk, a variable does not have a leading $. So what you want to say is:
awk '{login=tolower(substr($1,1,1)substr($2,1,1));print login; }' liste.txt
# ^
# intead of $login
However, you are willing to use a script script1 with this value. For this, you may want to use system() to call an external command............... or use a while block to handle all in one.
while IFS= read -r name surname
do
login=${name:0:2}${surname:0:2}
script1 "$login"
done < file
This uses the same logic to get the 3 first characters of a variable:
$ v="123456789"
$ echo ${v:0:3}
123
awk '{print tolower(substr($1,1,2)substr($2,1,2))}' liste.txt | xargs script1
e.g. using echo instead of script1 which of course I don't have:
$ awk '{print tolower(substr($1,1,2)substr($2,1,2))}' liste.txt | xargs echo
duch mapa duje
or if script1 requires 1 arg at a time:
awk '{print tolower(substr($1,1,2)substr($2,1,2))}' liste.txt | xargs -n1 script1
e.g.
$ awk '{print tolower(substr($1,1,2)substr($2,1,2))}' liste.txt | xargs -n1 echo
duch
mapa
duje

Get the part of string after the delimiter

The string format is
Executed: variable_name
What is the simplest way to get the *variable_name* sub-string?
foo="Executed: variable_name"
echo ${foo##* } # Strip everything up to and including the rightmost space.
or
set -- $foo
echo $2
Unlike other solutions using awk, sed, and whatnot, these don't fork other programs and save thousands of CPU cycles since they execute completely in your shell. They are also more portable (unlike ${i/* /} which is a bashism).
With sed:
echo "Executed: variable_name" | sed 's/[^:]*: //'
Using awk:
echo 'Executed: variable_name' | awk -F' *: *' '{print $2}'
variable_name
If you have the string in a variable:
$ i="Executed: variable_name"
$ echo ${i/* /}
variable_name
If you have the string as output of a command
$ cmd
Executed: variable_name
$ cmd | awk '{print $NF}'
variable_name
Note that 'NF' means "number of fields", so $NF is always the last field on the line. Fields are assumed to be separated by spaces (unless -F is specified)
If your variable_name could have spaces in it, the
-F' *: *'
mentioned previously ensures that only the ": " is used as a field separator. However, this will preserve spaces at the end of the line if there are any.
If the line is mixed in with other output, you might need to filter.
Either grep..
$ cmd | grep '^Executed: ' | awk '{print $NF}'
or more clever awk..
$ cmd | awk '/^Executed: /{print $NF}'

How to execute awk command in shell script

I have an awk command that extracts the 16th column from 3rd line in a csv file and prints the first 4 characters.
awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
This works fine.
But when I execute it from a shell script, I get and error
#!/bin/ksh
YEAR=awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
Error message:
-F,: not found
Use command substitution to assign the output of a command to a variable, as shown below:
YEAR=$(awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}')
you are asking the shell to do :
VAR=value command [arguments...]
which means: launch command but pass it the VAR=value environment first
(ex: LC_ALL=C grep '[0-9]*' /some/file.txt : will grep a number in file.txt (and this with the LC_ALL variable set to C just for the duration of the call of grep)
So here : you ask the shell to launch the -F"," command (ie, -F, once the shell interpret the "," into , with arguments 'NR==3.......... and with the variable YEAR set to the value awk for the duration of the command invocation.
Just replace it with :
#!/bin/ksh
YEAR="$(awk -F',' 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,1,4)}')"
(I didn't try it, but I hope they work for you and your sample.csv file)
(Note that you use "0" to match character position 1, which works in many awk implementation but not all (ie most (but not all) assume 1 when you write 0))
From your description, it looks like you want to extract the year from the 16th field, which might contain leading spaces. You can accomplish it by calling AWK once:
YEAR=$(awk -F, 'NR==3{sub(/^[ \t]*/, "", $16); print ">" substr($16,1,4) "<" }')
Better yet, you don't even have to use awk. Since you are already writing shell script, let's do it all in shell script:
{ read line; read line; read line; } < sample.csv # Get the third line
IFS=, set $line # Breaks line into comma-separated fields
IFS=" " set ${16} # Trick to remove leading spaces, field 16 becomes field 1
YEAR=${1:0:4} # Extract the first 4 char from field 1
Do this:
year=$(awk -F, 'NR==3{sub(/^[ \t]+/,"",$16); print substr($16,1,4); exit }' sample.csv)

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources