Bash, awk, two arguments for one column - bash

Need 2 arguments in awk command for one column.
Script, name todo.
#!/bin/bash
folder_main="$( cd $( dirname "${BASH_SOURCE[0]}" ) >/dev/null 2>&1 && pwd )"
if [ $1 = 'e' ]; then
mcedit $folder_main/kb/todo.kb
else
awk -F ',' '$1=="'$1'"' $folder_main/kb/todo.kb
fi
Expectation is when I write todo i, it will grep me lines with i OR c by the first column divided by ,.
I tried this.
awk -F ',' '$1=="{c|'$1'}"' $folder_main/kb/todo.kb
But nothing.
Thanks.

You should pass your shell variable to awk using -v and fix your awk syntax:
awk -F, -v a="$1" '$1 == "c" || $1 == a' "$folder_main/kb/todo.kb"
This sets the awk variable a to the value of the shell positional parameter $1, and prints the line if the first column is either "c" or whatever you passed as the first argument to the script.
You could also shorten the line slightly by using a regular expression match instead of two ==:
awk -F, -v a="$1" '$1 ~ "^(c|"a")$"' "$folder_main/kb/todo.kb"
Although I think that the first option is easier to read, personally. It is also safer to use, as a character with special meaning inside a regular expression (such as *, [, ( or {) could cause the script to either fail or behave in an unexpected way.

You can't use shell variables directly in awk like this. Instead you pass them into your awk script using the -v flag:
awk -F ',' -v searchterm=$1 '$1==searchterm' $folder_main/kb/todo.kb

Related

Search the first and second variable (with spaces and special character like $) using awk

I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.
inputfile.txt:
IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs
BigFile.txt:
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
it is working if i use the actual string but not if assigned to a variable.
[root#brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s+2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
any idea what am i missing?
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
I see a number of problems here. First, where you split the fields from inputfile.txt with
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1 will be set correctly, but var2 will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read do the splitting for you:
while read -r var1 var2
...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:
while read -r var1 var2 ignoredstuff
See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
As for the awk commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0 to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v, is a lot better, but you still have to fix a couple of things.
In the -v a=$var1 b=$var2 part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" -v b="$var2". You should pretty much always double-quote variable references to prevent problems like this.
Also, the way you use those a and b variables in the awk command isn't right. $ in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2 gets the second field, and $(x+2) gets the x-plus-second field). Also, in a /.../ pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a and /$b/ is $0~a and $0~b (note that ~ is awk's regex match operator).
So the command should be something like this:
awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s+2 && $0~a{print $0}' BigFile.txt
Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the . characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $ in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b) instead:
awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s+2 && index($0,a){print $0}' BigFile.txt
I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.
Finally, I have to ask what's up with all the ok and s stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep search be simpler?
Your code is:
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=awk1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
I won't address the logic of the code (eg. you should probably be using match($0,"...") instead of /.../, and I don't know what the test NR<=s+2 is for) but here are some syntax and efficiency issues:
You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just: read -r var1 var2 or read -r var1 var2 junk
print is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}', etc)? But just use simple read instead.
Single-quotes prevent variable expansion so, inside the script argument passed to awk, /$var/ will literally look for dollar, v, a, r. Pass variables using awk's -v option as you do in the second awk line.
Each variable passed to awk needs a separate -v option.
awk does not use $name to reference variable values, simply name. To use a variable as a regex, just use it in the right place: eg. $0 ~ name.
So:
while read -r var1 var2 junk; do
# quote variables to prevent globbing, word-splitting, etc
awk -v a="$var1" -v "$var2" '
$0 ~ var2 { ok=1; s=NR }
ok && NR<=s+2 && $0 ~ var1 ; # print is default action
' BigFile.txt
done <inputfile.txt
Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.

Bash scripting | awk with ariables

Works:
repquota $HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}"
But if i try insert this to variable it isn't working:
VARIABLE=`repquota $FULL_HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}"`
awk: {if( > 1572864 && < 302118056)print}
awk: ^ syntax error
Your bash syntax is way off. You're not quoting variables, wrongly quoting an awk script, and using deprecated backticks. What you seem to be trying to do would be:
VARIABLE=$(repquota "$FULL_HOME" | awk -v min="$MIN" -v max="$MAX" '($3>min) && ($3<max)')
but since you didn't provide any sample input and expected output it's an untested guess and it's always hard to tell what you DO want from reading a script that doesn't do what you want.
Use the new command substitution syntax $(command):
VARIABLE=$(repquota $FULL_HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}")
Explanation
From man bash:
When the old-style backquote form of substitution is used, backslash
retains its literal meaning except when followed by $, `, or \. The
first backquote not preceded by a backslash terminates the command sub‐
stitution. When using the $(command) form, all characters between the
parentheses make up the command; none are treated specially.
When using backslashes, a \$var inside a double-quoted string is not scaped, resulting in the value of $var being substituted, so awk does not see $3, as you expected.
You can see it with these commands:
var="I am a test string"
echo `echo "\$var"` # output: I am a test string
echo $(echo "\$var") # output: $var
Edit: As Ed Morton comments, you should not pass awk variables from shell that way, instead use the -v switch of awk:
VARIABLE=$(repquota $FULL_HOME | awk -v min="$MIN" -v max="$MAX" '{if($3 > min && $3 < max )print}')

Set variable in current shell from awk

Is there a way to set a variable in my current shell from within awk?
I'd like to do some processing on a file and print out some data; since I'll read the whole file through, I'd like to save the number of lines -- in this case, FNR.
Happens though I can't seem to find a way to set a shell variable with FNR value; if not this, I'd have to read the FNR from my output file, to set, say num_lines, with FNR value.
I've tried some combinations using awk 'END{system(...)}', but could not manage it to work. Any way around this?
Here's another way.
This is especially useful when when you've got the values of your variables in a single variable and you want split them up. For example, you have a list of values from a single row in a database that you want to create variables out of.
val="hello|beautiful|world" # assume this string comes from a database query
read a b c <<< $( echo ${val} | awk -F"|" '{print $1" "$2" "$3}' )
echo $a #hello
echo $b #beautiful
echo $c #world
We need the 'here string' i.e <<< in this case, because the read command does not read from a pipe and instead reads from stdin
$ echo "$var"
$ declare $( awk 'BEGIN{print "var=17"}' )
$ echo "$var"
17
Here's why you should use declare instead of eval:
$ eval $( awk 'BEGIN{print "echo \"removing all of your files, ha ha ha....\""}' )
removing all of your files, ha ha ha....
$ declare $( awk 'BEGIN{print "echo \"removing all of your files\""}' )
bash: declare: `"removing': not a valid identifier
bash: declare: `files"': not a valid identifier
Note in the first case that eval executes whatever string awk prints, which could accidentally be a very bad thing!
You can't export variables from a subshell to its parent shell. You have some other choices, though, including:
Make another pass of the file using AWK to count records, and use command substitution to capture the result. For example:
FNR=$(awk 'END {print FNR}' filename)
Print FNR in the subshell, and parse the output in your other process.
If FNR is the same as number of lines, you can call wc -l < filename to get your count.
A warning for anyone trying to use declare as suggested by several answers.
eval does not have this problem.
If the awk (or other expression) provided to declare results in an empty string then declare will dump the current environment.
This is almost certainly not what you would want.
eg: if your awk pattern doesn't exist in the input you will never print an output, therefore you will end up with unexpected behaviour.
An example of this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {print "var=17"}' )
echo "var=$var"
var=99
The current environment as seen by declare is printed
and $var is not changed
A minor change to store the value to set in an awk variable and print it at the end solves this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
To show this working with a matching pattern
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
Make awk print out the assignment statement:
MYVAR=NewValue
Then in your shell script, eval the output of your awk script:
eval $(awk ....)
# then use $MYVAR
EDIT: people recommend using declare instead of eval, to be slightly less error-prone if something other than the assignment is printed by the inner script. It's bash-only, but it's okay when the shell is bash and the script has #!/bin/bash, correctly stating this dependency.
The eval $(...) variant is widely used, with existing programs generating output suitable for eval but not for declare (lesspipe is an example); that's why it's important to understand it, and the bash-only variant is "too localized".
To synthesize everything here so far I'll share what I find is useful to set a shell environment variable from a script that reads a one-line file using awk. Obviously a /pattern/ could be used instead of NR==1 to find the needed variable.
# export a variable from a script (such as in a .dotfile)
declare $( awk 'NR==1 {tmp=$1} END {print "SHELL_VAR=" tmp}' /path/to/file )
export SHELL_VAR
This will avoid a massive output of variables if a declare command is issued with no argument, as well as the security risks of a blind eval.
echo "First arg: $1"
for ((i=0 ; i < $1 ; i++)); do
echo "inside"
echo "Welcome $i times."
cat man.xml | awk '{ x[NR] = $0 } END { for ( i=2 ; i<=NR ; i++ ) { if (x[i] ~ // ) {x[i+1]=" '$i'"}print x[i] }} ' > $i.xml
done
echo "compleated"

Passing bash input variables to awk

Trying to pass a variable into awk from user input:
Have tried variations of awk -v with errors stating 'awk: invalid -v option', even though the option is listed in man files.
#! /bin/bash
read -p "Enter ClassID:" CLASS
read -p "Enter FacultyName:" FACULTY
awk '/FacultyName/ {print}' data-new.csv > $FACULTY.csv
awk -vclass=${CLASS} '/class/ {print}' data-new.csv >> $FACULTY.csv
echo Class is $CLASS
echo Faculty member is $FACULTY
Some versions of awk require a space between the -v and the variable assignment. Also, you should put the bash variable in double-quotes to prevent unwanted parsing by the shell (e.g. word splitting, wildcard expansion) before it's passed to awk. Finally, in awk /.../ is a constant regular expression (i.e. /class/ will search for the string "class", not the value of the variable "class"). With all of this corrected, here's the awk command that I think will do what you want:
awk -v class="${CLASS}" '$0 ~ class {print}' data-new.csv >> $FACULTY.csv
Now, is there any reason you're using this instead of:
grep "$CLASS" data-new.csv >> $FACULTY.csv
Your script is not clear to me, but these all work:
CLASS=ec123
echo | awk -vclass=$CLASS '{print class}'
echo | awk -vclass=${CLASS} '{print class}'

Calling Awk in a shell script

I have this command which executes correctly if run directly on the terminal.
awk '/word/ {print NR}' file.txt | head -n 1
The purpose is to find the line number of the line on which the word 'word' first appears in file.txt.
But when I put it in a script file, it doens't seem to work.
#! /bin/sh
if [ $# -ne 2 ]
then
echo "Usage: $0 <word> <filename>"
exit 1
fi
awk '/$1/ {print NR}' $2 | head -n 1
So what did I do wrong?
Thanks,
Replace the single quotes with double quotes so that the $1 is evaluated by the shell:
awk "/$1/ {print NR}" $2 | head -n 1
In the shell, single-quotes prevent parameter-substitution; so if your script is invoked like this:
script.sh word
then you want to run this AWK program:
/word/ {print NR}
but you're actually running this one:
/$1/ {print NR}
and needless to say, AWK has no idea what $1 is supposed to be.
To fix this, change your single-quotes to double-quotes:
awk "/$1/ {print NR}" $2 | head -n 1
so that the shell will substitute word for $1.
You should use AWK's variable passing feature:
awk -v patt="$1" '$0 ~ patt {print NR; exit}' "$2"
The exit makes the head -1 unnecessary.
you could also pass the value as a variable to awk:
awk -v varA=$1 '{if(match($0,varA)>0){print NR;}}' $2 | head -n 1
Seems more cumbersome than the above, but illustrates passing vars.

Resources