Search the first and second variable (with spaces and special character like $) using awk - bash

I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.
inputfile.txt:
IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs
BigFile.txt:
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
it is working if i use the actual string but not if assigned to a variable.
[root#brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s+2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
any idea what am i missing?
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING

I see a number of problems here. First, where you split the fields from inputfile.txt with
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1 will be set correctly, but var2 will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read do the splitting for you:
while read -r var1 var2
...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:
while read -r var1 var2 ignoredstuff
See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
As for the awk commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0 to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v, is a lot better, but you still have to fix a couple of things.
In the -v a=$var1 b=$var2 part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" -v b="$var2". You should pretty much always double-quote variable references to prevent problems like this.
Also, the way you use those a and b variables in the awk command isn't right. $ in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2 gets the second field, and $(x+2) gets the x-plus-second field). Also, in a /.../ pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a and /$b/ is $0~a and $0~b (note that ~ is awk's regex match operator).
So the command should be something like this:
awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s+2 && $0~a{print $0}' BigFile.txt
Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the . characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $ in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b) instead:
awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s+2 && index($0,a){print $0}' BigFile.txt
I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.
Finally, I have to ask what's up with all the ok and s stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep search be simpler?

Your code is:
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=awk1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
I won't address the logic of the code (eg. you should probably be using match($0,"...") instead of /.../, and I don't know what the test NR<=s+2 is for) but here are some syntax and efficiency issues:
You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just: read -r var1 var2 or read -r var1 var2 junk
print is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}', etc)? But just use simple read instead.
Single-quotes prevent variable expansion so, inside the script argument passed to awk, /$var/ will literally look for dollar, v, a, r. Pass variables using awk's -v option as you do in the second awk line.
Each variable passed to awk needs a separate -v option.
awk does not use $name to reference variable values, simply name. To use a variable as a regex, just use it in the right place: eg. $0 ~ name.
So:
while read -r var1 var2 junk; do
# quote variables to prevent globbing, word-splitting, etc
awk -v a="$var1" -v "$var2" '
$0 ~ var2 { ok=1; s=NR }
ok && NR<=s+2 && $0 ~ var1 ; # print is default action
' BigFile.txt
done <inputfile.txt
Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.

Related

awk issue inside for loop

I have many files with different names that end with txt.
rtfgtq56.txt
fgutr567.txt
..
So I am running this command
for i in *txt
do
awk -F "\t" '{print $2}' $i | grep "K" | awk '{print}' ORS=';' | awk -F "\t" '{OFS="\t"; print $i, $1}' > ${i%.txt*}.k
done
My problem is that I want to add the name of every file in the first column, so I run this part:
awk -F "\t" '{OFS="\t"; print $i, $1}' > ${i%.txt*}
$i means the file that are in the for loop,
but it did not work because awk can't read the $i in the for loop.
Do you know how I can solve it?
You want to refactor eveything into a single Awk script anyway, and take care to quote your shell variables.
for i in *.txt
do
awk -F "\t" '/K/{a = a ";" $2}
END { print FILENAME, substr(a, 1) }' "$i" > "${i%.txt*}.k"
done
... assuming I untangled your logic correctly. The FILENAME Awk variable contains the current input file name.
More generally, if you genuinely want to pass a variable from a shell script to Awk, you can use
awk -v awkvar="$shellvar" ' .... # your awk script here
# Use awkwar to refer to the Awk variable'
Perhaps see also useless use of grep.
Using the -v option of awk, you can create an awk Variable based on a shell variable.
awk -v i="$i" ....
Another possibility would be to make i an environment variable, which means that awk can access it via the predefined ENVIRON array, i.e. as ENVIRON["i"].

Bash, awk, two arguments for one column

Need 2 arguments in awk command for one column.
Script, name todo.
#!/bin/bash
folder_main="$( cd $( dirname "${BASH_SOURCE[0]}" ) >/dev/null 2>&1 && pwd )"
if [ $1 = 'e' ]; then
mcedit $folder_main/kb/todo.kb
else
awk -F ',' '$1=="'$1'"' $folder_main/kb/todo.kb
fi
Expectation is when I write todo i, it will grep me lines with i OR c by the first column divided by ,.
I tried this.
awk -F ',' '$1=="{c|'$1'}"' $folder_main/kb/todo.kb
But nothing.
Thanks.
You should pass your shell variable to awk using -v and fix your awk syntax:
awk -F, -v a="$1" '$1 == "c" || $1 == a' "$folder_main/kb/todo.kb"
This sets the awk variable a to the value of the shell positional parameter $1, and prints the line if the first column is either "c" or whatever you passed as the first argument to the script.
You could also shorten the line slightly by using a regular expression match instead of two ==:
awk -F, -v a="$1" '$1 ~ "^(c|"a")$"' "$folder_main/kb/todo.kb"
Although I think that the first option is easier to read, personally. It is also safer to use, as a character with special meaning inside a regular expression (such as *, [, ( or {) could cause the script to either fail or behave in an unexpected way.
You can't use shell variables directly in awk like this. Instead you pass them into your awk script using the -v flag:
awk -F ',' -v searchterm=$1 '$1==searchterm' $folder_main/kb/todo.kb

Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

Passing bash input variables to awk

Trying to pass a variable into awk from user input:
Have tried variations of awk -v with errors stating 'awk: invalid -v option', even though the option is listed in man files.
#! /bin/bash
read -p "Enter ClassID:" CLASS
read -p "Enter FacultyName:" FACULTY
awk '/FacultyName/ {print}' data-new.csv > $FACULTY.csv
awk -vclass=${CLASS} '/class/ {print}' data-new.csv >> $FACULTY.csv
echo Class is $CLASS
echo Faculty member is $FACULTY
Some versions of awk require a space between the -v and the variable assignment. Also, you should put the bash variable in double-quotes to prevent unwanted parsing by the shell (e.g. word splitting, wildcard expansion) before it's passed to awk. Finally, in awk /.../ is a constant regular expression (i.e. /class/ will search for the string "class", not the value of the variable "class"). With all of this corrected, here's the awk command that I think will do what you want:
awk -v class="${CLASS}" '$0 ~ class {print}' data-new.csv >> $FACULTY.csv
Now, is there any reason you're using this instead of:
grep "$CLASS" data-new.csv >> $FACULTY.csv
Your script is not clear to me, but these all work:
CLASS=ec123
echo | awk -vclass=$CLASS '{print class}'
echo | awk -vclass=${CLASS} '{print class}'

How to assign the output of a command to a variable?

this is probably a very stupid question; in a bash script, given the output of, for instance;
awk '{print $7}' temp
it gives 0.54546
I would like to give this to a variable, so I tried:
read ENE <<< $(awk '{print $7}' temp)
but I get
Syntax error: redirection unexpected
Could you tell me why, and what is the easiest way to do this assignment?
Thanks
You can do command substitution as:
ENE=$(awk '{print $7}' temp)
or
ENE=`awk '{print $7}' temp`
This will assign the value 0.54546 to the variable ENE
your syntax should be
read ENE <<<$(awk '{print $1}' file)
you can directly assign the value as well
ENE=$(awk '{print $7}' temp)
you can also use the shell
$ var=$(< temp)
$ set -- $var
$ echo $7
or you can read it into array
$ declare -a array
$ read -a array <<<$(<file)
$ echo ${array[6]}
In general, Bash is kinda sensitive to spaces (requiring them some places, and breaking if they are added to other places,) which in my opinion is too bad. Just remember that there should be no space on either side of an equal sign, there should be no space after a dollar sign, and parentheses should be lined with spaces ( like this ) (not like this.)
`command` and $( command ) are the same thing, but $( this version can be $( nested ) ) whereas "this version can be `embedded in strings.` "

Resources