Awk dealing with variables containing whitespace - bash

I want to pass a string with whitespaces as a variable to awk from a Bash script, but independent of how I escape it, awk will complain. Please consider the following example:
list1:
one
two
three
four
The output:
[user#actual ~]$ ./dator.sh list1
1470054866 two (...)
A working script:
CMD='awk'
DATE=$(date +%s)
VARIABLES="-v time=$DATE"
SCRIPT='NR>=2 {printf "%s %s\n", time, $1;}'
$CMD $VARIABLES "$SCRIPT" $1
And only changing the date-formatting will break it:
CMD='awk'
DATE=$(date -u)
VARIABLES="-v time=$DATE"
SCRIPT='NR>=2 {printf "%s %s\n", time, $1;}'
$CMD $VARIABLES "$SCRIPT" $1
How should I escape it?
Every kind of quoting I'm aware of doesn't work.
Translating and inserting escaping "\" before whitespace doesn't make a difference.
Printing the variable via a function as suggested in another solution didn't work.

Arrays were designed for storing arbitrary arguments.
current_date=$(date +%u)
variables=( -v "time=$current_date")
script='NR >= 2 {printf "%s %s\n", time, $1;}'
awk "${variables[#]}" "$script" "$1"

Related

Assign bash value from value in specific line

I have a file that looks like:
>ref_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSD
>ref_frame=2
HQGLDISTMCFHRDGKDHQQYSKVA*QKS*SLLENKIQT*LSINTWMICM*DLT
>ref_frame=3
TRD*ISVQCASTGMERITSNIPK*HDKNLRAF*KTKSRHSYLSIHG*FVCRI*
>test_3_2960_3_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPSRKQNPDIVIYQYMDDLYVGSD
I want to assign a bash variable so that echo $variable gives test_3_2960
The line/row that I want to assign the variable to will always be line 7. How can I accomplish this using bash?
so far I have:
variable=`cat file.txt | awk 'NR==7'`
echo $variable = >test_3_2960_3_frame=1
Using sed
$ variable=$(sed -En '7s/>(([^_]*_){2}[0-9]+).*/\1/p' input_file)
$ echo "$variable"
test_3_2960
No pipes needed here...
$: variable=$(awk -F'[>_]' 'NR==7{ OFS="_"; print $2, $3, $4; exit; }' file)
$: echo $variable
test_3_2960
-F is using either > or _ as field separators, so your data starts in field 2.
OFS="_" sets the Output Field Separator, but you could also just use "_" instead of commas.
exit keeps it from wasting time bothering to read beyond line 7.
If you wish to continue with awk
$ variable=$(awk 'NR==7' file.txt | awk -F "[>_]" '{print $2"_"$3"_"$4}')
$ echo $variable
test_3_2960

Search the first and second variable (with spaces and special character like $) using awk

I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.
inputfile.txt:
IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs
BigFile.txt:
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,Peter.Salez#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
it is working if i use the actual string but not if assigned to a variable.
[root#brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s+2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz#brz.com
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
any idea what am i missing?
awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
I see a number of problems here. First, where you split the fields from inputfile.txt with
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1 will be set correctly, but var2 will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read do the splitting for you:
while read -r var1 var2
...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:
while read -r var1 var2 ignoredstuff
See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
As for the awk commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0 to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v, is a lot better, but you still have to fix a couple of things.
In the -v a=$var1 b=$var2 part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" -v b="$var2". You should pretty much always double-quote variable references to prevent problems like this.
Also, the way you use those a and b variables in the awk command isn't right. $ in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2 gets the second field, and $(x+2) gets the x-plus-second field). Also, in a /.../ pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a and /$b/ is $0~a and $0~b (note that ~ is awk's regex match operator).
So the command should be something like this:
awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s+2 && $0~a{print $0}' BigFile.txt
Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the . characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $ in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b) instead:
awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s+2 && index($0,a){print $0}' BigFile.txt
I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.
Finally, I have to ask what's up with all the ok and s stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep search be simpler?
Your code is:
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=awk1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
I won't address the logic of the code (eg. you should probably be using match($0,"...") instead of /.../, and I don't know what the test NR<=s+2 is for) but here are some syntax and efficiency issues:
You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just: read -r var1 var2 or read -r var1 var2 junk
print is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}', etc)? But just use simple read instead.
Single-quotes prevent variable expansion so, inside the script argument passed to awk, /$var/ will literally look for dollar, v, a, r. Pass variables using awk's -v option as you do in the second awk line.
Each variable passed to awk needs a separate -v option.
awk does not use $name to reference variable values, simply name. To use a variable as a regex, just use it in the right place: eg. $0 ~ name.
So:
while read -r var1 var2 junk; do
# quote variables to prevent globbing, word-splitting, etc
awk -v a="$var1" -v "$var2" '
$0 ~ var2 { ok=1; s=NR }
ok && NR<=s+2 && $0 ~ var1 ; # print is default action
' BigFile.txt
done <inputfile.txt
Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.

Bash, awk, two arguments for one column

Need 2 arguments in awk command for one column.
Script, name todo.
#!/bin/bash
folder_main="$( cd $( dirname "${BASH_SOURCE[0]}" ) >/dev/null 2>&1 && pwd )"
if [ $1 = 'e' ]; then
mcedit $folder_main/kb/todo.kb
else
awk -F ',' '$1=="'$1'"' $folder_main/kb/todo.kb
fi
Expectation is when I write todo i, it will grep me lines with i OR c by the first column divided by ,.
I tried this.
awk -F ',' '$1=="{c|'$1'}"' $folder_main/kb/todo.kb
But nothing.
Thanks.
You should pass your shell variable to awk using -v and fix your awk syntax:
awk -F, -v a="$1" '$1 == "c" || $1 == a' "$folder_main/kb/todo.kb"
This sets the awk variable a to the value of the shell positional parameter $1, and prints the line if the first column is either "c" or whatever you passed as the first argument to the script.
You could also shorten the line slightly by using a regular expression match instead of two ==:
awk -F, -v a="$1" '$1 ~ "^(c|"a")$"' "$folder_main/kb/todo.kb"
Although I think that the first option is easier to read, personally. It is also safer to use, as a character with special meaning inside a regular expression (such as *, [, ( or {) could cause the script to either fail or behave in an unexpected way.
You can't use shell variables directly in awk like this. Instead you pass them into your awk script using the -v flag:
awk -F ',' -v searchterm=$1 '$1==searchterm' $folder_main/kb/todo.kb

Passing bash input variables to awk

Trying to pass a variable into awk from user input:
Have tried variations of awk -v with errors stating 'awk: invalid -v option', even though the option is listed in man files.
#! /bin/bash
read -p "Enter ClassID:" CLASS
read -p "Enter FacultyName:" FACULTY
awk '/FacultyName/ {print}' data-new.csv > $FACULTY.csv
awk -vclass=${CLASS} '/class/ {print}' data-new.csv >> $FACULTY.csv
echo Class is $CLASS
echo Faculty member is $FACULTY
Some versions of awk require a space between the -v and the variable assignment. Also, you should put the bash variable in double-quotes to prevent unwanted parsing by the shell (e.g. word splitting, wildcard expansion) before it's passed to awk. Finally, in awk /.../ is a constant regular expression (i.e. /class/ will search for the string "class", not the value of the variable "class"). With all of this corrected, here's the awk command that I think will do what you want:
awk -v class="${CLASS}" '$0 ~ class {print}' data-new.csv >> $FACULTY.csv
Now, is there any reason you're using this instead of:
grep "$CLASS" data-new.csv >> $FACULTY.csv
Your script is not clear to me, but these all work:
CLASS=ec123
echo | awk -vclass=$CLASS '{print class}'
echo | awk -vclass=${CLASS} '{print class}'

Why awk '{ print }' doesn't start a new line but loops on space char

I have this shell script
#!/bin/bash
LINES=$(awk '{ print }' filename.txt)
for LINE in $LINES; do
echo "$LINE"
done
And filename.txt has this content
Loreum ipsum dolores
Loreum perche non se imortale
The shell script is iterating all spaces of the lines in filename.txt while it is supposed to loop only those two lines.
But when I type the "awk '{ print }' filename.txt" in terminal then it loops ok.
Any explanations?
Thanks in advance!
The $(...) construct absorbs all the output from awk as one large string, and then for LINE in $LINES splits on whitespace. You want this construct instead:
#! /bin/sh
while read LINE; do
printf '%s\n' "$LINE"
done < filename.txt
The other answers are good, another thing you can do is temporarily change your IFS (Internal Field Separator) variable. If you update your shell script to look like this:
#!/bin/bash
IFS="
"
LINES=$(awk '{ print }' filename.txt)
for LINE in $LINES; do
echo "$LINE"
done
This updates the IFS to be a newline instead of ' ' which should also do what you want.
Just another suggestion.
You need to loop over LINES as an array as all lines are stored as an array there.
Here's an example how to loop over the lines:
http://tldp.org/LDP/abs/html/arrays.html#SCRIPTARRAY

Resources