Setting awk to variable - bash

I have code that looks like this:
awk -F'|' '{if($1 in a)print "duplicate found:" $2 " AND "a[$1];else a[$1]=$2 }' dump.txt
I need to set $2 and a[$2] to a variable. How would I go about doing this?
I am taking a file that contains: a value "|" filename and then I want to set the filename and the value to two different variables.

What do you mean Set those variables? Are these environment variables. In Awk, variables that start with a dollar sign and are numeric are reserved for AWK. These are the field values for each line. For example:
test.txt =
this is line one
this is line two
this is line three
The command awk '{print $4}' test.txt will print out the fourth field:
$ awk '{print $4}' test.txt
one
two
three
$ awk '{print $3}' test.txt
line
line
line
As you can see: They don't have to be set. They're automatically set by Awk.
If you want to set environment variables, you can use the -v parameter
awk -v search="foo" '{
if (search = $1) {
print "Found your string in record " NR
}'
In the above, search is an Awk variable that is set equal to foo.
Since Awk is a programming language, it is sometimes easier to see what's going on by correctly formatting your program:
awk -F'|' '{
if($1 in a) {
print "duplicate found:" $2 " AND " a[$1]
}
else {
a[$1] = $2
}
}' dump.txt
The program is taking each line. Each line consists of two parts separated by the |. It appears that the first part in the key and the second part is the data. I've created a text file that looks like this:
This is a|test
That is a|line
who moans for|anchovies
whom moans for|anchovies
This is a|test again
The first and last line should show up as duplicates
I took your program and added a few debug lines. This will help me trace where it is in your program:
awk -F\| '{
if ($1 in a) {
print "DEBUG: In If clause"
print "duplicate found:" $2 " and " a[$1]
} else {
print "DEBUG: In else clause"
a[$1] = $2
print "DEBUG: a[" $1 "] = " a[$1]
}
print "DEBUG: $1 = " $1
print "DEBUG: $2 = " $2 "\n"
}' test.txt
And, this is my output
DEBUG: In else clause
DEBUG: a[This is a] = test
DEBUG: $1 = This is a
DEBUG: $2 = test
DEBUG: In else clause
DEBUG: a[That is a] = line
DEBUG: $1 = That is a
DEBUG: $2 = line
DEBUG: In else clause
DEBUG: a[who moans for] = anchovies
DEBUG: $1 = who moans for
DEBUG: $2 = anchovies
DEBUG: In else clause
DEBUG: a[whom moans for] = anchovies
DEBUG: $1 = whom moans for
DEBUG: $2 = anchovies
DEBUG: In If clause
duplicate found: test again and test
DEBUG: $1 =This is a
DEBUG: $2 = test again
Taking out the debug lines:
awk -F\| '{
if ($1 in a) {
print "duplicate found:" $2 " and " a[$1]
} else {
a[$1] = $2
}
}' test.txt
duplicate found: test again and test
As they say:
IT WORKS ON MY COMPUTER
(rimshot)
Seriously, what is your program suppose to be doing, and what do you see it do? Are there any errors? Your program appears to work as advertised.

Set them to an environment variable like this:
awk -F'|' '{if($1 in a)print "duplicate found:" '"$2"' " AND "a[$1];else a[$1]='"$2"' }' dump.txt
Note that what I did was to "disable" the single-quotes around the $2 parts, and add double-quotes around them in case the environment variable contains spaces (which awk wouldn't want to see splitting its argument into pieces).

Related

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

concatenate two variables from 2 different awk commands in a single echo

Job = grep 'Job:' | awk '{ print $3 }'
Status = grep 'Job Status:' | awk '{ print $3 }'
Both the variables are printed correctly by using two echo statements.I want a result like Job name - status in a single line.I have tried below commands. But its printing only 2nd variable like - status
echo "$Job - $Status"
echo "${Job} - ${Status}"
echo -e "${Job} - ${Status}"
please help!
You can do it with a single awk command:
awk '/Job:/ { job = $3 } /Job Status:/ { status = $3 } END { print job " - " status }' file
If Job: comes before Job Status:
awk '/Job:/ { job = $3 } /Job Status:/ { print job " - " $3; exit }' file
Or vice versa:
awk '/Job Status:/ { status = $3 } /Job Status:/ { print $3 " - " status; exit }' file
I think that should work:
echo $(awk ' /Job:/ { print $3} ' file)" - "$(awk ' /Job Status:/ { print $3} ' file)
but konsolebox's version is probably better, as there is only one awk invocation.
I think you are trying to find out how to get the result of running some command and store it in a variable. Then you want to do that twice and print both variables on the same line.
So the basic syntax is:
result=$(some command)
e.g. if
date +'%Y'
tells you the year is 2014, but you want 2014 in a variable called year, you can do
year=$(date +'%Y')
then you can echo $year like this:
echo $year
2014
So, coming to your actual question, you want two variables, one for the output of each of two commands:
job=$(grep "Job:" someFile | awk '{print $3}')
status=$(grep "Job Status:" someFile | awk '{print $3}')
then you can do:
echo $job $status
and get both things on the same line.
The other answers are saying you can avoid invoking awk twice, which is true, but doesn't explain how to capture the result of running a command into a variable. In general, you don't need to use awk and grep, because this:
grep xyz | awk ...
is equivalent to
awk '/xyz/ {...}'
but uses one fewer processes (i.e. no grep) and therefore fewer resources.
And by the way, you must not put any spaces either side of = in bash either. It is
variable=something
not
variable = something

Pass shell variables to AWK for a IF statement

This is what I'm trying to break my head over... but it still doesn't work
This is what works:
awk '{
if ( $2 ~ /FULL/ )
print "TRUE" $0
}' FILE1
But now I want to have the "/FULL/" part dynamic, so I can change this by linking it to a shell variable.
So this it what I want:
shellfull="/FULL/"
awk -v awkfull=$shellfull '{
if ( $2 ~ awkfull )
print "TRUE" $0
}' FILE1
Where FILE1 has the content:
84738273 RTF-Ref-FULL-Monday
38473847 The-Wed-DLL-DIFF-Fri-FULL
84839489 FULL
83945940 Schedule_FULL
84928049 Schedule_DIFF
It kinda looks like it doesn't let me use awkfull inside the "IF" statment, because I can print awkfull and it shows me indeed the shell variable content.
What I mean:
print awkfull //this works)
if ( $1 ~ awkfull) //this does NOT work... why?
UPDATE 1
Ty all for the responses, some gave really good idea's. But I feel like I need to describe my exact situation. Because I really need the "/" in order to find a word on a reandom place in a string, and I don't know what the string is going to look like in advance.
So here is what I want to use it for:
//Set Variable Content:
FULL="/FULL/||/Full/||/full/||/SQL-Backup_Fri-Last-Of-Month/"
DIFF="/DIFF/||/Diff/||/diff/||/San-Element/"
Now I have a big logfile, that has in one of its strings somewords that I'm going to use to determine if it counts as a FULL, or a DIFF, or maybe both based on if there is a word in $FULL or $DIFF matching a reandom place in that string
So if the stings in the file look like this:
84738273 RTF-Ref-full-Monday
38473847 The-Wed-DLL-DIFF-Fri-FULL
84839489 FULL
83945940 Schedule_Full_backup
84928049 Schedule_DIFF
83940392 2_SQL-Backup_Fri-Last-Of-Month-23049
84828348 Schedule_new-build
I want it to become:
FULL 84738273 RTF-Ref-full-Monday
DIFFFULL 38473847 The-Wed-DLL-DIFF-Fri-FULL
FULL 84839489 FULL
FULL 83945940 Schedule_Full
DIFF 84928049 Schedule_DIFF
FULL 83940392 2_SQL-Backup_Fri-Last-Of-Month-23049
UNKNOWN 84828348 Schedule_new-build
Now it is important to have the list with words that it is going to searche for setup in a dynamic way. So that with changing the variable $DIFF or $FULL it is going to search for different words in the file.
So how I wanted to achieve this was by the sript concept:
//set filter patterns
FULL="/FULL/||/Full/||/full/||/SQL-Backup_Fri-Last-Of-Month/"
DIFF="/DIFF/||/Diff/||/diff/||/San-Element/"
awk -v full=$FULL -v diff=$DIFF '{ //link the shell variable to awk variable, since a direct shell variable in awk didn't work.
if ( $2 ~ diff ) //find all strings with words defined in $DIFF
print "DIFF" $0 //print "DIFF" in front of the line
if ( $2 ~ full ) //find all strings with words defined in $FULL
print "FULL" $0 //print "FULL" in front of the line
if ( $2 !~ full||diff) //if a line does neighter contain words of $DIFF or $FULL.
print "UNKNOWN" $0 //print "UNKNOWN" in front of the line
}' FILE1 //load the file that needs to be filterd
This script needs to make clear to me, and itself, if it finds Schedules and Policies that are not yet defined. This so that the scipt knows when a new unknown schedule or policy is created and can warn me that I needs to adjust its filter. And it gives the script the ability to calculate how much the calculated Kbytes are going to a "unknown schedules and policies pool"
Hope this makes things a bit more clear. I need to search for multiple reandom word [FULL,full,SQL-DB,ect.] in a reandom place inside a string [thats why I wanted to use /$full/||/$diff/, or something like this].
Your string does not need to have the / for the comparison. With the following it works:
shellfull="FULL"
Test
$ shellfull="EMPTY"
$ awk -v patt=$shellfull '$1 ~ patt {print "TRUE" $0}' a
TRUEEMPTY 3928304
$ shellfull="FULL"
$ awk -v patt=$shellfull '$1 ~ patt {print "TRUE" $0}' a
TRUEFULL 2930429
TRUEFULL 3940229
Update
Based on the input file you just posted:
$ shellfull="FULL"
$ awk -v patt=$shellfull '$2 ~ patt {print "TRUE " $0}' a
TRUE 84738273 RTF-Ref-FULL-Monday
TRUE 38473847 The-Wed-DLL-DIFF-Fri-FULL
TRUE 84839489 FULL
TRUE 83945940 Schedule_FULL
In case you want exact matches:
$ awk -v patt=$shellfull '$2==patt {print "TRUE " $0}' a
TRUE 84839489 FULL
Update 2
If you want to match different words that you pass through an string, let's do it like this:
awk -v patt=$shellfull 'BEGIN{split(patt,a,"|")}
{for (i in a) if ($2==a[i]) {print "TRUE " $0; next}}' a
It gets the string, explodes it by the delimiter | (you can define another) and then looks for exact matches of each one of the given strings.
Test
$ shellfull="Schedule_DIFF|FULL"
$ awk -v patt=$shellfull 'BEGIN{split(patt,a,"|")} {for (i in a) if ($2==a[i]) {print "TRUE " $0; next}}' a
TRUE 84839489 FULL
TRUE 84928049 Schedule_DIFF
$ shellfull="FULL"
$ awk -v patt=$shellfull 'BEGIN{split(patt,a,"|")} {for (i in a) if ($2==a[i]) {print "TRUE " $0; next}}' a
TRUE 84839489 FULL
You can simply do it with:
shellfull="/FULL/"
awk '{
if ( $1 ~ '$shellfull' )
print "TRUE" $0
}' FILE1
This looks like it's now quoting the $shellfull variable, but it's actually "unquoting" it so that the shell will look at it.
Based on your latest update, it sounds like all you need is:
$ FULL="FULL|Full|full|SQL-Backup_Fri-Last-Of-Month"
$ DIFF="DIFF|Diff|diff|San-Element"
$ cat file
84738273 RTF-Ref-full-Monday
38473847 The-Wed-DLL-DIFF-Fri-FULL
84839489 FULL
83945940 Schedule_Full_backup
84928049 Schedule_DIFF
83940392 2_SQL-Backup_Fri-Last-Of-Month-23049
84828348 Schedule_new-build
$ awk -v full="$FULL" -v diff="$DIFF" '{
if ( ( $2 ~ diff ) && ( $2 ~ full ) )
print "DIFFFULL", $0
else if ( $2 ~ diff )
print "DIFF", $0
else if ( $2 ~ full )
print "FULL", $0
else
print "UNKNOWN", $0
}' file
FULL 84738273 RTF-Ref-full-Monday
DIFFFULL 38473847 The-Wed-DLL-DIFF-Fri-FULL
FULL 84839489 FULL
FULL 83945940 Schedule_Full_backup
DIFF 84928049 Schedule_DIFF
FULL 83940392 2_SQL-Backup_Fri-Last-Of-Month-23049
UNKNOWN 84828348 Schedule_new-build
but you need to decide if it's OK for words like "Different" to match with "Diff" if they happen to show up.
Getting Error while running below command:
Spacedata_1 is a file
mountpoint is external variable with some value
grep "$mountpoint" Spacedata_1 |awk -v mountpoint=$mountpoint 'if ($5==mountpoint ) { print $4 }'

getting result from awk and printing it into a file

I am using awk to get compute some sums and I want to store them in a file.
here is my input file:
misses 15
hit 18
misses 20
hit 31
And I want to print the total misses, and total hits in a file.
If I run this:
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
I see them in the terminal.
Now I want to write the in a file:
I tried this:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
echo misses $misses > $1; #first one overwrites the previous $1 is the argument given in the command line
echo hits $hits>> $1; # all the othes append to the .txt file
but $misses, and $hits do not have value.
I also tried this:
#!/bin/bash
result= $(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
# $NF if I want the last column
echo $result
that I saw on the web, in order to see what $result will return me but I get this error:
./test2.sh: line 2: Hits:: command not found
hits and misses are only variables inside awk, not in the shell after awk exits. Just do the following:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt > $1
In your second attempt, you cannot put a space after the '=':
result=$(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
simply redirect the output of the awk command:
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt >file.txt
the redirection operator > can be appended to any shell command to redirect its standard output to a file. changing it to >> appends the command's output to the file instead of completely overwriting the file, as you noticed.
edit:
the reason this didn't work:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
echo misses $misses > $1; #first one overwrites the previous $1 is the argument given in the command line
echo hits $hits>> $1; # all the othes append to the .txt file
is because $misses and $hits are local variables to the awk script. thus the shell has no knowledge of them outside that statment, so when you try to echo them, you get blanks.
and this doesn't work:
#!/bin/bash
result= $(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
# $NF if I want the last column
echo $result
for multiple reasons.
1) when assigning variables in bash, you cannot have whitespace around the equal sign, so the second line must begin:
`result=$(echo...`
2) the echo statement inside your substitution (echo $output) is unnecessary. this is because a) $output is undefined so echo produces no output, and b) the second statement in the pipeline (the awk statement) completely ignores the standard output of the command preceding it in the pipeline anyway since you specified a filename for it to act on (t.txt). so the second line could just be:
result=$(awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt)
3) the echo statement at the end will display the results all on one line, despite the fact that the awk statement prints two lines. this is because you did not quote the variable in your echo statement. try this instead:
echo "$result"
as a rule in bash scripting, you should ALWAYS quote the variables you are passing or printing, unless you know for sure why you don't want to.
hope that helps you learn a bit more about what you were trying!
Here is a more compact solution:
#!/bin/bash
awk '
{tally[$1] += $2}
END {
for (outcome in tally) {
print outcome ":", tally[outcome]
}
}' t.xt > $1
You don't have to initialize variables in AWK. The first time you use it, AWK will assume 0 for number, or "" for string, depend on the context.

set shell variable in awk and reuse

How can I pass a shell variable to awk, set it, use it in another awk in same line and print it?
I want to save $0 (all fields) into a variable first, parse $6 (ABC 123456M123000) - get '12300', do a range check on it and if it satisfies, print all fields ($0)
part 1: I am trying to do:
line="hello"
java class .... | awk -F, -v '{line=$0}' | awk 'begin my range check code' | if(p>100) print $line }
part2:
$6="ABC 123456M123000" ( string that I will parse)
Once I store all fields into a variable, I can parse $6 using this:
awk 'begin {FS=" "} { print $2; len=length($2); p=substr($2,8,len)+0 ; print len,p ; if(p>100) print $line }'
But my question is in part1: how to store $0 into a variable so that after my check is done, I can print them?
It's not clear why you need multiple invocations of awk. From your description, it looks like you are just trying to do:
... | awk -F, '{split( $6, f, "M" )} f[2] > min' min=100
or, if you can't split on 'M' but need to use substr (or some other method to extract the desired value):
... | awk -F, '{ split( $6, f, " " )} 0+substr( f[2], 8 ) > min' min=100
With the shell:
java ... | while IFS= read -r line ; do
sixth=$(IFS=,; set -- $line; echo "$6")
val=${sixth:11}
(( $val > 100 )) && echo "$line"
done
Some bash-isms there.

Resources