awk add previously initiated variable bash - bash

I've got a problem with adding variables to AWK.
I've got a file with the following input:
MainComputer>
Device ID: name1
Interface: 1
random text...
Device ID: name2
Interface: 2
random text....
Device ID: name3
Interface: 3
random text....
Now I want to print all the variables: What I already have:
#!/bin/bash
line=$(head -n 1 file)
var=$(echo $line | cut -d ">" -f1)
var2=$(awk '/Interface/ {print $2}' file)
awk -v var3="$var" '/Device/ {print var3, "->", $2, "[Label: "$var2"]}' file
But the $var2 doesn't show an output, and if I put: var2 it gives an error.
Output I want:
MainComputer -> name1 [Label: 1]
MainComputer -> name2 [Label: 2]
MainComputer -> name3 [Label: 3]
And so on for the other 40 inputs....
But it only gives MainComputer -> name1 and an error on the interface label...
So I'm seeking a method for printing more than 1 var that I already initialized.
Thanks anyway!

If I understand what you're attempting to do correctly, this could be better achieved in pure awk:
awk -F ': *' 'NR == 1 { sub(/>.*/, ""); name = $0; next } $1 == "Device ID" { dev = $2 } $1 == "Interface" { print name " -> " dev " [Label: " $2 "]" }' file
-F ': *' sets the field separator to the regex : * (matching a colon followed by an arbitrary number of spaces). The code works as follows:
NR == 1 { # in the first line:
sub(/>.*/, "") # remove > and everything after it
name = $0 # remember as machine name
next # do nothing else
}
$1 == "Device ID" { # after that: In device lines
dev = $2 # remember device name
}
# In interface lines: print remembered info
$1 == "Interface" {
print name " -> " dev " [Label: " $2 "]"
}
This will show all lines in the file formatted the way you show in the question. If you want to filter for a particular line, this could be amended as follows:
awk -v filter="name2" -F ': *' 'NR == 1 { sub(/>.*/, ""); name = $0; next } $1 == "Device ID" { dev = $2 } $1 == "Interface" && dev == filter { print name " -> " dev " [Label: " $2 "]" }' file
That is, make a variable filter known to awk with -v filter=value, and amend the code
NR == 1 {
sub(/>.*/, "")
name = $0
next
}
$1 == "Device ID" {
dev = $2
}
$1 == "Interface" && dev == filter { # <-- here
print name " -> " dev " [Label: " $2 "]"
}

Related

AWK in single line pass multiple commands

I would like to combine the following multiple awk commands into a single awk program:
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) } 1' sorce > destfil
awk -F 'FS' '{ $3 = ($3 == "]" ? "" : $3) } 1' sorce > destfil
awk -F 'FS' '{ $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
I have tried to accomplish this using && but the result is not what I expected.
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
The output seems to have various ZERO's in it.
Question:
How can I merge these lines?
What is the origin of the ZEROS?
Thank you!
#RavinderSingh13, as I tried your code, sample input file and output file as per below
[user#restt]$ tail source
{
}
]
}
{
" e t
{
}
]
}
[user#test]$ awk -F 'FS' '{$1=($1=="}"?"":$1); $3=($3=="]" ? "" : $3) ; $5=($5=="}" ? "}," :$5);} 1' source > target
[user#test]$ tail target
{
}
]
}
{
" e t
{
}
]
}
I think the issue is related to field separator -F 'FS' or I was not sure.
#kvantour, Below I have given my sample input file & command what I am running & what output I am getting & what I require.
Source file content :
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
A command I am running
$ awk '($1=="}"){$1="First Column"}
($3=="]"){$3="third Column"}
($5=="}"){$5="Fifth Column"}
{$1=$1}1' sample.json > out
Output I am getting :
[root#centos-src ~]# cat out
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
but I am expecting output is:
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
In a nice awk structure, one would write:
awk -F 'FS' '($1=="}"){$1=""}
($3=="]"){$3=""}
($5=="}"){$5="},"}
{$1=$1}1' <file>
The reason I add $1=$1 to the list, is to reprocess $0 for the correct OFS in case none of the above conditions was satisfied. If you don't do this, you will have lines printed with FS as field separator and others with OFS.
So why are you getting a bunch of zeros?
Let's look at your one-liner:
$1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5)
And simplify it by assuming that the ternary operators between brackets return a variable. So we can rewrite it as:
$1 = var1 && $3 = var3 && $5 = var5
Taking into account that:
expr1 && expr2 has a higher precedence than value = expr.
lvalue = expr returns the value of expr
We can see that awk interprets this as
$1 = var1 && ($3 = (var3 && ($5 = var5) ) )
So the result will be:
$5 = var5
$3 = var3 && $5 equalling var3 && var5
$1 = var1 && $3 equalling var1 && var5
This is visible in the following example:
$ echo "a b c d e f" | awk '{ $1="p" && $3 = "q" && $5 = "r"}1'
1 b 1 d rf
Finally, in awk an empty string and a numeric zero has the logical value false and anything else true. So since two of your original ternary operators can return empty strings, they will ensure that the logical AND will return false, which is equivalent to the number ZERO. Hence $1 and $3 will be both matched with ZERO if the original $3 equals ]
Update (after receiving [mcve])
What you try to achieve is not that easy. First off, it seems you assume that the column number implies the character number in the line. This is sadly not the case. Awk, in default mode, assumes that field $n is the nth word in the line where a word is a sequence of characters not containing any blank. So in the following text,
}
]
}
all characters are actually referenced by $1.
Under the assumption that your JSON file is perfectly indented, one could use the following:
awk '/^} *$/{$0="First Column"}
/^ ] *$/{$0=" Thrid Column"}
/^ } *$/{$0=" Fifth Column"}
{print $0}' <file>
However, if your JSON file is not indented uniformly, things become rather messy. The easiest would be to parse the file first with jq as
jq . <json-file> | awk ...
Is this what you're trying to do (given your source input file)?
$ awk '
BEGIN{ FS="[ ]"; map[1,"}"]=map[3,"]"]=map[5,"}"]="" }
{ for (i=1;i<=NF;i++) $i=((i,$i) in map ? map[i,$i] : $i); print }
' file
{
{
" e t
{
Usage ; to separate statements:
awk ... '{ $1 = ($1 == "}" ? "" : $1); $3 = ($3 == "]" ? "" : $3); $5 = ($5 == "}" ? "}," : $5); } 1' ...
Since you haven't shown your sample Input_file so couldn't test it, could you please try following.
awk -F 'FS' '{$1=($1=="}"?"":$1);$3=($3=="]"?"":$3);$5=($5=="}"?"":$5);} 1' sorce > destfil

Got stuck with multiple value validation against in particular columns in awk?

I have a text file where i'm trying to validate with particular column(5) if that column contains value like ACT,LFP,TST and EPO then file goes to further process else it should be exit.Here i'm if my text file contains these value in column number 5 means ACT,LFP,TST and EPO go for further process on other hand if column contains apart from that four value then script will terminate.
Code
cat test.txt \
| awk -F '~' -v ERR="/a/x/ERROR" -v NAME="/a/x/z/" -v WRKD="/a/x/b/" -v DATE="23_09_16" -v PD="234" -v FILE_NAME="FILENAME" \
'{ if ($5 != "ACT" || $5 != "LFP" || $5 != "EPO" || $5 != "TST")
system("mv "NAME" "ERR);
system("rm -f"" "WRKD);
print DATE" " PD " " "[" FILE_NAME "]" " ERROR: Panel status contains invalid value due to this file move to error folder";
print DATE" " PD " " "[" FILE_NAME "]" " INFO: Script is exited";
system("exit");
}' >>log.txt
Txt file: test.txt(Note:- File should be processed successfully)
161518~CHEM~ACT~IRPMR~ACT~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~LFP~UD
030767~CHEM~ACT~IRPMR~LFP~UD
Txt file: test1.txt(Note:- File should not be processed successfully.This file contains one invalid value)
161518~CHEM~ACT~IRPMR~**ACT1**~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
awk to the rescue!
Lets assume the following input file:
010282~CHEM~ACT~IRPMR~ACT~UD
121212~CHEM~ACT~IRPMR~ZZZ~UD
162794~CHEM~ACT~IRPMR~TST~UD
020202~CHEM~ACT~IRPMR~YYY~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
010101~CHEM~ACT~IRPMR~XXX~UD
123456~CHEM~ACT~IRPMR~TST~UD
1) This example illustrates how to check for invalid lines/records in the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print "Unexpected value # line " NR " [" $5 "]"
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
Unexpected value # line 2 [ZZZ]
Unexpected value # line 4 [YYY]
Unexpected value # line 7 [XXX]
2) This example illustrates how to filter out (remove) invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
{
if( a[i] == $5 )
{
print $0
next
}
}
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
123456~CHEM~ACT~IRPMR~TST~UD
3) This example illustrates how to display the invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print $0
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
121212~CHEM~ACT~IRPMR~ZZZ~UD
020202~CHEM~ACT~IRPMR~YYY~UD
010101~CHEM~ACT~IRPMR~XXX~UD
Hope it Helps!
Without getting into the calls to system, this will show you an answer.
awk -F"~" '{ if (! ($5 == "ACT" || $5 == "LFP" || $5 == "EPO" || $5 == "TST")) print $0}' data.txt
output
161518~CHEM~ACT~IRPMR~**ACT1**~UD
This version is testing if $5 matches at least one item in the list. If it doesn't (the ! at the front of the || chain tests), then it prints the record as an error.
Of course, $5 will match only one from that list at a time, but that is all you need.
By contrast, when you say
if ($5 != "ACT" || $5 != "LFP" ...)
You're creating a logic test that can never be true. If $5 does not equal "ACT" because it is "LFP", you have already had the chained condition fail, and the remaining || will not be checked.
IHTH

Finding Contiguous Ranges

I would like to find the contiguous ranges given a set of dates by day
given the following sample
2016-01-01
2016-01-02
2016-01-03
2016-01-04
2016-01-05
2016-01-06
2016-01-08
2016-01-09
2016-01-10
2016-01-11
2016-01-12
2016-01-15
2016-01-16
2016-01-17
2016-01-20
2016-01-21
2016-01-30
2016-01-31
2016-02-01
I expect the following result
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
I have already came across this question which is almost the opposite of what I want but with integers.
I have formulated the following which works with integers.
awk 'NR==1 {l=$1; n=$1} {if ($1==n){n=$1+1} else{print l"-"n-1; l=$1 ;n=$1+1} } END {print l"-"$1}' file.txt
With GNU awk for mktime():
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currSecs = mktime( $1" "$2" "$3" 0 0 0" ) }
(currSecs - prevSecs) > (24*60*60) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevSecs = currSecs; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-02-01
With any awk if you don't care about ranges restarting when months change (as apparent in your expected output and the comment under your question):
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currYrMth = $1 FS $2; currDay = $3 }
(currYrMth != prevYrMth) || ((currDay - prevDay) > 1) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevYrMth = currYrMth; prevDay = currDay; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
If you have GNU Awk you can use its time functions.
gawk -F - 'NR==1 || $1 "-" $2 "-" $3 != following {
if (following != "") print start "-" latest;
start = $1 "-" $2 "-" $3
this = mktime($1 " " $2 " " $3 " 0 0 0")
}
{
this += 24*60*60
following = strftime("%F", this)
latest = $1 "-" $2 "-" $3 }
END { if (start != latest) print start "-" latest }' filename
Unit ranges will print like "2016-04-15-2016-04-15" which is a bit of a wart, but easy to fix if you need to. Also the END block has a bug in this case, but again, this should at least get you started.
gawk:
#!/bin/awk -f
BEGIN{
FS="-"
}
{
a[NR]=mktime($1" "$2" "$3" 0 0 0")
b[NR]=$2;
if ( (a[NR-1]+86400) != a[NR] || b[NR-1]!=b[NR] ) {
if(NR!=1){
print s" - "strftime("%Y-%m-%d",a[NR-1])
};
s=$0
}
}
END{
print s" - "$0
}
Create array a with index NR and value as epochtime derived from $0 using awk time function mktime.
Array b with index NR and value as the month in $2
if either epoch time from last line + 86400 ( +1 day) is not equal to epoch time in current line or month in previous line and current line differs, except for first line, print value in s" - "strftime("%Y-%m-%d",a[NR-1] and reassign s which is the start date with $0
END:
Print the last start time s and last line

Add info to output -- obtained from a shell command execution

I have files containing indented lines such as:
table 't'
field 'abc'
field 'def' and #enabled=true
field 'ghi'
table 'u'
I want to transform it to:
table 't'
field 'abc' [info about ABC]
field 'def' [info about DEF] and #enabled=true
field 'ghi' [info about GHI]
table 'u'
where the string between brackets is get from the call of a shell script (get-info, that fetches the definition of terms 'abc', 'def' and 'ghi').
I tried with AWK (via the cmd | getline output mechanism):
awk '$1 == "field" {
$2 = substr($2, 2, length($2) - 2)
cmd = "get-info \"" $2 "\" 2>&1 | head -n 1" # results or error
while (cmd | getline output) {
print $0 " [" output "]";
}
close(cmd)
next
}
// { print $0 }'
but it does not respect the indentation!
How could I fulfil my wish?
It looks like what you're trying to do would be:
$1 == "field" {
cmd = "get-info \"" substr($2,2,length($2)-2) "\" 2>&1" # results or error
if ( (cmd | getline output) > 0 ) {
sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/,"& ["output"]")
}
close(cmd)
}
{ print }
Note you don't need the head -1, just don't read the output in a loop.
e.g.:
$ cat tst.awk
$1 == "field" {
cmd = "echo \"--->" substr($2,2,length($2)-2) "<---\" 2>&1"
if ( (cmd | getline output) > 0 ) {
sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/,"& ["output"]")
}
close(cmd)
}
{ print }
$ awk -f tst.awk file
table 't'
field 'abc'
field 'def' [--->def<---] and #enabled=true
field 'ghi'
table 'u'
This is a rare occasion where use of getline is probably appropriate but make sure you read and understand all of the getline caveats at http://awk.info/?tip/getline if you're considering using getline again.

Setting awk to variable

I have code that looks like this:
awk -F'|' '{if($1 in a)print "duplicate found:" $2 " AND "a[$1];else a[$1]=$2 }' dump.txt
I need to set $2 and a[$2] to a variable. How would I go about doing this?
I am taking a file that contains: a value "|" filename and then I want to set the filename and the value to two different variables.
What do you mean Set those variables? Are these environment variables. In Awk, variables that start with a dollar sign and are numeric are reserved for AWK. These are the field values for each line. For example:
test.txt =
this is line one
this is line two
this is line three
The command awk '{print $4}' test.txt will print out the fourth field:
$ awk '{print $4}' test.txt
one
two
three
$ awk '{print $3}' test.txt
line
line
line
As you can see: They don't have to be set. They're automatically set by Awk.
If you want to set environment variables, you can use the -v parameter
awk -v search="foo" '{
if (search = $1) {
print "Found your string in record " NR
}'
In the above, search is an Awk variable that is set equal to foo.
Since Awk is a programming language, it is sometimes easier to see what's going on by correctly formatting your program:
awk -F'|' '{
if($1 in a) {
print "duplicate found:" $2 " AND " a[$1]
}
else {
a[$1] = $2
}
}' dump.txt
The program is taking each line. Each line consists of two parts separated by the |. It appears that the first part in the key and the second part is the data. I've created a text file that looks like this:
This is a|test
That is a|line
who moans for|anchovies
whom moans for|anchovies
This is a|test again
The first and last line should show up as duplicates
I took your program and added a few debug lines. This will help me trace where it is in your program:
awk -F\| '{
if ($1 in a) {
print "DEBUG: In If clause"
print "duplicate found:" $2 " and " a[$1]
} else {
print "DEBUG: In else clause"
a[$1] = $2
print "DEBUG: a[" $1 "] = " a[$1]
}
print "DEBUG: $1 = " $1
print "DEBUG: $2 = " $2 "\n"
}' test.txt
And, this is my output
DEBUG: In else clause
DEBUG: a[This is a] = test
DEBUG: $1 = This is a
DEBUG: $2 = test
DEBUG: In else clause
DEBUG: a[That is a] = line
DEBUG: $1 = That is a
DEBUG: $2 = line
DEBUG: In else clause
DEBUG: a[who moans for] = anchovies
DEBUG: $1 = who moans for
DEBUG: $2 = anchovies
DEBUG: In else clause
DEBUG: a[whom moans for] = anchovies
DEBUG: $1 = whom moans for
DEBUG: $2 = anchovies
DEBUG: In If clause
duplicate found: test again and test
DEBUG: $1 =This is a
DEBUG: $2 = test again
Taking out the debug lines:
awk -F\| '{
if ($1 in a) {
print "duplicate found:" $2 " and " a[$1]
} else {
a[$1] = $2
}
}' test.txt
duplicate found: test again and test
As they say:
IT WORKS ON MY COMPUTER
(rimshot)
Seriously, what is your program suppose to be doing, and what do you see it do? Are there any errors? Your program appears to work as advertised.
Set them to an environment variable like this:
awk -F'|' '{if($1 in a)print "duplicate found:" '"$2"' " AND "a[$1];else a[$1]='"$2"' }' dump.txt
Note that what I did was to "disable" the single-quotes around the $2 parts, and add double-quotes around them in case the environment variable contains spaces (which awk wouldn't want to see splitting its argument into pieces).

Resources