Add info to output -- obtained from a shell command execution - shell

I have files containing indented lines such as:
table 't'
field 'abc'
field 'def' and #enabled=true
field 'ghi'
table 'u'
I want to transform it to:
table 't'
field 'abc' [info about ABC]
field 'def' [info about DEF] and #enabled=true
field 'ghi' [info about GHI]
table 'u'
where the string between brackets is get from the call of a shell script (get-info, that fetches the definition of terms 'abc', 'def' and 'ghi').
I tried with AWK (via the cmd | getline output mechanism):
awk '$1 == "field" {
$2 = substr($2, 2, length($2) - 2)
cmd = "get-info \"" $2 "\" 2>&1 | head -n 1" # results or error
while (cmd | getline output) {
print $0 " [" output "]";
}
close(cmd)
next
}
// { print $0 }'
but it does not respect the indentation!
How could I fulfil my wish?

It looks like what you're trying to do would be:
$1 == "field" {
cmd = "get-info \"" substr($2,2,length($2)-2) "\" 2>&1" # results or error
if ( (cmd | getline output) > 0 ) {
sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/,"& ["output"]")
}
close(cmd)
}
{ print }
Note you don't need the head -1, just don't read the output in a loop.
e.g.:
$ cat tst.awk
$1 == "field" {
cmd = "echo \"--->" substr($2,2,length($2)-2) "<---\" 2>&1"
if ( (cmd | getline output) > 0 ) {
sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/,"& ["output"]")
}
close(cmd)
}
{ print }
$ awk -f tst.awk file
table 't'
field 'abc'
field 'def' [--->def<---] and #enabled=true
field 'ghi'
table 'u'
This is a rare occasion where use of getline is probably appropriate but make sure you read and understand all of the getline caveats at http://awk.info/?tip/getline if you're considering using getline again.

Related

Can text be sorted twice?

I have an awk array that aggregates bytes up and downloaded. I can sort the output by either bytes down or up and pipe that to head for the top talkers; is it possible to output two sorts using different keys?
zgrep ^1 20211014T00*.gz|awk '{print$3,$11,$6,$(NF-7)}'| awk 'NR>1{bytesDown[$1 " " $2]+=$3;bytesUp[$1 " " $2]+=$4} END {for(i in bytesDown) print bytesDown[i], bytesUp[i], i}'|sort -rn|head
Rather than parsing the source again to get the top uploads, I would like to be able to output the array again to "sort -rnk2|head".
I can see how I'd do it with a scratch file but is it possible/desirable to do it in memory? It's a bash shell on a 2 CPU Linux VM with 4GB of memory.
Your question isn't clear and there's no sample input/output to test with but this MAY be what you're trying to do:
zgrep '^1' 20211014T00*.gz|
awk '
NR > 1 {
key = $3 " " $11
bytesdown[key] += $6
bytesup[key] += $(NF-7)
}
END {
cmd = "sort -rn | head"
for ( key in bytesDown ) {
print bytesDown[key], bytesUp[key], key | cmd
}
close(cmd)
cmd = "sort -rnk2 | head"
for ( key in bytesDown ) {
print bytesDown[key], bytesUp[key], key | cmd
}
close(cmd)
}
'
which could be written more concisely and efficiently as:
zgrep '^1' 20211014T00*.gz|
awk '
NR > 1 {
key = $3 " " $11
bytesdown[key] += $6
bytesup[key] += $(NF-7)
if ( NR == 2 ) {
max_bytesdown_key = key
max_bytesup_key = key
}
else {
if ( bytesdown[key] > bytesdown[max_bytesdown_key] ) {
max_bytesdown_key = key
}
if ( bytesup[key] > bytesup[max_bytesup_key] ) {
max_bytesup_key = key
}
}
}
END {
print bytesdown[max_bytesdown_key], bytesup[max_bytesdown_key], max_bytesdown_key
print bytesdown[max_bytesup_key], bytesup[max_bytesup_key], max_bytesup_key
}
'
Bash allows you to do that with process substitutions. It's not clear what you expect it to do with the data; printing both results to standard output is unlikely to be useful, so I send each to a separate file for later inspection.
zgrep ^1 20211014T00*.gz |
awk '{print$3,$11,$6,$(NF-7)}' |
awk 'NR>1{bytesDown[$1 " " $2]+=$3;bytesUp[$1 " " $2]+=$4}
END {for(i in bytesDown) print bytesDown[i], bytesUp[i], i}' |
tee >(sort -rn | head >first) |
sort -rnk2 | head >second
The double Awks could easily be refactored to a single Awk script.
Something like this?
awk 'NR>1{bytesDown[$3 " " $11]+=$6;bytesUp[$3 " " $11]+=$(NF-7)}
END { for(i in bytesDown) print bytesDown[i], bytesUp[i], i }'

convert table into comma separated in text file using bash

I have a text file like this:
+------------------+------------+----------+
| col_name | data_type | comment |
+------------------+------------+----------+
| _id | bigint | |
| starttime | string | |
+------------------+------------+----------+
how can i get a result like this using bash
(_id bigint, starttime string )
so just the column names and type
#remove first 3 lines
sed -e '1,3d' < columnnames.txt >clean.txt
#remove first character from each line
sed 's/^.//' < clean.txt >clean.txt
#remove last character from each line
sed 's/.$//' < clean.txt >clean.txt
# remove certain characters
sed 's/[+-|]//g' < clean.txt >clean.txt
# remove last line
sed '$ d' < clean.txt >clean.txt
so this is what i have so far, if there is a better implementation let me know!
Something similar, using only awk:
awk -F ' *[|]' 'BEGIN {printf("(")} NR>3 && NF>1 {printf("%s%s%s", NR>4 ? "," : "", $2, $3)} END {printf(" )\n")}' columnnames.txt
# Set the field separator to vertical bar surrounded by any number of spaces.
# BEGIN and END blocks print the opening and closing parens
# The line between skips the header lines and any line starting with '+'
$ awk -F"[[:space:]]*[|][[[:space:]]*" '
BEGIN { printf "%s", "( "}
NR > 3 && $0 !~ /^[+]/ { printf("%s%s %s", c, $2, $3); c = ", " }
END { print " )" }' file
( _id bigint, starttime string )
$ awk -F'[| ]+' 'NR>3 && NF>1{v=v s $2" "$3; s=", "} END{print "("v")"}' file
(_id bigint, starttime string)
I would do this :
cat input.txt \
| tail -n +4 \
| awk -F'[^a-zA-Z_]+' '{ for(i=1;i<=NF;i++) { printf $i" " }}'
Its a little bit shorter.
Another way to implement Diego Torres Milano's solution as a stand-alone awk program:
tableconvert
#!/usr/bin/env -S awk -f
BEGIN {
FS="[[:space:]]*[|][[[:space:]]*"
printf "%s", "( "
}
{
if (FNR <= 3 || match($0, /^[+]/))
next
else {
printf("%s%s %s", c, $2, $3)
c = ", "
}
}
END {
print " )"
}
Make tableconvert an executable:
chmod +x tableconvert
Run tableconvert on intablefile.txt
./tableconvert intablefile.txt
( _id bigint, starttime string )
With added bonus that using FNR instead of NR allow the awk program to process multiple input files as arguments:
./tableconvert infille1.txt infile2.txt infile3.txt ...
A variation on the other answers using awk with the field-separator being the '|' with optional spaces on either side as GNU awk allows, then taking fields 2 and 3 as the fields wanted in each record, and formatting the output as described in the question with the closing " )" provided in the END rule:
$ awk -F' *\\| *' '
NR>3 && $1~/^[+]/{exit} # exit condition first line w/^+
NR==4{$1=$1; printf "(%s %s", $2,$3} # 1st data record is 4
NR>4{$1=$1; printf ", %s %s", $2,$3} # process all remainng records
END{print " )"} # output closing " )"
' table
(_id bigint, starttime string )
(note: if you don't want the two-spaces before the closing ")", just remove them from the print in the END rule)
Rather than using a BEGIN the first record of interest (4) is used to provide the opening "(". Look things over and let me know if you have questions.

AWK in single line pass multiple commands

I would like to combine the following multiple awk commands into a single awk program:
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) } 1' sorce > destfil
awk -F 'FS' '{ $3 = ($3 == "]" ? "" : $3) } 1' sorce > destfil
awk -F 'FS' '{ $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
I have tried to accomplish this using && but the result is not what I expected.
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
The output seems to have various ZERO's in it.
Question:
How can I merge these lines?
What is the origin of the ZEROS?
Thank you!
#RavinderSingh13, as I tried your code, sample input file and output file as per below
[user#restt]$ tail source
{
}
]
}
{
" e t
{
}
]
}
[user#test]$ awk -F 'FS' '{$1=($1=="}"?"":$1); $3=($3=="]" ? "" : $3) ; $5=($5=="}" ? "}," :$5);} 1' source > target
[user#test]$ tail target
{
}
]
}
{
" e t
{
}
]
}
I think the issue is related to field separator -F 'FS' or I was not sure.
#kvantour, Below I have given my sample input file & command what I am running & what output I am getting & what I require.
Source file content :
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
A command I am running
$ awk '($1=="}"){$1="First Column"}
($3=="]"){$3="third Column"}
($5=="}"){$5="Fifth Column"}
{$1=$1}1' sample.json > out
Output I am getting :
[root#centos-src ~]# cat out
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
but I am expecting output is:
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
In a nice awk structure, one would write:
awk -F 'FS' '($1=="}"){$1=""}
($3=="]"){$3=""}
($5=="}"){$5="},"}
{$1=$1}1' <file>
The reason I add $1=$1 to the list, is to reprocess $0 for the correct OFS in case none of the above conditions was satisfied. If you don't do this, you will have lines printed with FS as field separator and others with OFS.
So why are you getting a bunch of zeros?
Let's look at your one-liner:
$1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5)
And simplify it by assuming that the ternary operators between brackets return a variable. So we can rewrite it as:
$1 = var1 && $3 = var3 && $5 = var5
Taking into account that:
expr1 && expr2 has a higher precedence than value = expr.
lvalue = expr returns the value of expr
We can see that awk interprets this as
$1 = var1 && ($3 = (var3 && ($5 = var5) ) )
So the result will be:
$5 = var5
$3 = var3 && $5 equalling var3 && var5
$1 = var1 && $3 equalling var1 && var5
This is visible in the following example:
$ echo "a b c d e f" | awk '{ $1="p" && $3 = "q" && $5 = "r"}1'
1 b 1 d rf
Finally, in awk an empty string and a numeric zero has the logical value false and anything else true. So since two of your original ternary operators can return empty strings, they will ensure that the logical AND will return false, which is equivalent to the number ZERO. Hence $1 and $3 will be both matched with ZERO if the original $3 equals ]
Update (after receiving [mcve])
What you try to achieve is not that easy. First off, it seems you assume that the column number implies the character number in the line. This is sadly not the case. Awk, in default mode, assumes that field $n is the nth word in the line where a word is a sequence of characters not containing any blank. So in the following text,
}
]
}
all characters are actually referenced by $1.
Under the assumption that your JSON file is perfectly indented, one could use the following:
awk '/^} *$/{$0="First Column"}
/^ ] *$/{$0=" Thrid Column"}
/^ } *$/{$0=" Fifth Column"}
{print $0}' <file>
However, if your JSON file is not indented uniformly, things become rather messy. The easiest would be to parse the file first with jq as
jq . <json-file> | awk ...
Is this what you're trying to do (given your source input file)?
$ awk '
BEGIN{ FS="[ ]"; map[1,"}"]=map[3,"]"]=map[5,"}"]="" }
{ for (i=1;i<=NF;i++) $i=((i,$i) in map ? map[i,$i] : $i); print }
' file
{
{
" e t
{
Usage ; to separate statements:
awk ... '{ $1 = ($1 == "}" ? "" : $1); $3 = ($3 == "]" ? "" : $3); $5 = ($5 == "}" ? "}," : $5); } 1' ...
Since you haven't shown your sample Input_file so couldn't test it, could you please try following.
awk -F 'FS' '{$1=($1=="}"?"":$1);$3=($3=="]"?"":$3);$5=($5=="}"?"":$5);} 1' sorce > destfil

Got stuck with multiple value validation against in particular columns in awk?

I have a text file where i'm trying to validate with particular column(5) if that column contains value like ACT,LFP,TST and EPO then file goes to further process else it should be exit.Here i'm if my text file contains these value in column number 5 means ACT,LFP,TST and EPO go for further process on other hand if column contains apart from that four value then script will terminate.
Code
cat test.txt \
| awk -F '~' -v ERR="/a/x/ERROR" -v NAME="/a/x/z/" -v WRKD="/a/x/b/" -v DATE="23_09_16" -v PD="234" -v FILE_NAME="FILENAME" \
'{ if ($5 != "ACT" || $5 != "LFP" || $5 != "EPO" || $5 != "TST")
system("mv "NAME" "ERR);
system("rm -f"" "WRKD);
print DATE" " PD " " "[" FILE_NAME "]" " ERROR: Panel status contains invalid value due to this file move to error folder";
print DATE" " PD " " "[" FILE_NAME "]" " INFO: Script is exited";
system("exit");
}' >>log.txt
Txt file: test.txt(Note:- File should be processed successfully)
161518~CHEM~ACT~IRPMR~ACT~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~LFP~UD
030767~CHEM~ACT~IRPMR~LFP~UD
Txt file: test1.txt(Note:- File should not be processed successfully.This file contains one invalid value)
161518~CHEM~ACT~IRPMR~**ACT1**~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
awk to the rescue!
Lets assume the following input file:
010282~CHEM~ACT~IRPMR~ACT~UD
121212~CHEM~ACT~IRPMR~ZZZ~UD
162794~CHEM~ACT~IRPMR~TST~UD
020202~CHEM~ACT~IRPMR~YYY~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
010101~CHEM~ACT~IRPMR~XXX~UD
123456~CHEM~ACT~IRPMR~TST~UD
1) This example illustrates how to check for invalid lines/records in the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print "Unexpected value # line " NR " [" $5 "]"
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
Unexpected value # line 2 [ZZZ]
Unexpected value # line 4 [YYY]
Unexpected value # line 7 [XXX]
2) This example illustrates how to filter out (remove) invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
{
if( a[i] == $5 )
{
print $0
next
}
}
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
123456~CHEM~ACT~IRPMR~TST~UD
3) This example illustrates how to display the invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print $0
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
121212~CHEM~ACT~IRPMR~ZZZ~UD
020202~CHEM~ACT~IRPMR~YYY~UD
010101~CHEM~ACT~IRPMR~XXX~UD
Hope it Helps!
Without getting into the calls to system, this will show you an answer.
awk -F"~" '{ if (! ($5 == "ACT" || $5 == "LFP" || $5 == "EPO" || $5 == "TST")) print $0}' data.txt
output
161518~CHEM~ACT~IRPMR~**ACT1**~UD
This version is testing if $5 matches at least one item in the list. If it doesn't (the ! at the front of the || chain tests), then it prints the record as an error.
Of course, $5 will match only one from that list at a time, but that is all you need.
By contrast, when you say
if ($5 != "ACT" || $5 != "LFP" ...)
You're creating a logic test that can never be true. If $5 does not equal "ACT" because it is "LFP", you have already had the chained condition fail, and the remaining || will not be checked.
IHTH

awk add previously initiated variable bash

I've got a problem with adding variables to AWK.
I've got a file with the following input:
MainComputer>
Device ID: name1
Interface: 1
random text...
Device ID: name2
Interface: 2
random text....
Device ID: name3
Interface: 3
random text....
Now I want to print all the variables: What I already have:
#!/bin/bash
line=$(head -n 1 file)
var=$(echo $line | cut -d ">" -f1)
var2=$(awk '/Interface/ {print $2}' file)
awk -v var3="$var" '/Device/ {print var3, "->", $2, "[Label: "$var2"]}' file
But the $var2 doesn't show an output, and if I put: var2 it gives an error.
Output I want:
MainComputer -> name1 [Label: 1]
MainComputer -> name2 [Label: 2]
MainComputer -> name3 [Label: 3]
And so on for the other 40 inputs....
But it only gives MainComputer -> name1 and an error on the interface label...
So I'm seeking a method for printing more than 1 var that I already initialized.
Thanks anyway!
If I understand what you're attempting to do correctly, this could be better achieved in pure awk:
awk -F ': *' 'NR == 1 { sub(/>.*/, ""); name = $0; next } $1 == "Device ID" { dev = $2 } $1 == "Interface" { print name " -> " dev " [Label: " $2 "]" }' file
-F ': *' sets the field separator to the regex : * (matching a colon followed by an arbitrary number of spaces). The code works as follows:
NR == 1 { # in the first line:
sub(/>.*/, "") # remove > and everything after it
name = $0 # remember as machine name
next # do nothing else
}
$1 == "Device ID" { # after that: In device lines
dev = $2 # remember device name
}
# In interface lines: print remembered info
$1 == "Interface" {
print name " -> " dev " [Label: " $2 "]"
}
This will show all lines in the file formatted the way you show in the question. If you want to filter for a particular line, this could be amended as follows:
awk -v filter="name2" -F ': *' 'NR == 1 { sub(/>.*/, ""); name = $0; next } $1 == "Device ID" { dev = $2 } $1 == "Interface" && dev == filter { print name " -> " dev " [Label: " $2 "]" }' file
That is, make a variable filter known to awk with -v filter=value, and amend the code
NR == 1 {
sub(/>.*/, "")
name = $0
next
}
$1 == "Device ID" {
dev = $2
}
$1 == "Interface" && dev == filter { # <-- here
print name " -> " dev " [Label: " $2 "]"
}

Resources