Read a variable with multiple occurrences from a file using AWK - bash

I want to get the value (usually a string) of var1 in a file my_file.dat and save this value to x.
I managed to do this using the following command:
x = `awk '$1 == "var1" {print $2}' my_file.dat`
It now turns out that there can be several occurrences of var1 in my_file.dat, e.g.:
Series1
var1 = temp/data/
Series2
var1 = lost/oldfiles/
My question is then how can I get only the value of the 'var1' which is located right after the line 'Series1', such that 'x' returns 'temp/data/'?

Given the sample you posted all you need is:
x=$(awk 'prev=="Series1" {print $NF} {prev=$0}' file)
but more robustly:
x=$(awk '
{ name=value=$0; sub(/[[:space:]]*=.*/,"",name); sub(/[^=]+=[[:space]]*/,"",value) }
(prev=="Series1") && (name=="var1") { print value }
{ prev=$0 }
' file)

What about a two state machine to solve the problem:
#!/bin/awk
BEGIN {
state = 0;
}
{
if( state == 0 )
{
if( index( $1, "Series" nseries ) )
{
state = 1
}
}
else
{
if( index( $1, "Series" ) > 0 )
{
exit
}
if( index( $1, "var1" ) > 0 )
{
idx = index( $0, "=" )
str = substr( $0, idx + 1 )
gsub(/^[ \t]+/, "", str )
print str
exit
}
}
}
# eof #
Test file:
Series1
var1 = temp/data/
Series2
var1 = lost/oldfiles/
Series3
var1 = foo/bar/
Series4
var1 = alpha/betta/
Series5
var1 = /foo/this=bad
Series6
var1 = /foo/Series/
Reading var1 from Series1:
x=$(awk -v nseries=1 -f solution.awk -- ./my_file.dat)
echo $x
temp/data/
Reading var1 from Series5:
x=$(awk -v nseries=5 -f solution.awk -- ./my_file.dat)
echo $x
/foo/this=bad
Reading var1 from Series6:
x=$(awk -v nseries=6 -f solution.awk -- ./my_file.dat)
echo $x
/foo/Series/
Hope it Helps!

Related

Unix Convert Rows to Columns [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have an output from a program that I need to convert to columns.
I know you can do this with awk or sed but I can't seem to figure out how.
This is how the output looks:
insert_job: aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa job_type: c
box_name: sss-eee-ess-saturday
command: $${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess
machine: qqqq-eee-cntl
owner: system_uu_gggg_p#ad
permission: gx,wx
condition: s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess)
std_out_file: >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out
std_err_file: >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err
max_run_alarm: 420
alarm_if_fail: 1
application: qqqq-M9887
I need it to look like this
Or like this:
insert_job: job_type: box_name: command: machine:
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa c sss-eee-ess-saturday $${qqqq-eee-eat-cmd} qqqq-eee-cntl
insert_job:;job_type:;box_name:;command:;machine:;
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa;c;sss-eee-ess-saturday;$${qqqq-eee-eat-cmd};qqqq-eee-cntl;
Basically either already with TAB separated or in CSV format.
Thanks for the help
You haven't shown us what the actual expected output looks like so I've assumed you want it tab-separated and unquoted and I've made some other assumptions about how your input records are separated, etc.:
$ cat tst.awk
BEGIN { OFS="\t" }
{
if ( numTags == 0 ) {
tag = $1
val = $2
sub(/:$/,"",tag)
tags[++numTags] = tag
tag2val[tag] = val
sub(/[^:]+: +[^ ]+ +/,"")
}
tag = val = $0
sub(/: .*/,"",tag)
sub(/[^:]+: /,"",val)
tags[++numTags] = tag
tag2val[tag] = val
}
tag == "application" {
if ( !cnt++ ) {
for (tagNr=1; tagNr<=numTags; tagNr++ ) {
tag = tags[tagNr]
printf "%s%s", tag, (tagNr<numTags ? OFS : ORS)
}
}
for (tagNr=1; tagNr<=numTags; tagNr++ ) {
tag = tags[tagNr]
val = tag2val[tag]
printf "%s%s", val, (tagNr<numTags ? OFS : ORS)
}
numTags = 0
}
.
$ awk -f tst.awk file
insert_job job_type box_name command machine owner permission condition std_out_file std_err_file max_run_alarm alarm_if_fail application
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa c sss-eee-ess-saturday $${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess qqqq-eee-cntl system_uu_gggg_p#ad gx,wx s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess) >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err 420 1 qqqq-M9887
If you want something easier for Excel to handle, this will produce a CSV that Excel will be able to open just by double clicking on the output file name:
$ cat tst.awk
BEGIN { OFS="," }
{
if ( numTags == 0 ) {
tag = $1
val = $2
sub(/:$/,"",tag)
tags[++numTags] = tag
tag2val[tag] = val
sub(/[^:]+: +[^ ]+ +/,"")
}
tag = val = $0
sub(/: .*/,"",tag)
sub(/[^:]+: /,"",val)
tags[++numTags] = tag
tag2val[tag] = val
}
tag == "application" {
if ( !cnt++ ) {
for (tagNr=1; tagNr<=numTags; tagNr++ ) {
tag = tags[tagNr]
printf "\"%s\"%s", tag, (tagNr<numTags ? OFS : ORS)
}
}
for (tagNr=1; tagNr<=numTags; tagNr++ ) {
tag = tags[tagNr]
val = tag2val[tag]
printf "\"%s\"%s", val, (tagNr<numTags ? OFS : ORS)
}
numTags = 0
}
.
$ awk -f tst.awk file
"insert_job","job_type","box_name","command","machine","owner","permission","condition","std_out_file","std_err_file","max_run_alarm","alarm_if_fail","application"
"aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa","c","sss-eee-ess-saturday","$${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess","qqqq-eee-cntl","system_uu_gggg_p#ad","gx,wx","s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess)",">E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out",">E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err","420","1","qqqq-M9887"
If this is a one-off job, and you don't have to worry about double quotes in the source data, try something like this. I have assumed you want comma-separated values to put in a spreadsheet and the data is in a file called foo.txt.
echo $(sed 's/^\([^:]*\): \(.*\)$/"\1",/g' foo.txt)
echo $(sed 's/^\([^:]*\): \(.*\)$/"\2",/g' foo.txt)

bash keeping the higest number as variable

I am trying to get the longest_length set to the highest number.
#!/bin/bash
for i in $(cat list_of_aways)
do
echo $i | while IFS=, read -r area name host
do
printf "%s\n" $name
sleep 1
longest_length=${#name}
printf "%s\n" "$longest_length"
done
done
This is the data. The 9999999 is the longest string - I want to set that variable to the arrays value because it is the longest.
__DATA__
HOME,script_name_12345,USAhost.com
AWAY,script_name_123,USAhost.com
HOME,script_name_1,EUROhost.com
AWAY,script_name_123,USAhost.com
HOME,script_name_123456,EUROhost.com
AWAY,script_name_12345678999999,USAhost.com
HOME,script_name_1234,USAhost.com
AWAY,script_name_1234578,USAhost.com
HOME,script_name_12,EUROhost.com
AWAY,script_name_123456789,USAhost.com
You can actually do it with an awk command:
awk -F '_|,' 'NR>1 {if ( length($4) > length(max) ) { max=$4 }} END { print max }' INPUTFILE
_|, = We are using 2 delimiters _ and ,
if ( length($4) > length(max) ) { max=$4 } = var max gets is compared with current 4th element in each line, and max is chosen.
print max = The max value is printed out.
If you want to set the result into a var called longest_length:
longest_length = `awk -F '_|,' 'NR>1 {if ( length($4) > length(max) ) { max=$4 }} END { print max }' INPUTFILE`

Got stuck with multiple value validation against in particular columns in awk?

I have a text file where i'm trying to validate with particular column(5) if that column contains value like ACT,LFP,TST and EPO then file goes to further process else it should be exit.Here i'm if my text file contains these value in column number 5 means ACT,LFP,TST and EPO go for further process on other hand if column contains apart from that four value then script will terminate.
Code
cat test.txt \
| awk -F '~' -v ERR="/a/x/ERROR" -v NAME="/a/x/z/" -v WRKD="/a/x/b/" -v DATE="23_09_16" -v PD="234" -v FILE_NAME="FILENAME" \
'{ if ($5 != "ACT" || $5 != "LFP" || $5 != "EPO" || $5 != "TST")
system("mv "NAME" "ERR);
system("rm -f"" "WRKD);
print DATE" " PD " " "[" FILE_NAME "]" " ERROR: Panel status contains invalid value due to this file move to error folder";
print DATE" " PD " " "[" FILE_NAME "]" " INFO: Script is exited";
system("exit");
}' >>log.txt
Txt file: test.txt(Note:- File should be processed successfully)
161518~CHEM~ACT~IRPMR~ACT~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~LFP~UD
030767~CHEM~ACT~IRPMR~LFP~UD
Txt file: test1.txt(Note:- File should not be processed successfully.This file contains one invalid value)
161518~CHEM~ACT~IRPMR~**ACT1**~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
awk to the rescue!
Lets assume the following input file:
010282~CHEM~ACT~IRPMR~ACT~UD
121212~CHEM~ACT~IRPMR~ZZZ~UD
162794~CHEM~ACT~IRPMR~TST~UD
020202~CHEM~ACT~IRPMR~YYY~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
010101~CHEM~ACT~IRPMR~XXX~UD
123456~CHEM~ACT~IRPMR~TST~UD
1) This example illustrates how to check for invalid lines/records in the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print "Unexpected value # line " NR " [" $5 "]"
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
Unexpected value # line 2 [ZZZ]
Unexpected value # line 4 [YYY]
Unexpected value # line 7 [XXX]
2) This example illustrates how to filter out (remove) invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
{
if( a[i] == $5 )
{
print $0
next
}
}
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
123456~CHEM~ACT~IRPMR~TST~UD
3) This example illustrates how to display the invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print $0
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
121212~CHEM~ACT~IRPMR~ZZZ~UD
020202~CHEM~ACT~IRPMR~YYY~UD
010101~CHEM~ACT~IRPMR~XXX~UD
Hope it Helps!
Without getting into the calls to system, this will show you an answer.
awk -F"~" '{ if (! ($5 == "ACT" || $5 == "LFP" || $5 == "EPO" || $5 == "TST")) print $0}' data.txt
output
161518~CHEM~ACT~IRPMR~**ACT1**~UD
This version is testing if $5 matches at least one item in the list. If it doesn't (the ! at the front of the || chain tests), then it prints the record as an error.
Of course, $5 will match only one from that list at a time, but that is all you need.
By contrast, when you say
if ($5 != "ACT" || $5 != "LFP" ...)
You're creating a logic test that can never be true. If $5 does not equal "ACT" because it is "LFP", you have already had the chained condition fail, and the remaining || will not be checked.
IHTH

awk parse data based on value in middle of text

I have the following input:
adm.cd.rrn.vme.abcd.name = foo
adm.cd.rrn.vme.abcd.test = no
adm.cd.rrn.vme.abcd.id = 123456
adm.cd.rrn.vme.abcd.option = no
adm.cd.rrn.vme.asfa.name = bar
adm.cd.rrn.vme.asfa.test = no
adm.cd.rrn.vme.asfa.id = 324523
adm.cd.rrn.vme.asfa.option = yes
adm.cd.rrn.vme.xxxx.name = blah
adm.cd.rrn.vme.xxxx.test = no
adm.cd.rrn.vme.xxxx.id = 666666
adm.cd.rrn.vme.xxxx.option = no
How can extract all the values associated with a specific id?
For example, if I have id == 324523, I'd like it to print the values of name, test, and option:
bar no yes
Is it possible to achieve in a single awk command (or anything similar in bash)?
EDIT: Based on input, here's my solution until now:
MYID=$(awk -F. '/'"${ID}"$'/{print $5}' ${TMP_LIST})
awk -F'[ .]' '{
if ($5 == "'${MYID}'") {
if ($6 == "name") {name=$NF}
if ($6 == "test") {test=$NF}
if ($6 == "option") {option=$NF}
}
} END {print name,test,option}' ${TMP_LIST})
Thanks
$ cat tst.awk
{ rec = rec $0 RS }
/option/ {
if (rec ~ "id = "tgt"\n") {
printf "%s", rec
}
rec = ""
next
}
$ awk -v tgt=324523 -f tst.awk file
adm.cd.rrn.vme.asfa.name = bar
adm.cd.rrn.vme.asfa.test = no
adm.cd.rrn.vme.asfa.id = 324523
adm.cd.rrn.vme.asfa.option = yes
or if you prefer:
$ cat tst.awk
BEGIN { FS="[. ]" }
$(NF-2) == "id" { found = ($NF == tgt ? 1 : 0); next }
{ rec = (rec ? rec OFS : "") $NF }
$(NF-2) == "option" { if (found) print rec; rec = ""; next }
$ awk -v tgt=324523 -f tst.awk file
bar no yes
first, I convert each record in a line with xargs, then I look for lines that contain the regular expression and print the columns searched
cat input | xargs -n 12 | awk '{if($0~/id\s=\s324523\s/){ print $3, $6, $12}}'
a solution more general:
awk 'BEGIN{FS="\\.|\\s"; } #field separator is point \\. or space \\s
{
a[$5"."$6]=$8; #store records in associative array a
if($8=="324523" && $6=="id"){
reg[$5]=1; #if is record found, add to associative array reg
}
}END{
for(k2 in reg){
s=""
for(k in a){
if(k~"^"k2"\\."){ #if record is an element of "reg" then add to output "s"
s=k":"a[k]" "s
}
}
print s;
}
}' input
if your input format is fixed, you can do in this way:
grep -A1 -B2 'id\s*=\s*324523$' file|awk 'NR!=3{printf "%s ",$NF}END{print ""}'
you can add -F'=' to awk part too.
it could be done by awk alone, but grep could save some typing...

addition of variables combined with >/< test BASH

So i am trying to write a bash script to check if all values in a data set are within a certain margin of the average.
so far:
#!/bin/bash
cat massbuild.csv
while IFS=, read col1 col2
do
x=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{ sum += $7; n++ } END { if (n > 0) print sum / n; }')
i=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{print $7;}')
if $(($i > $[$x + 15])); then
echo "OUTSIDE THE RANGE!"
fi
done < massbuild.csv
So far, I have broken it down by components to test, and have found the values of x and i read correctly, but it seems that adding 15 to x, or the comparison to i doesn't work.
I have read around online and i am stumped =/
Without sample input and expected output we're just guessing but MAYBE this is the right starting point for your script (untested, of course, since no in/out provided):
#!/bin/bash
awk -F, '
NR==FNR {
file = $1 ".pdb"
ARGV[ARGC] = file
file2col2s[file] = (col1to2s[file] ? file2col2s[file] FS : "") $2
next
}
FNR==1 { split(file2col2s[FILENAME],col2s) }
/HETATM/ {
for (i=1;i in col2s;i++) {
col2 = col2s[i]
if ($0 ~ col2) {
sum[FILENAME,col2] += $7
cnt[FILENAME,col2]++
}
}
}
END {
for (file in file2col2s) {
split(file2col2s[file],col2s)
for (i=1;i in col2s;i++) {
col2 = col2s[i]
print sum[file,col2]
print cnt[file,col2]
}
}
}
' massbuild.csv
Does this help?
a=4; b=0; if [ "$a" -lt "$(( $b + 5 ))" ]; then echo "a < b + 5"; else echo "a >= b + 5"; fi
Ref: http://www.tldp.org/LDP/abs/html/comparison-ops.html

Resources