How to select two specific lines with awk? - bash

/!\ The question is basically solved, see my own answer below for more details and a subsidiary question /!\
I'm trying to add two lines based on specific word, but all I could find is adding everything after some pattern: How to select lines between two marker patterns which may occur multiple times with awk/sed
Which is not what I'm looking after.
Consider the following output:
aji 1
bsk 2
cmq 3
doh 4
enr 5
fwp 6
gzx 7
What I'm trying to get is something like cmq + fwp, which output should be:
9
I do know how to add values, but I'm missing the select line containing cmq, then select line containing fwp part.
So, is there a way awk could strictly select two specific lines independently (then add them) ?
Edit:
As far as I know, matching words is awk '/cmq/', but I need to do that for let's say "fwp" too so I can add them.

$ awk '$1 ~ /^(cmq|fwp)$/{sum+=$2} END { print sum}' infile
Explanation:
awk '$1 ~ /^(cmq|fwp)$/{ # look for the match in first field
sum+=$2 # sum up 2nd field ($2) value,where sum is variable
}
END{ # at the end
print sum # print variable sum
}' infile
Test Results:
$ cat infile
aji 1
bsk 2
cmq 3
doh 4
enr 5
fwp 6
gzx 7
$ awk '$1 ~ /^(cmq|fwp)$/{sum+=$2} END { print sum}' infile
9

Now, for a more generic way this time -which even works for subtracting-:
awk '/cmq/{x=$2} /fwp/{y=$2} END {print x+y}'
Where:
awk ' # Invoking awk and its instructions
/cmq/{x=$2} # Select line with "cmq", then set its value to x. Both must be tied
/fwp/{y=$2} # Select line with "fwp", then set its value to y. Both must be tied
END # Ends pattern matching/creation
{print x+y} # Print the calculated result
' # Ending awk's instructions
Unfortuanately, two variables are used (x and y).
So, I'm still interested on finding how to do it without any variable, or only one at the very most.
I do have a single-variable way for summing:
awk '/cmq|fwp/ {x+=$2} END {print x}'
But doing this for subtracting:
awk '/cmq|fwp/ {x-=$2} END {print x}'
doesn't work.
As an subsidiary question, anyone knows to achieve such subtracting without or with only one variable ?

Related

Can we use begin and end with condition?

I have a CSV file which has multiple columns. What I am trying to achieve here is to print only the details if column 5<=25. If not then only one single line with string "No certificate will expire in 25 days".
Expectation is to print the column 3 and 6 along with Error_begin and Error_end string if condition matches.
If not then only need to print date - info - "somestring" in one line only.
Output should be in same log file for both.
Command I am using:
awk -F ',' -v date="$(date +'%Y-%m-%d')" 'BEGIN {if ($5<=25) {print date,"ERROR----"}else { print date,"INFO-- No certificate will expire in 25 days" }} {if ($
5<=25) {print $3,$6}} END {if ($5<=25) {print "ERROR_END"}}' /tmp/cert_details.csv
BEGIN is executed before processing lines, so accesing any field is equivalent of accessing unset variable, therefore it same as using 0 in numeric comparison, so this piece
BEGIN {if ($5<=25) {print date,"ERROR----"}else { print date,"INFO-- No certificate will expire in 25 days" }}
will behave like
BEGIN {if (0<=25) {print date,"ERROR----"}else { print date,"INFO-- No certificate will expire in 25 days" }}
which I suppose is not what you desire.
END is executed after all lines and here accessing field pertains to last line that is
END {if ($5<=25) {print "ERROR_END"}}
is dependant solely on last line of your file, which I suppose again is not what you desire.
Note that if you want to make first line of output depend on something which need processing all lines of file, you need to hold print until your line processing do commence, consider simple example, let say you want to print number of lines which have 5 or more character then print said lines and file.txt content is
Able
Baker
Charlie
then possible solution is
awk 'length>=5{cnt+=1;lines=lines "\n" $0}END{printf "Found %d lines:%s", cnt, lines}' file.txt
output
Found 2 lines:
Baker
Charlie
Observe that you must not print line as you go, as until reach end you do not know number to be put in first line of output, thus I do store lines of output to be printed, sheared by \n (newlines) and do printf inside END.
tested in gawk 4.2.1

Searching for a string between two characters

I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help
With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899
Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899

How $0 is used in awk, how it works?

read n
awk '
BEGIN {sum=0;}{if( $0%2==0 ){sum+=$0;
}
}
END { print sum}'
Here i add, sum of even numbers and what i want is, initially i give input as how many(count) and then the numbers i wanted to check as even and add it.
eg)
3
6
7
8
output is : 14
here 3 is count and followed by numbers i want to check, the code is executed correctly and output is correct, but i wanted to know how $0 left the count value i.e) 3 and calculates the remaining numbers.
Please update your question to be meaningful: There is no relationship between $0 and the Unix operating system, as choroba already pointed out in his comment. You obviously want to know the meaning of $0 in the awk programming language. From the awk man-page in the section about Fields:
$0 is the whole record, including leading and trailing whitespace.
you're reading the count but not using it in the script,
a rewrite can be
$ awk 'NR==1 {n=$1; next} // read the first value and skip the rest
!($1%2) {sum+=$1} // add up even numbers
NR>n {print sum; exit}' file // done when the # linespass the counter.
in awk, $0 corresponds to the record (here the line), and $i for the fields i=1,2,3...
even number is the one with remainder 0 divided by 2. NR is the line number.

Need first two characters from a file and make sure that they both aren't below 6 then return line

I have a text file like this
17 Blue
45 Purple
And I need to make sure the first two digits aren't both less than the number 6 and then print them. So, for example, the first line would print since the first digit 1 is lower than 6 and the second digit 7 is higher than 6, so they aren't both lower than 6. The next line would not print because 4 is lower than 6 and 5 is also lower than 6. I'm trying to use awk and not having any success this is what I have so far. It's just crashing in terminal/bash.
awk 'BEGIN { FS = "";} {if ($1 < 6 && $2 < 6) else print}' file.txt
I'm using FS = "" to separate the first two digits by columns ($1 and $2) not sure if there's an easier way to do this.
awk '!/^[0-5][0-5]/' file.txt
1 more approach could be.
awk -v val="6" 'substr($1,1,1)>val || substr($1,2,1)>val' Input_file
Where I am specifically checking either 1st character of 1st field or2nd character of 1st field are more than 6, where I created a variable named val whose value I have set to 6 one could change it as per need too.
About OP's approach: Yes, one could set FS="" but that will be more specific to GNU awk IMHO I don't think so all awks support it, so it may fail if FS="" is NOT supported. So it is better to either use substr or use regex for this problem(to make solution global supportive).

Extracting field from last row of given table using sed

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?
Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2
This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.
Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2
Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Resources