how not process the first line using awk?

how not process the first line using awk? - bash

I want to change the ps virtual memory size output from KiB to Mib and add sign to it, but neither want to change the header（first line），nor remove it from the result.
example
from
PID COMMAND VSZ
9 bash 6304
537 ps 7476
to
PID COMMAND VSZ
9 bash 6MB
537 ps 7MB

the default implicit print only happens if no action is provided
(...) is not the same as {...}
NR!=1 ($2=$2"MB")
whole line is treated as a single pattern
concatenate $2 and "MB" and assign to $2
concatenate 1 and result of parenthesised expression (ie. value of $2)
test if NR is not equal to that
As NR is always a number and "MB" is not, the test will always succeed. (ie. the pattern matches)
No action has been provided, so the default print is done.
{ if(NR==1){ print } else {$2=$2"mb"} }
whole line is treated as single action
no pattern provided, so this action is performed for every record
record is (explicitly) printed if NR equals 1
otherwise value of $2 is updated - there is no default print
Probably you want something like:
NR!=1 { $2 = int($2/1024) "MB" }
{ print }
or equivalently:
NR!=1 { $2 = int($2/1024) "MB" } 1

I try out and found the answer awk 'NR ==1 {print} NR>1 ($2=$2"MB") '

Related

awk to get first column if the a specific number in the line is greater than a digit

I have a data file (file.txt) contains the below lines:
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=22:00,dom=sss.co.uk,user2=lis
I'm expecting to get the first column ($1) only if the ETA= number is greater than 15, like here I will have 2nd and 3rd line first column only is expected.
345
456
I tried like cat file.txt | awk -F [,TPF=]' '{print $1}' but its print whole line which has ETA at the end.

Using awk
$ awk -F"[=, ]" '{for (i=1;i<NF;i++) if ($i=="ETA") if ($(i+1) > 15) print $1}' input_file
345
456

With your shown samples please try following GNU awk code. Using match function of GNU awk where I am using regex (^[0-9]+).*ETA=([0-9]+):[0-9]+ which creates 2 capturing groups and saves its values into array arr. Then checking condition if 2nd element of arr is greater than 15 then print 1st value of arr array as per requirement.
awk '
match($0,/(^[0-9]+).*\<ETA=([0-9]+):[0-9]+/,arr) && arr[2]+0>15{
print arr[1]
}
' Input_file

I would harness GNU AWK for this task following way, let file.txt content be
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=02:00,dom=sss.co.uk,user2=lis
then
awk 'substr($0,index($0,"ETA=")+4,2)+0>15{print $1}' file.txt
gives output
345
Explanation: I use String functions, index to find where is ETA= then substr to get 2 characters after ETA=, 4 is used as ETA= is 4 characters long and index gives start position, I use +0 to convert to integer then compare it with 15. Disclaimer: this solution assumes every row has ETA= followed by exactly 2 digits.
(tested in GNU Awk 5.0.1)

Whenever input contains tag=value pairs as yours does, it's best to first create an array of those mappings (v[]) below and then you can just access the values by their tags (names):
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
v["ETA"]+0 > 15 {
print $1
}
$ awk -f tst.awk file
345
456
With that approach you can trivially enhance the script in future to access whatever values you like by their names, test them in whatever combinations you like, output them in whatever order you like, etc. For example:
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
(v["pro"] ~ /b/) && (v["ETA"]+0 > 15) {
print $1, v["team"], v["dom"]
}
$ awk -f tst.awk file
345,abc,sbc.int
456,efg,sss.co.uk
Think about how you'd enhance any other solution to do the above or anything remotely similar.

It's unclear why you think your attempt would do anything of the sort. Your attempt uses a completely different field separator and does not compare anything against the number 15.
You'll also want to get rid of the useless use of cat.
When you specify a column separator with -F that changes what the first column $1 actually means; it is then everything before the first occurrence of the separator. Probably separately split the line to obtain the first column, space-separated.
awk -F 'ETA=' '$2 > 15 { split($0, n, /[ \t]+/); print n[1] }' file.txt
The value in $2 will be the data after the first separator (and up until the next one) but using it in a numeric comparison simply ignores any non-numeric text after the number at the beginning of the field. So for example, on the first line, we are actually literally checking if 12:00, team=xyz,user1=tom,dom=dby.com is larger than 15 but it effectively checks if 12 is larger than 15 (which is obviously false).
When the condition is true, we split the original line $0 into the array n on sequences of whitespace, and then print the first element of this array.

Using awk you could match ETA= followed by 1 or more digits. Then get the match without the ETA= part and check if the number is greater than 15 and print the first field.
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4)+0 > 15) print $1
}' file
Output
345
456
If the first field should start with a number:
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4) > 15)+0 print $1
}' file

How do pipes inside awk work (Sort with keeping header)

The following command outputs the header of a file and sorts the records after the header. But how does it work? Can anyone explain this command?
awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'

Could you please go through following once(only for explanation purposes). For learning more concepts on awk I suggest go through Stack overflow's nice awk learning section
awk ' ##Starting awk program from here.
NR == 1; ##Checking if line is first line then print it.
##awk works on method of condition then action since here is NO ACTION mentioned so by default printing of current line will happen
NR > 1{ ##If line is more than 1st line then do following.
print $0 | "sort -k3" ##It will be keep printing lines into memory and before printing it will sort them with their 3rd field.
}'

Understanding the awk command:
Overall an awk program is build out of (pattern){action} pairs which stat that if pattern returns a non-zero value, action is executed. One does not necessarily, need to write both. If pattern is omitted, it defaults to 1 and if action is omitted, it defaults to print $0.
When looking at the command in question:
awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'
We notice that there are two action-pattern pairs. The first reads NR == 1 and states that if we are processing the first record (pattern) then print the record (default action). The second is a bit more tricky. The pattern is clear, the action on the other hand needs some explaining.
awk has knowledge of 4 output statements that can redirect the output. One of these reads expression | cmd . It essentially means that awk will write output to a stream that is piped as input to a command cmd. It will keep on writing the output to that stream until the stream is explicitly closed using a close(cmd) statement or by simply terminating awk.
In case of the OP, the action reads { print $0 | "sort -k3" }, meaning that it will print all records $0 to a stream that is used as input of the shell command sort -k3. Only when the program finishes will sort write its output.
Recap: the command of the OP will print the first line of a file, and sort the consecutive lines according the third column.
Alternative solutions:
Using GNU awk, it is better to do:
awk '(FNR==1);{a[$3]=$0}
END{PROCINFO["sorted_in"]="#ind_str_asc"
for(i in a) print a[i]
}' file
Using pure shell, it is better to do:
cat file | (read -r; printf "%s\n" "$REPLY"; sort -k3)
Related questions:
Is there a way to ignore header lines in a UNIX sort?

| is one of redirections supported by print and printf - in this case pipe to command sort -k3. You might also use redirection to write to file using >:
awk 'NR == 1; NR > 1 {print $0 > "output.txt"}'
or append to file using >>:
awk 'NR == 1; NR > 1 {print $0 >> "output.txt"}'
First will write to file output.txt all lines but first, second will append to output.txt all lines but first.

How to add an if statement before calculation in AWK

I have a series of files that I am looping through and calculating the mean on a column within each file after performing a serious of filters. Each filter is piped in to the next, BEFORE calculating the mean on the final output. All of this is done within a sub shell to assign it to a variable for later use.
for example:
variable=$(filter1 | filter 2 | filter 3 | calculate mean)
to calculate the mean I use the following code
... | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
So, my problem is that depending on the file, the number of rows after the final filter is reduced to 0, i.e. the pipe passes nothing to AWK and I end up with awk: fatal: division by zero attempted printed to screen, and the variable then remains empty. I later print the variable to file and in this case I end up with BLANK in a text file. Instead what I am attempting to do is state that if NR==0 then assign 0 to the variable so that my final output in the text file is 0.
To do this I have tried to add an if statement at the start of my awk command
... | awk '{if (NR==0) print 0}BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
but this doesn't change the output/ error and I am left with BLANKs
I did move the begin statement but this caused other errors (syntax and output errors)
Expected results:
given that column from a file has 5 lines and looks thus, I would filter on apple and pipe into the calculation
apple 10
apple 10
apple 10
apple 10
apple 10
code:
vairable=$(awk -F"\t" '{OFS="\t"; if($1 ~ /apple/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}')
then I would expect the variable to be set to 10 (10*5/5 = 10)
In the following scenario where I filter on banana
vairable=$(awk -F"\t" '{OFS="\t"; if($1 ~ /banana/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}')
given that the pipe passes nothing to AWK I would want the variable to be 0
is it just easier to accept the blank space and change it later when printed to file - i.e. replace BLANK with 0?

The default value of a variable which you treat as a number in AWK is 0, so you don't need BEGIN {s=0}.
You should put the condition in the END block. NR is not the number of all rows, but the index of the current row. So it will only give the number of rows there were at the end.
awk '{s += $5} END { if (NR == 0) { print 0 } else { print s/NR } }'
Or, using a ternary:
awk '{s += $5} END { print (NR == 0) ? 0 : s/NR }'
Also, a side note about your BEGIN{OFS='\t'} ($1 ~ /banana/) { print $0 } examples: most of that code is unnecessary. You can just pass the condition:
awk -F'\t' '$1 ~ /banana/'`
When an awk program is only a condition, it uses that as a condition for whether or not to print a line. So you can use conditions as a quick way to filter through the text.

The correct way to write:
awk -F"\t" '{OFS="\t"; if($1 ~ /banana/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
is (assuming a regexp comparison for $1 really is appropriate, which it probably isn't):
awk 'BEGIN{FS=OFS="\t"} $1 ~ /banana/{ s+=$5; c++ } END{print (c ? s/c : 0)}' file.in
Is that what you're looking for?
Or are you trying to get the mean per column 1 like this:
awk 'BEGIN{FS=OFS="\t"} { s[$1]+=$5; c[$1]++ } END{ for (k in s) print k, s[k]/c[k] }' file.in
or something else?

Printing contents with a specific range with Awk

I have a text file containing:
Location 1 40.733596 -74.003139
Location 2 43.758102 -73.975734
Location 3 41.732456 -74.003755
Location 4 42.345907 -71.087001
where the first column is just a location count, the second column represents the latitude and third represents the longitude.
I'm trying to write an awk command to only print out the location within a specific latitude and longitude range.
awk -F '\t' '$2>40,$2<=42,$3>=-71,$3<=74 {print $1,$2,$3}'LatLon.txt
in the pattern segment of the awk command I'm trying to specify the range for the column 2 and column 3 where it prompts bash to only print the location within 40-42 lat and -71 to -74 lon range.
I'm getting an error mentioning:
awk: bailing out at source line 1
due to the pattern segment of my awk line. How do i properly specify the range?

Your code:
awk -F '\t' '$2>40,$2<=42,$3>=-71,$3<=74 {print $1,$2,$3}'LatLon.txt
This has a few errors in it:
You need to combine conditionals with && rather than commas
Your test on $3 won't pass when correctedsince you're asking for values between -71 and 74 yet all given values are lower than -71
You need a space between the awk code and your file.
This code should work for you:
awk -F '\t' '(40 < $2 && $2 <= 42) && (-74 <= $3 && $3 <= -71)' LatLon.txt
You may notice the lack of an action here. The default action is to print the line as-is, so this is roughly comparable to the action you gave (though {print $1,$2,$3} re-concatenates those fields using OFS which defaults to a space rather than a tab; you could do OFS="\t"; print $1,$2,$3 to preserve that or just print $0 which is what happens by default without an action.)
The parentheses are technically unnecessary. They are provided for legibility.

how can i make awk process the BEGIN block for each file it parses?

i have an awk script that i'm running against a pair of files. i'm calling it like this:
awk -f script.awk file1 file2
script.awk looks something like this:
BEGIN {FS=":"}
{ if( NR == 1 )
{
var=$2
FS=" "
}
else print var,"|",$0
}
the first line of each file is colon-delimited. for every other line, i want it to return to the default whitespace file seperator.
this works fine for the first file, but fails because FS is not reset to : after each file, because the BEGIN block is only processed once.
tldr: is there a way to make awk process the BEGIN block once for each file i pass it?
i'm running this on cygwin bash, in case that matters.

If you're using gawk version 4 or later there's the BEGINFILE block. From the manual:
BEGINFILE and ENDFILE are additional special patterns whose bodies are executed before reading the first
record of each command line input file and after reading the last record of each file. Inside the BEGINFILE
rule, the value of ERRNO will be the empty string if the file could be opened successfully. Otherwise, there
is some problem with the file and the code should use nextfile to skip it. If that is not done, gawk produces
its usual fatal error for files that cannot be opened.
For example:
touch a b c
awk 'BEGINFILE { print "Processing: " FILENAME }' a b c
Output:
Processing: a
Processing: b
Processing: c
Edit - a more portable way
As noted by DennisWilliamson you can achieve a similar effect with FNR == 1 at the beginning of your script. In addition to this you could change FS from the command-line directly, e.g.:
awk -f script.awk FS=':' file1 FS=' ' file2
Here the FS variable will retain whatever value it had previously.

Instead of:
BEGIN {FS=":"}
use:
FNR == 1 {FS=":"}

The FNR variable should do the trick for you. It's the same as NR except it is scoped within the file, so it resets to 1 for every input file.
http://unstableme.blogspot.ca/2009/01/difference-between-awk-nr-and-fnr.html
http://www.unix.com/shell-programming-scripting/46931-awk-different-between-nr-fnr.html

When you want a POSIX complient version, the best is to do:
(FNR == 1) { FS=":"; $0=$0 }
This states that, if the File record number (FNR) equals one, we reset the field separator FS. However, you also need to reparse $0 and reset the values of all other fields and the NF built-in variable.
This is equivalent to the GNU awk 4.x BEGINFILE if and only if the record separator (RS) stays unchanged.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how not process the first line using awk? - bash

I want to change the ps virtual memory size output from KiB to Mib and add sign to it, but neither want to change the header（first line），nor remove it from the result. example from PID COMMAND VSZ 9 bash 6304 537 ps 7476 to PID COMMAND VSZ 9 bash 6MB 537 ps 7MB

I try out and found the answer awk 'NR ==1 {print} NR>1 ($2=$2"MB") '

Related

awk to get first column if the a specific number in the line is greater than a digit

How do pipes inside awk work (Sort with keeping header)

How to add an if statement before calculation in AWK

Printing contents with a specific range with Awk

how can i make awk process the BEGIN block for each file it parses?

Categories

Resources