Regex negative match on multiple column with and operator in AWK - bash

I have simple file with sample log:
192.168.1.117 60921 224.0.0.252 5355 yahoo.com
192.168.1.117 60900 192.168.1.118 5356 118.1.168.192.in-addr.arpa
192.29.1.221 5943 211.29.132.12 4325 webex.com
192.168.1.221 5943 192.29.132.12 4325 lizzard.com
I am using awk in following way to filter internal traffic and arpa domain:
awk '($1 !~ ^([192.168]|[192.29]) && $3 !~ ^([192.168]|[192.29])) || $5 ~ (in\-addr\.arpa$)' < filein
I am not sure what I am doing wrong as I am not getting any output?

You are missing to wrap the regular expression between //. Furthermore the use of [] is wrong.
It should be:
awk '($1 !~ /^192\.(168|29)/ && $3 !~ /^192\.(168|29)/) || $5 ~ /in\-addr\.arpa$/' file

Related

How to match a unique patter using awk?

I have a text file called 'file.txt' with the content like,
test:one
test_test:two
test_test_test:three
If the pattern is test, then the expected output should be one and similarly for the other two lines.
This is what I have tried.
pattern=test && awk '{split($0,i,":"); if (i[1] ~ /'"$pattern"'$/) print i[2]}'
This command gives the output like,
one
two
three
and pattern=test_test && awk '{split($0,i,":"); if (i[1] ~ /'"$pattern"'$/) print i[2]}'
two
three
How can I match the unique pattern being "test" for "test" and not for "test_test" and so on.
How can I match the unique pattern being test for test and not for test_test and so on.
Don't use a regex for comparing the value, just use equality:
awk -F: -v pat='test' '$1 == pat {print $2}' file
one
awk -F: -v pat='test_test' '$1 == pat {print $2}' file
two
If you really want to use regex, then use it like this with anchors:
awk -F: -v pat='test' '$1 ~ "^" pat "$" {print $2}' file
one
If you want to use a regex, you can create it dynamically with pattern and optionally repeating _ followed by pattern until matching a :
If it matches the start of the string, then you can print the second field.
awk -v pattern='test' -F: '
$0 ~ "^"pattern"(_"pattern")*:" {
print $2
}
' file
Output
one
two
three
Or if only matching the part before the first underscore is also ok, then splitting field 1 on _ and printing field 2:
awk -v pattern='test' -F: ' {
split($1, a, "_")
if(a[1] == pattern) print $2
}' file
Using GNU sed with word boundaries
$ sed -n '/\<test\>/s/[^:]*://p' input_file
one

How to do a if else match on pattern in awk

I've tried the below command:
awk '/search-pattern/ {print $1}'
How do I write the else part for the above command?
Classic way:
awk '{if ($0 ~ /pattern/) {then_actions} else {else_actions}}' file
$0 represents the whole input record.
Another idiomatic way
based on the ternary operator syntax selector ? if-true-exp : if-false-exp
awk '{print ($0 ~ /pattern/)?text_for_true:text_for_false}'
awk '{x == y ? a[i++] : b[i++]}'
awk '{print ($0 ~ /two/)?NR "yes":NR "No"}' <<<$'one two\nthree four\nfive six\nseven two'
1yes
2No
3No
4yes
A straightforward method is,
/REGEX/ {action-if-matches...}
! /REGEX/ {action-if-does-not-match}
Here's a simple example,
$ cat test.txt
123
456
$ awk '/123/{print "O",$0} !/123/{print "X",$0}' test.txt
O 123
X 456
Equivalent to the above, but without violating the DRY principle:
awk '/123/{print "O",$0}{print "X",$0}' test.txt
This is functionally equivalent to awk '/123/{print "O",$0} !/123/{print "X",$0}' test.txt
Depending what you want to do in the else part and other things about your script, choose between these options:
awk '/regexp/{print "true"; next} {print "false"}'
awk '{if (/regexp/) {print "true"} else {print "false"}}'
awk '{print (/regexp/ ? "true" : "false")}'
The default action of awk is to print a line. You're encouraged to use more idiomatic awk
awk '/pattern/' filename
#prints all lines that contain the pattern.
awk '!/pattern/' filename
#prints all lines that do not contain the pattern.
# If you find if(condition){}else{} an overkill to use
awk '/pattern/{print "yes";next}{print "no"}' filename
# Same as if(pattern){print "yes"}else{print "no"}
This command will check whether the values in the $1 $2 and $7-th column are greater than 1, 2, and 5.
!IF! the values do not mach they will be ignored by the filter we declared in awk.
(You can use logical Operators and = "&&"; or= "||".)
awk '($1 > 1) && ($2 > 1) && ($7 > 5)'
You can monitoring your system with the "vmstat 3" command, where "3" means a 3 second delay between the new values
vmstat 3 | awk '($1 > 1) && ($2 > 1) && ($7 > 5)'
I stressed my computer with 13GB copy between USB connected HardDisks, and scrolling youtube video in Chrome browser.

Convert slurm accounting output

I'm looking for a way to get the elapsed time output to always include days, at the moment I can't see away in defining an output format so I'm looking at using cut, awk, sed or similar command(s) to do this after the output has been generated.
So any ideas how I can change output such as:
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|12:49:20
970801|interactive-a|sam|COMPLETED|07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|04:36:35
971912|interactive-a|mat|COMPLETED|02:12:02
972668|interactive-a|mat|COMPLETED|00:09:06
Into this format (the last column has 0- added where needed)
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40|
968491|interactive-a|bob|COMPLETED|0-12:49:20|
970801|interactive-a|sam|COMPLETED|0-07:00:46|
912973|interactive-a|tom|COMPLETED|41-02:34:41|
971356|interactive-a|mat|COMPLETED|0-04:36:35|
971912|interactive-a|mat|COMPLETED|0-02:12:02|
972668|interactive-a|mat|COMPLETED|0-00:09:06|
Thanks
$ sed 's/|\([0-9:]\{1,\}\)$/|0-\1/' file
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|0-12:49:20
970801|interactive-a|sam|COMPLETED|0-07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|0-04:36:35
971912|interactive-a|mat|COMPLETED|0-02:12:02
972668|interactive-a|mat|COMPLETED|0-00:09:06
In awk:
$ awk -F\| '$5 ~ /-|E/ || ($5 = "0-" $5) && gsub(/ /,"|")' file
-F\| set FS to |
$5 ~ /-|E/ matches and prints records with - OR E in fifth field
|| logical OR, ie. if previous didn't match, then:
($5 = "0-" $5) prepend 0- to fifth field
&& gsub(/ /,"|") AND replace those space-replaced field separators with |s.
above could be removed if -v OFS="|" was used:
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file

Multiple pattern matching

I have an input file with columns seperated by | as follows.
[3yu23yuoi]|$name
!$fjkdjl|[kkklkl]
$hjhj|$mmkj
I want the output as
0 $name
!$fjkdjl 0
$hjhj $mmkj
Whenever the string begins with $ or !$ or "any", i want it to get printed as such else 0.
I have tried the following command.It prints verything same as input file only.
awk -F="|" '{if (($1 ~ /^.*\$/) || ($1 ~ /^.*\!$/) || ($1 ~ /^any/)) {print $1} else if ($1 ~ /^\[.*/){print "0"} else if (($2 ~ /^.*\$/) || ($2 ~ /^.*\!$/) || ($2 ~ /^any/)) {print $2} else if($2 ~ /^\[.*/){print "0"}}' input > output
This should do:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|any)/) $i=0}1' file
0 $name
!$fjkdjl 0
$hjhj $mmkj
If data does not start with $ !$ or any, set it to 0
Or if you like tab as separator:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|^any)/) $i=0}1' OFS="\t" file
0 $name
!$fjkdjl 0
$hjhj $mmkj
$1=$1 make sure all line have same output, even if no data is changed.

Creating an array with awk and passing it to a second awk operation

I have a column file and I want to print all the lines that do not contain the string SOL, and to print only the lines that do contain SOL but has the 5th column <1.2 or >4.8.
The file is structured as: MOLECULENAME ATOMNAME X Y Z
Example:
151SOL OW 6554 5.160 2.323 4.956
151SOL HW1 6555 5.188 2.254 4.690 ----> as you can see this atom is out of the
151SOL HW2 6556 5.115 2.279 5.034 threshold, but it need to be printed
What I thought is to save a vector with all the MOLECULENAME that I want, and then tell awk to match all the MOLECULENAME saved in vector "a" with the file, and print the complete output. ( if I only do the first awk i end up having bad atom linkage near the thershold)
The problem is that i have to pass the vector from the first awk to the second... I tried like this with a[], but of course it doesn't work.
How can i do this ?
Here is the code I have so far:
a[] = (awk 'BEGIN{i=0} $1 !~ /SOL/{a[i]=$1;i++}; /SOL/ && $5 > 4.8 {a[i]=$1;i++};/SOL/ &&$5<1.2 {a[i]=$1;i++}')
awk -v a="$a[$i]" 'BEGIN{i=0} $1 ~ $a[i] {if (NR>6540) {for (j=0;j<3;j++) {print $0}} else {print $0}
You can put all of the same molecule names in one row by using sort on the file and then running this AWK which basically uses printf to print on the same line until a different molecule name is found. Then, a new line starts. The second AWK script is used to detect which molecules names have 3 valid lines in the original file. I hope this can help you to solve your problem
sort your_file | awk 'BEGIN{ molname=""; } ( $0 !~ "SOL" || ( $0 ~ "SOL" && ( $5<1.2 || $5>4.8 ) ) ){ if($1!=molname){printf("\n");molname=$1}for(i=1;i<=NF;i++){printf("%s ",$i);}}' | awk 'NF>12 {print $0}'
awk '!/SOL/ || $5 < 1.2 || $5 > 4.8' inputfile.txt
Print (default behaviour) lines where:
"SOL" is not found
SOL is found and fifth column < 1.2
SOL is found and fifth column > 4.8
SOLVED! Thanks to all, here is how i solved it.
#!/bin/bash
file=$1
awk 'BEGIN {molecola="";i=0;j=1;}
{if ($1 !~ /SOL/) {print $0}
else if ( $1 != molecola && $1 ~ /SOL/ ) {
for (j in arr_comp) {if( arr_comp[j] < 1.2 || arr_comp[j] > 5) {for(j in arr_comp) {print arr_mol[j] };break}}
delete(arr_comp)
delete(arr_mol)
arr_mol[0]=$0
arr_comp[0]=$5
molecola=$1
j=1
}
else {arr_mol[j]=$0;arr_comp[j]=$5;j++} }' $file

Resources