Multiple pattern matching - shell

I have an input file with columns seperated by | as follows.
[3yu23yuoi]|$name
!$fjkdjl|[kkklkl]
$hjhj|$mmkj
I want the output as
0 $name
!$fjkdjl 0
$hjhj $mmkj
Whenever the string begins with $ or !$ or "any", i want it to get printed as such else 0.
I have tried the following command.It prints verything same as input file only.
awk -F="|" '{if (($1 ~ /^.*\$/) || ($1 ~ /^.*\!$/) || ($1 ~ /^any/)) {print $1} else if ($1 ~ /^\[.*/){print "0"} else if (($2 ~ /^.*\$/) || ($2 ~ /^.*\!$/) || ($2 ~ /^any/)) {print $2} else if($2 ~ /^\[.*/){print "0"}}' input > output

This should do:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|any)/) $i=0}1' file
0 $name
!$fjkdjl 0
$hjhj $mmkj
If data does not start with $ !$ or any, set it to 0
Or if you like tab as separator:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|^any)/) $i=0}1' OFS="\t" file
0 $name
!$fjkdjl 0
$hjhj $mmkj
$1=$1 make sure all line have same output, even if no data is changed.

Related

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

Error with awk (newline or end of string)

I am having an issue with the following command:
awk ‘{if ($1 ~ /^##contig/) {next}else if ($1 ~ /^#/) {print $0; next}else {print $0 | “sort -k1,1V -k2,2n”}’ file.vcf > out.vcf
It gives the following error:
^ unexpected newline or end of string
Your command contains "fancy quotes" instead of normal ones, in addition to a missing }.
awk '{if ($1 ~ /^##contig/) {next} else if ($1 ~ /^#/) {print $0; next} else {print $0 | "sort -k1,1V -k2,2n"} }' file.vcf > out.vcf
Changing your command to the above should work as expected.

How to do a if else match on pattern in awk

I've tried the below command:
awk '/search-pattern/ {print $1}'
How do I write the else part for the above command?
Classic way:
awk '{if ($0 ~ /pattern/) {then_actions} else {else_actions}}' file
$0 represents the whole input record.
Another idiomatic way
based on the ternary operator syntax selector ? if-true-exp : if-false-exp
awk '{print ($0 ~ /pattern/)?text_for_true:text_for_false}'
awk '{x == y ? a[i++] : b[i++]}'
awk '{print ($0 ~ /two/)?NR "yes":NR "No"}' <<<$'one two\nthree four\nfive six\nseven two'
1yes
2No
3No
4yes
A straightforward method is,
/REGEX/ {action-if-matches...}
! /REGEX/ {action-if-does-not-match}
Here's a simple example,
$ cat test.txt
123
456
$ awk '/123/{print "O",$0} !/123/{print "X",$0}' test.txt
O 123
X 456
Equivalent to the above, but without violating the DRY principle:
awk '/123/{print "O",$0}{print "X",$0}' test.txt
This is functionally equivalent to awk '/123/{print "O",$0} !/123/{print "X",$0}' test.txt
Depending what you want to do in the else part and other things about your script, choose between these options:
awk '/regexp/{print "true"; next} {print "false"}'
awk '{if (/regexp/) {print "true"} else {print "false"}}'
awk '{print (/regexp/ ? "true" : "false")}'
The default action of awk is to print a line. You're encouraged to use more idiomatic awk
awk '/pattern/' filename
#prints all lines that contain the pattern.
awk '!/pattern/' filename
#prints all lines that do not contain the pattern.
# If you find if(condition){}else{} an overkill to use
awk '/pattern/{print "yes";next}{print "no"}' filename
# Same as if(pattern){print "yes"}else{print "no"}
This command will check whether the values in the $1 $2 and $7-th column are greater than 1, 2, and 5.
!IF! the values do not mach they will be ignored by the filter we declared in awk.
(You can use logical Operators and = "&&"; or= "||".)
awk '($1 > 1) && ($2 > 1) && ($7 > 5)'
You can monitoring your system with the "vmstat 3" command, where "3" means a 3 second delay between the new values
vmstat 3 | awk '($1 > 1) && ($2 > 1) && ($7 > 5)'
I stressed my computer with 13GB copy between USB connected HardDisks, and scrolling youtube video in Chrome browser.

Convert slurm accounting output

I'm looking for a way to get the elapsed time output to always include days, at the moment I can't see away in defining an output format so I'm looking at using cut, awk, sed or similar command(s) to do this after the output has been generated.
So any ideas how I can change output such as:
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|12:49:20
970801|interactive-a|sam|COMPLETED|07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|04:36:35
971912|interactive-a|mat|COMPLETED|02:12:02
972668|interactive-a|mat|COMPLETED|00:09:06
Into this format (the last column has 0- added where needed)
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40|
968491|interactive-a|bob|COMPLETED|0-12:49:20|
970801|interactive-a|sam|COMPLETED|0-07:00:46|
912973|interactive-a|tom|COMPLETED|41-02:34:41|
971356|interactive-a|mat|COMPLETED|0-04:36:35|
971912|interactive-a|mat|COMPLETED|0-02:12:02|
972668|interactive-a|mat|COMPLETED|0-00:09:06|
Thanks
$ sed 's/|\([0-9:]\{1,\}\)$/|0-\1/' file
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|0-12:49:20
970801|interactive-a|sam|COMPLETED|0-07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|0-04:36:35
971912|interactive-a|mat|COMPLETED|0-02:12:02
972668|interactive-a|mat|COMPLETED|0-00:09:06
In awk:
$ awk -F\| '$5 ~ /-|E/ || ($5 = "0-" $5) && gsub(/ /,"|")' file
-F\| set FS to |
$5 ~ /-|E/ matches and prints records with - OR E in fifth field
|| logical OR, ie. if previous didn't match, then:
($5 = "0-" $5) prepend 0- to fifth field
&& gsub(/ /,"|") AND replace those space-replaced field separators with |s.
above could be removed if -v OFS="|" was used:
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file

Creating an array with awk and passing it to a second awk operation

I have a column file and I want to print all the lines that do not contain the string SOL, and to print only the lines that do contain SOL but has the 5th column <1.2 or >4.8.
The file is structured as: MOLECULENAME ATOMNAME X Y Z
Example:
151SOL OW 6554 5.160 2.323 4.956
151SOL HW1 6555 5.188 2.254 4.690 ----> as you can see this atom is out of the
151SOL HW2 6556 5.115 2.279 5.034 threshold, but it need to be printed
What I thought is to save a vector with all the MOLECULENAME that I want, and then tell awk to match all the MOLECULENAME saved in vector "a" with the file, and print the complete output. ( if I only do the first awk i end up having bad atom linkage near the thershold)
The problem is that i have to pass the vector from the first awk to the second... I tried like this with a[], but of course it doesn't work.
How can i do this ?
Here is the code I have so far:
a[] = (awk 'BEGIN{i=0} $1 !~ /SOL/{a[i]=$1;i++}; /SOL/ && $5 > 4.8 {a[i]=$1;i++};/SOL/ &&$5<1.2 {a[i]=$1;i++}')
awk -v a="$a[$i]" 'BEGIN{i=0} $1 ~ $a[i] {if (NR>6540) {for (j=0;j<3;j++) {print $0}} else {print $0}
You can put all of the same molecule names in one row by using sort on the file and then running this AWK which basically uses printf to print on the same line until a different molecule name is found. Then, a new line starts. The second AWK script is used to detect which molecules names have 3 valid lines in the original file. I hope this can help you to solve your problem
sort your_file | awk 'BEGIN{ molname=""; } ( $0 !~ "SOL" || ( $0 ~ "SOL" && ( $5<1.2 || $5>4.8 ) ) ){ if($1!=molname){printf("\n");molname=$1}for(i=1;i<=NF;i++){printf("%s ",$i);}}' | awk 'NF>12 {print $0}'
awk '!/SOL/ || $5 < 1.2 || $5 > 4.8' inputfile.txt
Print (default behaviour) lines where:
"SOL" is not found
SOL is found and fifth column < 1.2
SOL is found and fifth column > 4.8
SOLVED! Thanks to all, here is how i solved it.
#!/bin/bash
file=$1
awk 'BEGIN {molecola="";i=0;j=1;}
{if ($1 !~ /SOL/) {print $0}
else if ( $1 != molecola && $1 ~ /SOL/ ) {
for (j in arr_comp) {if( arr_comp[j] < 1.2 || arr_comp[j] > 5) {for(j in arr_comp) {print arr_mol[j] };break}}
delete(arr_comp)
delete(arr_mol)
arr_mol[0]=$0
arr_comp[0]=$5
molecola=$1
j=1
}
else {arr_mol[j]=$0;arr_comp[j]=$5;j++} }' $file

Resources