Escape awk special character in Python - bash

I have a Fabric task as follows:
#task
def getCrons():
timeStampNowServer = sudo("date +%s%3N", pty=False)
cronLogFiles = sudo(
"find /home/logs/cron/ -maxdepth 2 -type f -mtime -1 -name '*.log'", pty=False)
cronLogFiles = cronLogFiles.splitlines(True)
for cronLog in cronLogFiles:
info = sudo(
"awk '/END$/ {prev=$0; next}; /^#RETURN/ && $2>0 {cur=$0; pr=1; next}; pr {printf \" % s\n % s\n % s\n\", prev, cur, $0; pr=0}'{0}".format(cronLog), pty=False)
print(info)
I am having the following traceback:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/fabric/main.py", line 743, in main
*args, **kwargs
File "/usr/lib/python2.7/site-packages/fabric/tasks.py", line 379, in execute
multiprocessing
File "/usr/lib/python2.7/site-packages/fabric/tasks.py", line 274, in _execute
return task.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/fabric/tasks.py", line 174, in run
return self.wrapped(*args, **kwargs)
File "/home/lbn/k.sewnundun/fabfile/kse/test.py", line 18, in getCrons
"awk '/END$/ {prev=$0; next}; /^#RETURN/ && $2>0 {cur=$0; pr=1; next}; pr {printf \"%s\n%s\n%s\n\", prev, cur, $0; pr=0}'{0}".format(cronLog), pty=False)
KeyError: 'prev=$0; next'
The command I want to execute on the server is:
awk '/END$/ {prev=$0; next}; /^#RETURN/ && $2>0 {cur=$0; pr=1; next}; pr {printf "%s\n%s\n%s\n", prev, cur, $0; pr=0}' mylog.LOG
However I am unable to escape the characters in the line:
info = sudo(
"awk '/END$/ {prev=$0; next}; /^#RETURN/ && $2>0 {cur=$0; pr=1; next}; pr {printf \" % s\n % s\n % s\n\", prev, cur, $0; pr=0}'{0}".format(cronLog), pty=False)
How do I make it run correctly?

The issue was solved by escaping the { and the awk new lines:
info = sudo("awk '/END$/ {{prev=$0; next}}; /^#RETURN/ && $2>0 {{cur=$0; pr=1; next}}; pr {{printf \"%s\\n%s\\n%s\\n\", prev, cur, $0; pr=0}}' {0}".format(cronLog), pty=False)
https://docs.python.org/2/library/string.html#format-string-syntax
Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}.

Related

Error with awk (newline or end of string)

I am having an issue with the following command:
awk ‘{if ($1 ~ /^##contig/) {next}else if ($1 ~ /^#/) {print $0; next}else {print $0 | “sort -k1,1V -k2,2n”}’ file.vcf > out.vcf
It gives the following error:
^ unexpected newline or end of string
Your command contains "fancy quotes" instead of normal ones, in addition to a missing }.
awk '{if ($1 ~ /^##contig/) {next} else if ($1 ~ /^#/) {print $0; next} else {print $0 | "sort -k1,1V -k2,2n"} }' file.vcf > out.vcf
Changing your command to the above should work as expected.

AWK to display a column based on Column name and remove header and last delimiter

Id,responseId,name,test1,test2,bcid,stype
213,A_123456,abc,test,zzz,987654321,alpha
412,A_234566,xyz,test,xxx,897564322,gama
125,A_456314,ttt,qa,yyy,786950473,delta
222,A_243445,hds,test,fff,643528290,alpha
456,A_466875,sed,test,hhh,543819101,beta
I want to extract columns responseId, and bcid from above. I found an answer which is really close
awk -F ',' -v cols=responseID,bcid '(NR==1){n=split(cols,cs,",");for(c=1;c<=n;c++){for(i=1;i<=NF;i++)if($(i)==cs[c])ci[c]=i}}{for(i=1;i<=n;i++)printf "%s" FS,$(ci[i]);printf "\n"}' <file_name>
however, it prints "," in the end and the header as shown below.
responseId,bcid,
A_123456,987654321,
A_234566,897564322,
A_456314,786950473,
A_243445,643528290,
A_466875,543819101,
How can I make it to not print the header and the "," after bcid??
Input
$ cat infile
Id,responseId,name,test1,test2,bcid,stype
213, A_123456, abc, test, zzz, 987654321, alpha
412, A_234566, xyz, test, xxx, 897564322, gama
125, A_456314, ttt, qa, yyy, 786950473, delta
222, A_243445, hds, test, fff, 643528290, alpha
456, A_466875, sed, test, hhh, 543819101, beta
Script
$ cat byname.awk
FNR==1{
split(header,h,/,/);
for(i=1; i in h; i++)
{
for(j=1; j<=NF; j++)
{
if(tolower(h[i])==tolower($j)){ d[i]=j; break }
}
}
next
}
{
for(i=1; i in h; i++)
printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");
print "";
}
How to execute ?
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" -f byname.awk infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
One-liner
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" 'FNR==1{split(header,h,/,/);for(i=1; i in h; i++){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){ d[i]=j; break }}}next}{for(i=1; i in h; i++)printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");print "";}' infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
try:
awk '{NR==1?FS=",":FS=", ";$0=$0} {print $2 OFS $(NF-1)}' OFS=, Input_file
Checking if line is 1st line then making delimiter as "," and other lines making field separator as ", " then printing the 2nd field and 2nd last field. Setting OFS(output field separator) as ,

Subtract single largest number from multiple specific columns in awk

I have a comma delimited file that looks like
R,F,TE,K,G,R
1,0,12,f,1,18
2,1,17,t, ,17
3,1, , ,1,
4,0,15, ,0,16
There are some items which are missing, also first row is the header which I want to ignore. I wanted to calculate the second smallest number in specific columns and subtract it from all the elements in that column unless the value in the column is the minimum value. In this example, I want to subtract the second minimum values from columns 3 and 6 in the example. So, my final values would be:
R,F,TE,K,G,R
1,0,12,f,1,1
2,1, 2,t, ,0
3,1, , ,0,
4,0, 0, ,0,16
I tried individually using single columns and giving hand-coded thresholds to make it second largest by
awk 'BEGIN {FS=OFS=",";
};
{ min=1000000;
if($3<min && $3 != "" && $3>12) min = $3;
if($3>0) $3 = $3-min+1;
print}
END{print min}
' try1.txt
It finds the min alright but the output is not as expected. There should be an easier way in awk.
I'd loop over the file twice, once to find the minima, once to adjust the values. It's a trade-off of time versus memory.
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
$3 != min3 {$3 -= min3}
$6 != min6 {$6 -= min6}
{print}
' try1.txt try1.txt
For prettier output:
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6; next}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
FNR == 1 {len3 = length("" min3); len6 = length("" min6)}
$3 != min3 {$3 = sprintf("%*d", len3, $3-min3)}
$6 != min6 {$6 = sprintf("%*d", len6, $6-min6)}
{print}
' try1.txt try1.txt
Given the new requirements:
min2_3=$(cut -d, -f3 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
min2_6=$(cut -d, -f6 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
awk -F, -v OFS=, -v min2_3=$min2_3 -v min2_6=$min2_6 '
NR==1 {print; next}
$3 !~ /^ *$/ && $3 >= min2_3 {$3 -= min2_3}
$6 !~ /^ *$/ && $6 >= min2_6 {$6 -= min2_6}
{print}
' try1.txt
R,F,TE,K,G,R
1,0,12,f,1,1
2,1,2,t, ,0
3,1, , ,1,
4,0,0, ,0,16
BEGIN{
FS=OFS=","
}
{
if(NR==1){print;next}
if(+$3)a[NR]=$3
if(+$6)b[NR]=$6
s[NR]=$0
}
END{
asort(a,c)
asort(b,d)
for(i=2;i<=NR;i++){
split(s[i],t)
if(t[3]!=c[1]&&+t[3]!=0)t[3]=t[3]-c[2]
if(t[6]!=d[1]&&+t[6]!=0)t[6]=t[6]-d[2]
print t[1],t[2],t[3],t[4],t[5],t[6]
}
}

How to convert date with awk

My file temp.txt
ID53,20150918,2015-09-19,,0,CENTER<br>
ID54,20150911,2015-09-14,,0,CENTER<br>
ID55,20150911,2015-09-14,,0,CENTER
I need to replace and convert the 2nd field (yyyymmdd) for seconds
I try it, but only the first line is replaced
awk -F"," '{ ("date -j -f ""%Y%m%d"" ""20150918"" ""+%s""") | getline $2; print }' OFS="," temp.txt
and tried to like this
awk -F"," '{system("date -j -f ""%Y%m%d"" "$2" ""+%s""") | getline $2; print }' temp.txt
the output is:
1442619474
sh: 0: command not found
ID53,20150918,2015-09-19,,0,CENTER
1442014674
ID54,20150911,2015-09-14,,0,CENTER
1442014674
ID55,20150911,2015-09-14,,0,CENTER
Using gsub also could not
awk -F"," '{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" ""+%s""")",$2); print}' OFS="," temp.txt
awk: syntax error at source line 1
context is
{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" >>> ""+% <<< s""")",$2); print}
awk: illegal statement at source line 1
extra )
I need the output to be so. How to?
ID53,1442619376,2015-09-19,,0,CENTER
ID54,1442014576,2015-09-14,,0,CENTER
ID55,1442014576,2015-09-14,,0,CENTER
This GNU awk script should make it. If it is not yet installed on your mac, I suggest installing macport and then GNU awk. You can also install a decent version of bash, date and other important utilities for which the default are really disappointing on OSX.
BEGIN { FS = ","; OFS = FS; }
{
y = substr($2, 1, 4);
m = substr($2, 5, 2);
d = substr($2, 7, 2);
$2 = mktime(y " " m " " d " 00 00 00");
print;
}
Put it in a file (e.g. txt2ts.awk) and process your file with:
$ awk -f txt2ts.awk data.txt
ID53,1442527200,2015-09-19,,0,CENTER<br>
ID54,1441922400,2015-09-14,,0,CENTER<br>
ID55,1441922400,2015-09-14,,0,CENTER
Note that we do not have the same timestamps. I let you try to understand where it comes from, it is another problem.
Explanations: substr(s, m, n) returns the n-characters sub-string of s that starts at position m (starting with 1). mktime("YYYY MM DD HH MM SS") converts the date string into a timestamp (seconds since epoch). FS and OFS are the input and output filed separators, respectively. The commands between the curly braces of the BEGIN pattern are executed at the beginning only while the others are executed on each line of the file.
You could use substr:
printf "%s-%s-%s", substr($6,0,4), substr($6,5,2), substr($6,7,2)
Assuming that the 6th field was 20150914, this would produce 2015-09-14

Creating an array with awk and passing it to a second awk operation

I have a column file and I want to print all the lines that do not contain the string SOL, and to print only the lines that do contain SOL but has the 5th column <1.2 or >4.8.
The file is structured as: MOLECULENAME ATOMNAME X Y Z
Example:
151SOL OW 6554 5.160 2.323 4.956
151SOL HW1 6555 5.188 2.254 4.690 ----> as you can see this atom is out of the
151SOL HW2 6556 5.115 2.279 5.034 threshold, but it need to be printed
What I thought is to save a vector with all the MOLECULENAME that I want, and then tell awk to match all the MOLECULENAME saved in vector "a" with the file, and print the complete output. ( if I only do the first awk i end up having bad atom linkage near the thershold)
The problem is that i have to pass the vector from the first awk to the second... I tried like this with a[], but of course it doesn't work.
How can i do this ?
Here is the code I have so far:
a[] = (awk 'BEGIN{i=0} $1 !~ /SOL/{a[i]=$1;i++}; /SOL/ && $5 > 4.8 {a[i]=$1;i++};/SOL/ &&$5<1.2 {a[i]=$1;i++}')
awk -v a="$a[$i]" 'BEGIN{i=0} $1 ~ $a[i] {if (NR>6540) {for (j=0;j<3;j++) {print $0}} else {print $0}
You can put all of the same molecule names in one row by using sort on the file and then running this AWK which basically uses printf to print on the same line until a different molecule name is found. Then, a new line starts. The second AWK script is used to detect which molecules names have 3 valid lines in the original file. I hope this can help you to solve your problem
sort your_file | awk 'BEGIN{ molname=""; } ( $0 !~ "SOL" || ( $0 ~ "SOL" && ( $5<1.2 || $5>4.8 ) ) ){ if($1!=molname){printf("\n");molname=$1}for(i=1;i<=NF;i++){printf("%s ",$i);}}' | awk 'NF>12 {print $0}'
awk '!/SOL/ || $5 < 1.2 || $5 > 4.8' inputfile.txt
Print (default behaviour) lines where:
"SOL" is not found
SOL is found and fifth column < 1.2
SOL is found and fifth column > 4.8
SOLVED! Thanks to all, here is how i solved it.
#!/bin/bash
file=$1
awk 'BEGIN {molecola="";i=0;j=1;}
{if ($1 !~ /SOL/) {print $0}
else if ( $1 != molecola && $1 ~ /SOL/ ) {
for (j in arr_comp) {if( arr_comp[j] < 1.2 || arr_comp[j] > 5) {for(j in arr_comp) {print arr_mol[j] };break}}
delete(arr_comp)
delete(arr_mol)
arr_mol[0]=$0
arr_comp[0]=$5
molecola=$1
j=1
}
else {arr_mol[j]=$0;arr_comp[j]=$5;j++} }' $file

Resources