calculate percentage between columns in bash? - bash

I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.

I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.

Related

Apply multiple substract commands between two columns in text file in bash

I would like to substract 2x two columns in a text file and add into two new columns in a tab delimited text file in bash using awk.
I would like to substract column 3 (h3) - column 1 (h1). And name the new added column "count1".
I would like to substract column 4 (h4) - column 2 (h2). And name the new added column "count2".
I don't want to build a new text file, but edit the old one.
My text file:
h1 h2 h3 h4 h5
343 100 856 216 536
283 96 858 220 539
346 111 858 220 539
283 89 860 220 540
280 89 862 220 541
76 32 860 220 540
352 105 856 220 538
57 16 860 220 540
144 31 858 220 539
222 63 860 220 540
305 81 858 220 539
My command at the moment looks like this:
awk '{$6 = $3 - $1}1' file.txt
awk '{$6 = $4 - $2}1' file.txt
But I don't know how to rename the new added columns and maybe there is a smarter move to run both commands in the same awk command?
Pretty simple in awk. Use NR==1 to modify the first line.
awk -F '\t' -v OFS='\t' '
NR==1 {print $0,"count1","count2"}
NR!=1 {print $0,$3-$1,$4-$2}' file.txt > tmp && mv tmp file.txt

Trying to execute unix command in awk but receiving error

awk 'BEGIN{FS="|"; } {print $2|"od -An -vtu1"| tr -d "\n"}' test1.txt
I have file with
1|siva
2|krishna
3| syz 5
I am trying to find ascii value of field 2, but below command giving me error
awk 'BEGIN{FS="|"; } {print $2|"od -An -vtu1"| tr -d "\n"}' test1.txt
awk: BEGIN{FS="|"; } {print $2|"od -An -vtu1 tr -d "\n"}
awk: ^ backslash not last character on line
Expected output
115 105 118 97
107 114 105 115 104 110 97
32 115 121 122 32 53
you're not really using any awk, perhaps this is easier...
$ while IFS='|' read -r _ f;
do echo -n "$f" | od -An -vtu1;
done < file
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
It sounds like this is what your'e trying to do:
$ awk '
BEGIN { FS="|" }
{
cmd = "printf \047%s\047 \047" $2 "\047 | od -An -vtu1"
system(cmd)
}
' file
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
or an alternative syntax so the output comes from awk rather than by the shell called by system():
$ awk '
BEGIN { FS=OFS="|" }
{
cmd = "printf \047%s\047 \047" $2 "\047 | od -An -vtu1"
rslt = ( (cmd | getline line) > 0 ? line : "N/A" )
close(cmd)
print $0, rslt
}
' file
1|siva| 115 105 118 97
2|krishna| 107 114 105 115 104 110 97
3| syz 5| 32 32 115 121 122 32 53
Massage to suit. You don't NEED to save the result in a variable, you could just print it, but I figured you'll want to know how to do that at some point, and you don't NEED to print $0 of course.
I also assume you have some reason for wanting to do this in awk, e.g. it's part of a larger script, otherwise using awk to call system to call shell to execute shell commands is just a bad idea vs using shell to execute shell commands.
Having said that, the best shell command I can come up with to do what you want is this using GNU awk for mult-char RS:
$ awk -F'|' -v ORS='\0' '{print $2}' file |
od -An -vtu1 |
awk -v RS=' 0\\s' '{gsub(/\n/,"")}1'
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
See the comments below for how that's more robust than the first awk approach if the input contains '$(command)' but it does assume there's no NUL chars in your input.

output to a variable file name in for loop in bash

I am doing some tasks in side the for loop and trying to stdout to a variable file name during every iteration. But it is giving me the only one file with part of file assigned.
This is my script:
#!/bin/sh
me1_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me1_data"
me3_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me3_data"
dnase_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/dnase_data"
index=(003 004)
#index=(003 004 005 006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
#index=(006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
for i in "${index[#]}"; do
dnase_file="$dnase_dir/E$i-DNase.hotspot.fdr0.01.broad.bed"
me1_fil="$me1_dir/E$i-H3K4me1.broadPeak"
me3_fil="$me3_dir/E$i-H3K4me3.broadPeak"
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me1_fil > me1_file.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me3_fil > me3_file.bed
ctcf_file="CTCFsites_hg19_sorted_bedmerged.bed"
tss_file="TSS_gene_2kbupstrm_0.5kbdownstrm.bed"
cat me1_file.bed me3_file.bed $ctcf_file $tss_file | sort -k1,1 -k2,2n > file2.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $dnase_file | sort -k1,1 -k2,2n > file1.bed
bedtools intersect -v -a file1.bed -b file2.bed > E$i_file.txt;
done
It is giving only the output file "E.txt" from the last line in for loop. I am expecting E003_file.txt and E004_file.txt.
I am newbie please help me out.
Thank you
When you write
E$i_file.txt
the shell is looking for a variable named i_file, because _ is a valid character in a variable name, not a delimiter. You need to use braces to delimit the variable name:
bedtools intersect -v -a file1.bed -b file2.bed > "E${i}_file.txt"

awk script to return results recursively [duplicate]

This question already has an answer here:
awk script along with for loop
(1 answer)
Closed 7 years ago.
I have a data set as below (t.txt):
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I have second file as below (t1.txt):
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract column 2 elements of the second file from the first data set using a for loop.
I have tried :
for i in `cat t1.txt`
do
awk -F " " '$1=i {print $2}' t.txt > t0.txt
done
Desired output is :
694
820
774
Unfortunately I am getting a blank file.
I have tried to do it manually like : awk -F " " '$1==752 {print $2}' t.txt > t0.txt
Results obtained are
694
774
How can I do it for the entire t1 file in one go?
Simplest way: using join
$ join -o 1.2 <(sort t.txt) <(sort t1.txt)
694
774
820
join requires the files to be lexically sorted on the comparison field (the default field one). The -o option instructs join to output the 2nd field from the 1st file.
With awk
$ awk 'NR==FNR {key[$1]; next} $1 in key {print $2}' t1.txt t.txt
694
820
774
That remembers the keys in t1.txt, then loops over t.txt (when the accumulated record number NR is not equal to the file's record number FNR), if the first field occurred in t1, print the second field.

awk script along with for loop

I have a data set t.txt:
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I also have a second file t1.txt:
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract corresponding column 2 elements of the second file sequentially from the data set.
I have used: awk -F " " '$1==752 {print $2}' t.txt >> t2.txt
How can i use for loop for the above instruction and populate it in one text file instead of doing it one by one?
output for 752 will be 694. This 694 should be written in a different text file. For 812, it should give me 820. Both 694 and 820 should be written in the same text file. It should parse till end of the input file.
I was trying :
for i in `cat t1.txt` | awk -F " " '$1==$i {print $2}' t.txt >> t2.txt
which is throwing syntax error.
Answer for 3rd Version of This Question
$ awk 'FNR==NR{a[$1]=1;next;} $1 in a {print $2;}' t1.txt t.txt
694
820
774
Answer for 2nd Version of This Question
For every line in t1.txt, this checks to see if the same number appears in either column 1 of t.txt. If it does, the number in column 2 of the same line is printed:
$ awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt t1.txt
694
820
To save the output in file t2.txt, use:
awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt >t2.txt
How it works
FNR==NR{a[$1]=$2;next}
This reads through t.txt and creates an array a of its values.
$1 in a {print a[$1]}
For each number in file t1.txt, this checks to see if the number appears in array a and, if so, prints out the corresponding value.

Resources