output to a variable file name in for loop in bash - bash

I am doing some tasks in side the for loop and trying to stdout to a variable file name during every iteration. But it is giving me the only one file with part of file assigned.
This is my script:
#!/bin/sh
me1_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me1_data"
me3_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me3_data"
dnase_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/dnase_data"
index=(003 004)
#index=(003 004 005 006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
#index=(006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
for i in "${index[#]}"; do
dnase_file="$dnase_dir/E$i-DNase.hotspot.fdr0.01.broad.bed"
me1_fil="$me1_dir/E$i-H3K4me1.broadPeak"
me3_fil="$me3_dir/E$i-H3K4me3.broadPeak"
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me1_fil > me1_file.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me3_fil > me3_file.bed
ctcf_file="CTCFsites_hg19_sorted_bedmerged.bed"
tss_file="TSS_gene_2kbupstrm_0.5kbdownstrm.bed"
cat me1_file.bed me3_file.bed $ctcf_file $tss_file | sort -k1,1 -k2,2n > file2.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $dnase_file | sort -k1,1 -k2,2n > file1.bed
bedtools intersect -v -a file1.bed -b file2.bed > E$i_file.txt;
done
It is giving only the output file "E.txt" from the last line in for loop. I am expecting E003_file.txt and E004_file.txt.
I am newbie please help me out.
Thank you

When you write
E$i_file.txt
the shell is looking for a variable named i_file, because _ is a valid character in a variable name, not a delimiter. You need to use braces to delimit the variable name:
bedtools intersect -v -a file1.bed -b file2.bed > "E${i}_file.txt"

Related

Trying to execute unix command in awk but receiving error

awk 'BEGIN{FS="|"; } {print $2|"od -An -vtu1"| tr -d "\n"}' test1.txt
I have file with
1|siva
2|krishna
3| syz 5
I am trying to find ascii value of field 2, but below command giving me error
awk 'BEGIN{FS="|"; } {print $2|"od -An -vtu1"| tr -d "\n"}' test1.txt
awk: BEGIN{FS="|"; } {print $2|"od -An -vtu1 tr -d "\n"}
awk: ^ backslash not last character on line
Expected output
115 105 118 97
107 114 105 115 104 110 97
32 115 121 122 32 53
you're not really using any awk, perhaps this is easier...
$ while IFS='|' read -r _ f;
do echo -n "$f" | od -An -vtu1;
done < file
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
It sounds like this is what your'e trying to do:
$ awk '
BEGIN { FS="|" }
{
cmd = "printf \047%s\047 \047" $2 "\047 | od -An -vtu1"
system(cmd)
}
' file
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
or an alternative syntax so the output comes from awk rather than by the shell called by system():
$ awk '
BEGIN { FS=OFS="|" }
{
cmd = "printf \047%s\047 \047" $2 "\047 | od -An -vtu1"
rslt = ( (cmd | getline line) > 0 ? line : "N/A" )
close(cmd)
print $0, rslt
}
' file
1|siva| 115 105 118 97
2|krishna| 107 114 105 115 104 110 97
3| syz 5| 32 32 115 121 122 32 53
Massage to suit. You don't NEED to save the result in a variable, you could just print it, but I figured you'll want to know how to do that at some point, and you don't NEED to print $0 of course.
I also assume you have some reason for wanting to do this in awk, e.g. it's part of a larger script, otherwise using awk to call system to call shell to execute shell commands is just a bad idea vs using shell to execute shell commands.
Having said that, the best shell command I can come up with to do what you want is this using GNU awk for mult-char RS:
$ awk -F'|' -v ORS='\0' '{print $2}' file |
od -An -vtu1 |
awk -v RS=' 0\\s' '{gsub(/\n/,"")}1'
115 105 118 97
107 114 105 115 104 110 97
32 32 115 121 122 32 53
See the comments below for how that's more robust than the first awk approach if the input contains '$(command)' but it does assume there's no NUL chars in your input.

Easy way to make a loop in shell

I would like to print an array as
001 002 003 .. 010 021 022 .. 030 041 042 .. 050
I had written the following script to do that. This is working well, but it is printing like
001 021 041 002 022 042 ....
#!/bin/sh
for i in {1..10}; do
while [ $i -le 50 ]; do
if [[ $i -le 9 ]];then n=00$i;else n=0$i;fi
echo $n
i=$(( i + 20 ))
done
done
I am looking for a easy way so that it will print like
001 002 003 .. 010 021 022 .. 030 041 042 .. 050
I assume that you are using Linux (since seq is often not installed in something like FreeBSD).
You can use seq with -f option.
first seq prints 001 .. 010
second seq prints 021 .. 030
and the last seq prints 041 .. 050
for i in {0..2}
do seq -f '%03g' $((i*20+1)) $((i*20+10))
done
A for-loop in a shell needs to operate on a static set, but you change $i in your loop.
Instead, use a while-loop:
i=1
while [ $i -le 50 ]; do
printf "%03d " $i
if [ $( expr $i % 10 ) -eq 0 ]; then
i=$(( i + 11 ))
else
i=$(( i + 1 ))
fi
done
echo
Or, with bash or ksh:
i=1
while (( i <= 50 )); do
printf "%03d " $i
if (( (i % 10) == 0 )); then
(( i += 11 ))
else
(( ++i ))
fi
done
echo
bash:
echo {001..010} {021..030} {041..050}
Output:
001 002 003 004 005 006 007 008 009 010 021 022 023 024 025 026 027 028 029 030 041 042 043 044 045 046 047 048 049 050
Using numrange:
numrange /001..010,021..030,041..050/
Output (space delimited):
001 002 003 004 005 006 007 008 009 010 021 022 023 024 025 026 027
028 029 030 041 042 043 044 045 046 047 048 049 050
For linefeed delimiters add the -N option (30 line output not shown):
numrange -N /001..010,021..030,041..050/

calculate percentage between columns in bash?

I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.
I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.

unexpected result: grep from a changing line

I wrote a bash command to test grep from a changing line:
for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5
The result shows:
9
Update
The real problem is as follows:
mplayer shows and refreshes a single-line playing progress when playing a media file. A sample result is:
A: 17.2 (17.2) of 213.0 (03:33.0) 0.5%
And I'm trying to grep this playing progress and ingore other lines. I used this command:
mplayer xxx.mp3 | grep ^A:
The result does not contain the line expected.
Update 2
mplayer xxx.mp3 | od -xda
shows:
0002140 4a5b 410d 203a 2020 2e31 2033 3028 2e31
[ J \r A : 1 . 3 ( 0 1 .
133 112 015 101 072 040 040 040 061 056 063 040 050 060 061 056
0002160 2932 6f20 2066 3132 2e33 2030 3028 3a33
2 ) o f 2 1 3 . 0 ( 0 3 :
062 051 040 157 146 040 062 061 063 056 060 040 050 060 063 072
0002200 3333 302e 2029 3020 342e 2025 5b1b 0d4a
3 3 . 0 ) 0 . 4 % 033 [ J \r
063 063 056 060 051 040 040 060 056 064 045 040 033 133 112 015
0002220 3a41 2020 3120 352e 2820 3130 342e 2029
A : 1 . 5 ( 0 1 . 4 )
101 072 040 040 040 061 056 065 040 050 060 061 056 064 051 040
0002240 666f 3220 3331 302e 2820 3330 333a 2e33
o f 2 1 3 . 0 ( 0 3 : 3 3 .
157 146 040 062 061 063 056 060 040 050 060 063 072 063 063 056
And
mplayer xxx.mp3 | tr '\r' '\n'
shows
A: 0.2 (00.1) of 213.0 (03:33.0) 0.3%
A: 0.3 (00.3) of 213.0 (03:33.0) 0.3%
A: 0.5 (00.5) of 213.0 (03:33.0) 0.4%
A: 0.6 (00.6) of 213.0 (03:33.0) 0.4%
A: 0.8 (00.8) of 213.0 (03:33.0) 0.4%
A: 1.0 (01.0) of 213.0 (03:33.0) 0.4%
While,
mplayer xxx.mp3 | tr '\r' '\n' | grep ^A
shows empty result.
Any tip will be appreciated.
It's your definition of "line" that's causing the problem here. The -n means that all the numbers are output on a single line, according the the definition used by grep (a series of characters, terminated by the \n character):
\r1\r2\r3\r4\r5\r6\r7\r8\r9
If you pipe the output through something like a hex dump, you can see what's happening:
$ for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 300d 310d 320d 330d 340d 350d 360d 370d
\r 0 \r 1 \r 2 \r 3 \r 4 \r 5 \r 6 \r 7
015 060 015 061 015 062 015 063 015 064 015 065 015 066 015 067
0000020 380d 390d 000a
\r 8 \r 9 \n
015 070 015 071 012
0000025
That single line containing all the carriage returns (and not newlines) will, when output, appear to be a single line with just the 9 on it. Removing the -n will result instead in:
$ for i in $(seq 0 9); do echo -e "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 350d 000a
\r 5 \n
015 065 012
0000003
which would look like just the 5 was being output.
If you have a process that outputs "lines" separated by carriage returns rather than newlines, there's nothing to stop you changing them on the fly so as to be able to handle them as real lines:
$ echo -e "junk\rA: good 1\rjunk\rA: good 2\rjunk" | tr '\r' '\n' | grep '^A'
A: good 1
A: good 2
Applying that back to your original question, it would be (with the sleep removed since it's irrelevant):
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5
5
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5 | od -xcb
0000000 0a35
5 \n
065 012
0000002

unexpected result from gnu sort

when I try to sort the following text file 'input':
test1 3
test3 2
test 4
with the command
sort input
the output is exactly the input. Here is the output of
od -bc input
:
0000000 164 145 163 164 061 011 063 012 164 145 163 164 063 011 062 012
t e s t 1 \t 3 \n t e s t 3 \t 2 \n
0000020 164 145 163 164 011 064 012
t e s t \t 4 \n
0000027
It's just a tab separated file with two columns. When I do
sort -k 2
The output changes to
test3 2
test1 3
test 4
which is what I would expect. But if I do
sort -k 1
nothing changes with respect to the input, whereas I would expect 'test' to sort before 'test1'. Finally, if I do
cat input | cut -f 1 | sort
I get
test
test1
test3
as expected. Is there a logical explanation for this? What exactly is sort supposed to do by default, something like:
sort -k 1
?
My version of sort:
sort (GNU coreutils) 7.4
From the man pages:
* WARNING * The locale specified by the environment affects
sort
order. Set LC_ALL=C to get the traditional sort order that uses
native
byte values.
So it seems export LC_ALL=C must help

Resources