Use an awk loop to subset a file - shell

I have a file with lots of pieces of information that I want to split on the first column.
Example (example.gen):
1 rs3094315 752566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
1 rs2094315 752999 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3044315 759996 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3054375 799966 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3094375 999566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs3078315 799866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs4054315 759986 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Desired output:
Chr1.gen
1 rs3094315 752566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
1 rs2094315 752999 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr2.gen
2 rs3044315 759996 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3054375 799966 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3094375 999566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr3.gen
3 rs3078315 799866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs4054315 759986 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr4.gen
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
I've tried to do this with the following shell scripts, but it doesn't work - I can't work out how to get awk to recognise a variable defined outside the awk script itself.
First script attempt (no awk loop):
for i in {1..23}
do
awk '{$1 = $i}' example.gen > Chr$i.gen
done
Second script attempt (with awk loop):
for i in {1..23}
do
awk '{for (i = 1; i <= 23; i++) $1 = $i}' example.gen > Chr$i.gen
done
I'm sure its probably quite basic, but I just can't work it out...
Thank you!

With awk:
awk '{print > "Chr"$1".gen"}' file
It just prints and redirects it to a file. And how is this file defined? With "Chr" + first_column + ".gen".
With your sample input it creates 4 files. For example the 4th is:
$ cat Chr4.gen
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1

First, use #fedorqui's answer, as that is best. But to understand the mistake you made with your first attempt (which was close), read on.
Your first attempt failed because you put the test inside the action (in the braces), not preceding it. The minimal fix:
awk "\$1 == $i" example.gen > Chr$i.gen
This uses double quotes to allow the value of i to be seen by the awk script, but that requires you to then escape the dollar sign for $1 so that you don't substitute the value of the shell's first positional argument. Cleaner but longer:
awk -v i=$i '$1 == i' example.gen > Chr$i.gen
This adds creates a variable i inside the awk script with the same value as the shell's i variable.

Related

Bash: Pipe output into a table

I have a program that prints out the following:
bash-3.2$ ./drawgrid
0
1 1 0
1 1 0
0 0 0
1
0 1 1
0 1 1
0 0 0
2
0 0 0
1 1 0
1 1 0
3
0 0 0
0 1 1
0 1 1
Is it possible to pipe the output of this command such that I get all the 3x3 matrices (together with their number) displayed on a table, for example a 2x2 like this?
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1
I tried searching, and came across the column command, but I did not figure it out.
Thank you
You can use pr -2T to get the following output, which is close to what you expected:
0 2
1 1 0 0 0 0
1 1 0 1 1 0
0 0 0 1 1 0
1 3
0 1 1 0 0 0
0 1 1 0 1 1
0 0 0 0 1 1
You could use an awk script:
NF == 1 {
if ($NF % 2 == 0) {
delete line
line[1]=$1
f=1
} else {
print line[1]"\t"$1
f=0
}
n=1
}
NF > 1 {
n++
if (f)
line[n]=$0
else
print line[n]"\t"$0
}
And pipe to it like so:
$ ./drawgrid | awk -f 2x2.awk
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1
You can get exactly what you expect with a short bash script and a little array index thought:
#!/bin/bash
declare -a idx
declare -a acont
declare -i cnt=0
declare -i offset=0
while IFS=$'\n'; read -r line ; do
[ ${#line} -eq 1 ] && { idx+=( $line ); ((cnt++)); }
[ ${#line} -gt 1 ] && { acont+=( $line );((cnt++)); }
done
for ((i = 0; i < ${#idx[#]}; i+=2)); do
printf "%4s%8s\n" ${idx[i]} ${idx[i+1]}
for ((j = offset; j < offset + 3; j++)); do
printf " %8s%8s\n" ${acont[j]} ${acont[j+3]}
done
offset=$((j + 3))
done
exit 0
Output
$ bash array_cols.sh <dat/cols.txt
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1

Sort each column independently

cat sanger.* | tr '\-ACGT' '01234' | sed -e 's/\([[:digit:]]\)/\1 /g'
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0
0 2 2 0 0 0 0 2 2 2 2 0 2 0 0 0 0 0 2 2 2 0 2 0 0 0 0 0 0 0 2
0 0 0 0 0 0 3 0 0 0 0 3 0 0 3 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 4 4 0 0 0 0 0 0 0 0 4 0 4 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0
This is my current output, now I want to sort each column independently, so all the numbers are on the same line.
How can I do that?
I am not sorting, here, but extracting the non-0 digits.
Here is an awk filter that "updates" each fields with the only (actually, the latest) non-"0" content it sees :
# short version
awk '/./ { if ( NF > maxNF ) { maxNF=NF }
for(i=1;i<=NF;i++) { if ( $i!="0" ) { result[i]=$i } }
}
END { for(i=1;i<=maxNF;i++) { printf "%s ",result[i] } }
'
# expanded version (ie, the same as above, with different indentation to mhelp reading)
awk '/./ { if ( NF > maxNF )
{ maxNF=NF }
for(i=1;i<=NF;i++)
{ if ( $i!="0" )
{ result[i]=$i }
}
}
END { for(i=1;i<=maxNF;i++)
{ printf "%s ",result[i]
}
}
'
so if I paste your posted result into that filter:
echo "
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0
0 2 2 0 0 0 0 2 2 2 2 0 2 0 0 0 0 0 2 2 2 0 2 0 0 0 0 0 0 0 2
0 0 0 0 0 0 3 0 0 0 0 3 0 0 3 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 4 4 0 0 0 0 0 0 0 0 4 0 4 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0
" | awk '/./ { if ( NF > maxNF ) { maxNF=NF }
for(i=1;i<=NF;i++) { if ( $i!="0" ) { result[i]=$i } }
}
END { for(i=1;i<=maxNF;i++) { printf "%s ",result[i] } }
'
it outputs:
1 2 2 4 4 1 3 2 2 2 2 3 2 4 3 4 1 3 2 2 2 1 2 1 3 4 1 1 1 1 2
(note: with an extra " " at the end, here...)
A note of warning however: very OLD version of the original awk (and maybe some nawk) are limited to 99 fields... (Rarely encountered nowadays. And if you use GNU's version, you will be fine)

Sorting the connected component in order

I have a question in sort of connected component. I have a binary image ( onlye 0 and 1) I run the function from matlab:
f=
1 0 0 1 0 0 0 1 0 0
1 1 0 1 1 1 0 0 1 0
0 0 0 0 0 0 0 1 1 1
1 0 0 0 1 0 1 0 1 1
1 1 0 0 0 0 0 1 1 1
0 0 0 1 0 0 1 0 0 0
0 0 0 1 0 1 1 0 1 1
1 1 0 0 1 0 0 0 1 0
1 1 0 1 1 1 0 1 0 0
1 1 0 0 1 0 0 0 1 0
[L num]=bwlabel(f);
suppose that they give me the ma trix:
1 0 0 4 0 0 0 5 0 0
1 1 0 4 4 4 0 0 5 0
0 0 0 0 0 0 0 5 5 5
2 0 0 0 6 0 5 0 5 5
2 2 0 0 0 0 0 5 5 5
0 0 0 5 0 0 5 0 0 0
0 0 0 5 0 5 5 0 7 7
3 3 0 0 5 0 0 0 7 0
3 3 0 5 5 5 0 7 0 0
3 3 0 0 5 0 0 0 7 0
But you can see in this resul, the order of matrix is follow the column. Now I want to change this in to the oder rows, that mean number 4 is 2 , number 5 is 3... so on.
The oder is left-> right and top -> down. How can I do that ( the oder of reading )??
Thank you so much
f=f';
[L num]=bwlabel(f);
L=L';
does this solves your problem?

What operator can return the median of close pixels?

I have a binarized image like the folowing matrix:
1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1
1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1
1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1
1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1
The problem is that the image stars and end with 101, so how can i turn that into this.
1 0 1 0 1 0 0 1 1 0 1 0 1
1 0 1 0 1 0 0 1 1 0 1 0 1
1 0 1 0 1 0 0 1 1 0 1 0 1
1 0 1 0 1 0 0 1 1 0 1 0 1
I am trying to the decode the image binary code.
it seems like resizing original image with scale (0.5,1) using Nearest Neighbor method.
If you are using Matlab or any other language with similar array processing capabilities (APL, Fortran 90, Mathematica, C++ +Boost, ...) you could turn your input into your desired output with a statement similar to this:
arr(:,1:2:end)
if your array of pixels is called arr of course.
This does not return the median of close pixels, but then nor does the suggested output in the question.

how to convert these 7-segment decoder to boolean expression

how to convert these 7-segment decoder to boolean expression??
BCD 7-Segment decoder
A B C D a b c d e f g
0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 1 0 0 1 1 1 1
0 0 1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1 1 0
0 1 0 0 1 0 0 1 1 0 0
0 1 0 1 0 1 0 0 1 0 0
0 1 1 0 0 1 0 0 0 0 0
0 1 1 1 0 0 0 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 1 0 0
I suggest you use a karnough map.
You'll need to use one for each result column, so 7 4x4 tables.
There are even a few karough map generators on the web that you can use.

Resources