Print each two column together from a matrix - shell

I have a matrix:
$cat ifile.txt
2 3 4 5 10 0 2 2 0 1 0 0 0 1
0 3 4 6 2 0 2 0 0 0 0 1 2 3
0 0 0 2 3 0 3 0 3 1 2 3 1 0
Here it has total 14 columns e.g. A1 B1 A2 B2 A3 B3 A4 B4 A5 B5 A6 B6 A7 B7. Each odd number columns correspond to A and even number columns correspond to B.
I would like to print all A in one column and all B in one column. So my desire file looks like:
$cat ofile.txt
2 3
0 3
0 0
4 5
4 6
0 2
10 0
2 0
3 0
2 0
0 0
0 3
....
It is possible for me to do manually in the following way, but I am looking for some more easy way to do it.
for c in 1 3 5 7 9 11 13;do
awk'{printf"%5s %5s",$c,$(c+1)} > A$c.txt
cat A1 A3 A5 A7 A9 A11 A13 > ofile.txt

$ cat tst.awk
{
for ( i=1; i<=NF; i++ ) {
a[NR,i] = $i
}
}
END {
for ( i=1; i<=NF; i+=2 ) {
for (j=1; j<=NR; j++ ) {
print a[j,i], a[j,i+1]
}
}
}
.
$ awk -f tst.awk file
2 3
0 3
0 0
4 5
4 6
0 2
10 0
2 0
3 0
2 2
2 0
3 0
0 1
0 0
3 1
0 0
0 1
2 3
0 1
2 3
1 0
If you want to generalize for more than 2 output columns:
$ cat tst.awk
BEGIN { n=(n ? n : 2) }
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
END {
for ( i=1; i<=NF; i+=n ) {
for (j=1; j<=NR; j++) {
for ( k=1; k<=n; k++ ) {
printf "%s%s", a[j,i+k-1], (k<n ? OFS : ORS)
}
}
}
}
.
$ awk -v n=2 -f tst.awk file
2 3
0 3
0 0
4 5
4 6
0 2
10 0
2 0
3 0
2 2
2 0
3 0
0 1
0 0
3 1
0 0
0 1
2 3
0 1
2 3
1 0
.
$ awk -v n=7 -f tst.awk file
2 3 4 5 10 0 2
0 3 4 6 2 0 2
0 0 0 2 3 0 3
2 0 1 0 0 0 1
0 0 0 0 1 2 3
0 3 1 2 3 1 0

Related

numeric vs alphanumeric sort on ubuntu 18.04.2

I am getting some strange behavior on sort utility on Ubuntu 18.04.2. Here's some sequence of commands issued. How can I ensure numeric sort for all the columns? column 1, 2, 3, 4 should be in order.
$ cat zz
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
2 2 2 2
10 10 10 10
1 1 10 1
1 1 100 1
$ cat zz | sort
0 0 0 0
0 1 0 0
1 0 0 0
10 10 10 10
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
$ cat zz | sort -n
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
$ cat zz | sort -n -k1,3
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
Desired output (with numeric sorting):
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
1 1 10 1
1 1 100 1
2 2 2 2
10 10 10 10
What options should I use in sort to get my desired output i.e. sorted in numeric order

Renaming file based on a value in a tsv file

My input is a tsv file with 5 columns. It has the column names 'Position' 'A', 'B' and so on, that repeat every now and then in the tsv. How can I split this tsv file so that each one has one set of the column headers and the data underneth, but not the next set of column headers.
Input:
Position A B C D Seg2
1 9 0 0 0 0
2 0 0 16 0 0
3 0 19 0 0 0
4 0 0 18 0 0
Position A B C D Seg1
1 9 0 0 0 1
2 0 0 22 0 0
3 0 19 0 0 0
4 0 0 19 0 0
5 39 0 0 0 0
6 43 0 0 0 0
The ideal output would be the above in split into two tsv files, one named Seg1.tsv and the other Seg2.tsv.
What I have:
awk '/Position/{x="F"++i;}{print > x;}' file.tsv
How can I modify the above to rename the files?
You should just derive the filename from the last column :
awk '/Position/{x=$6".tsv"}{print > x;}' file.tsv

how to record properties of other variables in stata

I have to generate variables entry_1, entry_2 and entry_3 which will adopt the value 1 if id_i for that particular month had entry=1.
Example.
id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
Would anyone be so kind to propose an idea of how to implement a loop in order to do this?
I am thinking of something like this:
forvalues i=1(1)3 {
gen entry`i'=0
replace entry`i'=1 if on that particular month id=`i' had entry=1
}
You could do something like this (although your data don't quite look right for the question you're asking):
forvalues i = 1/3 {
gen entry_`i' = id == `i' & entry == 1
}
This generates a dummy variable entry_i for each i in the forvalues loop where entry_i = 1 if id is i and entry is 1, and 0 otherwise.
The code can be simplified down to at most one loop.
clear
input id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
end
forval j = 1/4 {
egen entry`j' = total(entry & id == `j'), by(month)
}
list id month entry entry? , sepby(id)
+--------------------------------------------------------+
| id month entry entry1 entry2 entry3 entry4 |
|--------------------------------------------------------|
1. | 1 1 1 1 0 0 0 |
2. | 1 2 0 0 0 0 0 |
3. | 1 3 0 0 1 1 0 |
4. | 1 4 0 0 0 0 0 |
|--------------------------------------------------------|
5. | 2 1 0 1 0 0 0 |
6. | 2 2 0 0 0 0 0 |
7. | 2 3 1 0 1 1 0 |
8. | 2 4 0 0 0 0 0 |
|--------------------------------------------------------|
9. | 3 1 0 1 0 0 0 |
10. | 3 2 0 0 0 0 0 |
11. | 3 3 1 0 1 1 0 |
12. | 3 4 0 0 0 0 0 |
+--------------------------------------------------------+

Bash: Pipe output into a table

I have a program that prints out the following:
bash-3.2$ ./drawgrid
0
1 1 0
1 1 0
0 0 0
1
0 1 1
0 1 1
0 0 0
2
0 0 0
1 1 0
1 1 0
3
0 0 0
0 1 1
0 1 1
Is it possible to pipe the output of this command such that I get all the 3x3 matrices (together with their number) displayed on a table, for example a 2x2 like this?
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1
I tried searching, and came across the column command, but I did not figure it out.
Thank you
You can use pr -2T to get the following output, which is close to what you expected:
0 2
1 1 0 0 0 0
1 1 0 1 1 0
0 0 0 1 1 0
1 3
0 1 1 0 0 0
0 1 1 0 1 1
0 0 0 0 1 1
You could use an awk script:
NF == 1 {
if ($NF % 2 == 0) {
delete line
line[1]=$1
f=1
} else {
print line[1]"\t"$1
f=0
}
n=1
}
NF > 1 {
n++
if (f)
line[n]=$0
else
print line[n]"\t"$0
}
And pipe to it like so:
$ ./drawgrid | awk -f 2x2.awk
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1
You can get exactly what you expect with a short bash script and a little array index thought:
#!/bin/bash
declare -a idx
declare -a acont
declare -i cnt=0
declare -i offset=0
while IFS=$'\n'; read -r line ; do
[ ${#line} -eq 1 ] && { idx+=( $line ); ((cnt++)); }
[ ${#line} -gt 1 ] && { acont+=( $line );((cnt++)); }
done
for ((i = 0; i < ${#idx[#]}; i+=2)); do
printf "%4s%8s\n" ${idx[i]} ${idx[i+1]}
for ((j = offset; j < offset + 3; j++)); do
printf " %8s%8s\n" ${acont[j]} ${acont[j+3]}
done
offset=$((j + 3))
done
exit 0
Output
$ bash array_cols.sh <dat/cols.txt
0 1
1 1 0 0 1 1
1 1 0 0 1 1
0 0 0 0 0 0
2 3
0 0 0 0 0 0
1 1 0 0 1 1
1 1 0 0 1 1

Sort each column independently

cat sanger.* | tr '\-ACGT' '01234' | sed -e 's/\([[:digit:]]\)/\1 /g'
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0
0 2 2 0 0 0 0 2 2 2 2 0 2 0 0 0 0 0 2 2 2 0 2 0 0 0 0 0 0 0 2
0 0 0 0 0 0 3 0 0 0 0 3 0 0 3 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 4 4 0 0 0 0 0 0 0 0 4 0 4 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0
This is my current output, now I want to sort each column independently, so all the numbers are on the same line.
How can I do that?
I am not sorting, here, but extracting the non-0 digits.
Here is an awk filter that "updates" each fields with the only (actually, the latest) non-"0" content it sees :
# short version
awk '/./ { if ( NF > maxNF ) { maxNF=NF }
for(i=1;i<=NF;i++) { if ( $i!="0" ) { result[i]=$i } }
}
END { for(i=1;i<=maxNF;i++) { printf "%s ",result[i] } }
'
# expanded version (ie, the same as above, with different indentation to mhelp reading)
awk '/./ { if ( NF > maxNF )
{ maxNF=NF }
for(i=1;i<=NF;i++)
{ if ( $i!="0" )
{ result[i]=$i }
}
}
END { for(i=1;i<=maxNF;i++)
{ printf "%s ",result[i]
}
}
'
so if I paste your posted result into that filter:
echo "
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0
0 2 2 0 0 0 0 2 2 2 2 0 2 0 0 0 0 0 2 2 2 0 2 0 0 0 0 0 0 0 2
0 0 0 0 0 0 3 0 0 0 0 3 0 0 3 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 4 4 0 0 0 0 0 0 0 0 4 0 4 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0
" | awk '/./ { if ( NF > maxNF ) { maxNF=NF }
for(i=1;i<=NF;i++) { if ( $i!="0" ) { result[i]=$i } }
}
END { for(i=1;i<=maxNF;i++) { printf "%s ",result[i] } }
'
it outputs:
1 2 2 4 4 1 3 2 2 2 2 3 2 4 3 4 1 3 2 2 2 1 2 1 3 4 1 1 1 1 2
(note: with an extra " " at the end, here...)
A note of warning however: very OLD version of the original awk (and maybe some nawk) are limited to 99 fields... (Rarely encountered nowadays. And if you use GNU's version, you will be fine)

Resources