unix, merging several text file contents, horizontally (problems with paste) - bash

ok, let's I have a txt file like this...
X 1 : D i s t a n c e [ m m ]
Y 1 : I n t e n s i t y
X 2 : D i s t a n c e [ m m ]
Y 2 : I n t e n s i t y
I m a g e ( 2 3 7 . 2 3 u )
X 1 Y 1
0 . 0 0 0 0 0 0 4 0 . 0 0 0 0 0 0
0 . 0 0 2 0 0 0 5 7 . 0 0 0 0 0 0
...etc
And several others similar to this...
X 1 : D i s t a n c e [ m m ]
Y 1 : I n t e n s i t y
X 2 : D i s t a n c e [ m m ]
Y 2 : I n t e n s i t y
I m a g e ( 2 6 5 . 2 7 u )
X 1 Y 1
0 . 0 0 0 0 0 0 3 6 . 0 0 0 0 0 0
0 . 0 0 2 0 0 0 3 4 . 0 0 0 0 0 0
0 . 0 0 4 0 0 0 4 0 . 0 0 0 0 0 0
When I use paste, to merge horizontally the content of these files...
#! /bin/bash
zeta=$(ls)
paste $zeta >> file_1.txt
I get this (example if there were two files):
X 1 : D i s t a n c e [ m m ]
X 1 : D i s t a n c e [ m m ]
Y 1 : I n t e n s i t y
Y 1 : I n t e n s i t y
X 2 : D i s t a n c e [ m m ]
X 2 : D i s t a n c e [ m m ]
Y 2 : I n t e n s i t y
Y 2 : I n t e n s i t y
I m a g e ( 2 3 7 . 2 3 u )
I m a g e ( 2 6 5 . 2 7 u )
X 1 Y 1
X 1 Y 1
0 . 0 0 0 0 0 0 4 0 . 0 0 0 0 0 0
0 . 0 0 0 0 0 0 3 6 . 0 0 0 0 0 0
0 . 0 0 2 0 0 0 5 7 . 0 0 0 0 0 0
0 . 0 0 2 0 0 0 3 4 . 0 0 0 0 0 0
0 . 0 0 4 0 0 0 4 1 . 0 0 0 0 0 0
0 . 0 0 4 0 0 0 4 0 . 0 0 0 0 0 0
Why do I have this intermingle of lines?
How can I do to put exactly the content of a txt file just aside of the content of the other txt file? In this case have the columns 1 and 2 for my first file, and the columns 3 and 4 for my second file. And then massively for several files?
Thanks for any hint,

Maybe you can put several '\t' between context of the line and '\n' :
cat text1.txt | tr "\n" "\t\t\n" > text1.txt
After the processes, you can use your old method to paste them together. :)

Related

Is there a way to show the index of atoms in rdkit.Chem.rdmolops.GetAdjacencyMatrix?

I'm trying to convert a compound from mol to adjacency matrix. However, i encountered a problem that rdkit.Chem.rdmolops.GetAdjacencyMatrix() doesn't provide the index of the atoms for the adjacency matrix. Is there any way to include the index data for the adjacency matrix in rdkit?
rdkit.Chem.rdmolops.GetAdjacencyMatrix((Mol)mol)
As the RDKit AdjacencyMatrix is ordered from zero upwards, you can convert it to a Pandas dataframe.
from rdkit import Chem
import pandas as pd
s = 'CCC(C(O)C)CN'
mol = Chem.MolFromSmiles(s)
am = Chem.GetAdjacencyMatrix(mol)
print(am)
[[0 1 0 0 0 0 0 0]
[1 0 1 0 0 0 0 0]
[0 1 0 1 0 0 1 0]
[0 0 1 0 1 1 0 0]
[0 0 0 1 0 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 1 0 0 0 0 1]
[0 0 0 0 0 0 1 0]]
df = pd.DataFrame(am)
print(df)
0 1 2 3 4 5 6 7
0 0 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0
2 0 1 0 1 0 0 1 0
3 0 0 1 0 1 1 0 0
4 0 0 0 1 0 0 0 0
5 0 0 0 1 0 0 0 0
6 0 0 1 0 0 0 0 1
7 0 0 0 0 0 0 1 0
If you want elements instead of indices
element = [atom.GetSymbol() for atom in mol.GetAtoms()]
print(element)
['C', 'C', 'C', 'C', 'O', 'C', 'C', 'N']
df_e = pd.DataFrame(am, index=element, columns=element)
print(df_e)
C C C C O C C N
C 0 1 0 0 0 0 0 0
C 1 0 1 0 0 0 0 0
C 0 1 0 1 0 0 1 0
C 0 0 1 0 1 1 0 0
O 0 0 0 1 0 0 0 0
C 0 0 0 1 0 0 0 0
C 0 0 1 0 0 0 0 1
N 0 0 0 0 0 0 1 0

split a file keeping some columns

I need to split a file with ~5million rows based on some columns, i.e, I need to keep some columns on the different chunks. I am aware of split command for row-wise splitting, but don't know if there is any similar function to split column-wise with as I would like to. My file has 196 ANN columns
SNPID CHR POS Z F N LNBF ANN1 ANN2 ANN3
rs367896724 1 10177 0 0 0 -3.36827717630604 0 0 0
rs555500075 1 10352 0 0 0 -2.30999509213213 0 1 0
rs575272151 1 11008 0 0 0 -1.14611711529388 0 0 1
rs544419019 1 11012 0 0 0 -1.14611711529388 1 1 1
The desired output will be
#chunk1
SNPID CHR POS Z F N LNBF ANN1
rs367896724 1 10177 0 0 0 -3.36827717630604 0
rs555500075 1 10352 0 0 0 -2.30999509213213 0
rs575272151 1 11008 0 0 0 -1.14611711529388 0
rs544419019 1 11012 0 0 0 -1.14611711529388 1
#chunk2
SNPID CHR POS Z F N LNBF ANN2
rs367896724 1 10177 0 0 0 -3.36827717630604 0
rs555500075 1 10352 0 0 0 -2.30999509213213 1
rs575272151 1 11008 0 0 0 -1.14611711529388 0
rs544419019 1 11012 0 0 0 -1.14611711529388 1
#chunk3
SNPID CHR POS Z F N LNBF ANN3
rs367896724 1 10177 0 0 0 -3.36827717630604 0
rs555500075 1 10352 0 0 0 -2.30999509213213 0
rs575272151 1 11008 0 0 0 -1.14611711529388 1
rs544419019 1 11012 0 0 0 -1.14611711529388 1
The names of my ANN columns are not like ANN1 ANN2, the names are quite different to each other, I have just used ANN for simplicity.
The speed would be an issue, since the file is quite huge
UPDATE: if it would be possible I would like to split the files every 10 or 20 ANN columns (the total number of ANN is 196)
Something like this might work:
% cat script.awk
{
for (i=8;i<=NF;i++) {
print $1, $2, $3, $4, $5, $6, $7, $i >> "chunk"(i-7)".txt"
}
}
This will write 8 columns for each ANN columns into chunk1.txt, chunk2.txt, ... chunkN.txt (First 7 and then one ANN column). Run it with:
awk -f script.awk input_file
I assume that >> will open a file handle, append the line and then close it. So it's properly possible to optimize it.
A solution with perl:
The initial file, with a few extra columns
$ cat file
SNPID CHR POS Z F N LNBF ANN1 ANN2 ANN3 ANN4 ANN5 ANN6 ANN7 ANN8
rs367896724 1 10177 0 0 0 -3.36827717630604 0 0 0 a b c d e
rs555500075 1 10352 0 0 0 -2.30999509213213 0 1 0 f g h i j
rs575272151 1 11008 0 0 0 -1.14611711529388 0 0 1 k l m n o
rs544419019 1 11012 0 0 0 -1.14611711529388 1 1 1 p q r s t
A perl script to split it up
$ perl -alne '
$n=4; # how many data columns to put into the "split" files
for ( ($i,$j)=(7,1); $i < #F; $i+=$n,$j++ ) {
open($fh{$j}, ">", "file.$j") unless $fh{$j};
#data = (#F[0..6], #F[$i .. $i+$n-1]);
print {$fh{$j}} "#data";
}
' file
The results
$ cat file.1
SNPID CHR POS Z F N LNBF ANN1 ANN2 ANN3 ANN4
rs367896724 1 10177 0 0 0 -3.36827717630604 0 0 0 a
rs555500075 1 10352 0 0 0 -2.30999509213213 0 1 0 f
rs575272151 1 11008 0 0 0 -1.14611711529388 0 0 1 k
rs544419019 1 11012 0 0 0 -1.14611711529388 1 1 1 p
$ cat file.2
SNPID CHR POS Z F N LNBF ANN5 ANN6 ANN7 ANN8
rs367896724 1 10177 0 0 0 -3.36827717630604 b c d e
rs555500075 1 10352 0 0 0 -2.30999509213213 g h i j
rs575272151 1 11008 0 0 0 -1.14611711529388 l m n o
rs544419019 1 11012 0 0 0 -1.14611711529388 q r s t

full adder gate

I don't know if this question is considered to be related to stackoverflow (I'm sorry if it's not but I have searched and did not find an answer anywhere).
I have coded a full adder
Output:
Truth Table :
a1 a2 b1 b2 S1 S2 C
______________________________
0 0 0 0 0 0 0
0 0 0 1 0 1 0
0 0 1 0 1 0 0
0 0 1 1 1 1 0
0 1 0 0 0 1 0
0 1 0 1 0 0 1
0 1 1 0 1 1 0
0 1 1 1 1 0 1
1 0 0 0 1 0 0
1 0 0 1 1 1 0
1 0 1 0 0 1 0
1 0 1 1 0 0 1
1 1 0 0 1 1 0
1 1 0 1 1 0 1
1 1 1 0 0 0 1
1 1 1 1 0 1 1
If somebody has ever calculated this, can they tell me if my output is correct
a1 a2 b1 b2 S1 S2 C a b s c
______________________________
0 0 0 0 0 0 0 0 0 0 0 nothing plus nothing is nothing
0 0 0 1 0 1 0 0 2 2 0 nothing plus two is two
0 0 1 0 1 0 0 0 1 1 0 nothing plus one is one
0 0 1 1 1 1 0 0 3 3 0 nothing plus three is three
0 1 0 0 0 1 0 2 0 2 0 two plus nothing is two
0 1 0 1 0 0 1 2 2 0 1 two plus two is four (four not in 0-3)
0 1 1 0 1 1 0 2 1 3 0 two plus 1 is three
0 1 1 1 1 0 1 2 3 1 1 two plus three is five (one and four)
1 0 0 0 1 0 0 1 0 1 0 one plus nothing is one
1 0 0 1 1 1 0 1 2 3 0 one plus two is three
1 0 1 0 0 1 0 1 1 2 0 one plus one is two
1 0 1 1 0 0 1 1 3 0 1 one plus three is four
1 1 0 0 1 1 0 3 0 3 0 three plus nothing is three
1 1 0 1 1 0 1 3 2 1 1 three plus two is five (one and four)
1 1 1 0 0 0 1 3 1 0 1 three plus one is four
1 1 1 1 0 1 1 3 3 2 1 three plus three is 6 (two and four)
Looks right. Ordering your 16 rows a little differently would make them flow in a more logical order.
It's an adder! Just check if it's adding. Let's take this row:
a2 a1 b2 b1 C S2 S2
1 0 1 1 1 0 1
Here I have reordered the columns in an easier to read manner: higher order bits first.
The a input is 10 = 2 (base 10). The b input is 11 = 3 (base 10). The output is 101, which
is 5 (base 10). So this one is right: 2 + 3 == 5.
I'll let you check the other rows.

Use an awk loop to subset a file

I have a file with lots of pieces of information that I want to split on the first column.
Example (example.gen):
1 rs3094315 752566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
1 rs2094315 752999 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3044315 759996 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3054375 799966 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3094375 999566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs3078315 799866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs4054315 759986 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Desired output:
Chr1.gen
1 rs3094315 752566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
1 rs2094315 752999 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr2.gen
2 rs3044315 759996 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3054375 799966 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
2 rs3094375 999566 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr3.gen
3 rs3078315 799866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
3 rs4054315 759986 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
Chr4.gen
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
I've tried to do this with the following shell scripts, but it doesn't work - I can't work out how to get awk to recognise a variable defined outside the awk script itself.
First script attempt (no awk loop):
for i in {1..23}
do
awk '{$1 = $i}' example.gen > Chr$i.gen
done
Second script attempt (with awk loop):
for i in {1..23}
do
awk '{for (i = 1; i <= 23; i++) $1 = $i}' example.gen > Chr$i.gen
done
I'm sure its probably quite basic, but I just can't work it out...
Thank you!
With awk:
awk '{print > "Chr"$1".gen"}' file
It just prints and redirects it to a file. And how is this file defined? With "Chr" + first_column + ".gen".
With your sample input it creates 4 files. For example the 4th is:
$ cat Chr4.gen
4 rs4900215 752998 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs5094315 759886 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
4 rs6094315 798866 A G 0 1 0 1 0 0 1 0 0 0 1 0 0 1
First, use #fedorqui's answer, as that is best. But to understand the mistake you made with your first attempt (which was close), read on.
Your first attempt failed because you put the test inside the action (in the braces), not preceding it. The minimal fix:
awk "\$1 == $i" example.gen > Chr$i.gen
This uses double quotes to allow the value of i to be seen by the awk script, but that requires you to then escape the dollar sign for $1 so that you don't substitute the value of the shell's first positional argument. Cleaner but longer:
awk -v i=$i '$1 == i' example.gen > Chr$i.gen
This adds creates a variable i inside the awk script with the same value as the shell's i variable.

How do I implement this function using XOR gates?

I have to implement the function below with XOR gates. I drew the Karnaugh map and wrote down the resulting minimized function. But now I'm stuck with AND and OR gates, what should I do in order to get XOR gates?
My solution looks as follows:
F = Sum(1,2,4,7,8,11,13,14);
F = A' B' C' D + A' B' C D' + A' B C' D' + A' B C D + A B' C' D' + A B' C D + A B C' D + A B C D';
F = XOR(C, D) & A' B' + XOR(C, D) & A B + XOR(A, B) & C' D' + XOR(A, B) & C D;
F = XOR(C, D) & XOR(A, B)' + XOR(A, B) & XOR(C, D) ';
F = XOR(XOR(A, B), XOR(C, D));
A B C D XOR(A, B) XOR(C, D) F
00 0 0 0 0 0 0 0
01 0 0 0 1 0 1 1
02 0 0 1 0 0 1 1
03 0 0 1 1 0 0 0
04 0 1 0 0 1 0 1
05 0 1 0 1 1 1 0
06 0 1 1 0 1 1 0
07 0 1 1 1 1 0 1
08 1 0 0 0 1 0 1
09 1 0 0 1 1 1 0
10 1 0 1 0 1 1 0
11 1 0 1 1 1 0 1
12 1 1 0 0 0 0 0
13 1 1 0 1 0 1 1
14 1 1 1 0 0 1 1
15 1 1 1 1 0 0 0
A handy tool for such questions is "Logic Friday 1"

Resources