What constitutes side chain atoms in pdb file? - bioinformatics

I previously calculated phi, psi, and omega, pretty easily from .pdb file. Because their definitions are rather straight-forward. For instance, I know that they require four cartesian coordinates (four atoms) that are set
phi: C-N-CA-C
psi: N-CA-C-N
omega: CA-C-N-CA
Now I am trying to calculate side-chain angles. I know this is similar to phi, psi, and omega (in that I will need 4 atoms per angle). However, I am having difficulty reading the .pdb file and determining what atoms in the first place constitute the side chains? For instance, in the following segment (I removed hydrogens and the one carbon per residue without a subscript):
1 N -14.152 0.961 4.712
1 CA -13.296 0.028 3.924
1 O -11.358 1.432 3.941
1 CB -13.571 0.173 2.426
1 CG -15.046 -0.135 2.144
1 SD -16.174 1.270 1.982
1 CE -17.702 0.313 1.823
2 N -11.121 -0.642 4.703
2 CA -9.669 -0.447 4.998
2 O -9.036 -2.736 4.724
2 CB -9.462 -0.447 6.516
2 OG1 -10.399 0.505 7.010
2 CG2 -8.090 0.103 6.896
3 N -7.990 -1.247 3.462
3 CA -7.173 -2.314 2.811
3 O -5.487 -1.663 4.367
3 CB -6.881 -1.930 1.359
3 CG -8.162 -1.388 0.715
3 CD1 -8.594 -0.102 0.975
3 CD2 -8.903 -2.180 -0.135
3 CE1 -9.749 0.380 0.392
3 CE2 -10.057 -1.699 -0.718
3 CZ -10.490 -0.415 -0.457
3 OH -11.645 0.066 -1.038
4 N -5.204 -3.598 3.323
4 CA -3.922 -3.881 4.044
4 O -2.647 -4.537 2.142
4 CB -4.003 -5.297 4.612
4 CG -3.169 -5.399 5.890
4 CD -2.632 -6.837 6.002
4 CE -2.044 -7.084 7.401
4 NZ -2.526 -8.390 7.935
Would the first few angles be between atoms as such:
N-CA-O-CB
CA-O-CB-CG
O-CB-CG-SD
CB-CG-SD-CE
In other words, would I be including atoms like O, SD, etc? Or do I only include subscripts in the order A, B, G, D, E, Z (anything else)? So that my first few angles would be:
N-CA-CB-CG
CA-CB-CG-CE
CG-CE-N-CA
CE-N-CA-CB

As I understand it, you usually just include the numbered carbons (so the second option that you showed would be correct). The exception is when there aren't enough carbons to define an angle. For instance, in cysteine you use the sulpher instead of a gamma carbon. You can find a list of the standard side chain angles for each amino acid here.
For more information, see this page.

Related

Converting to and from a number system that doesn't have a zero digit

Consider Microsoft Excel's column-numbering system. Columns are "numbered" A, B, C, ... , Y, Z, AA, AB, AC, ... where A is 1.
The column system is similar to the base-10 numbering system that we're familiar with in that when any digit has its maximum value and is incremented, its value is set to the lowest possible digit value and the digit to its left is incremented, or a new digit is added at the minimum value. The difference is that there isn't a digit that represents zero in the letter numbering system. So if the "digit alphabet" contained ABC or 123, we could count like this:
(base 3 with zeros added for comparison)
base 3 no 0 base 3 with 0 base 10 with 0
----------- ------------- --------------
- - 0 0
A 1 1 1
B 2 2 2
C 3 10 3
AA 11 11 4
AB 12 12 5
AC 13 20 6
BA 21 21 7
BB 22 22 8
BC 23 100 9
CA 31 101 10
CB 32 102 11
CC 33 110 12
AAA 111 111 13
Converting from the zeroless system to our base 10 system is fairly simple; it's still a matter of multiplying the power of that space by the value in that space and adding it to the total. So in the case of AAA with the alphabet ABC, it's equivalent to (1*3^2) + (1*3^1) + (1*3^0) = 9 + 3 + 1 = 13.
I'm having trouble converting inversely, though. With a zero-based system, you can use a greedy algorithm moving from largest to smallest digit and grabbing whatever fits. This will not work for a zeroless system, however. For example, converting the base-10 number 10 to the base-3 zeroless system: Though 9 (the third digit slot: 3^2) would fit into 10, this would leave no possible configuration of the final two digits since their minimum values are 1*3^1 = 3 and 1*3^0 = 1 respectively.
Realistically, my digit alphabet will contain A-Z, so I'm looking for a quick, generalized conversion method that can do this without trial and error or counting up from zero.
Edit
The accepted answer by n.m. is primarily a string-manipulation-based solution.
For a purely mathematical solution see kennytm's links:
What is the algorithm to convert an Excel Column Letter into its Number?
How to convert a column number (eg. 127) into an excel column (eg. AA)
Convert to base-3-with-zeroes first (digits 0AB), and from there, convert to base-3-without-zeroes (ABC), using these string substitutions:
A0 => 0C
B0 => AC
C0 => BC
Each substitution either removes a zero, or pushes one to the left. In the end, discard leading zeroes.
It is also possible, as an optimisation, to process longer strings of zeros at once:
A000...000 = 0BBB...BBC
B000...000 = ABBB...BBC
C000...000 = BBBB...BBC
Generalizable to any base.

Diagonal Matrices in J

I do a lot of work with eigenvalues and hence building / unbuilding diagonal matrices is something I do a lot. In the spirit of J, I've come up with some simple definitions, but wonder if I have missed a simpler way? I couldn't find anything in the phrasebook, but may have been looking in the wrong place.
Make Diagonal matrix from list of diagonal entries:
diag =: * =#i.##
Extract diagonal entries from a matrix:
extract =: +/#(* =#i.##)
Diagonal entries of a matrix have a standar definition in J:
extract =: (<0 1)&|:
This is, unfortunately, hidden somewhere in the vocabulary. (You can see it passing in transpose)
I usually use diag as
diag =: 3 :'(2##y) $ ,_1 (((#y)#0),~])\y'
but I no longer remember why. Your version is better.
(* =) 2 3 4
2 0 0
0 3 0
0 0 4
If you are working with unique elements.
diag=: * = NB. a hook defined tacitly
diag 89 3 56.6
89 0 0
0 3 0
0 0 56.6
The = breaks down if the elements are not unique as the matrix is no longer square
diag 3 4 4
|length error: diag
| diag 3 4 4
Another solution involves using "copy-fill".
diag =: (2 ##) $ (#~ 1 j. #)
This is longer than OP's original formulation, but it works for both numbers and characters (as long as you want spaces to play the role of zero).
Short explanation (primarily for "future me", as I'm fairly new to J):
Consider the following example (with y =: 1 2 5 7 representing the diagonal entries):
4 4 $ 1j4 # y NB. the required diagonal matrix
The complex number argument 1j4 to the left of # inserts 4 zeros after every copied item from y. Reshaping this into a 4 x 4 matrix gives the diagonal matrix.
The 4's above are nothing but the number of items in y: #y. So we can generalise as (2 # #y) $ (1 j. #y) # y. The tacit equivalent of this is given at the top.

Strategy with regard to how to approach this algorithm?

I was asked this question in a test and I need help with regards to how I should approach the solution, not the actual answer. The question is
You have been given a 7 digit number(with each digit being distinct and 0-9). The number has this property
product of first 3 digits = product of last 3 digits = product of central 3 digits
Identify the middle digit.
Now, I can do this on paper by brute force(trial and error), the product is 72 and digits being
8,1,9,2,4,3,6
Now how do I approach the problem in a no brute force way?
Let the number is: a b c d e f g
So as per the rule(1):
axbxc = cxdxe = exfxg
more over we have(2):
axb = dxe and
cxd = fxg
This question can be solved with factorization and little bit of hit/trial.
Out of the digits from 1 to 9, 5 and 7 can rejected straight-away since these are prime numbers and would not fit in the above two equations.
The digits 1 to 9 can be factored as:
1 = 1, 2 = 2, 3 = 3, 4 = 2X2, 6 = 2X3, 8 = 2X2X2, 9 = 3X3
After factorization we are now left with total 7 - 2's, 4 - 3's and the number 1.
As for rule 2 we are left with only 4 possibilities, these 4 equations can be computed by factorization logic since we know we have overall 7 2's and 4 3's with us.
1: 1X8(2x2x2) = 2X4(2x2)
2: 1X6(3x2) = 3X2
3: 4(2x2)X3 = 6(3x2)X2
4: 9(3x3)X2 = 6(3x2)X3
Skipping 5 and 7 we are left with 7 digits.
With above equations we have 4 digits with us and are left with remaining 3 digits which can be tested through hit and trial. For example, if we consider the first case we have:
1X8 = 2X4 and are left with 3,6,9.
we have axbxc = cxdxe we can opt c with these 3 options in that case the products would be 24, 48 and 72.
24 cant be correct since for last three digits we are left with are 6,9,4(=216)
48 cant be correct since for last three digits we are left with 3,9,4(=108)
72 could be a valid option since the last three digits in that case would be 3,6,4 (=72)
This question is good to solve with Relational Programming. I think it very clearly lets the programmer see what's going on and how the problem is solved. While it may not be the most efficient way to solve problems, it can still bring desired clarity and handle problems up to a certain size. Consider this small example from Oz:
fun {FindDigits}
D1 = {Digit}
D2 = {Digit}
D3 = {Digit}
D4 = {Digit}
D5 = {Digit}
D6 = {Digit}
D7 = {Digit}
L = [D1 D2 D3] M = [D3 D4 D5] E= [D5 D6 D7] TotL in
TotL = [D1 D2 D3 D4 D5 D6 D7]
{Unique TotL} = true
{ProductList L} = {ProductList M} = {ProductList E}
TotL
end
(Now this would be possible to parameterize furthermore, but non-optimized to illustrate the point).
Here you first pick 7 digits with a function Digit/0. Then you create three lists, L, M and E consisting of the segments, as well as a total list to return (you could also return the concatenation, but I found this better for illustration).
Then comes the point, you specify relations that have to be intact. First, that the TotL is unique (distinct in your tasks wording). Then the next one, that the segment products have to be equal.
What now happens is that a search is conducted for your answers. This is a depth-first search strategy, but could also be breadth-first, and a solver is called to bring out all solutions. The search strategy is found inside the SolveAll/1 function.
{Browse {SolveAll FindDigits}}
Which in turns returns this list of answers:
[[1 8 9 2 4 3 6] [1 8 9 2 4 6 3] [3 6 4 2 9 1 8]
[3 6 4 2 9 8 1] [6 3 4 2 9 1 8] [6 3 4 2 9 8 1]
[8 1 9 2 4 3 6] [8 1 9 2 4 6 3]]
At least this way forward is not using brute force. Essentially you are searching for answers here. There might be heuristics that let you find the correct answer sooner (some mathematical magic, perhaps), or you can use genetic algorithms to search the space or other well-known strategies.
Prime factor of distinct digit (if possible)
0 = 0
1 = 1
2 = 2
3 = 3
4 = 2 x 2
5 = 5
6 = 2 x 3
7 = 7
8 = 2 x 2 x 2
9 = 3 x 3
In total:
7 2's + 4 3's + 1 5's + 1 7's
With the fact that When A=B=C, composition of prime factor of A must be same as composition of prime factor of B and that of C, 0 , 5 and 7 are excluded since they have unique prime factor that can never match with the fact.
Hence, 7 2's + 4 3's are left and we have 7 digit (1,2,3,4,6,8,9). As there are 7 digits only, the number is formed by these digits only.
Recall the fact, A, B and C must have same composition of prime factors. This implies that A, B and C have same number of 2's and 3's in their composition. So, we should try to achieve (in total for A and B and C):
9 OR 12 2's AND
6 3's
(Must be product of 3, lower bound is total number of prime factor of all digits, upper bound is lower bound * 2)
Consider point 2 (as it has one possibility), A has 2 3's and same for B and C. To have more number of prime factor in total, we need to put digit in connection digit between two product (third or fifth digit). Extract digits with prime factor 3 into two groups {3,6} and {9} and put digit into connection digit. The only possible way is to put 9 in connection digit and 3,6 on unconnected product. That mean xx9xx36 or 36xx9xx (order of 3,6 is not important)
With this result, we get 9 x middle x connection digit = connection digit x 3 x 6. Thus, middle = (3 x 6) / 9 = 2
My answer actually extends #Ansh's answer.
Let abcdefg be the digits of the number. Then
ab=de
cd=fg
From these relations we can exclude 0, 5 and 7 because there are no other multipliers of these numbers between 0 and 9. So we are left with seven numbers and each number is included once in each answer. We are going to examine how we can pair the numbers (ab, de, cd, fg).
What happens with 9? It can't be combined with 3 or 6 since then their product will have three times the factor 3 and we have at total 4 factors of 3. Similarly, 3 and 6 must be combined at least one time together in response to the two factors of 9. This gives a product of 18 and so 9 must be combined at least once with 2.
Now if 9x2 is in a corner then 3x6 must be in the middle. Meaning in the other corner there must be another multiplier of 3. So 9 and 2 are in the middle.
Let's suppose ab=3x6 (The other case is symmetric). Then d must be 9 or 2. But if d is 9 then f or g must be multiplier of 3. So d is 2 and e is 9. We can stop here and answer the middle digit is
2
Now we have 2c = fg and the remaining choices are 1, 4, 8. We see that the only solutions are c = 4, f = 1, g = 8 and c = 4, f = 8, g = 1.
So if is 3x6 is in the left corner we have the following solutions:
3642918, 3642981, 6342918, 6342981
If 3x6 is in the right corner we have the following solutions which are the reverse of the above:
8192463, 1892463, 8192436, 1892436
Here is how you can consider the problem:
Let's note the final solution N1 N2 N3 N4 N5 N6 N7 for the 3 numbers N1N2N3, N3N4N5 and N5N6N7
0, 5 and 7 are to exclude because they are prime and no other ciphers is a multiple of them. So if they had divided one of the 3 numbers, no other number could have divided the others.
So we get the 7 remaining ciphers : 1234689
where the product of the ciphers is 2^7*3^4
(N1*N2*N3) and (N5*N6*N7) are equals so their product is a square number. We can then remove, one of the number (N4) from the product of the previous point to find a square number (i.e. even exponents on both numbers)
N4 can't be 1, 3, 4, 6, 9.
We conclude N4 is 2 or 8
If N4 is 8 and it divides (N3*N4*N5), we can't use the remaining even numbers (2, 4, 6) to divides
both (N1*N2*N3) and (N6*N7*N8) by 8. So N4 is 2 and 8 does not belong to the second group (let's put it in N1).
Now, we have: 1st grp: 8XX, 2nd group: X2X 3rd group: XXX
Note: at this point we know that the product is 72 because it is 2^3*3^2 (the square root of 2^6*3^4) but the result is not really important. We have made the difficult part knowing the 7 numbers and the middle position.
Then, we know that we have to distribute 2^3 on (N1*N2*N3), (N3*N4*N5), (N5*N6*N7) because 2^3*2*2^3=2^7
We already gave 8 to N1, 2 to N4 and we place 6 to N6, and 4 to N5 position, resulting in each of the 3 numbers being a multiple of 8.
Now, we have: 1st grp: 8XX, 2nd group: X24 3rd group: 46X
We have the same way of thinking considering the odd number, we distribute 3^2, on each part knowing that we already have a 6 in the last group.
Last group will then get the 3. And first and second ones the 9.
Now, we have: 1st grp: 8X9, 2nd group: 924 3rd group: 463
And, then 1 at N2, which is the remaining position.
This problem is pretty easy if you look at the number 72 more carefully.
We have our number with this form abcdefg
and abc = cde = efg, with those digits 8,1,9,2,4,3,6
So, first, we can conclude that 8,1,9 must be one of the triple, because, there is no way 1 can go with other two numbers to form 72.
We can also conclude that 1 must be in the start/end of the whole number or middle of the triple.
So now we have 819defg or 918defg ...
Using some calculations with the rest of those digits, we can see that only 819defg is possible, because, we need 72/9 = 8,so only 2,4 is valid, while we cannot create 72/8 = 9 from those 2,4,3,6 digits, so -> 81924fg or 81942fg and 819 must be the triple that start or end our number.
So the rest of the job is easy, we need either 72/4 = 18 or 72/2 = 36, now, we can have our answers: 8192436 or 8192463.
7 digits: 8,1,9,2,4,3,6
say XxYxZ = 72
1) pick any two from above 7 digits. say X,Y
2) divide 72 by X and then Y.. you will get the 3rd number i.e Z.
we found XYZ set of 3-digits which gives result 72.
now repeat 1) and 2) with remaining 4 digits.
this time we found ABC which multiplies to 72.
lets say, 7th digit left out is I.
3) divide 72 by I. result R
4) divide R by one of XYZ. check if result is in ABC.
if No, repeat the step 3)
if yes, found the third pair.(assume you divided R by Y and the result is B)
YIB is the third pair.
so... solution will be.
XZYIBAC
You have your 7 numbers - instead of looking at it in groups of 3 divide up the number as such:
AB | C | D | E | FG
Get the value of AB and use it to get the value of C like so: C = ABC/AB
Next you want to do the same thing with the trailing 2 digits to find E using FG. E = EFG/FG
Now that you have C & E you can solve for D
Since CDE = ABC then D = ABC/CE
Remember your formulas - instead of looking at numbers create a formula aka an algorithm that you know will work every time.
ABC = CDE = EFG However, you have to remember that your = signs have to balance. You can see that D = ABC/CE = EFG/CE Once you know that, you can figure out what you need in order to solve the problem.
Made a quick example in a fiddle of the code:
http://jsfiddle.net/4ykxx9ve/1/
var findMidNum = function() {
var num = [8, 1, 9, 2, 4, 3, 6];
var ab = num[0] * num[1];
var fg = num[5] * num[6];
var abc = num[0] * num[1] * num[2];
var cde = num[2] * num[3] * num[4];
var efg = num[4] * num[5] * num[6];
var c = abc/ab;
var e = efg/fg;
var ce = c * e
var d = abc/ce;
console.log(d); //2
}();
You have been given a 7 digit number(with each digit being distinct and 0-9). The number has this property
product of first 3 digits = product of last 3 digits = product of central 3 digits
Identify the middle digit.
Now, I can do this on paper by brute force(trial and error), the product is 72 and digits being
8,1,9,2,4,3,6
Now how do I approach the problem in a no brute force way?
use linq and substring functions
example var item = array.Skip(3).Take(3) in such a way that you have a loop
for(f =0;f<charlen.length;f++){
var xItemSum = charlen[f].Skip(f).Take(f).Sum(f => f.Value);
}
// untested code

What kind of algorithm is used to generate a square matrix?

I need generate a matrix and fill with numbers and inactive cells, but that the sum of each columns or rows are equal. I know the magic box and sudoku, but is different. Can you help me please? What kind algorithm I need use for generate this matrix?
E.g
X = 0 = block inactive
Matrix ( 4x4 )
0 8 4 X | 12
2 0 8 2 | 12
10 1 X 1 | 12
0 3 X 9 | 12
____________|
12 12 12 12
Other example:
Matrix ( 5x5 )
0 2 2 3 5 | 12
2 4 0 5 1 | 12
8 2 0 2 0 | 12
0 4 2 0 6 | 12
2 0 8 2 0 | 12
______________|
12 12 12 12 12
The result can be any other number, it is not always 12. Just as in Example I was easier to do for me. It's not be symmetrical.
Note: This is not magic box, also is not sudoku.
Conclusion:
1) I need build this box and fill with number and block inactive.
2) Always matrix is square(3x3, 4x4, 5x5, NxN, ...)
3) When I fill of space is not block, I can use number one, two or three digits.
4) The sum of all sides must be equal.
5) In the above example, X is block. Block mean not use for player.
6) you can inactive block can be 0, however does not affect the sum.
7) There is also no restriction on how many blocks or inactive will have no
8) To fill cells with numbers, this can be repeated if you want. There is not restriction.
9) The matrix is ​​always a square and may be of different dimensions. (2)
Thanks guys for your help. And sorry that the problem is incomplete and for my english is too bad, but that's all.
In terms of agorithms, I would approach it as a system of linear equations. You can put the box as a matrix of variables:
x11 x12 x13 x14
x21 x22 x23 x24
x31 x32 x33 x34
x41 x42 x43 x44
Then you would make the equations as:
row1 = row2 (x11 + x12 + x13 + x14 = x21 + x22 + x23 + x24)
row1 = row3 (...)
row1 = row4
row1 = col1
row1 = col2
row1 = col3
row1 = col4
For N = 4, you would have 16 variables and 7 equations, so you would have a solution with a number of degrees of freedom (at least 9, as pointed out by #JamesMcLeod, and exactly 9, as stated by #Chris), so you could generate every possible matrix satisfying the restrictions just giving values to every free parameter. In the resulting matrix, you could mark every cell with 0 as an inactive cell.
To do this however you would need a library or software package with the ability to solve systems of linear equations with degrees of freedom (several math software packages can do this, but right now only Maple comes to my mind).
PD: I've just read that numbers must have one, two or three digits (and be positive, too?). To address this, you could just "take care" when choosing the values for the free parameters once the system of equations is solved, or you could add inequalities to the problem like:
x11 < 1000
x11 >= 0 (if values must be positive)
x12 < 1000
(...)
But then it would be a linear programming problem. You may approach it like this too.
PD2: You can also make simple cases with diagonal matrices:
7 X X X
X 7 X X
X X 7 X
X X X 7
But I guess you already knew that...
Edit: Thanks James McLeod and Chris for your corrections.
do you fill the matrix with random numbers? You need a function that has an argument as 1 dimension vector which will verify if the sum of the row's elements is 12, then you can still use this function for columns(with a loop) into your main.

How can I take the modulus of two very large numbers?

I need an algorithm for A mod B with
A is a very big integer and it contains digit 1 only (ex: 1111, 1111111111111111)
B is a very big integer (ex: 1231, 1231231823127312918923)
Big, I mean 1000 digits.
To compute a number mod n, given a function to get quotient and remainder when dividing by (n+1), start by adding one to the number. Then, as long as the number is bigger than 'n', iterate:number = (number div (n+1)) + (number mod (n+1))Finally at the end, subtract one. An alternative to adding one at the beginning and subtracting one at the end is checking whether the result equals n and returning zero if so.
For example, given a function to divide by ten, one can compute 12345678 mod 9 thusly:
12345679 -> 1234567 + 9
1234576 -> 123457 + 6
123463 -> 12346 + 3
12349 -> 1234 + 9
1243 -> 124 + 3
127 -> 12 + 7
19 -> 1 + 9
10 -> 1
Subtract 1, and the result is zero.
1000 digits isn't really big, use any big integer library to get rather fast results.
If you really worry about performance, A can be written as 1111...1=(10n-1)/9 for some n, so computing A mod B can be reduced to computing ((10^n-1) mod (9*B)) / 9, and you can do that faster.
Try Montgomery reduction on how to find modulo on large numbers - http://en.wikipedia.org/wiki/Montgomery_reduction
1) Just find a language or package that does arbitrary precision arithmetic - in my case I'd try java.math.BigDecimal.
2) If you are doing this yourself, you can avoid having to do division by using doubling and subtraction. E.g. 10 mod 3 = 10 - 3 - 3 - 3 = 1 (repeatedly subtracting 3 until you can't any more) - which is incredibly slow, so double 3 until it is just smaller than 10 (e.g. to 6), subtract to leave 4, and repeat.

Resources