Generate all possible dna sequences from a few given sets - ruby

I have been trying to wrap my head around this for a while now but have not been able to come up with a good solution. Here goes:
Given a number of sets:
set1: A, T
set2: C
set3: A, C, G
set4: T
set5: G
I want to generate all possible sequences from a list of sets. In this example the length of the sequence is 5, but it can be any length up to around 20. For position 1 the possible candidates are 'A' and 'T' respectively, for position 2 the only option is 'C' and so on.
The answer for the example above would be:
ACATG, ACCTG, ACGTG, TCATG, TCCTG, TCGTG
I am doing this in ruby and I have the different sets as arrays within a master array:
[[A, T], [C], [A, C, G], [T], [G]]
At first I thought a recursive solution would be best but I was unable figure out how to set it up properly.
My second idea was to create another array of the same size with an index for each set. So 00000 would correspond to the first sequence above 'ACATG' and 10200 would correspond to 'TCGTG'. Beginning with 00000 I would increase the last index by one and modulo it with the length of the set in question (2 for set1, 1 for set2 above) and if the counter wrapped around I would zero it and increase the previous one by one.
But the more I thought about this solution it seemed too complex for this very small problem. There must be a more straight-forward solution that I am missing. Could anyone help me out?
/Nick

The Array class in Ruby 1.8.7 has an Array#product method, which returns the cartesian product of the arrays in question.
irb(main):001:0> ['A', 'T'].product(['C'], ['A', 'C', 'G'], ['T'], ['G'])
=> [["A", "C", "A", "T", "G"], ["A", "C", "C", "T", "G"], ["A", "C", "G", "T", "G"], ["T", "C", "A", "T", "G"], ["T", "C", "C", "T", "G"], ["T", "C", "G", "T", "G"]]

Related

Can preorder, postorder and in order traversals be beneficial to a real time application?

Sequences ["A", "B", "C", "D", "E", "F"] (preorder)
and ["B", "A", "E", "D", "F", "C"] (inorder)
What can relate to a real-life example application?
Postorder: Used to delete a tree.
Inorder: Used to print data in ordered (ascending) fashion.
Preorder: Used to create a copy of the tree.

Ruby sort subarray in place

If I have an array in Ruby, foo, how can a sort foo[i..j] in-place?
I tried calling foo[i..j].sort! but it didn't sort the original array, just returned a sorted part of it.
If you want to sort part of an array you need to reinject the sorted parts. The in-place modifier won't help you here because foo[i..j] returns a copy. You're sorting the copy in place, which really doesn't mean anything to the original array.
So instead, replace the original slice with a sorted version of same:
test = %w[ z b f d c h k z ]
test[2..6] = test[2..6].sort
# => ["c", "d", "f", "h", "k"]
test
# => ["a", "b", "c", "d", "f", "h", "k", "q"]

How do I create every combination of single elements selected from multiple arrays?

I have 5 arrays:
["A", "B", "C"]
["A", "B", "C", "D", "E"]
["A"]
["A", "B", "C", "D", "E", "F"]
["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"]
I would like to create a list of each combination as such:
["AAAAA","AAAAB","AAAAC", "AAAAD"...
"BAAAA","BAAAB","BAAAC", "BAAAD"...]
a = [
["A", "B", "C"],
["A", "B", "C", "D", "E"],
["A"],
["A", "B", "C", "D", "E", "F"],
["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"]
]
a.inject(&:product).map(&:join)
# => ["AAAAA", "AAAAB", "AAAAC", ..., "CEAFM", "CEAFN", "CEAFO"]
Thanks to bluexuemei for the improved answer. The original solution was a.shift.product(*a).map(&:join).
A More Traditional Solution
With such a convenient library, these ruby one-liners seem almost like cheating.
Here is a more traditional way to solve this common problem that can be readily coded into other programming languages:
N = a.reduce(1) { |product,list| product * list.size } # 1350
combinations = []
0.upto(N-1) do |q|
combo = []
a.reverse.each do |list|
q, r = q.divmod list.size
combo << list[r]
end
combinations.push combo.reverse.join
end
combinations
# => ["AAAAA", "AAAAB", "AAAAC", ..., "CEAFM", "CEAFN", "CEAFO"]
The basic idea is to first calculate the total number of combinations N which is just the product of the length of all the lists. Each integer from 0 to N-1 then encodes all the information needed to provide unique indices into each list to produce each combination. One way to think of it is that the index variable q can be expressed as a 5-digit number, where each digit is in a different base, where the base is the size of the corresponding list. That is, the first digit is base-3, the second digit is base-5, the 3rd is base-1 (always 0), the 4th is base-6, and the 5th is base-15. To extract these values from q, this is just taking a series of repeated integer divisions and remainders, as done in the inner loop. Naturally this requires some homework, perhaps looking at simpler examples, to fully digest.
a.reduce(&:product).map(&:join).size

Confusion in String Manipulation

InputForm[{a, b, c, d, e, f}] gives {a, b, c, d, e, f}
InputForm[Characters["SOMETHING"]] gives {"S", "O", "M", "E", "T", "H", "I", "N", "G"}
But why does not Drop[InputForm[Characters["SOMETHING"]],1] give {"O", "M", "E", "T", "H", "I", "N", "G"}
but gives a InputForm[] and nothing else?
How can I achieve this?
Thank You
When you evaluate
InputForm[Characters["SOMETHING"]]
Mathematica internally produces the result
InputForm[List["S","O","M","E","T","H","I","N","G"]]
i.e. it's an expression with InputForm as a head, which contains ListList["S","O","M","E","T","H","I","N","G"] as its first subexpression. You don't see the InputForm head when Mathematica displays the expression, because the front end only uses it as a hint as to how the expression should be shown, but it's still there behind the scenes.
Then when you use Drop[..., 1], it looks at the expression it's given, picks out the first subexpression, which is List["S","O","M","E","T","H","I","N","G"], and discards it. That leaves just InputForm[].
To make an analogy: if you evaluated
Drop[List[List["S","O","M","E","T","H","I","N","G"]], 1]
you would understand why you'd get an empty list back, right? It's the same thing going on.

Algorithm for matching all items with another item in same list, where some have restrictions

Given array [a, b, c, d, e, f]
I want to match each letter with any other letter except itself, resulting in something like:
a - c
b - f
d - e
The catch is that each letter may be restricted to being matched with one or more of the other letters.
So let's say for example,
a cannot be matched with c, d
c cannot be matched with e, f
e cannot be matched with a
Any guidance on how to go about this? I'm using Ruby, but any pseudo code would be helpful.
Thanks!
The problem you are describing is a graph problem called maximum matching (or more specifically perfect matching). The restrictions correspond to vertexes in the graph that do not have a line between them.
You can use Edmond's matching algorithm.
Let's assume for now that a solution exists. It may not.
Pick one of your elements, and try to match it.
If it breaks one of your rules, try again until you do.
Choose another element, and try to match that. If you run through all other elements and break a rule each time, then go back, unmatch your previous match, and try another one.
Continue until all of your elements are used up.
If you don't know whether a solution exists or not, then you'll need to keep track of your attempts and figure out when you've tried them all. Or, use some checking at the beginning to see if there are any obvious contradictions in your rule set.
I'm not sure I understand the problem, but this seems to fit the question:
%w[a b c d e f].combination(2).to_a - [%w[a c],%w[a d],%w[c e],%w[c f],%w[e a]]
# => [["a", "b"], ["a", "e"], ["a", "f"], ["b", "c"], ["b", "d"], ["b", "e"], ["b", "f"], ["c", "d"], ["d", "e"], ["d", "f"], ["e", "f"]]
$letters = array('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j');
$exclusions = array('a' => array('e', 'd', 'c'), 'b' => array('a','b', 'c','d'));
foreach ($letters as $matching) {
foreach ($letters as $search) {
if(!in_array($search,$exclusions[$matching])){
if($search!=$matching){
$match[$matching][] = $search;
}
}
}
}
print_r($match);
The innermost EVAL could be added to the next outer one...
you can see this in action at
http://craigslist.fatherstorm.com/stackoverflow2.php

Resources