Related
Given a pair of numbers (A, B).
You can perform an operation (A + B, B) or (A, A + B).
(A, B) is initialized to (1, 1).
For any N > 0, find the minimum number of operations you need to perform on (A, B) until A = N or B = N
Came across this question in an interview summary on glassdoor. Thought through a couple approaches, searched online but couldn't find any articles/answers solving this question. I have a brute force method shown below, however it must traverse O(2^N) paths, wondering if there is an elegant solution I am not seeing.
def pairsum(N):
A = 1
B = 1
return helper(N, A, B, 0)
def helper(N, A, B, ops):
# Solution found
if A == N or B == N:
return ops
# We've gone over, invalid path taken
if A > N or B > N:
return float("inf")
return min(helper(N, A + B, B, ops + 1), helper(N, A, A + B, ops + 1))
Given a target number N, it's possible to compute the minimum number of operations in approximately O(N log(N)) basic arithmetic operations (though I suspect there are faster ways). Here's how:
For this problem, I think it's easier to work backwards than forwards. Suppose that we're trying to reach a target pair (a, b) of positive integers. We start with (a, b) and work backwards towards (1, 1), counting steps as we go. The reason that this is easy is that there's only ever a single path from a pair (a, b) back to (1, 1): if a > b, then the pair (a, b) can't be the result of the second operation, so the only way we can possibly reach this pair is by applying the first operation to (a - b, b). Similarly, if a < b, we can only have reached the pair via the second operation applied to (a, b - a). What about the case a = b? Well, if a = b = 1, there's nothing to do. If a = b and a > 1, then there's no way we can reach the pair at all: note that both operations take coprime pairs of integers to coprime pairs of integers, so if we start with (1, 1), we can never reach a pair of integers that has a greatest common divisor bigger than 1.
This leads to the following code to count the number of steps to get from (1, 1) to (a, b), for any pair of positive integers a and b:
def steps_to_reach(a, b):
"""
Given a pair of positive integers, return the number of steps required
to reach that pair from (1, 1), or None if no path exists.
"""
steps = 0
while True:
if a > b:
a -= b
elif b > a:
b -= a
elif a == 1: # must also have b == 1 here
break
else:
return None # no path, gcd(a, b) > 1
steps += 1
return steps
Looking at the code above, it bears a strong resemblance to the Euclidean algorithm for computing greatest common divisors, except that we're doing things very inefficiently, by using repeated subtractions instead of going directly to the remainder with a Euclidean division step. So it's possible to replace the above with the following equivalent, simpler, faster version:
def steps_to_reach_fast(a, b):
"""
Given a pair of positive integers, return the number of steps required
to reach that pair from (1, 1), or None if no path exists.
Faster version of steps_to_reach.
"""
steps = -1
while b:
a, (q, b) = b, divmod(a, b)
steps += q
return None if a > 1 else steps
I leave it to you to check that the two pieces of code are equivalent: it's not hard to prove, but if you don't feel like getting out pen and paper then a quick check at the prompt should be convincing:
>>> all(steps_to_reach(a, b) == steps_to_reach_fast(a, b) for a in range(1, 1001) for b in range(1, 1001))
True
The call steps_to_reach_fast(a, b) needs O(log(max(a, b))) arithmetic operations. (This follows from standard analysis of the Euclidean algorithm.)
Now it's straightfoward to find the minimum number of operations for a given n:
def min_steps_to_reach(n):
"""
Find the minimum number of steps to reach a pair (*, n) or (n, *).
"""
# Count steps in all paths to (n, a). By symmetry, no need to
# check (a, n) too.
all_steps = (steps_to_reach_fast(n, a) for a in range(1, n+1))
return min(steps for steps in all_steps if steps is not None)
This function runs reasonably quickly up to n = 1000000 or so. Let's print out the first few values:
>>> min_steps_to_reach(10**6) # takes ~1 second on my laptop
30
>>> [min_steps_to_reach(n) for n in range(1, 50)]
[0, 1, 2, 3, 3, 5, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 6, 7, 7, 7, 7, 7, 7, 8, 7, 7, 7, 8, 8, 7, 8, 8, 8, 9, 8, 8, 8, 9, 8, 8, 8, 8, 8, 9, 8]
A search at the Online Encyclopedia of Integer Sequences quickly yields the sequence A178047, which matches our sequence perfectly. The sequence is described as follows:
Consider the Farey tree A006842/A006843; a(n) = row at which the
denominator n first appears (assumes first row is labeled row 0).
And indeed, if you look at the tree generated by your two operations, starting at (1, 1), and you regard each pair as a fraction, you get something that's very similar to the Stern-Brocot tree (another name for the Farey tree): the contents of each row are the same, but the ordering within each row is different. As it turns out, it's the Stern-Brocot tree in disguise!
This observation gives us an easily computable lower-bound on min_steps_to_reach: it's easy to show that the largest integer appearing as either a numerator or denominator in the ith row of the Stern-Brocot tree is the i+2nd Fibonacci number. So if n > Fib(i+2), then min_steps_to_reach(n) > i (and if n == Fib(i+2), then min_steps_to_reach(n) is exactly i). Getting an upper bound (or an exact value without an exhaustive search) seems to be a bit harder. Here are the worst cases: for each integer s >= 0, the smallest n requiring s steps (so for example, 506 is the first number requiring 15 steps):
[1, 2, 3, 4, 7, 6, 14, 20, 28, 38, 54, 90, 150, 216, 350, 506, 876, 1230, 2034, 3160, 4470, 7764]
If there's a pattern here, I'm not spotting it (but it's essentially sequence A135510 on OEIS).
[I wrote this before I realized #mark-dickinson had answered; his answer is much better than mine, but I'm providing mine for reference anyway]
The problem is fairly easy to solve if you work backwards. As an example, suppose N=65:
That means our current pair is either {65, x} or {y, 65} for some unknown values of x and y.
If {A,B} was the previous pair, this means either {A, A+B} or {A+B, B} is equal to either {65, x} or {y, 65}, which gives us 4 possible cases:
{A,A+B} = {65,x}, which would mean A=65. However, if A=65, we would've already hit A=N at an earlier step, and we're assuming this is the first step at which A=N or B=N, so we discard this possibility.
{A,A+B} = {y,65} which means A+B=65
{A+B,B} = {65,x} which means A+B=65
{A+B,B} = {y,65} which means B=65. However, if B=65, we already had a solution at a previous step, we also discard this possibility.
Therefore, A+B=65. There are 65 ways in which this can happen (actually, I think you can ignore the cases where A=0 or B=0, and also choose B>A by symmetry, but the solution is easy even withouth these assumptions).
We now examine all 65 cases. As an example, let's use A=25 and B=40.
If {C,D} was the pair that generated {25,40}, there are two possible cases:
{C+D,D} = {25,40} so D=40 and C=-15, which is impossible, since, starting at {1,1}, we will never get negative numbers.
{C,C+D} = {25,40} so C=25, and D=15.
Therefore, the "predecessor" of {25,40} is necessarily {25,15}.
By similar analysis, the predecessor of {25,15}, let's call it {E,F}, must have the property that either:
{E,E+F} = {25,15}, impossible since this would mean F=-10
{E+F,F} = {25,15} meaning E=10 and F=15.
Similarly the predecessor of {10,15} is {10,5}, whose predecessor is {5,5}.
The predecessor of {5,5} is either {0,5} or {5,0}. These two pairs are their own predecessors, but have no other predecessors.
Since we never hit {1,1} in this sequence, we know that {1,1} will never generate {25, 40}, so we continue computing for other pairs {A,B} such that A+B=65.
If we did hit {1,1}, we'd count the number of steps it took to get there, store the value, compute it for all other values of {A,B} such that A+B=65, and take the minimum.
Note that once we've chosen a value of A (and thus a value of B), we are effectively doing the subtraction version of Euclid's Algorithm, so the number of steps required is O(log(N)). Since you are doing these steps N times, the algorithm is O(N*log(N)), much smaller than your O(2^N).
Of course, you may be able to find shortcuts to make the method even faster.
Interesting Notes
If you start with {1,1}, here are the pairs you can generate in k steps (we use k=0 for {1,1} itself), after removing duplicates:
k=0: {1,1}
k=1: {2, 1}, {1, 2}
k=2: {3, 1}, {2, 3}, {3, 2}, {1, 3}
k=3: {4, 1}, {3, 4}, {5, 3}, {2, 5}, {5, 2}, {3, 5}, {4, 3}, {1, 4}
k=4: {5, 1}, {4, 5}, {7, 4}, {3, 7}, {8, 3}, {5, 8}, {7, 5}, {2, 7}, {7, 2}, {5, 7}, {8, 5}, {3, 8}, {7, 3}, {4, 7}, {5, 4}, {1, 5}
k=5: {6, 1}, {5, 6}, {9, 5}, {4, 9}, {11, 4}, {7, 11}, {10, 7}, {3, 10}, {11, 3}, {8, 11}, {13, 8}, {5, 13}, {12, 5}, {7, 12}, {9, 7}, {2, 9}, {9, 2}, {7, 9}, {12, 7}, {5, 12}, {13, 5}, {8, 13}, {11, 8}, {3, 11}, {10, 3}, {7, 10}, {11, 7}, {4, 11}, {9, 4}, {5, 9}, {6, 5}, {1, 6}
Things to note:
You can generate N=7 and N=8 in 4 steps, but not N=6, which requires 5 steps.
The number of pairs generated is 2^k
The smallest number of steps (k) required to reach a given N is:
N=1: k=0
N=2: k=1
N=3: k=2
N=4: k=3
N=5: k=3
N=6: k=5
N=7: k=4
N=8: k=4
N=9: k=5
N=10: k=5
N=11: k=5
The resulting sequence, {0,1,2,3,3,5,4,4,5,5,5,...} is https://oeis.org/A178047
The highest number generated in k steps is the (k+2)nd Fibonacci number, http://oeis.org/A000045
The number of distinct integers you can reach in k steps is now the (k+1)st element of http://oeis.org/A293160
As an example for k=20:
There are 2^20 or 1048576 pairs when k=20
The highest number in any of the 1048576 pairs above is 17711, the 22nd (20+2) Fibonacci number
However, you can't reach all of the first 17711 integers with these pairs. You can only reach 11552 of them, the 21st (20+1) element of A293160
For details on how I worked this problem out, see https://github.com/barrycarter/bcapps/blob/master/STACK/bc-add-sets.m
I have been trying to solve Maximum clique problem with the algorithm mentioned below and so far not been able to find a case in which it fails.
Algorithm:
For a given graph, each node numbered from 1 to N.
1. Consider a node as permanent node and form a set of nodes such that each node is connected to this permanent node.(the set includes permanent node as well)
2. Now form a subgraph of the original graph such that it contains all the nodes in the set formed and only those edges which are between the nodes present in the set.
3. Find degree of each node.
4. If all the nodes have same degree then we have a clique.
5. Else delete the least degree node from this subgraph and repeat from step 3.
6. Repeat step 1-5 for all the nodes in the graph.
Can anyone point out flaw in this algorithm?
Here is my code http://pastebin.com/tN149P9m.
Here's a family of counterexamples. Start with a k-clique. For each node in this clique, connect it to each node of a fresh copy of K_{k-1,k-1}, i.e., the complete bipartite graph on k-1 plus k-1 nodes. For every permanent node in the clique, the residual graph is its copy of K_{k-1,k-1} and the clique. The nodes in K_{k-1,k-1} have degree k and the other clique nodes have degree k - 1, so the latter get deleted.
Here's a 16-node counterexample, obtained by setting k = 4 and identifying parts of the K_{3,3}s in a ring:
{0: {1, 2, 3, 4, 5, 6, 7, 8, 9},
1: {0, 2, 3, 7, 8, 9, 10, 11, 12},
2: {0, 1, 3, 10, 11, 12, 13, 14, 15},
3: {0, 1, 2, 4, 5, 6, 13, 14, 15},
4: {0, 3, 7, 8, 9, 13, 14, 15},
5: {0, 3, 7, 8, 9, 13, 14, 15},
6: {0, 3, 7, 8, 9, 13, 14, 15},
7: {0, 1, 4, 5, 6, 10, 11, 12},
8: {0, 1, 4, 5, 6, 10, 11, 12},
9: {0, 1, 4, 5, 6, 10, 11, 12},
10: {1, 2, 7, 8, 9, 13, 14, 15},
11: {1, 2, 7, 8, 9, 13, 14, 15},
12: {1, 2, 7, 8, 9, 13, 14, 15},
13: {2, 3, 4, 5, 6, 10, 11, 12},
14: {2, 3, 4, 5, 6, 10, 11, 12},
15: {2, 3, 4, 5, 6, 10, 11, 12}}
What you propose looks very much like the following sorting algorithm combined with a greedy clique search:
Consider a simple undirected graph G=(V,E)
Initial sorting
Pick the vertex with minimum degree and place it first in the new list L. From the remaining vertices pick the vertex with minimum degree and place it in the second position in L. Repeat the operations until all vertices in V are in L.
Find cliques greedily
Start from the last vertex in L and move in reverse order. For each vertex v in L compute cliques like this:
Add v to the new clique C
Compute the neighbor set of v in L: N(v)
Pick the last vertex in N(v)
v=w; L=L intersection with N(v);
Repeat steps 1 to 4
Actually the proposed initial sorting is called a degeneracy ordering and decomposes G in k-cores (see Batagelj et al. 2002 ) A k-core is a maximal subgraph such that all its vertices have at least degree k. The initial sorting leaves the highest cores (with largest k) at the end. When vertices are picked in reverse order you are picking vertices in the highest cores first(similar to your step 4) and trying to find cliques there. There are a number of other possibilities to find cliques greedily based on k-cores but you can never guarantee an optimum unless you do full enumeration.
The proposed initial sorting is used, for example, when searching for exact maximum clique and has been described in many research papers, such as [Carraghan and Pardalos 90]
Suppose you have a list of subsets S1,...,Sn of the integer range R={1,2,...,N}, and an integer k. Is there an efficient way to find a subset C of R of size k such that C is a subset of a maximal number of the Si?
As an example, let R={1,2,3,4} and k=2
S1={1,2,3}
S2={1,2,3}
S3={1,2,4}
S4={1,3,4}
Then I want to return either C={1,2} or C={1,3} (doesn't matter which).
I think your problem is NP-Hard. Consider the bipartite graph with the left nodes being your sets and the right nodes being the integers {1, ..., N}, with an edge between two nodes if the set contains the integer. Then, finding a common subset of size k, which is a subset of a maximal number of the Si, is equivalent to finding a complete bipartite subgraph K(i, k) with maximal number of edges i*k. If you could do this in polynomial time, then, you could find the complete bipartite subgraph K(i, j) with maximal number of edges i*j in polynomial time, by trying for each fixed k. But this problem in NP-Complete (Complete bipartite graph).
So, unless P=NP, your problem does not have a polynomial time algorithm.
Assuming I understand your question I believe this is straightforward for fairly small sets.
I will use Mathematica code for illustration, but the concept is universal.
I generate 10 random subsets of length 4, from the set {1 .. 8}:
ss = Subsets[Range#8, {4}] ~RandomSample~ 10
{{1, 3, 4, 6}, {2, 6, 7, 8}, {3, 5, 6, 7}, {2, 4, 6, 7}, {1, 4, 5, 8},
{2, 4, 6, 8}, {1, 2, 3, 8}, {1, 6, 7, 8}, {1, 2, 4, 7}, {1, 2, 5, 7}}
I convert these to a binary array of the presence of each number in each subset:
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
Grid[a]
That is ten columns for ten subsets, and eight rows for elements {1 .. 8}.
Now generate all possible target subsets (size 3):
keys = Subsets[Union ## ss, {3}];
Take a "key" and extract those rows from the array and do a BitAnd operation (return 1 iff all columns equal 1), then count the number of ones. For example, for key {1, 6, 8} we have:
a[[{1, 6, 8}]]
After BitAnd:
Do this for each key:
counts = Tr[BitAnd ## a[[#]]] & /# keys;
Then find the position(s) of the maximum element of that list, and extract the corresponding parts of keys:
keys ~Extract~ Position[counts, Max#counts]
{{1, 2, 7}, {2, 4, 6}, {2, 4, 7}, {2, 6, 7}, {2, 6, 8}, {6, 7, 8}}
With adequate memory this process works quickly for a larger set. Starting with 50,000 randomly selected subsets of length 7 from {1 .. 30}:
ss = Subsets[Range#30, {7}] ~RandomSample~ 50000;
The maximum sub-subsets of length 4 are calculated in about nine seconds:
AbsoluteTiming[
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
keys = Subsets[Union ## ss, {4}];
counts = Tr[BitAnd ## a[[#]]] & /# keys;
keys~Extract~Position[counts, Max#counts]
]
{8.8205045, {{2, 3, 4, 20},
{7, 10, 15, 18},
{7, 13, 16, 26},
{11, 21, 26, 28}}}
I should add that Mathematica is a high level language and these operations are on generic objects, therefore if this is done truly at the binary level this should be much faster, and more memory efficient.
I hope I don't misunderstand the problem... Here a solution in SWI-Prolog
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
transpose_pairs(NumToCount, CountToNumSortedR),
reverse(CountToNumSortedR, CountToNumSorted),
length(Subset, K), % list of free vars
prefix(SolutionsK, CountToNumSorted),
pairs_values(SolutionsK, Subset).
test output:
?- solve.
[1,3]
true ;
[7,6,2]
true.
edit: I think that the above solution is wrong, in the sense that what's returned could not be a subset of any of the input: here (a commented) solution without this problem:
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate),
library(ordsets)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
% actually sort by ascending # of occurrences
transpose_pairs(NumToCount, CountToNumSorted),
pairs_values(CountToNumSorted, PreferredRev),
% we need higher values first
reverse(PreferredRev, Preferred),
% empty slots to fill, preferred first
length(SubsetP, K),
select_k(Preferred, SubsetP),
% verify our selection it's an actual subset of any of subsets
sort(SubsetP, Subset),
once((member(S, Subsets), ord_subtract(Subset, S, []))).
select_k(_Subset, []).
select_k(Subset, [E|R]) :-
select(E, Subset, WithoutE),
select_k(WithoutE, R).
test:
?- solve.
[1,3]
true ;
[2,6,7]
true.
I need to obtain a matrix vvT formed by a column vector v. i.e. the column vector v matrix times its transpose.
I found Mathematica doesn't support column vector. Please help.
Does this do what you want?
v = List /# Range#5;
vT = Transpose[v];
vvT = v.vT;
v // MatrixForm
vT // MatrixForm
vvT // MatrixForm
To get {1, 2, 3, 4, 5} into {{1}, {2}, {3}, {4}, {5}} you can use any of:
List /# {1, 2, 3, 4, 5}
{ {1, 2, 3, 4, 5} }\[Transpose]
Partition[{1, 2, 3, 4, 5}, 1]
You may find one of these more convenient than the others. Usually on long lists you will find Partition to be the fastest.
Also, your specific operation can be done in different ways:
x = {1, 2, 3, 4, 5};
Outer[Times, x, x]
Syntactically shortest:
I have a matrix {{2, 1, 2, 2, 1}, {1, 3, 0, 1, 2}, {3, 3, 0, 3, 1}, {1, 1, 2, 1, 1}}, and I want to generate a 3d plot such as there are a total of 4*5=20 bars.
There is a bar of height 2 based at the little square (1, 1) (i.e. the square formed on the x-y plane by the points {{0,0},{0,1},{1,1},{1,0}}),
another bar of height 1 based at the little square (1,2) (i.e. the square formed on the x-y plane by the points {{0,1},{0,2},{1,2},{1,0}}),
...
another bar of height 3 based at the little square (2,2) (i.e. the square formed on the x-y plane by the points {{1,1},{2,1},{2,2},{1,2}})
...
and another bar of height 1 based at the little square (4,5) (i.e. the square formed on the x-y plane by the points {{3,4},{4,4},{4,5},{3,5}})
I cannot find an easy way to do this. Thanks a lot for your help!
What you want is BarChart3D.
Note, this function exists in two incarnations:
There is a BarChart3D in the BarCharts package. This function does what you want out of the box, but is deprecated in Mathematica 7+.
Then there's a BarChart3D in the main namespace (Mathematica 7+ only), which can do what you want as well, but needs to be passed the option ChartStyle -> "Grid" to display the result you want.
Here is some example code for both of these:
Mathematica 6 and prior
<<BarCharts`;
data = {{2, 1, 2, 2, 1}, {1, 3, 0, 1, 2}, {3, 3, 0, 3, 1}, {1, 1, 2, 1, 1}};
BarChart3D[data]
Mathematica 7 and later
data = {{2, 1, 2, 2, 1}, {1, 3, 0, 1, 2}, {3, 3, 0, 3, 1}, {1, 1, 2, 1, 1}};
BarChart3D[data, ChartLayout -> "Grid"]
data = {{2,1,2,2,1}, {1,3,0,1,2}, {3,3,0,3,1}, {1,1,2,1,1}};
BarChart3D[data, ChartLayout -> "Grid", BarSpacing -> 0]
Edit
Updating after wiki-specifying the question :
BarChart3D[data, ChartLayout -> "Grid", BarSpacing -> {0, 0},
LabelingFunction -> (Row[{#1, Reverse[#2 - 1], Reverse[#2]}] &),
AxesLabel -> {"x", "y", "z"}]
Here the both x and y-spacings vanish. Setting the cursor on a given bar you get z{x_min,y_min}{x_max,y_max}, on the top-red i.e. : 2{4,1}{5,2}