Determine if a graph contains a triangle? - algorithm

This problem has an easy solution if our target time complexity is O(|V| * |E|) or O(V^3) and the like. However, my professor recently gave us an assignment with the problem statement being:
Let G = (V, E) be a connected undirected graph. Write an algorithm that determines if G contains a triangle in O(|V| + |E|).
At this point, I'm stumped. Wikipedia says:
It is possible to test whether a graph with m edges is triangle-free in time O(m^1.41).
There was no mention of the possibility for a faster algorithm besides one that runs on a Quantum computer. I started resorting to better sources afterwards. A question on Math.SE linked me to this paper that says:
The fastest algorithm known for finding and counting triangles relies on fast matrix product and has an O(n^ω) time complexity, where ω < 2.376 is the fast matrix product exponent.
And that's where I started to realize that maybe, we're being tricked into working on an unsolved problem! That dastardly professor!
However, I'm still a bit skeptical. The paper says "finding and counting". Is that equivalent to the problem I'm trying to solve?
TL;DR: Am I being fooled, or am I overlooking something so trivial?

Well, it turns out, this really isn't doable in O(|V| + |E|). Or at least, we don't know. I read 4 papers to reach this result. I stopped half-way into one of them, because I realized it was more focused on distributed computing than graph theory. One of them even gave probabilistic algorithms to determine triangle-freeness in "almost linear" time. The three relevant papers are:
Finding and counting given length cycles by Alon, Yuster & Zwick.
Testing Triangle-Freeness in General Graphs by Alon, Kaufman, Krivelevich & Ron.
Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs by Latapy
I wrote about 2 pages of LaTeX for the assignment, quoting the papers with proper citations. The relevant statements in the papers are boxed:
In the end, I spoke to my professor and it turns out, it was in fact an unintended dire mistake. He then changed the required complexity to O(|V| * |E|). I don't blame him, he got me to learn more graph theory!

Here's the code for the O(|E|*|V|) version.
When you constrain |V| the bit mask intersect-any operation is effectively O(1) which gets you O(|E|), but that's cheating.
Realistically the complexity is O(|E| * (|V| / C)) where C is some architecture specific constant (i.e: 32, 64, 128).
function hasTriangle(v, e) {
if(v.length > 32) throw Error("|V| too big, we can't pretend bit mask intersection is O(1) if |V| is too big!");
// setup search Array
var search = new Uint32Array(v.length);
// loop through edges O(|E|)
var lastEdge = [-1, -1];
for(var i=0, l=e.length; i < l; i++) {
var edge = e[i];
if(edge[0] == lastEdge[0]) {
search[lastEdge[1]] = search[lastEdge[1]] | (1 << edge[0]);
search[edge[1]] = search[edge[1]] | (1 << edge[0]);
} else {
lastEdge = edge;
}
// bit mask intersection-any O(1), but unfortunately considered O(|V|)
if( (search[edge[0]] & search[edge[1]]) > 0 ) {
return true;
}
}
return false;
}
var V = [0, 1, 2, 3, 4, 5];
var E_no_triangle = [[0, 4], [0, 5], [1, 2], [1, 3], [2, 5]];
var E_triangle = [[0, 1], [0, 2], [0, 3], [1, 4], [2, 1], [2, 3], [4, 5]]; // Triange(0, 2, 3)
console.log(hasTriangle(V, E_no_triangle)); // false
console.log(hasTriangle(V, E_triangle)); // true

Related

Frog jumps across a river with stones

This is different from the classic codility Frog-River-One with leaves falling at different times problem.
Problem statement
There is a part where it got cut off: If the monkey can just jump across river, the function returns 0. If it's impossible to jump across river, then -1.
Some test cases include:
[[-1, 5, -1, 5, -1, 10], 3] -> returns 5
[[1, -1, 0, 2, 3, 5], 3] -> returns 2
[[0, 0, 0, 0, 0, 0], 3] -> returns 0
The image has a problem description. I did this in a brute-force way using recursion, and although I believe it returned correct answers, it probably wasn't good enough because it would yield run time of O(n^D).
Is there a way to solve this problem more efficiently? What am I not seeing? I feel that there might be a DP solution or like a simple math trick... I am attaching my solution for reference.
My recursive solution with explanation
Note that the earliest time you can reach x = i can be expressed by the following recurrence relation:
shortest[i] = if A[i] = -1 then +inf
else max(A[i], min{shortest[j] | i - D <= j < i})
So first there is a simple O(ND) solution using only dynamic programming.
This can actually be reduced to O(N + D) using an efficient algorithm to maintain the mininum of shortest on the sliding window [i-D ... i] (using double-ended queue).

Maximum sum of n intervals in a sequence

I'm doing some programming "kata" which are skill building exercises for programming (and martial arts). I want to learn how to solve for algorithms like these in shorter amounts of time, so I need to develop my knowledge of the patterns. Eventually I want to solve in increasingly efficient time complexities (O(n), O(n^2), etc), but for now I'm fine with figuring out the solution with any efficiency to start.
The problem:
Given arr[10] = [4, 5, 0, 2, 5, 6, 4, 0, 3, 5]
Given various segment lengths, for example one 3-length segment, and two 2-length segments, find the optimal position of (or maximum sum contained by) the segments without overlapping the segments.
For example, solution to this array and these segments is 2, because:
{4 5} 0 2 {5 6 4} 0 {3 5}
What I have tried before posting on stackoverflow.com:
I've read through:
Algorithm to find maximum coverage of non-overlapping sequences. (I.e., the Weighted Interval Scheduling Prob.)
algorithm to find longest non-overlapping sequences
and I've watched MIT opencourseware and read about general steps for solving complex problems with dynamic programming, and completed a dynamic programming tutorial for finding Fibonacci numbers with memoization. I thought I could apply memoization to this problem, but I haven't found a way yet.
The theme of dynamic programming is to break the problem down into sub-problems which can be iterated to find the optimal solution.
What I have come up with (in an OO way) is
foreach (segment) {
- find the greatest sum interval with length of this segment
This produces incorrect results, because not always will the segments fit with this approach. For example:
Given arr[7] = [0, 3, 5, 5, 5, 1, 0] and two 3-length segments,
The first segment will take 5, 5, 5, leaving no room for the second segment. Ideally I should memoize this scenario and try the algorithm again, this time avoiding 5, 5, 5, as a first pick. Is this the right path?
How can I approach this in a "dynamic programming" way?
If you place the first segment, you get two smaller sub-arrays: placing one or both of the two remaining segments into one of these sub-arrays is a sub-problem of just the same form as the original one.
So this suggests a recursion: you place the first segment, then try out the various combinations of assigning remaining segments to sub-arrays, and maximize over those combinations. Then you memoize: the sub-problems all take an array and a list of segment sizes, just like the original problem.
I'm not sure this is the best algorithm but it is the one suggested by a "direct" dynamic programming approach.
EDIT: In more detail:
The arguments to the valuation function should have two parts: one is a pair of numbers which represent the sub-array being analysed (initially [0,6] in this example) and the second is a multi-set of numbers representing the lengths of the segments to be allocated ({3,3} in this example). Then in pseudo-code you do something like this:
valuation( array_ends, the_segments):
if sum of the_segments > array_ends[1] - array_ends[0]:
return -infinity
segment_length = length of chosen segment from the_segments
remaining_segments = the_segments with chosen segment removed
best_option = 0
for segment_placement = array_ends[0] to array_ends[1] - segment_length:
value1 = value of placing the chosen segment at segment_placement
new_array1 = [array_ends[0],segment_placement]
new_array2 = [segment_placement + segment_length,array_ends[1]]
for each partition of remaining segments into seg1 and seg2:
sub_value1 = valuation( new_array1, seg1)
sub_value2 = valuation( new_array2, seg2)
if value1 + sub_value1 + sub_value2 > best_option:
best_option = value1 + sub_value1 + sub_value2
return best_option
This code (modulo off by one errors and typos) calculates the valuation but it calls the valuation function more than once with the same arguments. So the idea of the memoization is to cache those results and avoid re-traversing equivalent parts of the tree. So we can do this just by wrapping the valuation function:
memoized_valuation(args):
if args in memo_dictionary:
return memo_dictionary[args]
else:
result = valuation(args)
memo_dictionary[args] = result
return result
Of course, you need to change the recursive call now to call memoized_valuation.

Importance of order of the operation in backtracking algorithms

Order of operation in each recursive step of a backtracking algorithms are how much important in terms of the efficiency of that particular algorithm?
For Ex.
In the Knight’s Tour problem.
The knight is placed on the first block of an empty board and, moving
according to the rules of chess, must visit each square exactly once.
In each step there are 8 possible (in general) ways to move.
int xMove[8] = { 2, 1, -1, -2, -2, -1, 1, 2 };
int yMove[8] = { 1, 2, 2, 1, -1, -2, -2, -1 };
If I change this order like...
int xmove[8] = { -2, -2, 2, 2, -1, -1, 1, 1};
int ymove[8] = { -1, 1,-1, 1, -2, 2, -2, 2};
Now,
for a n*n board
upto n=6
both the operation order does not affect any visible change in the execution time,
But if it is n >= 7
First operation (movement) order's execution time is much less than the later one.
In such cases, it is not feasible to generate all the O(m!) operation order and test the algorithm. So how do I determine the performance of such algorithms on a specific movement order, or rather how could it be possible to reach one (or a set) of operation orders such that the algorithm that is more efficient in terms of execution time.
This is an interesting problem from a Math/CS perspective. There definitely exists a permutation (or set of permutations) that would be most efficient for a given n . I don't know if there is a permutation that is most efficient among all n. I would guess not. There could be a permutation that is better 'on average' (however you define that) across all n.
If I was tasked to find an efficient permutation I might try doing the following: I would generate a fixed number x of randomly generated move orders. Measure their efficiency. For every one of the randomly generated movesets, randomly create a fixed number of permutations that are near the original. Compute their efficiencies. Now you have many more permutations than you started with. Take top x performing ones and repeat. This will provide some locally maxed algorithms, but I don't know if it leads up to the globally maxed algorithm(s).

Create expression trees from given sets of numbers and operations and find those that evaluate to a target number in Mathematica 8 or above

Given a set of numbers and a set of binary operations,
what is the fastest way to create random expression trees or exhaustively check every possible combination in Mathematica?
What I am trying to solve is given:
numbers={25,50,75,100,3,6} (* each can ONLY be used ONCE *)
operators={Plus,Subtract,Times,Divide} (* each can be used repeatedly *)
target=99
find expression trees that would evaluate to target.
I have two solutions whose performances I give for the case where expression trees contain exactly 4 of the numbers and 3 operators:
random sample & choice: ~25K trees / second
exhaustive scan: 806400 trees in ~2.15 seconds
(timed on a laptop with: Intel(R) Core(TM)2 Duo CPU T9300 # 2.50GHz, 3GB ram, no parallelization used yet but would be most welcome in answers)
My notebooks are a bit messy at the moment. So I would first love to pose the question and hope for original ideas and answers while I clean up my code for sharing.
Largest possible case is where every expression tree uses up all the (6) numbers and 'Length[numbers]-1' (5) operators.
Performance of methods in the largest case is:
random sample & choice: ~21K trees / second
exhaustive scan: 23224320 trees in ~100 seconds
Also I am using Mathematica 8.0.1 so I am more than all ears if there are any ways to do it in OpenCL or using compiled functions wiht CompilationTarget->"C", etc.
OK, this is not elegant or fast, and it's buggy, but it works (sometimes). It uses a monte carlo method, implementing the metropolis algorithm for a weight function that I (arbitrarily) selected just to see if this would work. This was some time ago for a similar problem; I suppose my mathematica skills have improved as it looks ugly now, but I have no time to fix it at the moment.
Execute this (it looks more reasonable when you paste it into a notebook):
ClearAll[swap];
swap[lst_, {p1_, p2_}] :=
ReplacePart[
lst, {p1 \[Rule] lst\[LeftDoubleBracket]p2\[RightDoubleBracket],
p2 \[Rule] lst\[LeftDoubleBracket]p1\[RightDoubleBracket]}]
ClearAll[evalops];
(*first element of opslst is Identity*)
evalops[opslst_, ord_, nums_] :=
Module[{curval}, curval = First#nums;
Do[curval =
opslst\[LeftDoubleBracket]p\[RightDoubleBracket][curval,
nums\[LeftDoubleBracket]ord\[LeftDoubleBracket]p\
\[RightDoubleBracket]\[RightDoubleBracket]], {p, 2, Length#nums}];
curval]
ClearAll[randomizeOrder];
randomizeOrder[ordlst_] :=
swap[ordlst, RandomInteger[{1, Length#ordlst}, 2]]
ClearAll[randomizeOps];
(*never touch the first element*)
randomizeOps[oplst_, allowedOps_] :=
ReplacePart[
oplst, {RandomInteger[{2, Length#oplst}] \[Rule] RandomChoice[ops]}]
ClearAll[takeMCstep];
takeMCstep[goal_, opslst_, ord_, nums_, allowedops_] :=
Module[{curres, newres, newops, neword, p},
curres = evalops[opslst, ord, nums];
newops = randomizeOps[opslst, allowedops];
neword = randomizeOrder[ord];
newres = evalops[newops, neword, nums];
Switch[Abs[newres - goal],
0, {newops,
neword}, _, (p = Abs[curres - goal]/Abs[newres - goal];
If[RandomReal[] < p, {newops, neword}, {opslst, ord}])]]
then to solve your actual problem, do
ops = {Times, Plus, Subtract, Divide}
nums = {25, 50, 75, 100, 3, 6}
ord = Range[Length#nums]
(*the first element is identity to simplify the logic later*)
oplist = {Identity}~Join~RandomChoice[ops, Length#nums - 1]
out = NestList[
takeMCstep[
99, #\[LeftDoubleBracket]1\[RightDoubleBracket], #\
\[LeftDoubleBracket]2\[RightDoubleBracket], nums, ops] &, {oplist,
ord}, 10000]
and then to see that it worked,
ev = Map[evalops[#\[LeftDoubleBracket]1\[RightDoubleBracket], #\
\[LeftDoubleBracket]2\[RightDoubleBracket], nums] &, out];
ev // Last // N
ev // ListPlot[#, PlotMarkers \[Rule] None] &
giving
thus, it obtained the correct order of operators and numbers after around 2000 tries.
As I said, it's ugly, inefficient, and badly programmed as it was a quick-and-dirty adaptation of a quick-and-dirty hack. If you're interested I can clean up and explain the code.
This was a fun question. Here's my full solution:
ExprEval[nums_, ops_] := Fold[
#2[[1]][#1, #2[[2]]] &,
First#nums,
Transpose[{ops, Rest#nums}]]
SymbolicEval[nums_, ops_] := ExprEval[nums, ToString /# ops]
GetExpression[nums_, ops_, target_] := Select[
Tuples[ops, Length#nums - 1],
(target == ExprEval[nums, #]) &]
Usage example:
nums = {-1, 1, 2, 3};
ops = {Plus, Subtract, Times, Divide};
solutions = GetExpression[nums, ops, 3]
ExprEval[nums, #] & /# solutions
SymbolicEval[nums, #] & /# solutions
Outputs:
{{Plus, Times, Plus}, {Plus, Divide, Plus}, {Subtract, Plus,
Plus}, {Times, Plus, Times}, {Divide, Plus, Times}}
{3, 3, 3, 3, 3}
{"Plus"["Times"["Plus"[-1, 1], 2], 3],
"Plus"["Divide"["Plus"[-1, 1], 2], 3],
"Plus"["Plus"["Subtract"[-1, 1], 2], 3],
"Times"["Plus"["Times"[-1, 1], 2], 3],
"Times"["Plus"["Divide"[-1, 1], 2], 3]}
How it works
The ExprEval function takes in the numbers and operations, and applies them using (I think) RPN:
ExprEval[{1, 2, 3}, {Plus, Times}] == (1 + 2) * 3
It does this by continually folding pairs of numbers using the appropriate operation.
Now that I have a way to evaluate an expression tree, I just needed to generate them. Using Tuples, I'm able to generate all the different operators that I would intersperse between the numbers.
Once you get all possible operations, I used Select to pick out the the ones that evaluate to the target number.
Drawbacks
The solution above is really slow. Generating all the possible tuples is exponential in time. If there are k operations and n numbers, it's on the order of O(k^n).
For n = 10, it took 6 seconds to complete on Win 7 x64, Core i7 860, 12 GB RAM. The timings of the runs match the theoretical time complexity almost exactly:
Red line is the theoretical, blue is experimental. The x-axis is size of the nums input and the y-axis is the time in seconds to enumerate all solutions.
The above solution also solves the problem using a functional programming style. It looks pretty, but the thing also sucks up a butt ton of memory since it's storing the full results at nearly every step.
It doesn't even make use of parallelization, and I'm not entirely certain how you would even parallelize the solution I produced.
Some limitations
Mr. Wizard brought to my attention that this code only solves for only particular set of solutions. Given some input such as {a, b, c, d, e, ... } it only permutes the operators in between the numbers. It doesn't permute the ordering of the numbers. If it were to permute the numbers as well, the time complexity would rise up to O(k^n * n!) where k is the number of operators and n is the length of the input number array.
The following will produce the set of solutions for any permutation of the input numbers and operators:
(* generates a lists of the form
{
{number permutation, {{op order 1}, {op order 2}, ... }
}, ...
}*)
GetAllExpressions[nums_, ops_, target_] :=
ParallelMap[{#, GetExpression[#, ops, target]} &,
Tuples[nums, Length#nums]]

For every vertex in a graph, find all vertices within a distance d

In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A breadth-first search on every vertex is also an option, but it is slow.
def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)
Let's say that we have a function verticesWithin(d,x) that finds all vertices within distance d of vertex x.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x) if d >= 1 is the union of vertices(d-1,y[i]) for all i within range, where y=verticesWithin(1,x). If d == 0 then it's simply {x}. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x itself as a member of y.
Also, if the return type of verticesWithin(d,x) is changed from a simple list or set, to a list of d sets representing increasing distance from x, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.
You already mention the option of calculating A^d, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v of zeros and ones, representing a set of vertices. The vector w := A v now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w now has a positive entry for each node that can be accessed from the jth node in at most d steps.
Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}

Resources