I have a question which says
Let a directed graph G model a relation R. If G is represented using adjacency lists, then,
the worst case time complexity for finding whether R is symmetric would be (as a
function of |R| ) _____?
The answer according to me is that for finding if symmetric property is satisfied or not we just need to go to both the vertices and check if they both have each other in each other's adjacency list.
So, worst case would be O(degree(vertex)) but the answer given is O(|R|*|R|) or O(|E|*|E|)
How ?
what do you mean by O(degree(vertex))?
beside consider that EVERY pair in relation must be checked to have a reverse. so for each pair of |R| members of relation, there is going to be a search. the search itself -if not using any tricks- is about visiting |R| pairs , so it does make sense that in worst case it would be of |R|*|R|
Related
Lets say I have N objects and each of them has associated values A and B. This could be represented as a list of tuples like:
[(3,10), (8,4), (0,0), (20,7),...]
where each tuple is an object and the two values are A and B.
What I want to do is select M of these objects (where M < N) such that the sums of A and B in the selected subset is as balanced as possible. M here is a parameter of the problem, I don't want to find the optimal M. I want to be able to say "give me 100 objects, and make them as balanced as possible".
Any idea if there an efficient algorithm which can solve this problem (not necessarily completely optimally)? I think this might be related to bin-packing, but I'm not really sure.
This is a disguised variant of subset-sum. Replace each (A,B) by A-B, and then the absolute value of the sum of all selected A-B values is the "unbalancedness" of the sums. So you really have to select M of those scalars and try to have a sum as close to 0 as possible.
The "variant" bit is because you have to select exactly M items. (I think this is why your mind went to bin-packing rather than subset-sum.) If you have a black-box subset-sum solver you can account for this too: if the maximum single-pair absolute difference is D, replace each (A,B) by (A-B+D) and have the target sum be M*D. (Don't do that if you're doing a dynamic programming approach, of course, since it increases the magnitude of the numbers you're working with.)
Presuming that you're fine with an approximation (and if you're not, you're gonna have a real bad day) I would tend to use Simulated Annealing or Late Acceptance Hill Climbing as a basic approach, starting with a greedy initial solution (iteratively add whichever object results in the minimal difference), and then in each round, considering randomly replacing one object by one not-currently-selected object.
I have a graph and a heuristic table, with a list connections and node values and also the cost (heuristic table).
Graph:
Heuristic table:
They're represented in prolog as follows.
s(a,b,2).
s(a,c,1).
s(b,e,4).
s(b,g,2).
s(c,d,1).
s(c,x,3).
s(x,g,1).
h(a,9)
h(b,3)
h(c,2)
h(d,8)
h(e,4)
h(g,0)
h(x,2)
My query, how do I perform a greedy search using the heuristic values h(a,9) to find the next node at each iteration.
I know how to use DFS to get shortest possible path and store that path using a list. I don't know how to take the heuristic values at each node into account to account for this in a greedy search - I do know that it expands the lowest h value node, to find it's next neighbor.
DFS:
depthfirstsearch(GoalN,Path,[GoalN|Path]) :- goal(GoalN).
depthfirstsearch(Node,Path,Solution) :- s(Node,NextNode,_), \+member(NextNode,Path),
depthfirstsearch(NextNode,[Node|Path],Solution)
As I've search SO and the web (finding notes on how greedy search works) but nothing that explains how it actually works using prolog code. It explains how to do this in java, C++, etc. but I'm not using those languages.
Can anyone put me in the right direction? I read somewhere that findall could be used? But how do I combine the heuristic value to the node, for example Node "B" with a cost of 2, from A to B. Do I substitute the cost of 2, with the heuristic value/cost of 3. Please explain in more detail or direct me to another resource that would be beneficial right now?
I could maybe create a predicate to help find the next node at each iteration (using the heuristic value of course)?
Bear in mind I'm a beginner at prolog and trying out ways but struggling to piece it all together.
Update: This link is where I find out most of the info on searches
I think you need to know what is a heuristic value and how it can be used in your searching algorithm.
In my answer:
n is the node we want to reach
s is the source node
h() is the heuristic function. h(n) is an admissible (?)
value, I prefer to think of it an estimated cost to reach n
from source s. We call a heuristic good only if it does not safely
over-estimate the cost.
w(a,b) is the actual cost to go from a to b
g() is a function giving actual cost to reach node n from s
by summing up w(a,b) where both a and b are nodes in path from
n to s.
Now to answer your question, AFAIK you can use this heuristic in 2 ways:
As you have said, you can lazily ignore w(a,b) values and use
h(b) values to sort the successor nodes (where b is any successor
node) -- This is called best first search algorithm
Another way would be to sort successor nodes based on value h(b) +
g(b) (where b is any successor node) -- This is called A*
search algorithm.
Recommended Reading:
Stuart J Russell and Peter Norvig, ―Artificial Intelligence - A Modern Approach, Third Edition
Ivan Brakto, ―Prolog Programming for Artificial Intelligence, Pearson Education, August 2011.
P.S. findall is the right thing to use in prolog to implement these 2 searches.
I'm sure there is an abundance of information on how to do exactly what I'm after, but it's a matter of not knowing the technical term for it. Basically what I want to create is an adjacency matrix for a directed graph, however rather than simply storing whether or not each vertex pair has a direct adjacency, for every vertex pair in the matrix I want to store if there is ANY path connecting the two (and what those paths are).
This would give me constant time complexity for lookups which is desirable, however what's not immediately clear to me is what the expected optimal time complexity of building this matrix will be.
Also, is there a formal name for such a matrix?
Playing this out in my head, it seems like a dynamic programming problem. If I want to know if A is connected to Z, I should be able to ask each of A's neighbors, B, C and D if they are (in some way) connected to Z, and if so, then I know A is. And if B doesn't have this answer stored, then he would ask the same question of his direct neighbors, and so on. I would memoize the results along the way, so subsequent lookups would be constant.
I haven't spent time to implement this yet, because it feels like ϴ(n^n) to build a complete matrix, so my question is whether or not I'm going about this the right way, and if indeed there is a lower-cost way to build such a matrix?
The transitive closure of a graph (https://en.wikipedia.org/wiki/Transitive_closure#In_graph_theory) can indeed be computed by dynamic programming with a variation of Floyd Warshall algorithm: https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.
Using |V| DFS (or BFS) is more efficient, though.
Using networkx connected components
G = nx.path_graph(4)
G.add_path([10, 11, 12])
d = {}
for group in idx, group in enumerate(nx.connected_components(G)):
for node in group:
d[node] = idx
def connected(node1, node2):
return d[node1]==d[node2]
Generation should be O(N) lookup should be O(1)
This is the question [From CLRS]:
Define the optimization problem LONGEST-PATH-LENGTH as the relation that
associates each instance of an undirected graph and two vertices with the number
of edges in a longest simple path between the two vertices. Define the decision
problem LONGEST-PATH = {: G=(V,E) is an undirected
graph, u,v contained in V, k >= 0 is an integer, and there exists a simple path
from u to v in G consisting of at least k edges}. Show that the optimization problem
LONGEST-PATH-LENGTH can be solved in polynomial time if and only if
LONGEST-PATH is contained in P.
My solution:
Given an algorith A, that can solve G(u,v) in polytime, so we run the A on G(u,v) if it returns 'YES" and k' such that k' is the longest path in G(u,v), now all we have to do it compare if
k =< k'
if then the longest path length is solved. If we recieve "NO" or k>=k', then there exists no solution.
so polytime to run A + constant for comparsion, then to find the longest path length it takes poly time. Also this is only possible since G(u,v) runs in Polytime (in P), thus G(u,v,k) runs also in polytime (in P), therefore since longest path can be reduced to longest-path-length, then longest-path-length is in P.
we can solve it the oposite way, what we do is, run G(u,v,k') for k'=0 to n, every time check if the k==k', is so we solved it.
run time analysis for this:
n*polytime+ n*(constant comparsion)=polytime
Can someone tell me if my answer is reasonable? if not please tell me where i've gone wrong
Also can you give me some advice to how to study algorithms ,and what approch i should take to solve a algorith question (or a graph question)
please and thankyou
Your answer is reasonable but I would try to shore it up a little bit formally (format the cases separately in a clear manner, be more precise about what polynomial time means, that kind of stuff...)
The only thing that I would like to point out is that in your second reduction (showing the decision problem solves the optimization problem) the for k=0 to N solution is not general. Polynomial time is determined in relation to the length of input so in problems where N is a general number (such as weight or something) instead of a number of a count of items from the input (as in this case) you need to use a more advanced binary search to be sure.
A and B are sets of N dimensional vectors (N=10), |B|>=|A| (|A|=10^2, |B|=10^5). Similarity measure sim(a,b) is dot product (required). The task is following: for each vector a in A find vector b in B, such that sum of similarities ss of all pairs is maximal.
My first attempt was greedy algorithm:
find the pair with the highest similarity and remove that pair from A,B
repeat (1) until A is empty
But such greedy algorithm is suboptimal in this case:
a_1=[1, 0]
a_2=[.5, .4]
b_1=[1, 1]
b_2=[.9, 0]
sim(a_1,b_1)=1
sim(a_1,b_2)=.9
sim(a_2,b_1)=.9
sim(a_2, b_2)=.45
Algorithm returns [a_1,b_1] and [a_2, b_2], ss=1.45, but optimal solution yields ss=1.8.
Is there efficient algo to solve this problem? Thanks
This is essentially a matching problem in weighted bipartite graph. Just assume that weight function f is a dot product (|ab|).
I don't think the special structure of your weight function will simplify problem a lot, so you're pretty much down to finding a maximum matching.
You can find some basic algorithms for this problem in this wikipedia article. Although at first glance they don't seem viable for your data (V = 10^5, E = 10^7), I would still research them: some of them might allow you to take advantage of your 'lame' set of vertixes, with one part orders of magnitude smaller than the other.
This article also seems relevant, although doesn't list any algorithms.
Not exactly a solution, but hope it helps.
I second Nikita here, it is an assignment (or matching) problem. I'm not sure this is computationally feasible for your problem, but you could use the Hungarian algorithm, also known as Munkres' assignment algorithm, where the cost of assignment (i,j) is the negative of the dot product of ai and bj. Unless you happen to know how the elements of A and B are formed, I think this is the most efficient known algorithm for your problem.