Would this scenario classify as "not very dense graph"? - data-structures

Based on these infos : 181 047 vertices and 907 601 edges does this classify as not very dense ? In french we say "not very dense" but i am not sure in english.
Let me explain you why i am asking you this.
I currently have a shortest path algorithm implemented with Dijkstra algorithm.
I have to optimize it, i found that Dijkstra with min heap beats normal Dijkstra by far. ( At least for my problem since i tried it.)
But one of the required property to implement Dijkstra algorithm with min heap is that the graph must be not very dense but i am confuse with what that means, i've seen a lot of definitions but nothing answers my question.
I also seen this definition somewhere : you can use dijkstra min heap If your E is sufficiently smaller compared to V (as in E << V² / logV) So by this definition it would mean that i can't use Dijkstra with min heap but i did it and have a way faster algorithm so i am confused.
But E << V² doesn't tell me anything. ( what does that mean)
So i would like if someone could tell me based on the number of edges and vertices, if the graph qualify as "not so dense" if not is it dense or parse and why ?
Also i would like to know why we can use dijkstra min heap based on the number of edges and vertices.
Thanks a lot.

Related

How to find the size of maximal clique or clique number?

Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.

Back Tracking Vs. Greedy Algorithm Maximum Independent Set

I implemented a back tracing algorithm using both a greedy algorithm and a back tracking algorithm.
The back tracking algorithm is as follows:
MIS(G= (V,E): a graph): largest set of independent vertices
1:if|V|= 0
then return .
3:end if
if | V|= 1
then return V
end if
pick u ∈ V
Gout←G−{u}{remove u from V and E }
Gn ← G−{ u}−N(u){N(u) are the neighbors of u}
Sout ←MIS(Gout)
Sin←MIS(Gin)∪{u}
return maxsize(Sout,Sin){return Sin if there’s a tie — there’s a reason for this.
}
The greedy algorithm is to iteratively pick the node with the smallest degree, place it in the MIS and then remove it and its neighbors from G.
After running the algorithm on varying graph sizes where the probability of an edge existing is 0.5, I have empirically found that the back tracking algorithm always found a smaller a smaller maximum independent set than the greedy algorithm. Is this expected?
Your solution is strange. Backtracking is usually used to yes/no problems, not optimization. The algorithm you wrote depends heavily on how you pick u. And it definitely is not backtracking because you never backtrack.
Such problem can be solved in a number of ways, e.g.:
genetic programming,
exhaustive searching,
solving the problem on dual graph (maximum clique problem).
According to Wikipedia, this is a NP-hard problem:
A maximum independent set is an independent set of the largest possible size for a given graph G.
This size is called the independence number of G, and denoted α(G).
The problem of finding such a set is called the maximum independent set problem and is an NP-hard optimization problem.
As such, it is unlikely that there exists an efficient algorithm for finding a maximum independent set of a graph.
So, for finding the maximum independent set of a graph, you should test all available states (with an algorithm which its time complexity is exponential). All other faster algorithms (like greedy, genetic or randomize ones), can not find the exact answer. They can guarantee to find a maximal independent set, but not the maximum one.
In conclusion, I can say that your backtracking approach is slower and accurate; but the greedy approach is only an approximation algorithm.

Maximum simple path with budget on edges

This is a part of a self formulated question, and hence I have not been able to "Google" it and my own attempts have been futile till now.
You are given a graph G(V,E) each Node of V has a profit wi, each Edge of E has a cost of ci. We are now given a budget C, what is required to be found is a single path such that the sum of costs is less than C where sum of wi is maximum.Path has the normal definition here that is a path will not contain repeating vertices (simple path).
It is obvious that Hamiltonian path is a special case of this(Setting cost = |N-1| and the cost of each edge=1), and hence this is an NP Hard problem, so I am looking for approximation solutions, and heuristics.
Mathematically
Given Graph G(V,E)
ci >=0 for each edge e
wi >=0 for each vertex v
find a simple path P such that
Sum ci over all edges e in P <= C
Maximise Sum wi for all v in P
This is known as the Selective Travelling Salesman Problem, or Travelling Salesman with profits. Google Scholar should be able to give you some references. Metaheuristics such as genetic programming or tabu search are often used. If you want to solve the problem optimally, linear programming techniques would probably work (unfortunately, you don't state the size of the instances you're dealing with). If the length of the path is small (say 15 vertices), also color-coding might work.
One simple heuristic that cones to mind is a variation of stochastic hill climbing and greedy algorithm.
Define value function that is increasing in the weight and decreasing with the cost. For example:
value(u,v) = w(v) / [c(u,v) + epsilon]
(+ epsilon for the case of c(u,v) = 0)
Now, the idea is:
From a vertex u, proceed to vertex v with probability:
P(v|u) = value(u,v) / sum(u,x) [ for all feasible moves (u,x) ]
Repeat until you cannot continue.
This solution will give you one solution - quickly, but it is probably not near optimal. However - it is stochastic - you can always re-run it again and again, while you have time.
This will give you an anytime algorithm for this problem, meaning - the more time you have - the better your solution is.
Some optimizations:
You can try to learn macros to accelerate each search, which will result in more searches for each amount of time, and probably - better solutions.
Usually, the first search is not stochastic, and is purely greedy, following the max{value(u,v)}

Algorithm for finding optimal node pairs in hexagonal graph

I'm searching for an algorithm to find pairs of adjacent nodes on a hexagonal (honeycomb) graph that minimizes a cost function.
each node is connected to three adjacent nodes
each node "i" should be paired with exactly one neighbor node "j".
each pair of nodes defines a cost function
c = pairCost( i, j )
The total cost is then computed as
totalCost = 1/2 sum_{i=1:N} ( pairCost(i, pair(i) ) )
Where pair(i) returns the index of the node that "i" is paired with. (The sum is divided by two because the sum counts each node twice). My question is, how do I find node pairs that minimize the totalCost?
The linked image should make it clearer what a solution would look like (thick red line indicates a pairing):
Some further notes:
I don't really care about the outmost nodes
My cost function is something like || v(i) - v(j) || (distance between vectors associated with the nodes)
I'm guessing the problem might be NP-hard, but I don't really need the truly optimal solution, a good one would suffice.
Naive algos tend to get nodes that are "locked in", i.e. all their neighbors are taken.
Note: I'm not familiar with the usual nomenclature in this field (is it graph theory?). If you could help with that, then maybe that could enable me to search for a solution in the literature.
This is an instance of the maximum weight matching problem in a general graph - of course you'll have to negate your weights to make it a minimum weight matching problem. Edmonds's paths, trees and flowers algorithm (Wikipedia link) solves this for you (there is also a public Python implementation). The naive implementation is O(n4) for n vertices, but it can be pushed down to O(n1/2m) for n vertices and m edges using the algorithm of Micali and Vazirani (sorry, couldn't find a PDF for that).
This seems related to the minimum edge cover problem, with the additional constraint that there can only be one edge per node, and that you're trying to minimize the cost rather than the number of edges. Maybe you can find some answers by searching for that phrase.
Failing that, your problem can be phrased as an integer linear programming problem, which is NP-complete, which means that you might get dreadful performance for even medium-sized problems. (This does not necessarily mean that the problem itself is NP-complete, though.)

How can I prove the "Six Degrees of Separation" concept programmatically?

I have a database of 20 million users and connections between those people. How can I prove the concept of "Six degrees of separation" concept in the most efficient way in programming?
link to the article about Six degrees of separation
You just want to measure the diameter of the graph.
This is exactly the metric to find out the seperation between the most-distantly-connected nodes in a graph.
Lots of algorithms on Google, Boost graph too.
You can probably fit the graph in memory (in the representation that each vertex knows a list of its neighbors).
Then, from each vertex n, you can run a breadth-first search (using a queue) to the depth of 6 and count number of vertices visited. If not all vertices are visited, you have disproved the theorem. In other case, continue with next vertex n.
This is O(N*(N + #edges)) = N*(N + N*100) = 100N^2, if user has 100 connections on average, Which is not ideal for N=20 million. I wonder if the mentioned libraries can compute the diameter in better time complexity (general algorithm is O(N^3)).
The computations for individual vertices are independent, so they could be done in parallel.
A little heuristic: start with vertices that have the lowest degree (better chance to disprove the theorem).
I think the most efficient way (worst case) is almost N^3. Build an adjacency matrix, and then take that matrix ^2, ^3, ^4, ^5 and ^6. Look for any entries in the graph that are 0 for matrix through matrix^6.
Heuristically you can try to single out subgraphs ( large clumps of people who are only connected to other clumps by a relatively small number of "bridge" nodes ) but there's absolutely no guarantee you'll have any.
Well a better answer has already been given, but off the top of my head I would have gone with the Floyd-Warshall all pairs shortest path algorithm, which is O(n^3). I'm unsure of the complexity of the graph diameter algorithm, but it "sounds" like this would also be O(n^3). I'd like clarification on this if anyone knows.
On a side note, do you really have such a database? Scary.

Resources