why prim`s algorithm needs distance array? - algorithm

I have some questions about Prim`s algorithm.
Prim algorithms can find MST. In general implementation, It needs initialize all Nodes as INF. but i don`t know why this initialize needs.
Here is my implementation
#include<iostream>
#include<tuple>
#include<algorithm>
#include<vector>
using namespace std;
typedef tuple<int,int,int> ti;
int main(void)
{
ios::sync_with_stdio(0);
cin.tie(0);
bool vis[1005];
vector<pair<int,int>> vertex[1005];
int V,E;
int u,v,w;
int sum = 0;
int cnt = 0;
priority_queue<ti,vector<ti>,greater<ti>> pq;
cin >> V >> E;
for(int i = 0; i < E; i++)
{
cin >> u >> v >> w;
vertex[u].push_back({v,w});
vertex[v].push_back({u,w});
}
for(auto i : vertex[1]){
pq.push({i.second,1,i.first});
}
vis[1] = true;
while(!pq.empty())
{
tie(w,u,v) = pq.top(); pq.pop();
if(vis[v]) continue;
vis[v] = true;
sum += w;
cnt++;
for(auto i : vertex[v]){
if(!vis[i.first])
pq.push({i.second,v,i.first});
}
if(cnt == V-1) break;
}
// VlogV
cout << sum;
return 0;
plz ignore indentation (code paste error)
In this code, you can find sum of the MST. O(VlogV), Also we can find some Vertex Parent node (vis[v] = true, pre[v] = u) so we can know order of MST.
When we don`t need distance array, we can implement prim algorithm O(VlogV), In almost case(not in MST case) it always faster than Kruskal.
I know I'm something wrong, so i want to know what point I am wrong.
So is there any reason why we use distance array??

Your conclusion that this algorithm works in O(Vlog(V)) seems to be wrong. Here is why:
while(!pq.empty()) // executed |V| times
{
tie(w,u,v) = pq.top();
pq.pop(); // log(|V|) for each pop operation
if(vis[v]) continue;
vis[v] = true;
sum += w;
cnt++;
for(auto i : vertex[v]){ // check the vertices of edge v - |E| times in total
if(!vis[i.first])
pq.push({i.second,v,i.first}); // log(|V|) for each push operation
}
if(cnt == V-1) break;
}
First of all notice that, you have to implement the while loop |V| times, since there are |V| number of vertices stored in the pq.
However, also notice that you have to traverse all the vertices in the line:
for(auto i : vertex[v])
Therefore it takes |E| number of operations in total.
Notice that push and pop operations takes |V| number of operations for each approximately.
So what do we have?
We have |V| many iterations and log(|V|) number of push/pop operations in each iteration, which makes V * log(V) number of operations.
On the other hand, we have |E| number of vertex iteration in total, and log(|V|) number of push operation in each vertex iteration, which makes E * log(V) number of operations.
In conclusion, we have V*log(V) + E*log(V) total number of operations. In most cases, V < E assuming a connected graph, therefore time complexity can be shown as O(E*log(V)).
So, time complexity of Prim's Algorithm doesn't depend on keeping a distance array. Still, you have to make the iterations mentioned above.

Related

Understanding time complexity of Dijkstra priority queue vs set implementation

Consider the following two implementations of dijkstra's algorithm:
Using set:
// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
vector<int> dist(V, 1e9);
dist[S] = 0;
set<pair<int, int>> s;
for (int i=0; i<V; i++)
s.insert({dist[i], i}); // log(V) time
while (!s.empty()) // exactly V times
{
auto top = *(s.begin()); // constant time
int dis = top.first;
int node = top.second;
s.erase(top); // amortised constant time
for (auto it: edge[node]) // For all the iterations of outer while loop this will sum up to E, where E is the total number of edges in the graph
{
int nb = it[0];
int edge_weight = it[1];
if (dist[nb] > dis + edge_weight)
{
s.erase({dist[nb], nb}); // log(V) time
dist[nb] = dis + edge_weight;
s.insert({dist[nb], nb}); // log(V) time
}
}
}
}
Using priority queue:
// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
vector<int> dist(V, 1e9);
dist[S] = 0;
priority_queue<pair<int, int> , vector<pair<int, int>>, greater<pair<int, int>>> pq;
pq.push({dist[S], S});
while (!pq.empty()) // Can be more than V times, let's call this heap_size times
{
int node = pq.top().second; // O(1) time
pq.pop(); // log(heap_size) time
for (int i=0; i<edge[node].size(); i++) // Can be approximated to (E/V), where E is the total number of edges in the graph
{
int nb = edge[node][i][0];
int edge_weight = edge[node][i][1];
if (dist[nb] > dist[node] + edge_weight) {
dist[nb] = dist[node] + edge_weight;
pq.push({dist[nb], nb}); // log(heap_size) time
}
}
}
return dist;
}
Finding the time complexity using set based approach is easy as the number of elements in set is exactly V(number of vertices) and the inner for loop runs for every edge, so it's time complexity is O(V*log(V) + V + E*log(V)) which is equivalent to O(E*log(V)) (reason: What's the time complexity of Dijkstra's Algorithm)
But I have trouble in understanding the time complexity of the priority_queue approach. Here the same node can be present multiple times in the priority_queue with different distances. How do I calculate the upper bound on the number of nodes that are added to the heap?
I also want to decide which implementation to use based on the nature of the graph(sparse vs dense), are both these implementations equivalent for any type of graph?
Your priority_queue version isn't quite right.
Your while loop should start like this:
auto nextPair = pq.top();
pq.pop();
if (dist[nextPair.second] != nextPair.first) {
continue;
}
int node = nextPair.second;
This ensures that each vertex is processed only once, when its current record is popped from the priority queue.
The complexity analysis then becomes easy, since each edge will then be processed at most once, and there are then at most |E| inserts into the priority queue.
Total complexity is then O(E log E), and since E < V2, that's the same as O(E log V).
The major disadvantage of the priority queue method is that it can consume O(E) space. This is usually OK, since it's on par with the space consumed by the graph itself. Since the priority_queue is a lot faster than set, the priority_queue version is the way that it is commonly done in practice.

Dijkstra algorithm with negative weigths

Heres an implementation of dijkstra algo i found in an online couse. Can anyone site an example with negative edges where this might not work.
vector<edge> adj[100];
vector<int> dist(100, INF);
void dijkstra(int start) {
dist[start] = 0;
priority_queue<pair<int, int>,
vector<pair<int, int> >,
greater<pair<int, int> > > pq;
pq.push(make_pair(dist[start], start));
while (!pq.empty()) {
int u = pq.top().second,
d = pq.top().first;
pq.pop();
if (d > dist[u]) continue;
for (int i = 0; i < adj[u].size(); i++) {
int v = adj[u][i].v,
w = adj[u][i].weight;
if (w + dist[u] < dist[v]) {
dist[v] = w + dist[u];
pq.push(make_pair(dist[v], v));
}
}
}
Normally Dijsktra's algorithm includes a visited set which means that we can avoid revisiting nodes, and can terminate as soon as the destination has been visited.
This implementation does not use such a set. This means that it may revisit nodes and adjust the distance if it reduces.
The good thing is that this will always return the correct answer if it terminates. However, the algorithm can take a long time to terminate if negative weights are included (in particular, if there is a negative weight loop then this algorithm will never terminate).
The example is cycle with negative weight. Dijkstra's algorithms never leaves this cycle.

Algorithm that will maintain top n items in the past k days?

I would like to implement a data structure maintaining a set S for a leaderboard that can answer the following queries efficiently, while also being memory-efficient:
add(x, t) Add a new item with score x to set S with an associated time t.
query(u) List the top n items (sorted by score) in the set S which has associated time t such that t + k >= u. Each subsequent query will have a u no smaller than previous queries.
In standard English, highscores can be added to this leaderboard individually, and I'd like an algorithm that can efficiently query the top n items on the leaderboard within the post k days (where k and n are fixed constants).
n can be assumed to be much less than the total number of items, and scores may be assumed to be random.
A naïve algorithm would be to store all elements as they are added into a balanced binary search tree sorted by score, and remove elements from the tree when they are more than k days old. Detecting elements that are more than k days old can be done with another balanced binary search tree sorted by time. This algorithm would yield a good time complexity of O(log(h)) where h is the total number of scores added in the past k days. However, the space complexity is O(h), and it is easy to see that most of the data saved will never be reported in a query even if no new scores are added for the next k days.
If n is 1, a simple double-ended queue is all that is necessary. Before adding a new item to the front of the queue, remove items from the front that have a smaller score than the new item, because they will never be reported in a query. Before querying, remove items from the back of the queue that are too old, then return the item that is left at the back of the queue. All operations would be amortized constant time complexity, and I wouldn't be storing items that would never be reported.
When n is more than 1, I can't seem to be able to formulate an algorithm which has a good time complexity and only stores items that could possibly be reported. An algorithm with time complexity O(log(h)) would be great, but n is small enough so that O(log(h) + n) is acceptable too.
Any ideas? Thanks!
This solution is based on the double-ended queue solution and I assume t is ascending.
The idea is that a record can be removed if there are n records with both larger t and larger x than it, which is implemented by Record.count in the sample code.
As each record would be moved from S to temp at most n times, we have average time complexity O(n).
The space complexity is hard to decide. However, it looks fine in the simulation. S.size() is about 400 when h = 10000 and n = 50.
#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
using namespace std;
const int k = 10000, n = 50;
class Record {
public:
Record(int _x, int _t): x(_x), t(_t), count(n) {}
int x, t, count;
};
deque<Record> S;
void add(int x, int t)
{
Record record(x, t);
vector<Record> temp;
while (!S.empty() && record.x >= S.back().x) {
if (--S.back().count > 0) temp.push_back(S.back());
S.pop_back();
}
S.push_back(record);
while (!temp.empty()) {
S.push_back(temp.back());
temp.pop_back();
}
}
vector<int> query(int u)
{
while (S.front().t + k < u)
S.pop_front();
vector<int> xs;
for (int i = 0; i < S.size() && i < n; ++i)
xs.push_back(S[i].x);
return xs;
}
int main()
{
for (int t = 1; t <= 1000000; ++t) {
add(rand(), t);
vector<int> xs = query(t);
if (t % k == 0) {
cout << "t = " << t << endl;
cout << "S.size() = " << S.size() << endl;
for (auto x: xs) cout << x << " ";
cout << endl;
}
}
return 0;
}

Reconstructing the list of items from a space optimized 0/1 knapsack implementation

A space optimization for the 0/1 knapsack dynamic programming algorithm is to use a 1-d array (say, A) of size equal to the knapsack capacity, and simply overwrite A[w] (if required) at each iteration i, where A[w] denotes the total value if the first i items are considered and knapsack capacity is w.
If this optimization is used, is there a way to reconstruct the list of items picked, perhaps by saving some extra information at each iteration of the DP algorithm? For example, in the Bellman Ford Algorithm a similar space optimization can be implemented, and the shortest path can still be reconstructed as long as we keep a list of the predecessor pointers, ie the last hop (or first, depending on if a source/destination oriented algorithm is being written).
For reference, here is my C++ function for the 0/1 knapsack problem using dynamic programming where I construct a 2-d vector ans such that ans[i][j] denotes the total value considering the first i items and knapsack capacity j. I reconstruct the items picked by reverse traversing this vector ans:
void knapsack(vector<int> v,vector<int>w,int cap){
//v[i]=value of item i-1
//w[i]=weight of item i-1, cap=knapsack capacity
//ans[i][j]=total value if considering 1st i items and capacity j
vector <vector<int> > ans(v.size()+1,vector<int>(cap+1));
//value with 0 items is 0
ans[0]=vector<int>(cap+1,0);
//value with 0 capacity is 0
for (uint i=1;i<v.size()+1;i++){
ans[i][0]=0;
}
//dp
for (uint i=1;i<v.size()+1;i++) {
for (int x=1;x<cap+1;x++) {
if (ans[i-1][x]>=ans[i-1][x-w[i-1]]+v[i-1]||x<w[i-1])
ans[i][x]=ans[i-1][x];
else {
ans[i][x]=ans[i-1][x-w[i-1]]+v[i-1];
}
}
}
cout<<"Total value: "<<ans[v.size()][cap]<<endl;
//reconstruction
cout<<"Items to carry: \n";
for (uint i=v.size();i>0;i--) {
for (int x=cap;x>0;x--) {
if (ans[i][x]==ans[i-1][x]) //item i not in knapsack
break;
else if (ans[i][x]==ans[i-1][x-w[i-1]]+v[i-1]) { //item i in knapsack
cap-=w[i-1];
cout<<i<<"("<<v[i-1]<<"), ";
break;
}
}
}
cout<<endl;
}
The following is a C++ implementation of yildizkabaran's answer. It adapts Hirschberg's clever divide & conquer idea to compute the answer to a knapsack instance with n items and capacity c in O(nc) time and just O(c) space:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
// Returns a vector of (cost, elem) pairs.
vector<pair<int, int>> optimal_cost(vector<int> const& v, vector<int> const& w, int cap) {
vector<pair<int, int>> dp(cap + 1, { 0, -1 });
for (int i = 0; i < size(v); ++i) {
for (int j = cap; j >= 0; --j) {
if (w[i] <= j && dp[j].first < dp[j - w[i]].first + v[i]) {
dp[j] = { dp[j - w[i]].first + v[i], i };
}
}
}
return dp;
}
// Returns a vector of item labels corresponding to an optimal solution, in increasing order.
vector<int> knapsack_hirschberg(vector<int> const& v, vector<int> const& w, int cap, int offset = 0) {
if (empty(v)) {
return {};
}
int mid = size(v) / 2;
auto subSol1 = optimal_cost(vector<int>(begin(v), begin(v) + mid), vector<int>(begin(w), begin(w) + mid), cap);
auto subSol2 = optimal_cost(vector<int>(begin(v) + mid, end(v)), vector<int>(begin(w) + mid, end(w)), cap);
pair<int, int> best = { -1, -1 };
for (int i = 0; i <= cap; ++i) {
best = max(best, { subSol1[i].first + subSol2[cap - i].first, i });
}
vector<int> solution;
if (subSol1[best.second].second != -1) {
int iChosen = subSol1[best.second].second;
solution = knapsack_hirschberg(vector<int>(begin(v), begin(v) + iChosen), vector<int>(begin(w), begin(w) + iChosen), best.second - w[iChosen], offset);
solution.push_back(subSol1[best.second].second + offset);
}
if (subSol2[cap - best.second].second != -1) {
int iChosen = mid + subSol2[cap - best.second].second;
auto subSolution = knapsack_hirschberg(vector<int>(begin(v) + mid, begin(v) + iChosen), vector<int>(begin(w) + mid, begin(w) + iChosen), cap - best.second - w[iChosen], offset + mid);
copy(begin(subSolution), end(subSolution), back_inserter(solution));
solution.push_back(iChosen + offset);
}
return solution;
}
Even though this is an old question I recently ran into the same problem so I figured I would write my solution here. What you need is Hirschberg's algorithm. Although this algorithm is written for reconstructing edit distances, the same principle applies here. The idea is that when searching for n items in capacity c, the knapsack state at (n/2)th item corresponding to the final maximum value is determined in the first scan. Let's call this state weight_m and value_m. This can be with keeping track of an additional 1d array of size c. So the memory is still O(c). Then the problem is divided into two parts: items 0 to n/2 with a capacity of weight_m, and items n/2 to n with a capacity of c-weight_m. The reduced problems in total is of size nc/2. Continuing this approach we can determine the knapsack state (occupied weight and current value) after each item, after which we can simply check to see which items were included. This algorithm completes in O(2nc) while using O(c) memory, so in terms of big-O nothing is changed even though the algorithm is at least twice as slow. I hope this helps to anyone who is facing a similar problem.
To my understanding, with the proposed solution, it is effectively impossible to obtain the set of involved items for a certain objective value. The set of items can be obtained by either generating the discarded rows again or maintain a suitable auxiliary data structure. This could be done by associating each entry in A with the list of items from which it was generated. However, this would require more memory than the initially proposed solution. Approaches for backtracking for knapsack problems is also briefly discussed in this journal paper.

find minimum step to make a number from a pair of number

Let's assume that we have a pair of numbers (a, b). We can get a new pair (a + b, b) or (a, a + b) from the given pair in a single step.
Let the initial pair of numbers be (1,1). Our task is to find number k, that is, the least number of steps needed to transform (1,1) into the pair where at least one number equals n.
I solved it by finding all the possible pairs and then return min steps in which the given number is formed, but it taking quite long time to compute.I guess this must be somehow related with finding gcd.can some one please help or provide me some link for the concept.
Here is the program that solved the issue but it is not cleat to me...
#include <iostream>
using namespace std;
#define INF 1000000000
int n,r=INF;
int f(int a,int b){
if(b<=0)return INF;
if(a>1&&b==1)return a-1;
return f(b,a-a/b*b)+a/b;
}
int main(){
cin>>n;
for(int i=1;i<=n/2;i++){
r=min(r,f(n,i));
}
cout<<(n==1?0:r)<<endl;
}
My approach to such problems(one I got from projecteuler.net) is to calculate the first few terms of the sequence and then search in oeis for a sequence with the same terms. This can result in a solutions order of magnitude faster. In your case the sequence is probably: http://oeis.org/A178031 but unfortunately it has no easy to use formula.
:
As the constraint for n is relatively small you can do a dp on the minimum number of steps required to get to the pair (a,b) from (1,1). You take a two dimensional array that stores the answer for a given pair and then you do a recursion with memoization:
int mem[5001][5001];
int solve(int a, int b) {
if (a == 0) {
return mem[a][b] = b + 1;
}
if (mem[a][b] != -1) {
return mem[a][b];
}
if (a == 1 && b == 1) {
return mem[a][b] = 0;
}
int res;
if (a > b) {
swap(a,b);
}
if (mem[a][b%a] == -1) { // not yet calculated
res = solve(a, b%a);
} else { // already calculated
res = mem[a][b%a];
}
res += b/a;
return mem[a][b] = res;
}
int main() {
memset(mem, -1, sizeof(mem));
int n;
cin >> n;
int best = -1;
for (int i = 1; i <= n; ++i) {
int temp = solve(n, i);
if (best == -1 || temp < best) {
best = temp;
}
}
cout << best << endl;
}
In fact in this case there is not much difference between dp and BFS, but this is the general approach to such problems. Hope this helps.
EDIT: return a big enough value in the dp if a is zero
You can use the breadth first search algorithm to do this. At each step you generate all possible NEXT steps that you havent seen before. If the set of next steps contains the result you're done if not repeat. The number of times you repeat this is the minimum number of transformations.
First of all, the maximum number you can get after k-3 steps is kth fibinocci number. Let t be the magic ratio.
Now, for n start with (n, upper(n/t) ).
If x>y:
NumSteps(x,y) = NumSteps(x-y,y)+1
Else:
NumSteps(x,y) = NumSteps(x,y-x)+1
Iteratively calculate NumSteps(n, upper(n/t) )
PS: Using upper(n/t) might not always provide the optimal solution. You can do some local search around this value for the optimal results. To ensure optimality you can try ALL the values from 0 to n-1, in which worst case complexity is O(n^2). But, if the optimal value results from a value close to upper(n/t), the solution is O(nlogn)

Resources