Is it possible to do a breath first search using only (size of graph) + a constant amount of memory -- in other words, without recording which nodes have already been visited?
No. You always need to remember where you have visited. In the worst case, therefore, you need to record the visited state of all nodes. However, the branching factor and depth of the graph are the main factors. If the graph doesn't branch a lot, you won't need anything like that. If it is highly branching, you tend to the worst case.
Related
Is it possible to modify Floyd's Cycle Detection Algorithm to find all numbers having a repetition in an array?
I cannot modify the array.
Given an array for size n with integers 1,...n-1, by the pigeonhole principle, there exists a duplicate integer. - *
An implementation of Floyd's cycle detection algorithm successfully helps in finding a duplicate in O(n) time and O(1) space.
I want to handle the case when there are more than one duplicates. These are my thoughts:
Each repetition of a number results in a cycle (assuming the condition *). But no two different integers having a repetition can be in the same cycle. So we have to find all such cycles.
Each cycle path would be of the form --------o where --------- is the part where we start the two pointers for our cycle detection algorithm, o is the cycle.
We have to find entry points of the cycle, so we keep one pointer at the first - in our cycle path and the other where the pointers met initially, moving them together at the same speed would result in them meeting at the entry point of the cycle.
So if we have the first - of our "Cycle path", we can find the duplicate which resulted in that cycle.
How do I find the first - optimally for each cycle path? Storing the visited nodes would just result in losing that O(1) space complexity and it would be relatively better to just use a hashtable instead.
I would need to store some information to tell that the cycle path is already visited so space cannot be O(1) as per my understanding, but how do I minimize space in any case.
Is this approach even feasible, am I wrong somewhere in the ideas above? What other method would be feasible utilizing cycle detection and finding multiple duplicates optimally?
Update:
If the approach above is feasible, the problem comes down to exiting the cycle and finding an unvisited cycle. I can find if a cycle is visited by storing the number where our pointers meet initially. So technically taking O(k) space for k duplicates. Finding another cycle could be done using a randomized start and checking if the cycle is visited or not. But as the number of duplicates found gets closer to k, the randomized start would get slower and may never terminate. Also, would there be any smart way to to pick an index for a randomized approach such that it guarantees termination?
I recently had an interview for a position dealing with extremely large distributed systems, and one of the questions I was asked was to make a function that could count the nodes in a binary tree entirely in place; meaning no recursion, and no queue or stack for an iterative approach.
I don't think I have ever seen a solution that does not use at least one of the above, either when I was in school or after.
I mentioned that having a "parent" pointer would trivialize the problem somewhat but adding even a single simple field to each node in a tree with a million nodes is not trivial in terms of memory cost.
How can this be done?
If an exact solution is required, then the prerequisite of being a binary tree may be a red herring. Each node in the cluster may simply count allocations in the backing collection. Which may be either constant or linear time, depending on whether it has been tracked or not.
If no exact solution was asked for, but the given tree is balanced, then a simple deep probe to determine tree hight, in combination with the placement rules allows to estimate an upper and lower bound for the total node count. Be wary that the probe may have either hit a node with height log2(n) or log2(n) - 1, so your estimate can be up to factor 2 too low or too high. Constant space, O(log(n)) time.
If the placement rules dictate special properties about the bottom most layer (e.g. filled from left to right, not e.g. a red-black-tree), then you may perform log(n) probes in a binary search pattern to find the exact count, in constant space and O(log(n)^2) time.
In the decrease-key operation of a Fibonacci Heap, if it is allowed to lose s > 1 children before cutting a node and melding it to the root list (promote the node), does this alter the overall runtime complexity? I think there are no changes in the complexity since the change in potential will be the same. But I am not sure if I am right.
And how can this be proved by the amortized analysis?
Changing the number of children that a node in the Fibonacci heap can lose does affect the runtime, but my suspicion is that if you're careful with how you do it you'll still get the same asymptotic runtime.
You're correct that the potential function will be unchanged if you allow each node to lose multiple children before being promoted back up to the root. However, the potential function isn't the source of the Fibonacci heap's efficiency. The reason that we perform cascading cuts (promoting multiple nodes back up to the root level during a decrease-key) is to ensure that a tree that has order n has a number of nodes in it that is exponential in n. That way, when doing a dequeue-min operation and coalescing trees together such that there is at most one tree of each order, the total number of trees required to store all the nodes is logarithmic in the number of nodes. The standard marking scheme ensures that each tree of order n has at least Θ(φn) nodes, where φ is the Golden Ratio (around 1.618...)
If you allow more nodes to be removed out of each tree before promoting them back to the root, my suspicion is that if you cap the number of missing children at some constant that you should still get the same asymptotic time bounds, but probably with a higher constant factor (because each tree holds fewer nodes and therefore more trees will be required). It might be worth writing out the math to see what recurrence relation you get for the number of nodes in each tree in case you want an exact value.
Hope this helps!
How weigth order affects the computing cost in a backtracking algorithm? The number of nodes and search trees are the same but when it's non-ordered it tooks a more time, so it's doing something.
Thanks!
Sometimes in backtracking algorithms, when you know a certain branch is not an answer - you can trim it. This is very common with agents for games, and is called Alpha Beta Prunning.
Thus - when you reorder the visited nodes, you can increase your prunning rate and thus decrease the actual number of nodes you visit, without affecting the correctness of your answer.
One more possibility - if there is no prunning is cache performance. Sometimes trees are stored as array [especially complete trees]. Arrays are most efficient when iterating, and not "jumping randomly". The reorder might change this behavior, resulting in better/worse cache behavior.
The essence of backtracking is precisely not looking at all possibilities or nodes (in this case), however, if the nodes are not ordered it is impossible for the algorithm to "prune" a possible branch because it is not known with certainty if the element Is actually on that branch.
Unlike when it is an ordered tree since if the searched element is greater / smaller the root of that subtree, the searched element is to the right or left respectively. That is why if the tree is not ordered the computational order is equal to brute force, however, if the tree is ordered in the worst case order is equivalent to brute force, but the order of execution is smaller.
Problem: finding shortest paths in an unweighted, undirected graph.
Breadth-first search can find the shortest path between two nodes, but this can take up to O(|V| + |E|) time. A precomputed lookup table would allow requests to be answered in O(1) time, but at the cost of O(|V|^2) space.
What I'm wondering: Is there an algorithm which offers a space-time tradeoff that's more fine-grained? In other words, is there an algorithm which:
Finds shortest paths in more time than O(1), but is faster than a bidirectional breadth-first search
Uses precomputed data which takes up less space than O(|V|^2)?
On the practical side: The graph is 800,000 nodes and is believed to be a small-world network. The all-pairs shortest paths table would be on the order of gigabytes -- that's not outrageous these days, but it doesn't suit our requirements.
However, I am asking my question out of curiosity. What's keeping me up at night is not "how can I reduce cache misses for an all-pairs lookup table?", but "Is there a completely different algorithm out there that I've never heard of?"
The answer may be no, and that's okay.
You should start by looking at Dijkstra's algorithm for finding the shortest path. The a* algorithm is a variant that uses a heuristic to reduce the time taken to calculate the optimal route between the start and goal node (such as the euclidean distance). You can modify this heuristic for performance or accuracy.
It seems as if your input set must be very large, if a lookup table will be too large to store on the disk. I assume that that the data will not fit in RAM then, which means that whatever algorithm you use should be tuned to minimize the amounts of reads and writes. Whenever disks are involved space == time, because writing to disk is so slow.
The exact algorithm you should use depends on what kind of graph you have. This research paper might be of interest to you. Full disclosure: I have not read it myself, but it seems like it might be what you are looking for.
Edit:
If the graph is (almost) connected, which a small-world network is, a lookup table can't be smaller than V^2. This means that all lookups will require disk access. If the edges fit in main memory, it might be faster to just compute the path every time. Otherwise, you might compute the path from a table containing the lengths of all shortests paths. You can reconstruct the path from that table.
The key is to make sure that the entries in the table which are close to each other in either direction are also close to each other on the disk. This storage pattern accomplishes that:
1 2 1 2 5 6
3 4 3 4 7 8
9 10 13 14
11 12 15 16
It will also work well with the cache hierarchy.
In order to compute the table you might use a modified Floyd-Warshall, where you process the data in blocks. This would let you perform the computation in a reasonable amount of time, especially if you parallelize it.