Reference counting alone does not collect cycles but there are additional techniques that can collect cycles as well. What is the simplest such technique?
I'd like to compare the complexity of augmented referencing counting with tracing GC.
It's better to code be cycle free but in the case of cycles if you want find isolated cycle in graph with E and V, it will takes O(|E| + |V|), It's similar to finding connected component algorithm and then run finding all cycle of graph with BFS, and if you just think about |V| it can be very heavy (in compiled assembly) So It's better to prevent from this, and because of this they left them to developers.
Related
Graph theory Algorithm problem
Consider a set of, say p single-core processors which have been assigned q programs is given
along with:
– start times for the programs
– end times for the programs
– Total processing time to complete a program
Programs can be stopped, restarted, and moved between processors without any penalty.
i. Device an algorithm to schedule the running of programs on processors such that
the deadlines are met.
ii. Trace the algorithm for a set of values of your choice
I don't know which algorithm to use bellman-ford or floyd warshall or ford-fulkerson or dijksta's or kruskal's or prim's algorithm.
What algorithms could be used here, and what would be the correct way to formulate this problem using graph theory language?
You can use Dijkstra to find the critical path, if you set the edge costs to the reciprocal of the task duration. Then always choose to run a ready task if it is on the critical path, otherwise choose a non critical task at random.
This is the bare bones of the algorithm. Lots of details to sort out. You can see all the details at https://github.com/JamesBremner/TaskGraphScheduler
Course Schedule leetcode: https://leetcode.com/problems/course-schedule/
This problem involves detecting a cycle, and if there is one then you cannot complete all course.
I've heard that DFS is most recommended for detecting a cycle, yet Kahn's Algorithm is recommended for the course schedule problem, which is a BFS solution.
So.. which is it? Is DFS better for detecting cycles or is BFS?
Both have a time complexity of O(V+E) and both have a space complexity of O(V+E). So in those terms there is no winner.
One uses a queue, the other a stack. One uses a degree per node, the other a visited mark per node.
One difference is that DFS can use an implicit stack using recursion, which may make the code a bit more compact. But then again, you're then limited by the available call stack space, so using recursion may not be a viable solution for large inputs, and using an explicit stack may in the end be the option to take for a DFS-based solution.
So all in all, they come out equal. Choose either.
In real work, it's always a bad idea to use O(N) stack space, so practical DFS-based solutions would use an explicit stack.
The explicit stack implementation is a little complicated, because the stack isn't just a stack of nodes -- you also need to store the current position in the child list for each open node -- and the traversal logic gets a little complicated.
The DFS solution is not awful, but if you want to write a good solid solution, then Khan's algorithm will end up simpler and faster in the worst case. It will also use less memory, because the list of pending nodes is just a list of nodes. (It doesn't matter if you use that list like a stack or queue. In most cases using it like a stack is faster/easier)
So if you're going to explicitly check a DAG to see if it has cycles, usually Khan's algorithm is best. The DFS technique is useful if you're already doing a DFS for some other reason, and you want to detect cycles along the way.
Is it possible to modify Floyd's Cycle Detection Algorithm to find all numbers having a repetition in an array?
I cannot modify the array.
Given an array for size n with integers 1,...n-1, by the pigeonhole principle, there exists a duplicate integer. - *
An implementation of Floyd's cycle detection algorithm successfully helps in finding a duplicate in O(n) time and O(1) space.
I want to handle the case when there are more than one duplicates. These are my thoughts:
Each repetition of a number results in a cycle (assuming the condition *). But no two different integers having a repetition can be in the same cycle. So we have to find all such cycles.
Each cycle path would be of the form --------o where --------- is the part where we start the two pointers for our cycle detection algorithm, o is the cycle.
We have to find entry points of the cycle, so we keep one pointer at the first - in our cycle path and the other where the pointers met initially, moving them together at the same speed would result in them meeting at the entry point of the cycle.
So if we have the first - of our "Cycle path", we can find the duplicate which resulted in that cycle.
How do I find the first - optimally for each cycle path? Storing the visited nodes would just result in losing that O(1) space complexity and it would be relatively better to just use a hashtable instead.
I would need to store some information to tell that the cycle path is already visited so space cannot be O(1) as per my understanding, but how do I minimize space in any case.
Is this approach even feasible, am I wrong somewhere in the ideas above? What other method would be feasible utilizing cycle detection and finding multiple duplicates optimally?
Update:
If the approach above is feasible, the problem comes down to exiting the cycle and finding an unvisited cycle. I can find if a cycle is visited by storing the number where our pointers meet initially. So technically taking O(k) space for k duplicates. Finding another cycle could be done using a randomized start and checking if the cycle is visited or not. But as the number of duplicates found gets closer to k, the randomized start would get slower and may never terminate. Also, would there be any smart way to to pick an index for a randomized approach such that it guarantees termination?
Is it possible to do a breath first search using only (size of graph) + a constant amount of memory -- in other words, without recording which nodes have already been visited?
No. You always need to remember where you have visited. In the worst case, therefore, you need to record the visited state of all nodes. However, the branching factor and depth of the graph are the main factors. If the graph doesn't branch a lot, you won't need anything like that. If it is highly branching, you tend to the worst case.
Problem: finding shortest paths in an unweighted, undirected graph.
Breadth-first search can find the shortest path between two nodes, but this can take up to O(|V| + |E|) time. A precomputed lookup table would allow requests to be answered in O(1) time, but at the cost of O(|V|^2) space.
What I'm wondering: Is there an algorithm which offers a space-time tradeoff that's more fine-grained? In other words, is there an algorithm which:
Finds shortest paths in more time than O(1), but is faster than a bidirectional breadth-first search
Uses precomputed data which takes up less space than O(|V|^2)?
On the practical side: The graph is 800,000 nodes and is believed to be a small-world network. The all-pairs shortest paths table would be on the order of gigabytes -- that's not outrageous these days, but it doesn't suit our requirements.
However, I am asking my question out of curiosity. What's keeping me up at night is not "how can I reduce cache misses for an all-pairs lookup table?", but "Is there a completely different algorithm out there that I've never heard of?"
The answer may be no, and that's okay.
You should start by looking at Dijkstra's algorithm for finding the shortest path. The a* algorithm is a variant that uses a heuristic to reduce the time taken to calculate the optimal route between the start and goal node (such as the euclidean distance). You can modify this heuristic for performance or accuracy.
It seems as if your input set must be very large, if a lookup table will be too large to store on the disk. I assume that that the data will not fit in RAM then, which means that whatever algorithm you use should be tuned to minimize the amounts of reads and writes. Whenever disks are involved space == time, because writing to disk is so slow.
The exact algorithm you should use depends on what kind of graph you have. This research paper might be of interest to you. Full disclosure: I have not read it myself, but it seems like it might be what you are looking for.
Edit:
If the graph is (almost) connected, which a small-world network is, a lookup table can't be smaller than V^2. This means that all lookups will require disk access. If the edges fit in main memory, it might be faster to just compute the path every time. Otherwise, you might compute the path from a table containing the lengths of all shortests paths. You can reconstruct the path from that table.
The key is to make sure that the entries in the table which are close to each other in either direction are also close to each other on the disk. This storage pattern accomplishes that:
1 2 1 2 5 6
3 4 3 4 7 8
9 10 13 14
11 12 15 16
It will also work well with the cache hierarchy.
In order to compute the table you might use a modified Floyd-Warshall, where you process the data in blocks. This would let you perform the computation in a reasonable amount of time, especially if you parallelize it.