Given multiple abstract objects, but not their value, and a list of which objects are bigger than other objects, find the biggest object.
One thought I had was representing the objects in a directed graph.An outgoing connection will indicated that the node which the connection originated from is bigger than the node which receives the connection.
Therefore nodes with only outgoing connections will have no known node bigger than itself meaning it's the biggest of them all. If there are multiple nodes with only outgoing connections, then there's not enough given information to definitively pick the biggest node.
For example given this graph we known that nodes 4 and 5 are currently the biggest but it would require a connection to tell who's the biggest between these two.
On the other hand in this graph node 4 is the sole node with only outgoing connections and therefore is the biggest.
This method has worked on the examples I've tried, but I'd like to know if this method is even correct or if somebody can think of an example that will prove the method to be false or if somebody can prove this method is correct or if somebody has a proven working approach to solve this problem.
Thanks in advance!
Related
The problem statement
Given a directed graph of N nodes where each node is pointing to any one of the N nodes (can possibly point to itself). Ishu, the coder, is bored and he has discovered a problem out of it to keep himself busy. Problem is as follows:
A node is 'good' if it satisfies one of the following:
It is the special node (marked as node 1)
It is pointing to the special node (node 1)
It is pointing to a good node.
Ishu is going to change pointers of some nodes to make them all 'good'. You have to find the minimum number of pointers to change in order to make all the nodes good (Thus, a Good Graph).
NOTE: Resultant Graph should hold the property that all nodes are good and each node must point to exactly one node.
Problem Constraints
1 <= N <= 10^5
Input Format
First and only argument is an integer array A containing N numbers all between 1 to N, where i-th number is the number of node that i-th node is pointing to.
Output Format
An Integer denoting minimum number of pointer changes.
My Attempted Solution
I tried building a graph by reversing the edges, and then trying to colour connected nodes, the answer would be colors - 1. In short , I was attempting to find number of connected components for a directed graph, which does not make sense(as directed graphs do not have any concept of connected components). Other solutions on the web like this and this also point out to find number of connected components, but again connected components for aa directed graph does not make any sense to me. The question looks a bit trickier to me than a first look at it suggests.
#mcdowella gives an accurate characterization of the graph -- in each connected component, all chains of pointers and up in the same dead end or cycle.
There's a complication if the special node has a non-null pointer, though. If the special node has a non-null pointer, then it may or may not be in the terminal cycle for its connected component. First set the special node pointer to null to make this problem go away, and then:
If the graph has m connected components, then m-1 pointers would have to be changed to make all nodes good.
Since you don't have to actually change the pointers, all you need to do to solve the problem is count the connected components in the graph.
There are many ways to do this. If you think of the edges as undirected, then you could use BFS or DFS to trace each connected component, for example.
With the kind of graph you actually have, though, where each node has a directed pointer, the easiest way to solve this is with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Initially you just put each node in its own set, then union the sets on either side of each pointer, and then count the number of sets by counting the number of roots in the disjoint set structure.
If you haven't implemented a disjoint set structure before, it's a little tough to understand just how easy it is. There are couple simple implementations here: How to properly implement disjoint set data structure for finding spanning forests in Python?
I think this is a very similar question to Graphic: Rerouting problem test in python language. Because each node points to just one other node, your graph consists of a set of cycles (counting a node pointing to itself as a cycle) with trees feeding in to them. You can find a cycle by following pointers between nodes and checking for nodes that you have already visited. You need to break one link in each cycle, redirecting it to point to the special node. This routes everything else in that cycle, and all the trees that feed into it, to the special node.
What I want to solve:
I want to detect newly created unreachable subgraph(marking nodes that unreachable) in given directed graph if I cut the specific edge.
The restriction of this problem:
The given graph is directed graph (see useful information below.)
The number of the nodes are more than 100,000.
The number of the edges are around 1.5x of nodes.
Running time of the solution should be less than a second.
The information that might be useful:
The given graph was made by connecting numerous cycles. And, there are at least 1 route exists from any node to other node.
A few (~ 10%) of the nodes have the branch. No more than 3 edges on the node exists in the graph.
The meaning of "unreachable area" is including "not connected", but you can ignore this if you think this is mixing two different problems into one.
My trials
When I met this problem, I tried in 4-ways but no luck above of them.
Find the another path can replace the cut node.
This method is rejected because of running time of method. Currently we use Dijkstra Algorithm for path-find and when I tried this method by putting into job queue, the job queue was flooded in less than an hour.
Check level of edges (like packets' Time-To-Live from network.)
Search from edge node with given threshold level.
If I met the branch, keep previous level. Otherwise, decrease level.
If level is 0, do nothing.
Current temporal solution is this one, but obviously this solution ignores a lot of corner cases.
Simulate flow network to the graph.
It's simple:
Give a threshold(like 100) to every nodes and simulate its flow.
If I met the branch, split number into each branch.
Check the values that is lower than 1.
But this method is also rejected because of Time complexity.
SCC and Topological sorting.
Lastly, I check the Strongly-Connected-Components with Topological orders. (Of course I know I used wrong word, see below)
The idea is, topological sorting is used for DAG(Directed Acyclic Graph), but If I add some rules(like "If I detect cycle, treat that cycle as a virtual node, recursively", using SCC), I can check the "topological orders" for general directed graph. If I found the topological orders, this means that there is an area that unreachable. (It's hard to say, think about it with method 3: simulate flow network)
I think this approach is the best one, and might be solve the problem, but I have no ideas about keywords that should I search and learn about it. Same as implementation.
EDIT
I forgot the explanation of unreachable means. If there is no route from a node(node 'A') to any other node, node 'A' is "unreachable". Initially, at given digraph, there are no unreachable node exists.
In this problem, let's assume that node 1 is the master node. if there is no route from node 1 to node 2, then node 2 is unreachable.
I need an algorithm to find ANY path from point A to point B in a graph.
The problem is that finding out wich nodes can follow a specific one needs a quite lengthy matlab simulation, so i want to access as few nodes as possible.
I know some heuristics about the graph, I.E. every node has some coordinates and follow-up nodes are always "near" the previous one, but there is not always a connection between two close nodes.
I am not searching for an optimal path, or even a short one. I just need any connection.
My first try was some simple greedy algorithm that always picks a follow up node closest to the final node, but this ended in dead ends very often. This wouldn't be a problem, but i have no idea how to efficiently move out of a deadend, currently i simply move through all nodes inside the dead end until i find a better way.
Here is a drawing of an example where i already know the solution:
There are many nodes, so calculating the edges for every node in this small dead end on the top takes about 1h20min. (You can assume every pixel in the picture is a node.)
To put it in short words: how do i find a good way around the obstacle without looking at every node inside a whole area.
Sorry if this a silly question but i'm an engineer and never had a formal education in programming aside from making a LED blink.
Thanks in advance!
I'm trying to optimize a graph-traversal problem, but can't figure out the best way to tackle it. It seems neither like A* search problem (because we want to maximize the path rather than minimizing it), nor traveling salesman problem (because we don't have to visit all cities). The simplified version of it is something along these lines:
We have a set of nodes and connections/edges. Connections are arbitrary and nodes can have one or more of them. Connections also have an interface/type associated with them, and interfaces can't support more than a single connection. So for example, if node A can connect to nodes B or C via interface alpha, and we decide to connect it to node B, that interface on node A can no longer support other connections, so C can't be connected to A anymore. However, we could connect node C to node D, if it happens to have the same alpha interface.
I should also mention that these interfaces work like lock-and-key, so A can connect to either B or C, but B and C can't connect to eachother (the interface is like a mirror). Also, while A can no longer connect to anything via the alpha interface because it's used by B, if it happens to have another interface (bravo) and something else can connect to bravo, then we can connect more than one node to A. The goal is to obtain the largest group of connected nodes (discarding all smaller groups).
There are a few heuristics I'm considering:
prefer nodes with more interfaces (I already discarded interfaces without pairs)
prefer interfaces that are more popular
The above two rules can be useful for prioritizing which node to try connecting to next (for now I naively grouped them into one rank - total number of connectable nodes), but my gut is telling me I can do better. Moreover, I don't think this would favor an optimal solution.
I was trying to figure out if I can invert the heuristic somehow to create a variation of A* Search such that the A* 'optimistic heuristic cost' rule still applies (i.e. heuristic cost = number of nodes discarded, however, this breaks the actual cost computation - since we'd be starting with all but one node discarded).
Another idea I had was computing the distance (number of intermediate nodes) to each node from the starting node and using the average of that as a heuristic, with goal being all nodes connected. However, I'm not guaranteed that all nodes will connect.
EDIT:
Here is an example
dashed lines represent allowed (but not activated/traveled) connections
interfaces are not allowed to connect to the interface with identical name, but can connect to the ' version of itself
interface can only be used once (if we connect A to B via α, we can no longer connect A to C because A no longer has interface α available)
number of nodes is arbitrary (but constant during the algorithm's execution), and should be assumed to be very large
number of interfaces per node is going to be at least one, we could assume an upper limit if it makes the problem easier - i.e. 3
number of possible connections is simply a function of interface compatibility, interface defines what the node can connect to, whether/how you use that interface is up to you
direction/order of activating the connections doesn't matter
the goal is to generate the largest set of connected nodes (we don't care about number of connections or interfaces used)
Here's my situation. I have a graph that has different sets of data being added at different times. For example, set1 might have a few thousand nodes and then set2 comes in later and we apply business logic to create edges from set1 to set2(and disgard any Vertices from set1 that do not have edges to set2). Then at a later point, we get set3, set4, and so on and the same process applies between each set and its previous set.
Question, what's the best way to organize this? What I did before was name the nodes set1-xx, set2-xx,etc.. The problem I faced was when I was trying to run analytics between the current set and the previous set I would have to run a loop through the entire graph and look for all the nodes that started with 'setx'. It took a long time as the graph grew, so I thought of another solution which was to create a node called 'set1' and have it connected to all nodes for that particular set. I am testing it but I was wondering if there way a more efficient way or a build in way of handling data structures like this? Is there a way to somehow segment data like this?
I think a general solution would be application but if it helps I'm using neo4j(so any specific solution to that database would be good as well).
You have a very special type of a directed graph, called a layered graph.
The choice of the data structure depends primarily on the expected graph density (how many nodes from a previous set/layer are typically connected to a node in the current set/layer) and on the operations that you need to perform on it most of the time. It is definitely a good idea to have each layer directly represented by a numeric index (that is, the outermost structure will be an array of sets/layers), and presumably you can also use one array of vertices per layer. However, the list of edges per vertex (out only, or in and out sets of edges depending on whether you ever traverse the layers backward) may be any of the following:
Linked list of vertex identifiers; this is good if the graph is very sparse and edges are often added/removed.
Sorted array of vertex identifiers; this is good if the graph is quite sparse and immutable.
Array of booleans, indexed by vertex identifiers, determining whether a given vertex is or is not linked by an edge from the current vertex; this is good if the graph is dense.
The "vertex identifier" can take many forms. For example, it can be an index into the array of vertices on the next layer.
Your second solution is what I would do- create a setX node and connect all nodes belonging to that set to setX. That way your data is partitioned and it is easier to query.