Is there a way to turn a DCG to a DAG without losing any connections? - algorithm

Im currently working on a homework problem that asks us to write an algorithm that checks if a directed cyclic graph is semi connected or not. My current thought process is (if possible) turn DCG to DAG to Topological. I’m stuck on whether or not it’s possible to turn a DCG to DAG without losing any connections. Everything I’ve come up with causes a connection to be lost.

Related

How do I process a graph that is constantly updating, with low latency?

I am working on a project that involves many clients connecting to a server(servers if need be) that contains a bunch of graph info (node attributes and edges). They will have the option to introduce a new node or edge anytime they want and then request some information from the graph as a whole (shortest distance between two nodes, graph coloring, etc).
This is obviously quite easy to develop the naive algorithm for, but then I am trying to learn to scale this so that it can handle many users updating the graph at the same time, many users requesting information from the graph, and the possibility of handling a very large (500k +) nodes and possibly a very large number of edges as well.
The challenges I can foresee:
with a constantly updating graph, I need to process the whole graph every time someone requests information...which will increase computation time and latency quite a bit
with a very large graph, the computation time and latency will obviously be a lot higher (I read that this was remedied by some companies by batch processing a ton of results and storing them with an index for later use...but then since my graph is being constantly updated and users want the most up to date info, this is not a viable solution)
a large number of users requesting information which will be quite a load on the servers since it has to process the graph that many times
How do I start facing these challenges? I looked at hadoop and spark, but they seem have high latency solutions (with batch processing) or solutions that address problems where the graph is not constantly changing.
I had the idea of maybe processing different parts of the graph and indexing them, then keeping track of where the graph is updated and re-process that section of the graph (a kind of distributed dynamic programming approach), but im not sure how feasible that is.
Thanks!
How do I start facing these challenges?
I'm going to answer this question, because it's the important one. You've enumerated a number of valid concerns, all of which you'll need to deal with and none of which I'll address directly.
In order to start, you need to finish defining your semantics. You might think you're done, but you're not. When you say "users want the most up to date info", does "up to date" mean
"everything in the past", which leads to total serialization of each transaction to the graph, so that answers reflect every possible piece of information?
Or "everything transacted more than X seconds ago", which leads to partial serialization, which multiple database states in the present that are progressively serialized into the past?
If 1. is required, you may well have unavoidable hot spots in your code, depending on the application. You have immediate information for when to roll back a transaction because it of inconsistency.
If 2. is acceptable, you have the possibility for much better performance. There are tradeoffs, though. You'll have situations where you have to roll back a transaction after initial acceptance.
Once you've answered this question, you've started facing your challenges and, I assume, will have further questions.
I don't know much about graphs, but I do understand a bit of networking.
One rule I try to keep in mind is... don't do work on the server side if you can get the client to do it.
All your server needs to do is maintain the raw data, serve raw data to clients, and notify connected clients when data changes.
The clients can have their own copy of raw data and then generate calculations/visualizations based on what they know and the updates they receive.
Clients only need to know if there are new records or if old records have changed.
If, for some reason, you ABSOLUTELY have to process data server side and send it to the client (for example, client is 3rd party software, not something you have control over and it expects processed data, not raw data), THEN, you do have a bit of an issue, so get a bad ass server... or 3 or 30. In this case, I would have to know exactly what the data is and how it's being processed in order to make any kind of suggestions on scaled configuration.

How does Paxos handle packet loss and new node joining?

Recently I'm learning Paxos, until now I already have a basic understanding of how it works. But can anyone explain how Paxos handles packet loss and a new node joining? Could be better if a simple example is provided.
The classical Paxos algorithm does not have a concept of "new nodes joining". Some Paoxs variants do, such as "Vertical Paxos", but the classic algorithm requires that all nodes be statically defined before running the algorithm. With respect to packet loss, Paxos uses a very simple infinite loop: "try a round of the algorithm, if anything at all goes wrong, try another round". So if too many packets are lost in the 1st attempt at achieving resolution (which can be detected via a simple timeout on waiting for replies), a second round can be attempted. If the timeout for that round expires, try again, and so on.
Exactly how packet loss is to be detected and handled is something the Paxos algorithm leaves undefined. It's an implementation-specific detail. This is actually a good thing for production environments since how this is handled can have a pretty big performance impact on Paxos-based systems.
About packet loss, Paxos uses the next assumption about network:
Messages may be lost, reordered, or duplicated.
This is solved via quorums. At least X of all Acceptors must accept a value in order for the system to accept it. This also solves the issue when a node if failing.
About new node joining, Paxos is not focus about how the node detects other nodes. That is a problem solved by other algorithms.
They automagically know all the nodes and each one's role
If you want, for production code implementation, you can use Zookeeper to solve this new node detection.
As pointed out in other answers message loss or message reordering is handled by the algorithm: it is designed to exactly to handle those cases.
New nodes joining is a matter of "cluster membership changes". There is a common misconception that cluster membership changes are not covered by Paxos; yet they are described in the 2001 paper Paxos Made Simple in the last paragraph. In this blog post I discuss it. There is a question of how a new node gets a copy of all the state when it joins the cluster. That is discussed in this answer.

Pathfinding with limited knowledge and no distance heuristic

I'm having trouble writing the pathfinding routine for the AI in a simple Elite-esque game I'm writing.
The world in this game is a handful of "systems" connected by "wormholes", and a ship can jump from the system it's in to any system it's linked to once per turn. The AI is limited to only knowing things that it should know; it doesn't know what links come from a system it hasn't been to (though it can work it out from the systems it has seen, since links are two-way). Other parts of the AI decide which system the ship needs to get to based on what goods it has in its inventory and how much it remembers things being worth on systems it has passed through.
The problem is, I don't know how to approach the problem of finding a path to the target system. I can't use A*; there's no way to determine the 'distance' to another system without pathing to it. I also need this algorithm to be efficient, since it'll need to run about 100 times every time the player takes his turn.
Does anyone know of a suitable algorithm?
I ended up implementing a bidirectional, greedy version of breadth-first search, which suits the purpose well enough. To put it simply, I just had the program look through each node its starting node connected through, then each node those nodes connected to, then each node those connected to... until the destination node was found.
Normally one would build a list of appropriate paths and pick the shortest one, but I tried a different method; I had the program run two searches in parallel, one from the starting point, and one from the end point. When the 'from' search found the last node of the 'to' search, the path was considered found.
It then optimizes the path by checking if each node on the path connects to a node further up in the path, and deleting each node in between them.
Whether or not this funky algorithm is actually any better than a straight BFS remains to be seen.
When it comess to unknown environments, I usually use an evolutionary algorithm approach. It doesn't guarantee that you'll find the best solution in the small timeframe you have, but is a way to approach such a problem.
Have a look at Partially Observable Markov Decision Problems (POMDP). You should be able to express your problem with this model.
Then you can use an algorithm that solves these problems to try to find a good solution. Note that solving POMDPs is usually very expensive and you probably have to resort to approximate methods.
Easiest way to cheat this would be o go through, or at least attempt to access as many systems as possible, then implement the distance heuristic as the sum of all the systems you've been to.
Alternatively, and way cooler:
I've implemented something similar using ACO (Ant colony optimization) and worked pretty well combined with PSO(particle swarm optimization), however, the additional constraints your system is imposing means that you'll have to spend a few (at least one) sessions figuring out the environment layout, and if it's dynamic... well... though.
The good thing is that this algorithm completely bypasses the need for heuristic generation which is what you need since you are flying blind. Be advised though, that this is a bad idea if your search space (number of runs) is small. (100 may be acceptable, but 10 or 5 ... not so much).
This scales up quite nicely when dealing with large numbers of nodes (systems) and it bypasses the heuristic distance computational need for every node-to-node relationship, thereby making it more efficient.
Good luck.

Is there a well known algorithm fill in the grid given a set of points?

I saw this game here Flow, it looks quite interesting.
Connect matching colors with pipe to create a flow. Pair all colors,
and cover the entire board to solve each puzzle. But watch out, pipes
will break if they cross or overlap.
Given a set of pairs (x, y), is there an algorithm to solve the puzzle, i.e. fill in the whole grid (assuming there is a solution) that I'm not aware of?
This is a very specific instance of the global routing problem. Global routing is a well studied problem in VLSI CAD (where one needs to route millions of nets in an integrated circuit). The problem is NP-complete and can be solved in many ways depending upon the tradeoff you need between runtime and quality. Following wiki is a good starting point:
https://en.wikipedia.org/wiki/Routing_(electronic_design_automation)
Paper here gives a survey of various techniques:
http://dropzone.tamu.edu/~jhu/publications/HuIntegration01.pdf
Bear in mind that the pointers I had given typically try to solve a far more complex version of the problem you had stated. Never-the-less, the mathematical concepts remain the same.

critical path analysis

I'm trying to write a VB6 program (for a laugh) that will compute event times + the critical path JUST BASED ON A PRECEDENCE TABLE. I want my students to use it as a checking mechanism ie. to do everything without drawing the activity network. I'm happy that I can do all this once I've got start and finish events for each activity. How do I allocate events without drawing the network. Everything I come up with works for a specific example and then doesn't work for another one. I need a more general algorithm and it's driving me mental. Help!
I am not a professional programmer - I do this in my spare time to create teaching resources - simple English would really be appreciated.
Okay, so you have a precedence table, which I take to be a table of pairs like
A→B
B→C
and so forth, for activities {A,B,C}. Each of the activities also has a duration and (maybe) a distribution on the duration, so you know A takes 3 days, B takes 2, and so on. This would be interpreted as "A must be finished before B which must be finished before C".
Right?
Now, the obvious thing to do is construct the graph of activities and arrows -- in fact, you basically have the graph there in incidence-list form. The critical part is the greatest-weight (biggest sum of times) path. This is a longest-path problem, and assuming your chart isn't cyclic (which would be bad anyway) it can be solved with topological sort or transitive closure.

Resources