Using graph to translate differing representations

Using graph to translate differing representations - algorithm

I have a data translation problem, and would like guidance on how I can crack it:
I have an inbound list of items that represent train cars assigned to segments along a journey. Each item has an index, a car reference, an origin and a destination.
E.g.
Index Car Origin Destination
1 C1 L1 L2
2 C2 L1 L2
3 C3 L1 L2
4 C1 L2 L3
5 C2 L2 L3
6 C4 L2 L3
The example above show four cars (C1, C2, C3, C4). Cars C1-C2 travels from L1 to L3. C3 travels from L1 to L2. C4 travels from L2 to L3. The index maintains the order of the cars within each 'leg', but it is relative: For the second leg (L2-L3), the first index in use is 4.
I need to translate this into a different model that provides a distinct list of cars, while maintaining the order of the cars within the train.
E.g.
Index Car Origin Destination
1 C1 L1 L3
2 C2 L1 L3
3 C3 L1 L2
4 C4 L2 L3
The second model doesn't allow for a complete re-ordering of the train cars mid journey. I.e. I can't allow cars A, B, C, D to change order to A, C, B, D. I would have to apply some heuristic to obtain the resulting car order, and it would not reflect reality. I'm happy to accept this drawback.
Also, although the target model index specifies car order within the train, it doesn't matter whether I index from the front or the back of the train. It would be nice to use lower indeces for car assignments earlier in the Journey.
So, for the solution: I think I need to employ graphs to make this translation but I'm unsure where to start. I think I should be modeling a car as a vertex, and a coupling of a two cars in the same leg as an edge. But I'm not sure where I go from there.
I'd be very grateful for any pointers on how to approach the problem: Modelling tips, merging algorithms...
Edit
One more complication: On some legs the order of cars in a train may completely reverse. This is used to indicate the train changed direction. I don't need to capture that reversal, but I do need to retain the inter-connected order of the cars.

In essence, and leaving aside the question of reversed segments, this problem reduces to a topological sort for which a number of simple and efficient algorithms exist (see the Wikipedia link for examples). To construct the graph, we use the cars as vertices and insert an edge from Ci to Cj if Ci immediately precedes Cj in some leg. (This minimizes the number of edges, which reduces the cost of an O(V+E) topological sort.)
But that won't work with "reversed" legs; these will cause the topological sort to fail. So the other part of the problem is to detect reversed legs. Here I'm assuming that there is no definite list of reversed legs; if there were, the solution would be obvious.
I think the following will work reasonably efficiently, but it may well not be optimal.
Let's say that two legs are forward compatible if they share at least two cars and the order of the shared cars is identical in the two legs. Similarly, two legs are reverse compatible if they share at least two cars and the order of the shared cars in one is the exact reverse of the order of the shared cars in the other. Finally, two legs are bidirectionally compatible if they share at most one car. (It's possible that two legs don't fit into any of these three categories, in which case they are incompatible and the problem has no solution.)
It's easy to categorize the relationship between two legs. With the right datastructure (a hashtable, for example) finding the list of shared cars between two legs is O(min(m,n)) where m and n are the sizes of the legs (in number of cars), as is checking if the shared cars appear in the same or reversed order in the two lists. So constructing the entire array of relationships between all possible pairs of legs should be O(L·N) where L is the number of legs and N the number of cars. (I don't have a proof of this assertion, so it could be wrong. But it seems reasonable.)
With the graph of compatibilities, we need to assign a direction to each leg. We do this using a traverse of the graph, using the following recursive procedures:
# Direction is either Forward or Reverse. We assume a function reverse(D) which
# returns Reverse if D is Forward, and Forward if D is Reverse
setDirection(Leg, Direction):
+ If the Direction of Leg is Direction, return.
+ If the Direction of Leg is set and not the same as Direction, fail.
+ Otherwise:
+ Set the Direction of Leg to Direction.
+ For each L such that Leg is forward compatible with L:
+ Call SetDirection(L, Direction)
+ For each L such that Leg is reverse compatible with L:
+ Call SetDirection(L, reverse(Direction))
setAllDirections():
+ while some Leg L does not have its direction set:
+ SetDirection(L, Forward)
Now, we can reverse the order of the cars in the legs which are marked as reversed, and apply the topological sort.
Note that it is possible for the above procedure to a consistent set of leg directions which does not correspond to "reality", because the decision to set the initial direction of a new Leg in the last line is totally arbitrary. But I think it is the best that we can do.

Related

Connecting a set of points with horizontally aligned polylines without them crossing

I have a set of 2D points where all values are integers. No points are identical. I want to draw polylines/paths/whatever through all the points with a few restrictions:
1: A line should always move in the positive x-direction. p1.x < p2.x < ...
2: Lines may never cross each other.
3: All polylines needs to begin at x = 0 and end at x-max.
4: It should use as few polylines as possible (or a number defined by me).
I have attached an image of a sample set of points. And a hand made solution I drew with a pencil and ruler.
It's trivial to find a solution by hand, but I have no idea how to describe my process in logical terms. I don't need an optimal solution (whatever that means). It doesn't need to be fast.
Point set (Disregard colors)
Connected Points
My current solution is to step through the set along the x-axis and then try all viable combinations and choosing the one with the lowest total vertical movement. This works in some cases but not all. And it seems to over-complicate the problem.
My next idea is to do a brute force approach with backtracking when collisions occour. But that also seems a bit much.
For anyone wondering the points are actually notes on sheet music. The x-axis is time and the y-axis is pitch. The polylines represent the movement of robotic fingers playing a piano.

We will find a solution which uses the minimum number of robotic fingers (least number of polylines). The trick is to consider your input as a Partially ordered set (or poset). Each point in your input is an element of the poset, and with the relation (p1.x, p1.y) < (p2.x , p2.y) if and only if p1.x < p2.x. This basically means that two points which have the same x-coordinate are incomparable with each other.
For now, let us forget this constraint: "Lines may never cross each other". We'll get back to it at the end.
What you are looking for, is a partition of this poset into chains. This is done using Dilworth's Theorem. It should be clear that if there are 5 points with the same x-coordinate, then we need at least 5 different polylines. What Dilworth's says is that if there is no x-coordinate with more than 5 points on it, then we can get 5 polylines (chains) which cover all the points. And it also gives us a way to find these polylines, which I'm summarizing here:
You just create a bipartite graph G = (U,V,E) where U = V = Set of all input points and where (u,v) is an edge in G if u.x < v.x. Then find a maximum matching, M, in this graph, and consider the set of polylines formed by including u and v in the same polyline whenever there is an edge (u,v) in M.
The only issue now, is that some of these polylines could cross each other. We'll see how to fix that:
First, let us assume that there are only two polylines, L1 and L2. You find the first instance (minimum x-coordinate) of their crossing. Suppose the two line segments which cross each other are AB and CD:
We delete AB and CD and instead add AD and CB:
The polylines still cross each other, but their point of crossing has been delayed. So we can keep repeating this process until there is no crossing left. This takes at most n iterations. Thus we know how to 'untangle' two polylines.
[The edge case of B lying on the segment CD is also handled in the exact same way]
Now, suppose we have k different polylines which the maximum matching has given us: L1, L2, ..., Lk. WLOG, let us assume that at x = 0, L1's y-coordinate is lower than L2's y-coordinate, which is lower than L3's and so on.
Take L1 and find the first time that it crosses with any other polyline. At that crossing, applying the swapping operation as above. Keep repeating this, until L1 does not cross with any other polyline. Now, L1 is at the 'bottom', and doesn't cross with any other line. We now output L1 as one of the final polylines, and delete it from our algo. We then repeat the same process with L2, and after outputting it, delete it, and repeat with L3, and so on.

How to build a graph from cops and robber problem?

This is a 2 part problem that I have given some thought on.
Problem Statement:
In a m by n rectangular field there is a robber R and two cops C1 and C2. Each of the
three start off at some intial square, and at the beginning of the chase, R, C1, C2 all
know each other's positions.
R makes the first move, then C1 and C2. They can only move up, down, left or right. Some
squares are inaccessible because there is an obstacle present. If C1 or C2 reach a square
that R is on then they catch R.
In order to escape, R must reach a square X on the perimeter of the grid. If R reaches the square X before it's caught by C1 or C2, then R successfully escapes. Else, R is unable to escape.
As input we are provided: Values of m (number of rows) and n (number of columns), initial coordinates for R, C1, C2, and a list of inaccessible squares.
I) Using the input provided, how can you use an adjacency list to construct a graph to solve the problem. Analyze the runtime of graph creation.
I was actually thinking of using a adjacency matrix because of the grid representation, but we are asked to use and adjacency list. As a result, I'm confused on what should be considered a vertex and edge in this problem. I was think that every square in the grid will be a vertex and its edges will be all of its neighboring squares, at least the ones it can reach, 4 squares being the maximum. So should my adjacency list store ALL m by n pairs and then for every pair maintain a linked list of neighbors, i.e. squares reachable? If I went with this route there will be (m * n) vertices, and then for each of those I would have to check which squares are reachable (up, down, left, right) and whether that square is inaccessible, so I would have to scan through the inaccessible list provided as input which would take O(n) time. So I guess that would put me up to O(m*n) running time for graph creation. Can I do better than this?
II) Given the graph you create in part (I) describe an algorithm to check if R can escape.
*Assumption: The strategy that R, C1 and C2 is negligible. It doesn't matter if R,C1,C2 move in the "smart" way or completely random.
Since R declares its destination before the chase begins I think it's just a matter of whether there exists a path from where R starts at to its destination square. So can I get away with running DFS and check if R can reach its destination? But, I don't know R will be able to avoid C1 and C2.
Guidance is appreciated.

Sounds like you pretty much know how to build the graph, but it's better to give each vertex a number instead of maintaining (m,n) tuples.
Allocate an array of N * M lists. Each position (x,y) on the grid will correspond to slot x+n*y in that array. That slot will contain a list of adjacent accessible numbers or null if its an obstacle.
For now, initialize the array with an empty list at every position
For each obstacle, set its corresponding array slot to null.
For grid position (x,y), if its a vertex (array[x+n*y]!=null), then check its neighbors to fill out its adjacency list. If array[x+1+n*y]!=null, for example, then the list at [x+n*y] would include [x+1+n*y].
The resulting representation is pretty compact and good for many purposes. Since the vertexes have degree <= 4, an adjacency list is much more efficient than an adjacency matrix.
The remaining part of your program will be greatly simplified as well, since it doesn't have to deal with coordinates or know anything about the original grid.
Unfortunately, the "*Assumption" takes all the fun out of the second part.

Approximate nearest neighbor algorithm for Octrees

Does anyone know the origin (paper, book, etc.) of this approximate nearest neighbor technique for octrees?:
http://www.cs.ucdavis.edu/~amenta/w11/nnLecture.pdf
I am having trouble implementing it from the provided pseudo code. I am also having trouble finding the original publication of the technique, which I am hoping has a little more detail.
Thanks for any help.

This is not the exact answer, but an approximate ( to use the terms of the subject :) ).
It was too big to write it an comment, and I think is good information for a start.
The paper mentions that Voronoi diagrams don't expand in higher dimensions than three and it implies that octrees do. That's wrong, in terms of terminology.
An octree is defined in R^3. Simply put it, you see this kind of data structure in 2D, where we have a quadtree. These kind of trees have 2^D children per node, where D is the dimension. This means:
1. 2D: 2^D children per node, i.e. 4 children per node.
2. 3D: 2^D children per node, i.e. 8 children per node.
3. and so on.
For exampe, octree, comes from the Greek word octo, which means 8 and it implies that this tree has 8 children per node.
I had implemented this kind of tree for NN (Nearest Neighbor) and, even though I had made the tree a polymorphic one to not waste any amount of memory, this wouldn't scale above 10 dimensions.
Moreover, the paper mentions kd-trees. Notice, that when dimensions go high, the query time is no longer O(logn), but it becomes slightly less than brute force approach (i.e. check all points). The higher the dimensions, the worse kd-trees will perform.
A kd-tree is actually a binary tree, embedded in geometry. By that, I mean, that every node has two children and at every level, we halve the dataset (usually in the median of the coordinate with the greatest variance, so that we can exploit the structure of the dataset). And this will result into a perfect tree.
Here you can see a kd-tree, a friend of mine made, of 64 points in 8D. In this version, we store 4 points per leaf.
The numbers in the boxes refer to the point number (starting with 1,
i.e. line numbers in test.points file).
The notation "8 # 0.532" refers to an inner node, where the data is
split at 0.532 along the eighth dimension (again, dimensions starting with
1, for easier human understanding).
That's why, we tend our interest in approximate NN, which means that we pay some loss in accuracy, but we obtain some speedup. (As you may know, everything is a trade-off).
By Box, it probably means a minimum bounding box.
This is simple and here is an example:
Suppose you have, in 2D, this dataset:
-1 -2
0 5
8 -5
In order to construct the Bounding box, we need to find the minimum and the maximum coordinate in every dimension. Note, that for storing the Boudning box, it is enough to store its min and max corner.
Here, we have min = (-1, -5) and max = (8, 5). The bounding box is then, the rectangle, formed, in clockwise order -starting from max corner, the one that has as corners:
( 8, 5) // ( max.x, max.y)
( 8, -5) // ( max.x, min.y)
(-1, -5) // ( min.x, min.y)
(-1, 5) // ( min.x, max.y)
Observe, that all the points of the dataset, lie inside this bounding box.
As for the paper, it's actually a lecture, not a paper. It doesn't explain how one should write the algorithm. Moreover, it doesn't provide any unique information, in order to try to find another .pdf, that explains in more details the .pdf in your link.
[EDIT] for the OP's comment.
1) Q: dequeue box B, containing representative point p
I would say, that dequeue, means extract the "first" element of the queue. Enqueue, means push back an element in the queue. The queue seams to hold Bounding boxes as elements.
2) Q: r = d(q,B)
Maybe, he means from the representative point the box contains. Not clear.
You can compute the (Euclidean) distance from the query point to the closest corner of the box, or to the representative of the box.
3) for all children B' of B containing points in P
P is the dataset. Every box, is partitioned in 8 sub-boxes at every level (in the case of octree).
4) Q: while dN >= (1+e)r do
Approximation error e, is actually, what we call epsilon. It is usually a parameter and it means, that when you check:
while delta >= r do
you are less strict and you do
while delta >= (1 + e)*r do
which means that you are going into the loop less times than the exact condition above.
So, I think it says, to insert every sub-box of box B, in the queue. This is not so clever, IMHO.
About the last comment with e = 0.01, just do the math in the condition above. You will see that the answer is no, since as the link you posted state, e is a multiplicative factor.

Algo. to pick best combination out of possibilities

I have to solve following problem, basically it is to pick a best combination out of possible ones (huge) - I have to pick a best soccer team out of possible players, each player is given a score, I have to pick a team which has highest total score out of selected players.
There is a restriction on the players that I can select: at maximum I can only pick N (=2 for instance) players from a club. E.g if I have picked G1 (from Chelsea) as goalkeeper, then only 1 slot left for Chelsea.
Say I have to pick a best formation of 1-4-4-2
Goalkeeper (1): g1, g2, g3, g4 ... (name of players in this position, and their scores are correspondingly gc1, gc2, gc3, gc4 ...)
Defenders (4): d1, d2, d3, ...
Midfielders (4): m1, m2, m3, ...
Strikers (2): s1, s2, s3, s4, ...
What algorithm can I use here? I am looking at the so-called Hungarian algo:
http://en.wikipedia.org/wiki/Hungarian_algorithm
But it looks complicated and not sure if good for this case.
Any help would be much appreciated.
Best,

It's solvable via the minimum-cost flow problem, a generalization of the problem that the Hungarian algorithm solves.
Create a network with (1) one node per club (2) one node per player (3) one node per open position. Each club node sources two units of flow, and each open-position node sinks units equal to however many players of that position are needed. There are club-to-player arcs for each such relationship, with capacity one and cost zero. There are player-to-open-position arcs also, with capacity one and cost equal to minus that player's value at that position (multiple such arcs involving one player are possible if there are "flex" spots as in fantasy American football).
Find an min-cost integral flow of value eleven. Each player is used as transit for either zero or one unit of flow. The best team is comprised of the latter.

General Design Question

I have a general design question:
There is a junction, with four roads connecting to it. Each road has 2 lanes.
What would be the best way to design a program to handle such junction. It should allow 2 cars 2 go through the junction if they don't interfere each other. and 1 car came in before the other, and they both should use the same part of the junction, the 1st car should get priority. Also, 2 cars may arrive the junction at the exact same time.
What would be the best design for this problem? what exactly should you lock, in order to allow best use of the junction?
Thanks!

Each car should lock parts of the lane it's going to pass trough. If one of the parts is locked then car should wait until it will be released.

What do you think about having 4 different queues for each part of the junction. each car that comes in enters the relevant queue (should enter to more than one queue?), and only after the car leaves all queue it can go through the junction..
Still not sure what is the best implementation for it though.

Create a circle buffer with two entries for each road (one for inbound, one for outbound) meeting at the intersection.
For each car that you have to route put it's name into the circle buffer for its source (inbound) and its destination (outbound). Then iterate through the circle buffer, if you get two instances of the same car together then that car may travel. After that pick at random from the other cars.
I have a feeling that's pretty unclear so consider an intersection with 4 roads that we'll call N, E, S and W. To that we'll have 3 cars, A coming from the East turning South, B from the South travelling North and C from the West travelling East.
Circle buffer can be built as such (i=inbound, o=outbound:
Ni No Ei Eo Si So Wi Wo
B - C A A B - C
As we iterate through from left to right we realise that the two A's are adjacent so they can go but the B's and C's are not adjacent so these cars are block each other. Pick one at random for this light cycle and let the other go in the next light cycle. So either A and B can go or A and C can go.
Note1: testing adjacent ignores blanks so in the case of
Ni No Ei Eo Si So Wi Wo
D E - - E D - -
which models a car travelling North and another travelling South both E's and D's are adjacent.
Note2: I've mapped this out for driving on the left because that's what I do. You'll have to mirror it for driving on the right.
Note3: You can't overwrite a position in the buffer, if two cars want the same destination they're automatically blocking and you should just leave the first one in there and consider the other one next time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio