Consider a Graph connecting various cities through railways. Every Node is a city which has various railway lines (edges) to reach the other city. You need to find if a meeting point exists i.e. one such route (i.e. sequence of lines) which when taken always arrives to the same city no matter from which city you start.
Eg.
Consider Graph G = [[2, 1], [2, 0], [3, 1], [1, 0]]. The k^th element of Graph (counting from 0) will give the list of stations directly reachable from station k.
The outgoing lines are numbered 0, 1, 2... The r^th element of the list for station k, gives the number of the station directly reachable by taking line r from station k.
Then one could take the path [1, 0]. That is, from the starting station, take the second direction, then the first. If the first direction was the red line, and the second was the green line, you could phrase this as:
if you are lost, take the green line for 1 stop, then the red line for 1 stop.
So, consider following the directions starting at each
0 -> 1 -> 2.
1 -> 0 -> 2.
2 -> 1 -> 2.
3 -> 0 -> 2.
So, no matter the starting station, the path leads to station 2.
The limits for lines is from 0 to 5 and the limits for station is 2 to 50. So in the worst case there might be 2^(49*5) subsets of routes so brute force is out of the question.
Edit1 :
After mcdowella mentioned this problem being called Synchronising sequences in DFAs
Ans also, I am interested only if a meeting path exists or not I found out this pdf (slide 5) states that
Adler and Weiss, 1970 (Conjecture)
Every finite strongly connected aperiodic directed graph of uniform outdegree has
a synchronizing coloring.
Alternatively,
Every strongly connected graph with 'x' cycles all having gcd 1 (which states
aperiodicity) has a meeting path.
Which works for most cases. However, it's not hard to come up with something like this :
Which is neither strongly connected, so aperiodicity becomes out of question. And still has a meeting path [0 -> 1]. So what am I missing here?
You don't say what to do if the path says to go out along x and there is no outgoing link labelled x, so I am going to suppose that all nodes have a full set of outgoing links, or we treat such missing links as links back to the current node, or copies of the link labelled 0, or something.
I start with a set of possible nodes that we may be on, initialised to the set of all nodes.
For each label, take the set of possible nodes and compute the set of nodes that you get by going from any node in the current set, following the current label, to another node. If, for each possible label, the result is always the same set as the current set of possible nodes, give up. This means that each label maps each node in the current set to a different node, and, given any node in the current set and any path, of whatever length, you can find a unique node in the current set with a path that ends in the chosen node, so the situation looks hopeless to me.
If, for some label, the set of nodes after applying this label to the current set is smaller than the current set, note down that label, make the new smaller set the current set, and repeat.
If this process terminates in a set of size one you have worked out a path that ends with that node a meeting point, and the path is no longer than the number of nodes in the original graph, since each step reduces the number of nodes in the graph by at least one. Each step costs you at most the number of edges in the graph, so for a graph with N nodes and K labels per node, the cost is at most KN^2.
In fact, since the check at each stage amounts to looking for at least one node in the current set with two incoming edges with the same label on it, and then removing all nodes in the current set which don't have an incoming edge with the chosen label, I would hope that you can make the cost at each step linear in the number of nodes discarded, and argue that the total cost is something below O(KN^2)
(I'm pretty sure that I have seen this worked out properly somewhere as an exercise in robot navigation or something so a web search might be more reliable than reading this, but I've had fun writing it, and it looks plausible to me).
Edit -
It appears that the problem is referred to as a search for synchronizing sequences for finite automata. math.uni.wroc.pl/~kisiel/auto/Homing.pdf looks very promising but I haven't gone through it in detail.
TL;DR: There is an O(n^2) algorithm which determines if a meeting path exists, which is described here:
Consider P(G), the power graph of the original graph G. The power graph is created by taking all subsets of the set of nodes of the original graph G, and making each of those subsets into nodes themselves. The edges connect nodes as follows:
(using G = [[2, 1], [2, 0], [3, 1], [1, 0]]), and looking at the edge 1 (or line 1, as your problem states)
{0, 1, 2} -> {1, 0, 1} = {0, 1}, since, when taking line 1, 0 -> 1, 2 -> 1, 1 -> 0.
{0, 3} -> {1, 0}, since, when taking line 1, 0 -> 1 and 3 -> 0.
etc.
Now, if there are n stations, then, if there exists a path from the node {0, 1, ..., n-1} in P(G) to a singleton (set of one element) node in P(G), there is a meeting point. Because, if you take that path as the series of lines, starting at any of the stations will end in the same station. Now, creating the powerset is of course very expensive (O(2^n)), but making an important remark causes the amount of computation to be O(n^2).
This remark is very similar to that of (Černý, 1964) on the word synchronization problem of DFA's. A proof sketch of this remark is at the end this answer. The remark is that, when looking at the power graph of G:
Every node representing a subset of size 2 has a path to a singleton node if and only if there exists a meeting path
That is, when we create P(G), we only need to create nodes that represent subsets of size 2 or less. This means that P(G) will have only n^2 nodes.
So, essentially the algorithm is:
Create P_2(G), the power graph of G where each node represents a subset of size 2 or less.
For each node representing a subset of size 2 in P_2(G):
If there is no path from this node to a singleton node, return False
return True (will only happen if, for every node representing a subset of size 2, there is a path to a singleton node).
Part 2 of the algorithm can be done with DFS: you can reverse all of the edges of P_2(G) and begin the DFS stack with all of the singleton nodes. Then, if the DFS tree contains all of the nodes representing subsets of size 2, then all of the nodes representing subsets of size 2 have a path to a singleton node.
Part 1 is O(n^2) and Part 2 can be done in O(n^2) by reversing the edges of the graph and performing DFS as described above.
I hope this has been mostly clear.
Proof of the remark:
We first treat the direction where, a meeting path exists implies that, for every node representing a subset of size 2, there is a path to a singleton node. Just take the meeting path. Since it's a meeting path, starting at any node in G ends up in the same node. So, taking the meeting path will reach a singleton node.
Now, if for every node representing a subset of size 2, there is a path to a singleton node: then, one can construct a meeting path. Take two different nodes representing subsets of size 2, name them A and B. Then, take A's path to a singleton. Taking this path, A -> {i} and B -> C, For some C and {i}. Then, take C's path to a singleton. {i} -> {j} and C -> {k}. Then, there is a meeting path for the node representing {j, k}. So, we can find a meeting path for the union of A and B. Thus, we can do this for any pair of nodes. Inductively, you can find a meeting path for the union of any set of nodes representing subsets of size 2, so this can be done until the entire set of nodes is the starting point and a singleton node is the end.
As I already wrote in a comment on mcdowella's comment, I suspect this problem to be NP-complete. But a reasonable heuristic may allow you to get to the target in a reasonable time.
The following idea is based on A*. You represent the current set of stations as nodes (i.e. every set corresponds to one node). Since you have at most 50 stations, every step can be represented as a 64-bit number (where each number describes if the according station is in the set).
You want to maintain a list O of open nodes and a list V visited nodes. Start with C and O containing a single node that represents all stations.
The algorithm then has the following structure:
Choose the node in O with the least number of stations -> n.
Remove n from O.
For every line
transform n to the set of stations nt that result from travelling from the stations in n with the specified line (and count the number of stations to speed things up).
If nt has a single station, you have found a meeting point (there may be more than one)
If nt has not been visited before, add it to O and V and set its path to the current path.
If nt has already been visited, you have two choices. Either update the path of nt to achieve a path of minimal length or just ignore it.
Go to 1
The size of this graph is exponential in the length of the path. Therefore, this algorithm has exponential worst-case time and space complexity. However, since you always pick the node with the fewest stations, you take a step that is assumed to take you to the target on the fastest route. This may be a wrong step, which is why we need to keep the remaining graph. You will also calculate the graph on the fly, which avoids keeping the entire graph in memory (unless there is no solution).
Related
We have a directed weighted graph where an edge between two nodes can have more than one possible cost value (more precisely, at most 2 costs). I need to use a time-dependent variant of the Dijkstra's algorithm that can handle two possible ways of getting from one node to another, the cost between the nodes (edge cost) being dependant on the time at which we arrive at the source node and the type of edge we are about to use. When traversing from one node to the other only one of these edges is picked and its cost is added to the same total cost.
I currently model the two possible costs for an edge as two separate edges between the same nodes.
There is a similar problem I found here and it was suggested to augment the graph by duplicating the nodes. However, this does not allow returning to the original graph and implies the overhead of, well, duplicating all the nodes and possibly edges between them and original nodes.
Do you have any suggestions as to how to tackle this problem with as little overhead as possible? (The original graph is expected to be huge)
Thanks
Edit:
I provided more details about the problem in the first paragraph
You can safely ignore the largest of the two costs for algorithm purposes.
Assume there is a shortest path the uses the largest cost between two vertices, you can change it to use the smallest cost and the path will cost less, and that contradicts the assumption.
I think you can hack step 3 of Dijsktra's algorithm :
For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one. For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.
In your setup, you have two distances from A to B, depending on how late it is. You use the second one if your current distance to A is above your time treshold.
This step becomes :
if current distance to A above threshold :
current distance to B = min(current distance to B, current distance to A + d2(A, B))
else:
current distance to B = min(current distance to B, current distance to A + d1(A, B))
I'm trying to generate an undirected graph in which each node has a maximum degree associated with it. That is, if a node has a maximum degree of 2, it can connect to at most two nodes (connected node would be allowed, but not 0). My problem is that I'm trying to generate a graph in which its possible to get from one node to the other. Currently, I can have nodes "randomly" connect to one other, but the problem is that its possible to create divided graphs, ie if you have 10 nodes, then sometimes inadvertently two graphs of 5 nodes each forms. If anyone knows of an efficient solution, I'd love to hear it!
EDIT: Suppose that I have a graph with ten nodes, and I specify a maximum degree of 2. In this case, here is something that would be desirable:
Whereas this is what I'm trying to avoid:
Both graphs have a maximum degree of 2 per node, but in the second image, it's not possible to select an arbitrary node and be able to get to any other arbitrary node.
This problem is a pretty well-known problem in graph theory, soluble in polynomial time, the name of which I forget (which is probably "find a graph given its degree sequence"). Anyhow, Király's solution is a nice way to do it, explained much better here than by me. This algorithm solves for the exact graphs that satisfy the given degree sequence, but it should be easy to modify for your more loose constraints.
The obvious solution would be to build it as an N-way tree -- if the maximum degree is two, you end up with a binary tree.
To make it undirected, you'll have pointers not only to the "child" nodes, but also a backward pointer to the "parent" node. At least presumably, that one doesn't count toward the degree of the node (if it does, your degree of two basically ends up as a doubly-linked linear list instead of a tree).
Edit: post-clarification, it appears that the latter really is the case. Although they're drawn different (with links going in various different directions) your first picture showing the desired result is topologically just a linear linked list. As noted above, since you want an undirected graph, it ends up as a doubly linked list.
It sounds like you already know what the graph should look like, so I believe if you can use a depth-first search approach. Although breath-first search can be used to avoid recursion.
For example, if you have the nodes 1-5, and k=2, then you can build a graph by starting at node 1, and then simply randomly choosing an unvisited node. Like so:
1 [Start at 1]
1-2 [expand 2, add edge(1,2) to graph]
1-2-3 [expand 3, add edge(2,3) to graph]
1-2-3-4 [expand 4, add edge(3,4) to graph]
1-2-3-4-5 [expand 5, add edge(4,5) to graph]
1-2-3-4-5-1 [expand 1, add edge(5,1) to graph] (this step may or may not be done)
If an edge is never used twice, then p paths will lead to degree p*2 overall, with the degree of the start and end nodes dependent on if the paths are really a tour. To avoid duplicate work, it is probably easier to just label of the vertices as the integers 1 through N, then create edges such that each vertex, v, connects to the vertex numbered (v+j) mod (N+1) where j and (N+1) are co-prime < N-1. The last bit making things a bit problematic, as the number of co-primes from 1 to N can be limited if N is not prime. This means solutions don't exist for certain values, at least in the form of a new Hamiltonian path/tour. However, if you ignore the co-prime aspect and simply make j be integers from 1 thru p, then go through each vertex and create the edges (instead of using the path approach), you can make all the vertices have degree k, where k is an even number >= 2. This is achievable in O(N*k), although it may be pushed back as far as O(N^2) if co-prime method is used.
Thus the path for k=4 would look like this, if started at 1, with j=2:
1 [Start at 1]
1-3 [expand 3, add edge(1,3) to graph]
1-3-5 [expand 5, add edge(3,5) to graph]
1-3-5-2 [expand 2, add edge(5,2) to graph]
1-3-5-2-4 [expand 4, add edge(2,4) to graph]
1-3-5-2-4-1 [expand 1, add edge(4,1) to graph] (this step may or may not be done)
Since |V| = 5 and k = 4, the resulting edges form a complete graph, which is expected. It's also works out since 2 and 5 are co-prime.
Obtaining an odd degree is a bit more difficult. First obtain the degree k-1, then edges are added in such a way an odd degree is obtained overall. It seems fairly easy to get very close (with one or two exceptions) to all edges being an odd degree, but it seems impossible or at least very difficult with odd number of vertices, and requires a careful selection of edges with even number of vertices. The section of which, isn't easy to put into an algorithm. However, it can be approximated by simply picking two unused vertices and creating an edge between them such that the vertices are not used twice, and the edges are not used twice.
For a project of mine, I'm attempting to create a solver that, given a random set of weighted nodes with weighted paths, will find the highest scoring path with a finite number of moves. I've created a visual to help describe the problem.
This example has all the connection edges shown for completeness. The number on edges are traversal costs and numbers inside nodes are scores. A node is only counted when traversed to and cannot traverse to itself from itself.
As you can see from the description in the image, there is a start/finish node with randomly placed nodes that each have a arbitrary score. Every node is connected to all other nodes and every connection has an arbitrary weight that subtracts from the total number of move units remaining. For simplicity, you could assume the weight of a connection is a function of distance. Nodes can be traveled to more than once and their score is applied again. The goal is to find a loop path that has the highest score for the given move limit.
The solver will never be dealing with more than 30 nodes, usually dealing with 10-15 nodes. I still need to try and make it as fast as possible.
Any ideas on algorithms or methods that would help me solve this problem other than pure brute force methods?
Here's an O(m n^2)-time algorithm, where m is the number of moves and n is the number of nodes.
For every time t in {0, 1, ..., m} and every node v, compute the maximum score of a t-step walk that begins at the start node and ends at v as follows. If t = 0, then there's only walk, namely, doing nothing at the start node, so the maximum for (0, v) is 0 if v is the start node and -infinity (i.e., impossible) otherwise.
For t > 0, we use the entries for t - 1 to compute the entries for t. To compute the (t, v) entry, we add the score for v to the difference of the maximum over all nodes w of the (t - 1, w) entry minus the transition penalty from w to v. In other words, an optimal t-step walk to v consists of a step from some node w to v preceded by a (t - 1)-step walk to w, and this (t - 1)-step walk must be optimal because history does not influence future scoring.
At the end, we look at the (m, start node) entry. To recover the actual walk involves working backward and determining repeatedly which w was the best node to have come from.
I recently came across this (Edit: Problem A) interesting problem from Spotify's hacker challenge earlier this year which involves determining the switching at train truck junctions to route a train back to it's starting point. The train must arrive facing the same direction it left and the train can never reverse on the tracks.
As I understand it, the problem can be modeled as an undirected(?) graph where we must find the shortest cycle from a certain vertex, or detect that no such cycle exists. However, the interesting part is that for a vertex, v, the vertices adjacent to v are dependent on the path taken to v, so in a sense the graph could be considered directed, though this direction is path-dependent.
My first thought was to model each node as 3 separate vertices, A, B and C, where A <-> B and A <-> C, and then use a breadth-first search to build a search tree until we find the original vertex, but this is complicated by the caveat above, namely that the adjacencies for a given vertex depend on the previous vertex we visited. This means that in our BFS tree, nodes can have multiple parents.
Obviously a simple BFS search won't be sufficient to solve this problem. I know there are algorithms that exist to detect cycles in a graph. One approach might be to detect all the cycles, then for each cycle, detect whether the path is valid. (i.e., does not reverse direction)
Does anyone else have any insights on approaches to solving this problem?
UPDATE:
I followed the approach suggested by #Karussell in the comments.
Here is my solution on github.
The trick was to model the situation using an edge-based graph, not a traditional vertex-based graph. The input file supplied in the contest is conveniently specified in terms of edges already, so this file can be easily used to build an edge-based graph.
The program uses two important classes: Road and Solver. A Road has two integer fields, j1 and j2. j1 represents the source junction and j2 represents the target junction. Each road is one-way, meaning that you can only travel from j1 to j2. Each Road also includes a LinkedList of adjacent Roads and a parent Road. The Road class also includes static methods to convert between the Strings used in the input file and integer indexes representing the A, B, and C points at each junction.
For each entry in the input file, we add two Roads to a HashMap, one Road for each direction between the two junctions. We now have a list of all of the Roads that run between junctions. We just need to connect the roads together at the junctions through the A, B and C switches. If a Road ends at Junction.A, we look up the roads that begin at Junction.B and Junction.C and add these roads as adjacencies. The buildGraph() function returns the Road whose target junction (j2) is "1A" == index 0.
At this point, our graph is constructed. To find the shortest path I simply used a BFS to traverse the graph. We leave the root unmarked and begin by queueing the root's adjacencies. If we find a road whose target junction is "1A" (index 0) then we have found the shortest cycle through the starting point. Once we reconstruct the path using each Road's parent property, it's a trivial matter to set the switches appropriately as required in the problem.
Thanks to Karussell for suggesting this approach. If you want to put your comment in answer form with a short explanation, I will accept it. Thanks to #Origin, as well. I must admit that I did not fully follow the logic of your answer, but that is certainly not to say that it is not correct. If anyone solves this problem using your solution, I would be very interested to see it.
As my comment suggested: I think that you can solve this via edge based graph or via an improvement which is more or less an 'enhanced' node based graph.
Details:
Your situation is similar to turn restrictions in road networks. Those can be modeled if you create one node per (directed!) street and connect that nodes depending on the allowed turns.
So, do not only store the position of your current position but also the direction and possible further 'situations'. To make it possible that even the same position with a 180° turn is different to your current state.
Instead of modeling your 'state' (which is directed!) into the graph you could also assign possible outcomes to every junction - now the algorithm needs to be more clever and needs to decide per junction what to do depending on your earlier state (including direction). I think, this is the main idea of the 'enhanced' node based graph which should be less memory intensive (not that important in your case).
One possible approach: first constract some kind of graph to model all connections (graph G). Then construct another graph in which we will find the cycle (graph H). For each node A in G, we will add a node to graph H. Each A node also has 2 outgoing edges (to the B and C nodes in graph G). In H, these edges will go to the next A node that would be encountered in G. For example, the A node in H corresponding to the A node of the switch with ID 3 would have an outgoing edge to node 9 and node 6 in H. The weight of each edge is the number of switches passed on that route (including the starting switch).
This will yield a graph in which we can grow a forward shortest path tree. If we would reach the start again, the cycle would be complete.
The key is that a switch is only a decision point if it is traversed in the A-> direction. It is not necessary to model the backward direction as this would only complicate the search.
edit: some more clarification
The problem consists of determining the shortest path from A to A (again). The definition of shortest is here the number of switches passed. This will be used in a Dijkstra based search algorithm. We basically are going to do Dijkstra on a graph H in which the cost of the edges is equal to the number of switches in that edge.
In the H graph, we will have a node for each switch. Each node will have 2 outgoing edges, corresponding to the 2 paths one can take (B and C directions). The edges in H will correspond to an entire route between 2 A nodes in the original graph. For the example in the problem description, we get the following:
A node corresponding to switch 1:
1 outgoing link to node 2, weight 2, corresponding to taking the C
direction when leaving switch 1. The weight is 2 because we pass switch 1 and switch 3 if we go from A1->C1->C3->A3->A2
1 outgoing link to node 3, weight 2, corresponding to taking the B direction
A node corresponding to switch 2:
1 outgoing link to node 6, weight 2, corresponding to taking the B direction
no second link as the C direction is a dead end
A node corresponding to switch 3:
1 outgoing link to node 6, weight 2, corresponding to taking the C direction
1 outgoing link to node 9, weight 3, corresponding to taking the B direction and passing switches 3, 7 and 8
and so on for every switch. This yield a graph with 10 nodes, each having at most 2 directed edges.
Now we can start building our Dijkstra tree. We start at node 1 and have 2 possible directions, B and C. We put those on a priorityqueue. The queue then contains [node 2,weight 2] and [node 3, weight 2] as we can reach the A entrance of switch 2 after passing 2 switches and the A entrance of switch 3 after passing 2 switches. We then continue the search by taking the lowest weight entry from the queue:
[node 2, weight 2]: only the B direction to take, so put [node 6, weight 4] on the queue
[node 3, weight 2]: 2 directions to take, so add [node 6, weight 4] and [node 9, weight 5] to the queue.
[node 6, weight 4]: 2 directions possible, add [node 4, weight 5] and [node 8, weight 8] to the queue]
[node 9, weight 5]: only the C direction, add [node 10, weight 6]
[node 4, weight 5]: add [node 5, weight 7] for the C direction and [node 1, weight 9] for the B direction]
[node 10, weight 6]:add [node 1, weight 8] for the C direction and [node 1, weight 10] for the B direction
[node 5, weight 7]:add [node 1, weight 11] and [node 8, weight 10]
[node 8, weight 8]: add [node 7, weight 9]
[node 1, weight 8]: we found our way back so we can stop
(mistakes are possible, I'm just doing this by hand)
The algorithm then stops with a final length of 8 for a cycle. Determining the followed path is then just a matter of maintaining parent pointers for the nodes when you settle them and unpack the path.
We can use Dijkstra because each node in H corresponds to traversing an original node (in G) in the right direction. Each node in H can then be settled in a Dijkstra fashion so the complexity of the algorithm is limited to that of Dijkstra (which can handle the 100k upper limit for the number of switches).
The DFA must have the following four properties:
The DFA has N nodes
Each node has 2 outgoing transitions.
Each node is reachable from every other node.
The DFA is chosen with perfectly uniform randomness from all possibilities
This is what I have so far:
Start with a collection of N nodes.
Choose a node that has not already been chosen.
Connect its output to 2 other randomly selected nodes
Label one transition 1 and the other transition 0.
Go to 2, unless all nodes have been chosen.
Determine if there is a node with no incoming connections.
If so, steal an incoming connection from a node with more than 1 incoming connection.
Go to 6, unless there are no nodes with no incoming connections
However, this is algorithm is not correct. Consider the graph where node 1 has its two connections going to node 2 (and vice versa), while node 3 has its two connection going to node 4 (and vice versa). That is something like:
1 <==> 2
3 <==> 4
Where, by <==> I mean two outgoing connections both ways (so a total of 4 connections). This seems to form 2 cliques, which means that not every state is reachable from every other state.
Does anyone know how to complete the algorithm? Or, does anyone know another algorithm? I seem to vaguely recall that a binary tree can be used to construct this, but I am not sure about that.
Strong connectivity is a difficult constraint. Let's generate uniform random surjective transition functions and then test them with e.g. Tarjan's linear-time SCC algorithm until we get one that's strongly connected. This process has the right distribution, but it's not clear that it's efficient; my researcher's intuition is that the limiting probability of strong connectivity is less than 1 but greater than 0, which would imply only O(1) iterations are necessary in expectation.
Generating surjective transition functions is itself nontrivial. Unfortunately, without that constraint it is exponentially unlikely that every state has an incoming transition. Use the algorithm described in the answers to this question to sample a uniform random partition of {(1, a), (1, b), (2, a), (2, b), …, (N, a), (N, b)} with N parts. Permute the nodes randomly and assign them to parts.
For example, let N = 3 and suppose that the random partition is
{{(1, a), (2, a), (3, b)}, {(2, b)}, {(1, b), (3, a)}}.
We choose a random permutation 2, 3, 1 and derive a transition function
(1, a) |-> 2
(1, b) |-> 1
(2, a) |-> 2
(2, b) |-> 3
(3, a) |-> 1
(3, b) |-> 2
In what follows I'll use the basic terminology of graph theory.
You could:
Start with a directed graph with N vertices and no arcs.
Generate a random permutation of the N vertices to produce a random Hamiltonian cycle, and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
The result will satisfy all three requirements.
There is a expected running time O(n^{3/2}) algorithm.
If you generate a uniform random digraph with m vertices such that each vertex has k labelled out-arcs (a k-out digraph), then with high probability the largest SCC (strongly connected component) in this digraph is of size around c_k m, where c_k is a constant depending on k. Actually, there is about 1/\sqrt{m} probability that the size of this SCC is exactly c_k m (rounded to an integer).
So you can generate a uniform random 2-out digraph of size n/c_k, and check the size of the largest SCC. If its size is not exactly n, just try again until success. The expected number of trials needed is \sqrt{n}. And generating each digraph should be done in O(n) time. So in total the algorithm has expected running time O(n^{3/2}). See this paper for more details.
Just keep growing a set of nodes which are all reachable. Once they're all reachable, fill in the blanks.
Start with a set of N nodes called A.
Choose a node from A and put it in set B.
While there are nodes left in set A
Choose a node x from set A
Choose a node y from set B with less than two outgoing transitions.
Choose a node z from set B
Add a transition from y to x.
Add a transition from x to z
Move x to set B
For each node n in B
While n has less than two outgoing transitions
Choose a node m in B
Add a transition from n to m
Choose a node to be the start node.
Choose some number of nodes to be accepting nodes.
Every node in set B can reach every node in set B. As long as a node can be reached from a node in set B and that node can reach a node in set B, it can be added to the set.
The simplest way that I can think of is to (uniformly) generate a random DFA with N nodes and two outgoing edges per node, ignoring the other constraints, and then throw away any that are not strongly connected (which is easy to test using a strongly connected components algorithm). Generating uniform DFAs should be straightforward without the reachability constraint. The one thing that could be problematic performance-wise is how many DFAs you would need to skip before you found one with the reachability property. You should try this algorithm first, though, and see how long it ends up taking to generate an acceptable DFA.
We can start with a random number of states N1 between N and 2N.
Assume the initial state the as the state number 1.
For each state, for each character in the input alphabet we generate a random transition (between 1 and N1).
We take the connex automaton starting from the initial state. We check the number of states, and after few tries we get one with N states.
If we wish a minimal automaton too, remains only the assignment of final states, however there are great chances that a random assignment gets a minimal automaton as well.
The following references seem to be relevant to your question:
F. Bassino, J. David and C. Nicaud, Enumeration and random generation of possibly incomplete deterministic automata, Pure Mathematics and Applications 19 (2-3) (2009) 1-16.
F. Bassino and C. Nicaud. Enumeration and Random Generation of Accessible Automata. Theor. Comp. Sc.. 381 (2007) 86-104.