An graph algorithm - algorithm

There is a algorithm question which I really can't figure it out. The question may use Dijkstra algorithm.
There is a network of n computers that you will hack to take
control. Initially, you have already hacked computer c0
. There are m connections between
computers, through which you can use to take down an uncontrolled computer from a hacked
one. Each connection is described as a triple (ca
; cb
; t), which means if ca
is hacked, then you
can successfully hack cb at a cost of t minutes.
A large group of your hacker friends join you in hacking (they are as good as you and as
many as the computers in the network). They are all at your command, which means
you can assign them hacking tasks on multiple computers simultaneously. Describe an
ecient algorithm to determine how many minutes you would need to successfully hack
all the computers in the network. State the running time in terms of n,m.

Let your computers are labeled as c_0, c_1, ..., c_{n - 1}. After you running Dijkstra the answer you are looking for is max { d[i] | 0 <= i <= n - 1} where d[i] denotes the minimum distance between c_0 and c_i. This is true because: 1) you at least need time equal to maximum of all those distances in order to hack the most distant computer 2) take a look at the tree we got after applying Dijkstra's algorithm (c_0 would be the root of that tree). We can apply the following strategy: first we start off hacking all the neighbors of c_0 and we continue with hacking all computers that have already been hacked. We do this until all the computers have been hacked. We can see that the time needed for this to happen would be equal to maximum depth of the tree (note that the edge of this tree have weight equal to those of the original graph). We can easily see that this is exactly the same number we mentioned before. So the total running time would is equal to Dijkstra's O(m + nlogn)

Related

Pathfinding task - how can I find next vertex on the shortest path from A to B faster that O ( n )?

I have a quite tricky task to solve:
You are given a N * M board (1 <= N, M <= 256). You can move from each field to it's neighbouring field (moving diagonally is not allowed). At the beginning, there are two types of fields: active and blocked. You can pass through active field, but you can't go on the blocked one. You have Q queries (1 <= Q <= 200). There are two types of queries:
1) find the next field (neighbouring to A) that lies on the shortest path from field A to B
2) change field A from active to blocked or conversly.
The first type query can be easily solved with simple BFS in O(N * M) time. We can represent active and blocked fields as 0 or 1, so the second query could be done in constant time.
The total time of that algorithm would be O(Q (number of queries) * N * M).
So what's the problem? I have a 1/60 second to solve all the queries. If we consider 1 second as 10^8 calculations, we are left with about 1,5 * 10^6 calculations. One BFS may take up to N * M * 4 time, which is about 2,5 * 10^5. So if Q is 200, the needed calculations may be up to 5 * 10^7, which is way too slow.
As far as I know, there is no better pathfinding algorithms than BFS in this case (well, I could go for an A*, but I'm not sure if it's much quicker than BFS, it's still worst-case O(|E|) - according to Wikipedia ). So there's not much to optimize in this area. However, I could change my graph in some way to reduce the amount of edges that the algorithm would have to process (I don't need to know the full shortest path, only the next move I should make, so the rest of the shortest path can be very simplified). I was thinking about some preprocessing - grouping vertices in a groups and making a graph of graphs, but I'm not sure how to handle the blocked fields in that way.
How can I optimize it better? Or is it even possible?
EDIT: The actual problem: I have some units on the board. I want to start moving them to the selected destination. Units can't share the same field, so one can block others' paths or open a new, better paths for them. There can be a lot of units, that's why I need a better optimization.
If I understand the problem correctly, you want to find the shortest path on a grid from A to B, with the added ability that your path-finder can remove walls for an additional movement cost?
You can treat this as a directed graph problem, where you can move into any wall-node for a cost of 2, and into any normal node for a cost of 1. Then just use any directed-graph pathfinding algorithm such as Dijkstra's or A* (the usual heuristic, manhatten distance, will still work)

Cost minimizing algorithm (time limited)

Let's say I have a group of N people who is going to travel by train. I need to organize them in a line to the ticket office in a way that minimizes the total tickets cost. The cost can be minimized if families buy family tickets and people travelling to the same destination buy group tickets.
I do not know who of these people are families and where are they travelling.
All I can do is to send any M (1 <= M <= N) of them to the ticket office and get the answer, how much it will cost for these M people.
I also have a limited time, as a train is leaving in some minutes, so the near best solution is good enough for me.
The brute force solution is O(N!) and so is obviously unacceptable.
EDIT
The answer from the ticket office is always the total sum for M people, no details.
Group and/or family may start at 2 people and may include all N of them.
The cushier in the tickets office will always know who is a family and who is not.
EDIT
If I am sending to the tickets office a family and some more people, the family will not get a family ticket, they all will get their regular tickets.
Since you mentioned you can settle for 'good enough', and you are time limited - here is a greedy any-time approach:
p <- Create a random permutation
estimate the cost of this permutation. (let that be cost)
Find a new permutation p' such that cost(p') < cost(p) that is achieved from p using a single swap of two people (there are n(n-1)/2 such possible swaps)1
If such p' exist: p <- p' and return to 2.
Else, store p as a local minimum, and return to 1.
When time is up - choose the best local minimum found.
This is basically a variation of steepest ascent hill climbing with random restarts.
Note that if your time->infinity, you will find optimal solution,
because the probability of checking any possible permutation is
getting closer and closer to 1 as time passes.
(1) getting the price can be done by first checking who is a family member/going to the same destination and is adjacent to each other in the permutation at O(n)using the fact that d(passenger1,passenger2) < d(passenger1) + d(passenger2), and then checking each group separately.

Graph Algorithm load distribution

I come across the following problem of distributing load over a number of machines in a network. The problem is an interview question. The candidate should specify which algorithm and which data structures are the best for this problem.
We have N machines in a network. Each machine can accept up to 5 units of load. The requested algorithm receives as input a list of machines with their current load (ranging form 0-5), the distance matrix between the machines, and the new load M that we want to distribute on the network.
The algorithm returns the list of machines that can service the M units of load and have the minimum collective distance. The collective distance is the sum of the distances between the machines in the resulting list.
For example if the resulting list contains three machines A, B and C these machines can service collectively the M units of load (if M=5, A can service 3, B can service 1, C can service 1) and the sum of distances SUM = AB + BC is the smallest path that can collectively service the M units of load.
Do you have any proposals on how to approach it?
The simplest approach I can think of, is defining a value for every machine, something like the summation of inverted distances between this machine and all it's adjacent machines:
v_i = sum(1/dist(i, j) for j in A_i)
(Sorry I couldn't manage to put math formula here)
You can invert the summation again, and call it machine's crowd value (or something like that), but you don't need to.
Then sort machines based on this value (descending if you have inverted the summation value).
Starting with the machine with minimum value (maximum crowd) and add as much as load as you can. Then go for the next machine and do the same until you assign all of the load you want.
It sounds like every machine is able to process the same amount of load -- namely 5 units. And the cost measure you state depends only on the set of machines that have nonzero load (i.e. adding more load to a machine that already has nonzero load will not increase the cost). Therefore the problem can be decomposed:
Find the smallest number k <= n of machines that can perform all jobs. For this step we can ignore the individual identities of the machines and how they are connected.
Once you know the minimum number k of machines necessary, decide which k of the n machines offers the lowest cost.
(1) is a straightforward Bin packing problem. Although this problem is NP-hard, excellent heuristics exist and nearly all instances can be quickly solved to optimality in practice.
There may be linear algebra methods for solving (2) more quickly (if anyone knows of one, feel free to edit or suggest in the comments), but without thinking too hard about it you can always just use branch and bound. This potentially takes time exponential in n, but should be OK if n is low enough, or if you can get a decent heuristic solution that bounds out most of the search space.
(I did try thinking of a DP in which we calculate f(i, j), the lowest cost way of choosing i machines from among machines 1, ..., j, but this runs into the problem that when we try adding the jth machine to f(i - 1, j - 1), the total cost of the edges from the new machine to all existing machines depends on exactly which machines are in the solution for f(i - 1, j - 1), and not just on the cost of this solution, thus violating optimal substructure.)

Is there a well understood algorithm or solution model for this meeting scheduling scenario?

I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?

An algorithm for solving Google Code Jam tutorial problem C

I'd like to understand the algorithm that solves Google Code Jam, Tutorial, Problem C. So far I wrote my own basic implementation that solves the small problem. I find that it's unable to deal with the large problem (complexity O(min(n, 2*k)! which is 30! in the larger data set).
I found this solution page, but the solutions are of course not documented (there's a time limit to the context). I saw that at least one of the solutions used the Union Find data structure, but I don't understand how it's applied here.
Does anyone know of a page with the algorithms that solve these problems, not just code?
Not sure if there's a better way to deal with this near duplicate of GCJ - Hamiltonian Cycles, but here's my answer from there:
The O(2k)-based solution uses the inclusion-exclusion principle. Given that there are k forbidden edges, there are 2k subsets of those edges, including the set itself and the empty set. For instance, if there were 3 forbidden edges: {A, B, C}, there would be 23=8 subsets: {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}.
For each subset, you calculate the number of cycles that include at least all the edges in that subset . If the number of cycles containing edges s is f(s) and S is the set of all forbidden edges, then by the inclusion-exclusion principle, the number of cycles without any forbidden edges is:
sum, for each subset s of S: f(s) * (-1)^|s|
where |s| is the number of elements in s. Put another way, the sum of the number of cycles with any edges minus the number of cycles with at least 1 forbidden edge plus the number with at least 2 forbidden edges, ...
Calculating f(s) is not trivial -- at least I didn't find an easy way to do it. You might stop and ponder it before reading on.
To calculate f(s), start with the number of permutations of the nodes not involved with any s nodes. If there are m such nodes, there are m! permutations, as you know. Call the number of permutations c.
Now examine the edges in s for chains. If there are any impossible combinations, such as a node involved with 3 edges or a subcycle within s, then f(s) is 0.
Otherwise, for each chain increment m by 1 and multiply c by 2m. (There are m places to put the chain in the existing permutations and the factor of 2 is because the chain can be forwards or backwards.) Finally, f(s) is c/(2m). The last division converts permutations to cycles.
The important limit imposed on the input data is that the number of forbidden edges k<=15.
You can proceed by inclusion and exclusion:
calculate the number of all cycles ((n-1)!),
for each forbidden edge e, substract the number of cycles that contains it ((n-2)!/2, unless n is very small),
for each pair of forbidden edges e,f, add the number of cycles that contain both of them (this will depend on whether e and f touch),
for each triple ..., substract ...,
etc.
Since there are only 2^k <= 32768 subsets of F (the set of all forbidden edges), you will get a reasonable bound on the running time.
The analysis of the google code jam problem Endless Knight uses a similar idea.
Hamiltonian cycle problem is a special case of travelling salesman problem (obtained by setting the distance between two cities to a finite constant if they are adjacent and infinity otherwise.)
These are NP Complete problems which in simple words means no fast solution to them is known.
For solving questions on Google Code Jam you should know Algorimths Analysis and Design though they can be solved in exponential time and don't worry Google knows that very well. ;)
The following sources should be enough to get you started:
MIT Lecture Videos: "Introduction to Algorithims"
TopCoder Tutorials
http://dustycodes.wordpress.com
Book: Introduction to Algorithms [Cormen, Leiserson, Rivest & Stein]
The Algorithm Design Manual [Steven S. Skiena]

Resources