calendar scheduler algorithm - algorithm

I'm looking for an algorithm that, given a set of items containing a start time, end time, type, and id, it will return a set of all sets of items that fit together (no overlapping times and all types are represented in the set).
S = [("8:00AM", "9:00AM", "Breakfast With Mindy", 234),
("11:40AM", "12:40PM", "Go to Gym", 219),
("12:00PM", "1:00PM", "Lunch With Steve", 079),
("12:40PM", "1:20PM", "Lunch With Steve", 189)]
Algorithm(S) => [[("8:00AM", "9:00AM", "Breakfast With Mindy", 234),
("11:40AM", "12:40PM", "Go to Gym", 219),
("12:40PM", "1:20PM", "Lunch With Steve", 189)]]
Thanks!

This can be solved using graph theory. I would create an array, which contains the items sorted by start time and end time for equal start times: (added some more items to the example):
no.: id: [ start - end ] type
---------------------------------------------------------
0: 234: [08:00AM - 09:00AM] Breakfast With Mindy
1: 400: [09:00AM - 07:00PM] Check out stackoverflow.com
2: 219: [11:40AM - 12:40PM] Go to Gym
3: 79: [12:00PM - 01:00PM] Lunch With Steve
4: 189: [12:40PM - 01:20PM] Lunch With Steve
5: 270: [01:00PM - 05:00PM] Go to Tennis
6: 300: [06:40PM - 07:20PM] Dinner With Family
7: 250: [07:20PM - 08:00PM] Check out stackoverflow.com
After that i would create a list with the array no. of the least item that could be the possible next item. If there isn't a next item, -1 is added:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
1 | 7 | 4 | 5 | 6 | 6 | 7 | -1
With that list it is possible to generate a directed acyclic graph. Every vertice has a connection to the vertices starting from the next item. But for vertices where already is a vertices bewteen them no edge is made. I'll try to explain with the example. For the vertice 0 the next item is 1. So a edge is made 0 -> 1. The next item from 1 is 7, that means the range for the vertices which are connected from vertice 0 is now from 1 to (7-1). Because vertice 2 is in the range of 1 to 6, another edge 0 -> 2 is made and the range updates to 1 to (4-1) (because 4 is the next item of 2). Because vertice 3 is in the range of 1 to 3 one more edge 0 -> 3 is made. That was the last edge for vertice 0. That has to be continued with all vertices leading to such a graph:
Until now we are in O(n2). After that all paths can be found using a depth first search-like algorithm and then eliminating the duplicated types from each path.
For that example there are 4 solutions, but none of them has all types because it is not possible for the example to do Go to Gym, Lunch With Steve and Go to Tennis.
Also this search for all paths has a worst case complexity of O(2n). For example the following graph has 2n/2 possible paths from a start vertice to an end vertice.
(source: archive.org)
There could be made some more optimisation, like merging some vertices before searching for all paths. But that is not ever possible. In the first example vertice 3 and 4 can't be merged even though they are of the same type. But in the last example vertice 4 and 5 can be merged if they are of the same type. Which means it doesn't matter which activity you choose, both are valid. This can speed up calculation of all paths dramatically.
Maybe there is also a clever way to consider duplicate types earlier to eliminate them, but worst case is still O(2n) if you want all possible paths.
EDIT1:
It is possible to determine if there are sets that contain all types and get a t least one such solution in polynomial time. I found a algorithm with a worst case time of O(n4) and O(n2) space. I'll take an new example which has a solution with all types, but is more complex.
no.: id: [ start - end ] type
---------------------------------------------------------
0: 234: [08:00AM - 09:00AM] A
1: 400: [10:00AM - 11:00AM] B
2: 219: [10:20AM - 11:20AM] C
3: 79: [10:40AM - 11:40AM] D
4: 189: [11:30AM - 12:30PM] D
5: 270: [12:00PM - 06:00PM] B
6: 300: [02:00PM - 03:00PM] E
7: 250: [02:20PM - 03:20PM] B
8: 325: [02:40PM - 03:40PM] F
9: 150: [03:30PM - 04:30PM] F
10: 175: [05:40PM - 06:40PM] E
11: 275: [07:00PM - 08:00PM] G
1.) Count the different types in the item set. This is possible in O(nlogn). It is 7 for that example.
2.) Create a n*n-matrix, that represents which nodes can reach the actual node and which can be reached from the actual node. For example if position (2,4) is set to 1, means that there is a path from node 2 to node 4 in the graph and (4,2) is set to 1 too, because node 4 can be reached from node 2. This is possible in O(n2). For the example the matrix would look like that:
111111111111
110011111111
101011111111
100101111111
111010111111
111101000001
111110100111
111110010111
111110001011
111110110111
111110111111
111111111111
3.) Now we have in every row, which nodes can be reached. We can also mark each node in a row which is not yet marked, if it is of the same type as a node that can be reached. We set that matrix positions from 0 to 2. This is possible in O(n3). In the example there is no way from node 1 to node 3, but node 4 has the same type D as node 3 and there is a path from node 1 to node 4. So we get this matrix:
111111111111
110211111111
121211111111
120121111111
111212111111
111121020001
111112122111
111112212111
111112221211
111112112111
111112111111
111111111111
4.) The nodes that still contains 0's (in the corresponding rows) can't be part of the solution and we can remove them from the graph. If there were at least one node to remove we start again in step 2.) with the smaller graph. Because we removed at least one node, we have to go back to step 2.) at most n times, but most often this will only happend few times. If there are no 0's left in the matrix we can continue with step 5.). This is possible in O(n2). For the example it is not possible to build a path with node 1 that also contains a node with type C. Therefore it contains a 0 and is removed like node 3 and node 5. In the next loop with the smaller graph node 6 and node 8 will be removed.
5.) Count the different types in the remainig set of items/nodes. If it is smaller than the first count there is no solution that can represent all types. So we have to find another way to get a good solution. If it is the same as the first count we now have a smaller graph which still holds all the possible solutions. O(nlogn)
6.) To get one solution we pick a start node (it doesn't matter which, because all nodes that are left in the graph are part of a solution). O(1)
7.) We remove every node that can't be reached from the choosen node. O(n)
8.) We create a matrix like in step 2.) and 3.) for that graph and remove the nodes that can not reach nodes of any type like in step 4.). O(n3)
9.) We choose one of the next nodes from the node we choosen before and continue with 7.) until there we are at a end node and the graph only has one path left.
That way it is also possible to get all paths, but that can still be exponential many. After all it should be faster than finding solutions in the original graph.

Hmmm, this reminds me of a task in the university, I'll describe what i can remember
The run-time is O(n*logn) which is pretty good.
This is a greedy approuch..
i will refine your request abit, tell me if i'm wrong..
Algorithem should return the MAX subset of non colliding tasks(in terms of total length? or amount of activities? i guess total length)
I would first order the list by the finishing times(first-minimum finishing time,last-maximum) = O(nlogn)
Find_set(A):
G<-Empty set;
S<-A
f<-0
while S!='Empty set' do
i<-index of activity with earliest finish time(**O(1)**)
if S(i).finish_time>=f
G.insert(S(i)) \\add this to result set
f=S(i).finish_time
S.removeAt(i) \\remove the activity from the original set
od
return G
Run time analysis:
initial ordering :nlogn
each iteration O(1)*n = O(n)
Total O(nlogn)+O(n) ~ O(nlogn) (well, given the O notation weakness to represent real complexety on small numbers.. but as the scale grow, this is a good algo)
Enjoy.
Update:
Ok, it seems like i've misread the post, you can alternatively use dynamic programming to reduce running time, there is a solution in link text page 7-19.
you need to tweak the algorithm a bit, first you should build the table, then you can get all variations on it fairly easy.

I would use an Interval Tree for this.
After you build the data structure, you can iterate each event and perform an intersection query. If no intersections are found, it is added to your schedule.

Yes exhaustive search might be an option:
initialise partial schedules with earliest tasks that overlap (eg 9-9.30
and 9.15-9.45)
foreach partial schedule generated so far generate a list of new partial schedules appending to each partial schedule the earliest task that don't overlap (generate more than one in case of ties)
recur with new partial schedules
In your case initlialisation would produce only (8-9 breakfast)
After the first iteration: (8-9 brekkie, 11.40-12.40 gym) (no ties)
After the second iteration: (8-9 brekkie, 11.40-12.40 gym, 12.40-1.20 lunch) (no ties again)
This is a tree search, but it's greedy. It leaves out possibilities like skipping the gym and going to an early lunch.

Since you're looking for every possible schedule, I think the best solution you will find will be a simple exhaustive search.
The only thing I can say algorithmically is that your data structure of lists of strings is pretty terrible.
The implementation is hugely language dependent so I don't even think pseudo-code would make sense, but I'll try to give the steps for the basic algorithm.
Pop off the first n items of the same type and put them in list.
For each item in list, add that item to schedule set.
Pop off next n items of same type off list.
For each item that starts after the first item ends, put on list. (If none, fail)
Continue until done.
Hardest part is deciding exactly how to construct the lists/recursion so it's most elegant.

Related

With multiple cycles between 2 nodes

Trying to find all cycles in a directed graph, via DFS. But got an issue.
Issue
When there are multiple cycles between 2 nodes, sometimes only the longest one can be detected, the shorter one is skipped.
This is due to when a node is visited, I will skip it, thus that shorter cycle is skipped.
But, if I don't skip visited node, the DFS search will repeat forever.
Example
Graph:
1 -> [2, 4]
2 -> [3]
3 -> [4]
4 -> [1]
There are 2 cycles between 1 and 4:
(A) 1 -> 2 -> 3 -> 4 -> 1
(B) 1 -> 4 -> 1
Cycle B can't be detected, if A is detected first, because 4 will be skipped due to visited, and it never goes back to 1.
Current ideas
One possible solution is to start from every node, even it's already visited. But I want a better solution.
Calculate & remember hash of path, skip only when the same hash exists? That would take some memory, right? And, there are still possibility 2 different path with the same hash lead to the same node, it can't solve the problem entirely.
Any idea?
Credits: https://www.hackerearth.com/practice/notes/finding-all-elementry-cycles-in-a-directed-graph/
integer list array A(n);logical array marked(n);integer stack current_stack ;integer stack marked_stack;/* A(n) is the array of list wich is adjacency list representation*/
integer procedure intialize(); /*initialize the values*/
begin;
for i in n do
marked(i):=false
integer procedure print_cycles();
begin
for i in current_stack do
print i ;
logical procedure backtrack(k) do
begin
logical flag=false;
current_stack->push(k);
marked_stack->push(k);
marked(n):=true;
for i in A(k) do
if i < s; /* To find only disticnt cycles in topological manner*/
delete A(i);
if i==s; /Cycle found */
print_cycles()
if marked(i):= false;
backtrack(n); /*continue dfs*/
if flag :=true;
for i in marked_stack do /*unmark the elements that have been visited in any of the cycles starting from s*/
marked(i):=false;
current_stack->pop(k);
backtrack:=flag
end backtrack(k)
begin
integer procedure backtrack_Util();
begin
for s in n do
backtrack(s);
while marked_stack(s)->empty do
for i in marked_stack do
marked(i):=false
end backtrack_Util()
We want to find distinct cycles. Hence, we need to visit vertices in some order. In that order, we need to find cycles starting from that vertex and the cycle should not contain any vertices that are before the starting vertex in the ordering. How to obtain that ordering? The words topological manner in one of the comments above is ambiguous with topological sort which is not the case. I think we can pick an ordering as simple as vertex number like 0,1,2,..,v. Let us say we wish to find a cycle starting from 2. To avoid finding of duplicate cycles, we will not use vertex 0 and 1. If there is any cycle that consists of edge from 2 to 1 or 2 to 0, it would already have been considered when finding cycles starting from 0 or 1.
Let me introduce a concrete reference that will help you with the task. It is Johnson's algorithm. Apparently, it is the fastest way to accomplish the task.
On page 3, it mentions:
To avoid duplicating circuits, a vertex v is blocked when it is added
to some elementary path beginning in s. It stays blocked as long as
every path from v to s intersects the current elementary path at a
vertex other than s. Furthermore, a vertex does not become a root
vertex for constructing elementary paths unless it is the least vertex
in at least one elementary circuit.
You can also watch this youtube video for more understanding.
I think this information is acceptable.

Algorithmic Strategy for selection of minimum number of baskets

Example:
You have 4 baskets named P,Q,R,S.
You have 4 items in those baskets named A,B,C,D.
The composition of baskets are as follows
PIC
--A B C D
P 6 4 0 7
Q 6 4 1 1
R 4 6 3 6
S 4 6 2 3
Basket P has 6A, 4B, No C's and 7D.
Suppose you get following requests:
You have to give out 10A, 10B, 3C and 8D.
The minimum number of basket required to process the request is 2 (P,R).
How can I reach this algorithmically. What algo should I use, what should be the strategy?
Make directed graph (network) like this:
Source has edges with cost=1 and capacity=bigvalue to P,Q,R,S nodes
P has edges with cost=0 and capacity 6,4,7 to A,B,D, same for other baskets.
A,B,C,D have edges with cost=0 and capacity=10,10,3,8 to sink
Now solve Minimum-cost flow problem for 10+10+3+8 flow.
There was an algorithm about putting queens on rights places in chess board and the rule is that they must not threat each other. Your problem looks like that one for me. You can create a recursive structure like below:
Find first rows that meets the requirements: In your example P and Q (because 6+6 > 10)
So you handled first column, then go to second one and check if capacity of the basket P and Q can meet the requirement: They don't in your case (Because 4+4 < 10)
In here go back to first step (call the same recursive function for first column by increasing the pointer which was showing B before) and find the second rows that meet the requirements. P and R for your example. (6+4 = 10) Then do the second step for P and R.
So the idea is for every column find the baskets that meets the requirement and then go to the second column. If you can find the rows that meets the requirements then go for 3. If you can not find the ones in 3rd step then go back to 2nd step and again if no combinations of the rows that you choosed 2nd step meets the requirements than go to first and iterate it again.
I could not gave you a pseudocode properly but I think main idea is clear and not that hard to implement.

Understanding Nauty algorithm

I am trying to understand the Nauty algorithm.
Following this article: http://www.math.unl.edu/~aradcliffe1/Papers/Canonical.pdf
In this algorithm the vertices are distinguished based on their degree and the relative degree of a group corresponding to other groups(group action). In this way we get the groups as:
1379|2468|5
After this step, splitting is done as mentioned in this paper - page 7.
One image from this article is:
I am unable to understand how the splitting is done from
1379|2468|5 to 1|9|37|68|24|5
Why 1 and 9 went to different groups and 37 went to another group.
Briefly, you are individualizing vertices and then 'shattering' the cells of the resulting partition until the partition becomes equitable.
As it says in section 5:
Having reached an
equitable partition, we need to introduce artificial distinctions between vertices
This is described in definition 9. So we have chosen {1} from the cell {1379} and then refined the resulting partition until it is equitable (see definition 6 and the example below it).
Thus cell 1 - {1} - shatters cell 3 - {2468} - into two cells {68|24} due to 1 having 0 neighbours in {68} and one in {24}. Similarly, {379} is split by {24} into {9|37}.

fully connection algorithm

I have encoutered an algorithm question:
Fully Connection
Given n cities which spreads along a line, let Xi be the position of city i and Pi be its population.
Now we begin to lay cables between every two of the cities based on their distance and population. Given two cities i and j, the cost to lay cable between them is |Xi-Xj|*max(Pi,Pj). How much does it cost to lay all the cables?
For example, given:
i Xi Pi
- -- --
1 1 4
2 2 5
3 3 6
Then the total cost can be calculated as:
i j |Xi-Xj| max(Pi, Pj) Segment Cost
- - ------ ----------- ------------
1 2 1 5 5
2 3 1 6 6
1 3 2 6 12
So that the total cost is 5+6+12 = 23.
While this can clearly be done in O(n2) time, can it be done in asymptotically less time?
I can think of faster solution. If I am not wrong it goes to O(n*logn). Now let's first sort all the cities according to Pi. This is O(n* log n). Then we start processing the cities in increasing order of Pi. the reason being - you always know you have max (Pi, Pj) = Pi in this case. We only want to add all the segments that come from relations with Pi. Those that will connect with larger indexes will be counted when they will be processed.
Now the thing I was able to think of was to use several index trees in order to reduce the complexity of the algorithm. First index tree is counting the number of nodes and can process queries of the kind: how many nodes are to the right of xi in logarithmic time. Lets call this number NR. The second index tree can process queries of the kind: what is the sum of distances from all the points to the right of a given x. The distances are counted towards a fixed point that is guaranteed to be to the right of the rightmost point, lets call its x XR.Lets call this number SUMD. Then the sum of the distances to all points to the right of our point can be found that way: NR * dist(Xi, XR) - SUMD. Then all these contribute (NR * dist(Xi, XR) - SUMD) *Pi to the result. The same for the left points and you get the answer. After you process the ith point you add it to the index trees and can go on.
Edit: Here is one article about Biary index trees: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees
This is the direct connections problem from codesprint 2.
They will be posting worked solutions to all problems within a week on their website.
(They have said "Now that the contest is over, we're totally cool with everyone discussing their solutions to the problems.")

skiplist- i really need an explanation ,how does it insert and delete

i really dont understand the probabilty thing of this list. in addition to the statement"we have to examine no more than n/2 + 1 nodes (where n is the length of the list).Also giving every fourth node a pointer four ahead (Figure1c) requires that no more than n/4 + 2 nodes be examined".
this statement i read in the following link:ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf
What you're not understanding is that every node has a link at level 1. That is, at the lowest level, the data structure is essentially a linked list. Searching for a node using this is of course an O(n) operation.
Each node of the skip list has at least one link: the one at level 1. On average, half of the nodes also have a link at level 2. If this was the highest level at which a link existed, then you could find an arbitrary node in O(n/2). Basically, you follow the 2nd level nodes until either you find the item you're looking for, or until you get to a node whose value is larger than the one you're looking for. At that point, you step down to the level 1 nodes and search forward from the previous node (i.e. the one that's less than the one you're looking for).
Again on average, 1/4 of the nodes have a link at level 3. Using these, you can find an arbitrary node in O(n/4). You first search the level 3 nodes until you either find the node or go past it, then drop down to the level 2 nodes from that point, and to the level 1 nodes if don't find the node at level 2.
If you follow the math, you can see that if your maximum level is m, then as long as you have less than 2^m nodes in the skip list, your amortized average search time will be O(log2(n)), where n is the number of items in the list.
So the structure of a skip list node is like this:
SkiplistNode
{
int level;
SomeType data; // the data held in the node
SkiplistNode* forwards[]; // an array of 'level' forward references
}
If a node has a level value of 1, then there will be just one item in the forwards array. If it's at level 4, then there will be four entries: one for each of levels 4, 3, 2, and 1.
As it turns out, the average size of the forwards array is 2. That follows the progression 1 + 1/2 + 1/4 + 1/8 + 1/16, + 1/32, ... That is:
Every node has a link at level 1
1/2 of the nodes have a link at level 2
1/4 of the nodes have a link at level 3
1/8 of the nodes have a link at level 4
etc.
Is that more clear now?
Skip lists are explained quite well in their Wikipedia article. If you have a specific question about the data structure itself feel free to ask them though.
Lecture from MIT about skip lists: http://video.google.com/videoplay?docid=-6710586843601387849#
A somewhat easier to understand explanation can be found here : http://igoro.com/archive/skip-lists-are-fascinating/

Resources