algorithm to find longest non-overlapping sequences - algorithm

I am trying to find the best way to solve the following problem. By best way I mean less complex.
As an input a list of tuples (start,length) such:
[(0,5),(0,1),(1,9),(5,5),(5,7),(10,1)]
Each element represets a sequence by its start and length, for example (5,7) is equivalent to the sequence (5,6,7,8,9,10,11) - a list of 7 elements starting with 5. One can assume that the tuples are sorted by the start element.
The output should return a non-overlapping combination of tuples that represent the longest continuous sequences(s). This means that, a solution is a subset of ranges with no overlaps and no gaps and is the longest possible - there could be more than one though.
For example for the given input the solution is:
[(0,5),(5,7)] equivalent to (0,1,2,3,4,5,6,7,8,9,10,11)
is it backtracking the best approach to solve this problem ?
I'm interested in any different approaches that people could suggest.
Also if anyone knows a formal reference of this problem or another one that is similar I'd like to get references.
BTW - this is not homework.
Edit
Just to avoid some mistakes this is another example of expected behaviour
for an input like [(0,1),(1,7),(3,20),(8,5)] the right answer is [(3,20)] equivalent to (3,4,5,..,22) with length 20. Some of the answers received would give [(0,1),(1,7),(8,5)] equivalent to (0,1,2,...,11,12) as right answer. But this last answer is not correct because is shorter than [(3,20)].

Iterate over the list of tuples using the given ordering (by start element), while using a hashmap to keep track of the length of the longest continuous sequence ending on a certain index.
pseudo-code, skipping details like items not found in a hashmap (assume 0 returned if not found):
int bestEnd = 0;
hashmap<int,int> seq // seq[key] = length of the longest sequence ending on key-1, or 0 if not found
foreach (tuple in orderedTuples) {
int seqLength = seq[tuple.start] + tuple.length
int tupleEnd = tuple.start+tuple.length;
seq[tupleEnd] = max(seq[tupleEnd], seqLength)
if (seqLength > seq[bestEnd]) bestEnd = tupleEnd
}
return new tuple(bestEnd-seq[bestEnd], seq[bestEnd])
This is an O(N) algorithm.
If you need the actual tuples making up this sequence, you'd need to keep a linked list of tuples hashed by end index as well, updating this whenever the max length is updated for this end-point.
UPDATE: My knowledge of python is rather limited, but based on the python code you pasted, I created this code that returns the actual sequence instead of just the length:
def get_longest(arr):
bestEnd = 0;
seqLengths = dict() #seqLengths[key] = length of the longest sequence ending on key-1, or 0 if not found
seqTuples = dict() #seqTuples[key] = the last tuple used in this longest sequence
for t in arr:
seqLength = seqLengths.get(t[0],0) + t[1]
tupleEnd = t[0] + t[1]
if (seqLength > seqLengths.get(tupleEnd,0)):
seqLengths[tupleEnd] = seqLength
seqTuples[tupleEnd] = t
if seqLength > seqLengths.get(bestEnd,0):
bestEnd = tupleEnd
longestSeq = []
while (bestEnd in seqTuples):
longestSeq.append(seqTuples[bestEnd])
bestEnd -= seqTuples[bestEnd][1]
longestSeq.reverse()
return longestSeq
if __name__ == "__main__":
a = [(0,3),(1,4),(1,1),(1,8),(5,2),(5,5),(5,6),(10,2)]
print(get_longest(a))

Revised algorithm:
create a hashtable of start->list of tuples that start there
put all tuples in a queue of tupleSets
set the longestTupleSet to the first tuple
while the queue is not empty
take tupleSet from the queue
if any tuples start where the tupleSet ends
foreach tuple that starts where the tupleSet ends
enqueue new tupleSet of tupleSet + tuple
continue
if tupleSet is longer than longestTupleSet
replace longestTupleSet with tupleSet
return longestTupleSet
c# implementation
public static IList<Pair<int, int>> FindLongestNonOverlappingRangeSet(IList<Pair<int, int>> input)
{
var rangeStarts = input.ToLookup(x => x.First, x => x);
var adjacentTuples = new Queue<List<Pair<int, int>>>(
input.Select(x => new List<Pair<int, int>>
{
x
}));
var longest = new List<Pair<int, int>>
{
input[0]
};
int longestLength = input[0].Second - input[0].First;
while (adjacentTuples.Count > 0)
{
var tupleSet = adjacentTuples.Dequeue();
var last = tupleSet.Last();
int end = last.First + last.Second;
var sameStart = rangeStarts[end];
if (sameStart.Any())
{
foreach (var nextTuple in sameStart)
{
adjacentTuples.Enqueue(tupleSet.Concat(new[] { nextTuple }).ToList());
}
continue;
}
int length = end - tupleSet.First().First;
if (length > longestLength)
{
longestLength = length;
longest = tupleSet;
}
}
return longest;
}
tests:
[Test]
public void Given_the_first_problem_sample()
{
var input = new[]
{
new Pair<int, int>(0, 5),
new Pair<int, int>(0, 1),
new Pair<int, int>(1, 9),
new Pair<int, int>(5, 5),
new Pair<int, int>(5, 7),
new Pair<int, int>(10, 1)
};
var result = FindLongestNonOverlappingRangeSet(input);
result.Count.ShouldBeEqualTo(2);
result.First().ShouldBeSameInstanceAs(input[0]);
result.Last().ShouldBeSameInstanceAs(input[4]);
}
[Test]
public void Given_the_second_problem_sample()
{
var input = new[]
{
new Pair<int, int>(0, 1),
new Pair<int, int>(1, 7),
new Pair<int, int>(3, 20),
new Pair<int, int>(8, 5)
};
var result = FindLongestNonOverlappingRangeSet(input);
result.Count.ShouldBeEqualTo(1);
result.First().ShouldBeSameInstanceAs(input[2]);
}

This is a special case of the longest path problem for weighted directed acyclic graphs.
The nodes in the graph are the start points and the points after the last element in a sequence, where the next sequence could start.
The problem is special because the distance between two nodes must be the same independently of the path.

Just thinking about the algorithm in basic terms, would this work?
(apologies for horrible syntax but I'm trying to stay language-independent here)
First the simplest form: Find the longest contiguous pair.
Cycle through every member and compare it to every other member with a higher startpos. If the startpos of the second member is equal to the sum of the startpos and length of the first member, they are contiguous. If so, form a new member in a new set with the lower startpos and combined length to represent this.
Then, take each of these pairs and compare them to all of the single members with a higher startpos and repeat, forming a new set of contiguous triples (if any exist).
Continue this pattern until you have no new sets.
The tricky part then is you have to compare the length of every member of each of your sets to find the real longest chain.
I'm pretty sure this is not as efficient as other methods, but I believe this is a viable approach to brute forcing this solution.
I'd appreciate feedback on this and any errors I may have overlooked.

Edited to replace pseudocode with actual Python code
Edited AGAIN to change the code; The original algorithm was on the solution, but I missunderstood what the second value in the pairs was! Fortunatelly the basic algorithm is the same, and I was able to change it.
Here's an idea that solves the problem in O(N log N) and doesn't use a hash map (so no hidden times). For memory we're going to use N * 2 "things".
We're going to add two more values to each tuple: (BackCount, BackLink). In the successful combination BackLink will link from right to left from the right-most tuple to the left-most tuple. BackCount will be the value accumulated count for the given BackLink.
Here's some python code:
def FindTuplesStartingWith(tuples, frm):
# The Log(N) algorithm is left as an excersise for the user
ret=[]
for i in range(len(tuples)):
if (tuples[i][0]==frm): ret.append(i)
return ret
def FindLongestSequence(tuples):
# Prepare (BackCount, BackLink) array
bb=[] # (BackCount, BackLink)
for OneTuple in tuples: bb.append((-1,-1))
# Prepare
LongestSequenceLen=-1
LongestSequenceTail=-1
# Algorithm
for i in range(len(tuples)):
if (bb[i][0] == -1): bb[i] = (0, bb[i][1])
# Is this single pair the longest possible pair all by itself?
if (tuples[i][1] + bb[i][0]) > LongestSequenceLen:
LongestSequenceLen = tuples[i][1] + bb[i][0]
LongestSequenceTail = i
# Find next segment
for j in FindTuplesStartingWith(tuples, tuples[i][0] + tuples[i][1]):
if ((bb[j][0] == -1) or (bb[j][0] < (bb[i][0] + tuples[i][1]))):
# can be linked
bb[j] = (bb[i][0] + tuples[i][1], i)
if ((bb[j][0] + tuples[j][1]) > LongestSequenceLen):
LongestSequenceLen = bb[j][0] + tuples[j][1]
LongestSequenceTail=j
# Done! I'll now build up the solution
ret=[]
while (LongestSequenceTail > -1):
ret.insert(0, tuples[LongestSequenceTail])
LongestSequenceTail = bb[LongestSequenceTail][1]
return ret
# Call the algoritm
print FindLongestSequence([(0,5), (0,1), (1,9), (5,5), (5,7), (10,1)])
>>>>>> [(0, 5), (5, 7)]
print FindLongestSequence([(0,1), (1,7), (3,20), (8,5)])
>>>>>> [(3, 20)]
The key for the whole algorithm is where the "THIS IS THE KEY" comment is in the code. We know our current StartTuple can be linked to EndTuple. If a longer sequence that ends at EndTuple.To exists, it was found by the time we got to this point, because it had to start at an smaller StartTuple.From, and the array is sorted on "From"!

I removed the previous solution because it was not tested.
The problem is finding the longest path in a "weighted directed acyclic graph", it can be solved in linear time:
http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acyclic_graphs
Put a set of {start positions} union {(start position + end position)} as vertices. For your example it would be {0, 1, 5, 10, 11, 12}
for vertices v0, v1 if there is an end value w that makes v0 + w = v1, then add a directed edge connecting v0 to v1 and put w as its weight.
Now follow the pseudocode in the wikipedia page. since the number of vertices is the maximum value of 2xn (n is number of tuples), the problem can still be solved in linear time.

This is a simple reduce operation. Given a pair of consecutive tuples, they either can or can't be combined. So define the pairwise combination function:
def combo(first,second):
if first[0]+first[1] == second[0]:
return [(first[0],first[1]+second[1])]
else:
return [first,second]
This just returns a list of either one element combining the two arguments, or the original two elements.
Then define a function to iterate over the first list and combine pairs:
def collapse(tupleList):
first = tupleList.pop(0)
newList = []
for item in tupleList:
collapsed = combo(first,item)
if len(collapsed)==2:
newList.append(collapsed[0])
first = collapsed.pop()
newList.append(first)
return newList
This keeps a first element to compare with the current item in the list (starting at the second item), and when it can't combine them it drops the first into a new list and replaces first with the second of the two.
Then just call collapse with the list of tuples:
>>> collapse( [(5, 7), (12, 3), (0, 5), (0, 7), (7, 2), (9, 3)] )
[(5, 10), (0, 5), (0, 12)]
[Edit] Finally, iterate over the result to get the longest sequence.
def longest(seqs):
collapsed = collapse(seqs)
return max(collapsed, key=lambda x: x[1])
[/Edit]
Complexity O(N). For bonus marks, do it in reverse so that the initial pop(0) becomes a pop() and you don't have to reindex the array, or move the iterator instead. For top marks make it run as a pairwise reduce operation for multithreaded goodness.

This sounds like a perfect "dynamic programming" problem...
The simplest program would be to do it brute force (e.g. recursive), but this has exponential complexity.
With dynamic programming you can set up an array a of length n, where n is the maximum of all (start+length) values of your problem, where a[i] denotes the longest non-overlapping sequence up to a[i]. You can then step trought all tuples, updating a. The complexity of this algorithm would be O(n*k), where k is the number of input values.

Create an ordered array of all start and end points and initialise all of them to one
For each item in your tuple, compare the end point (start and end) to the ordered items in your array, if any point is between them (e.g. point in the array is 5 and you have start 2 with length 4) change value to zero.
After finishing the loop, start moving across the ordered array and create a strip when you see 1 and while you see 1, add to the existing strip, with any zero, close the strip and etc.
At the end check the length of strips
I think complexity is around O(4-5*N)
(SEE UPDATE)
with N being number of items in the tuple.
UPDATE
As you figured out, the complexity is not accurate but definitely very small since it is a function of number of line stretches (tuple items).
So if N is number of line stretches, sorting is O(2N * log2N). Comparison is O(2N). Finding line stretches is also O(2N). So all in all O(2N(log2N + 2)).

Related

design a recursive algorithm that find the minimum of an array

I was thinking about a recursive algorithm (it's a theoretical question, so it's not important the programming language). It consists of finding the minimum of a set of numbers
I was thinking of this way: let "n" be the number of elements in the set. let's rearrange the set as:
(a, (b, c, ..., z) ).
the function moves from left to right, and the first element is assumed as minimum in the first phase (it's, of course, the 0-th element, a). next steps are defined as follows:
(a, min(b, c, ..., z) ), check if a is still minimum, or if b is to be assumed as minimum, then (a or b, min(c, d, ..., z) ), another check condition, (a or b or c, min(d, e, ..., z)), check condition, etc.
I think the theoretical pseudocode may be as follows:
f(x) {
// base case
if I've reached the last element, assume it's a possible minimum, and check if y < z. then return a value to stop recursive calls.
// inductive steps
if ( f(i-th element) < f(i+1, next element) ) {
/* just assume the current element is the current minimum */
}
}
I'm having trouble with the base case. I don't know how to formalize it. I think I've understood the basic idea about it: it's basically what I've written in the pseudocode, right?
does what I've written so far make sense? Sorry if it's not clear but I'm a beginner, and I'm studying recursion for the first time, and I personally find it confusing. So, I've tried my best to explain it. If it's not clear, let me know, and I'll try to explain it better with different words.
Recursive problems can be hard to visualize. Let's take an example : arr = [3,5,1,6]
This is a relatively small array but still it's not easy to visualize how recursion will work here from start to end.
Tip : Try to reduce the size of the input. This will make it easy to visualize and help you with finding the base case. First decide what our function should do. In our case it finds the minimum number from an array. If our function works for array of size n then it should also work for array of size n-1 (Recursive leap of faith). Now using this we can reduce the size of input until we cannot reduce it any further, which should give us our base case.
Let's use the above example: arr = [3,5,1,6]
Let create a function findMin(arr, start) which takes an array and a start index and returns the minimum number from start index to end of array.
1st Iteration : [3,5,1,6]
// arr[start] = 3, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [5,1,6]
2nd Iteration : [5,1,6]
// arr[start] = 5, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [1,6]
3rd Iteration : [1,6]
// arr[start] = 1, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [6]
4th Iteration : [6]
// arr[start] = 6, Since it is the only element in the array, it is the minimum.
// This is our base case as we cannot reduce the input any further.
// We will simply return 6.
------------ Tracking Back ------------
3rd Iteration : [1,6]
// 1 will be compared with whatever the 4th Iteration returned (6 in this case).
// This iteration will return minimum(1, 4th Iteration) => minimum(1,6) => 1
2nd Iteration : [5,1,6]
// 5 will be compared with whatever the 3th Iteration returned (1 in this case).
// This iteration will return minimum(5, 3rd Iteration) => minimum(5,1) => 1
1st Iteration : [3,5,1,6]
// 3 will be compared with whatever the 2nd Iteration returned (1 in this case).
// This iteration will return minimum(3, 2nd Iteration) => minimum(3,1) => 1
Final answer = 1
function findMin(arr, start) {
if (start === arr.length - 1) return arr[start];
return Math.min(arr[start], findMin(arr, start + 1));
}
const arr = [3, 5, 1, 6];
const min = findMin(arr, 0);
console.log('Minimum element = ', min);
This is a good problem for practicing recursion for beginners. You can also try these problems for practice.
Reverse a string using recursion.
Reverse a stack using recursion.
Sort a stack using recursion.
To me, it's more like this:
int f(int[] x)
{
var minimum = head of X;
if (x has tail)
{
var remainder = f(tail of x);
if (remainder < minimum)
{
minimum = remainder;
}
}
return minimum;
}
You have the right idea.
You've correctly observed that
min_recursive(array) = min(array[0], min_recursive(array[1:]))
The function doesn't care about who's calling it or what's going on outside of it -- it just needs to return the minimum of the array passed in. The base case is when the array has a single value. That value is the minimum of the array so it should just return it. Otherwise find the minimum of the rest of the array by calling itself again and compare the result with the head of the array.
The other answers show some coding examples.
This is a recursive solution to the problem that you posed, using JavaScript:
a = [5,12,3,5,34,12]
const min = a => {
if (!a.length) { return 0 }
if (a.length === 1) { return a[0] }
return Math.min(a[0], min(a.slice(1)))
}
min(a)
Note the approach which is to first detect the simplest case (empty array), then a more complex case (single element array), then finally a recursive call which will reduce more complex cases to functions of simpler ones.
However, you don't need recursion to traverse a one dimensional array.

What is the most efficient algorithm/data structure for finding the smallest range containing a point?

Given a data set of a few millions of price ranges, we need to find the smallest range that contains a given price.
The following rules apply:
Ranges can be fully nested (ie, 1-10 and 5-10 is valid)
Ranges cannot be partially nested (ie, 1-10 and 5-15 is invalid)
Example:
Given the following price ranges:
1-100
50-100
100-120
5-10
5-20
The result for searching price 7 should be 5-10
The result for searching price 100 should be 100-120 (smallest range containing 100).
What's the most efficient algorithm/data structure to implement this?
Searching the web, I only found solutions for searching ranges within ranges.
I've been looking at Morton count and Hilbert curve, but can't wrap my head around how to use them for this case.
Thanks.
Because you did not mention this ad hoc algorithm, I'll propose this as a simple answer to your question:
This is a python function, but it's fairly easy to understand and convert it in another language.
def min_range(ranges, value):
# ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20)]
# value = 100
# INIT
import math
best_range = None
best_range_len = math.inf
# LOOP THROUGH ALL RANGES
for b, e in ranges:
# PICK THE SMALLEST
if b <= value <= e and e - b < best_range_len:
best_range = (b, e)
best_range_len = e - b
print(f'Minimal range containing {value} = {best_range}')
I believe there are more efficient and complicated solutions (if you can do some precomputation for example) but this is the first step you must take.
EDIT : Here is a better solution, probably in O(log(n)) but it's not trivial. It is a tree where each node is an interval, and has a child list of all strictly non overlapping intervals that are contained inside him.
Preprocessing is done in O(n log(n)) time and queries are O(n) in worst case (when you can't find 2 ranges that don't overlap) and probably O(log(n)) in average.
2 classes: Tree that holds the tree and can query:
class tree:
def __init__(self, ranges):
# sort the ranges by lowest starting and then greatest ending
ranges = sorted(ranges, key=lambda i: (i[0], -i[1]))
# recursive building -> might want to optimize that in python
self.node = node( (-float('inf'), float('inf')) , ranges)
def __str__(self):
return str(self.node)
def query(self, value):
# bisect is for binary search
import bisect
curr_sol = self.node.inter
node_list = self.node.child_list
while True:
# which of the child ranges can include our value ?
i = bisect.bisect_left(node_list, (value, float('inf'))) - 1
# does it includes it ?
if i < 0 or i == len(node_list):
return curr_sol
if value > node_list[i].inter[1]:
return curr_sol
else:
# if it does then go deeper
curr_sol = node_list[i].inter
node_list = node_list[i].child_list
Node that holds the structure and information:
class node:
def __init__(self, inter, ranges):
# all elements in ranges will be descendant of this node !
import bisect
self.inter = inter
self.child_list = []
for i, r in enumerate(ranges):
if len(self.child_list) == 0:
# append a new child when list is empty
self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))
else:
# the current range r is included in a previous range
# r is not a child of self but a descendant !
if r[0] < self.child_list[-1].inter[1]:
continue
# else -> this is a new child
self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))
def __str__(self):
# fancy
return f'{self.inter} : [{", ".join([str(n) for n in self.child_list])}]'
def __lt__(self, other):
# this is '<' operator -> for bisect to compare our items
return self.inter < other
and to test that:
ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20), (50, 51)]
t = tree(ranges)
print(t)
print(t.query(10))
print(t.query(5))
print(t.query(40))
print(t.query(50))
Preprocessing that generates disjoined intervals
(I call source segments as ranges and resulting segments as intervals)
For ever range border (both start and end) make tuple: (value, start/end fiels, range length, id), put them in array/list
Sort these tuples by the first field. In case of tie make longer range left for start and right for end.
Make a stack
Make StartValue variable.
Walk through the list:
if current tuple contains start:
if interval is opened: //we close it
if current value > StartValue: //interval is not empty
make interval with //note id remains in stack
(start=StartValue, end = current value, id = stack.peek)
add interval to result list
StartValue = current value //we open new interval
push id from current tuple onto stack
else: //end of range
if current value > StartValue: //interval is not empty
make interval with //note id is removed from stack
(start=StartValue, end = current value, id = stack.pop)
add interval to result list
if stack is not empty:
StartValue = current value //we open new interval
After that we have sorted list of disjointed intervals containing start/end value and id of the source range (note that many intervals might correspond to the same source range), so we can use binary search easily.
If we add source ranges one-by-one in nested order (nested after it parent), we can see that every new range might generate at most two new intervals, so overall number of intervals M <= 2*N and overall complexity is O(Nlog N + Q * logN) where Q is number of queries
Edit:
Added if stack is not empty section
Result for your example 1-100, 50-100, 100-120, 5-10, 5-20 is
1-5(0), 5-10(3), 10-20(4), 20-50(0), 50-100(1), 100-120(2)
Since pLOPeGG already covered the ad hoc case, I will answer the question under the premise that preporcessing is performed in order to support multiple queries efficiently.
General data structures for efficient queries on intervals are the Interval Tree and the Segment Tree
What about an approach like this. Since we only allow nested and not partial-nesting. This looks to be a do-able approach.
Split segments into (left,val) and (right,val) pairs.
Order them with respect to their vals and left/right relation.
Search the list with binary search. We get two outcomes not found and found.
If found check if it is a left or right. If it is a left go right until you find a right without finding a left. If it is a right go left until you find a left without finding a right. Pick the smallest.
If not found stop when the high-low is 1 or 0. Then compare the queried value with the value of the node you are at and then according to that search right and left to it just like before.
As an example;
We would have (l,10) (l,20) (l,30) (r,45) (r,60) (r,100) when searching for say, 65 you drop on (r,100) so you go left and can't find a spot with a (l,x) such that x>=65 so you go left until you get balanced lefts and rights and first right and last left is your interval. The reprocessing part will be long but since you will keep it that way. It is still O(n) in worst-case. But that worst case requires you to have everything nested inside each other and you searching for the outer-most.

Fast set overlap matching algorithm

Let's say I have two sets:
A = [1, 3, 5, 7, 9, 11]
and
B = [1, 3, 9, 11, 12, 13, 14]
Both sets can be of arbitrary (and differing numbers of elements).
I am writing a performance critical application that requires me to perform a search to determine the number of elements which both sets have in common. I don't actually need to return the matches, only the number of matches.
Obviously, a naive method would be a brute force, but I suspect that is nowhere near optimal. Is there an algorithm for performing this type of operation?
If it helps, in all cases the sets will consists of integers.
If both sets are roughly the same size, walking over them in sync, similar to a merge sort merge operation, is about as fast as it gets.
Look at the first elements.
If they match, you add that element to your result, and move both pointers forward.
Otherwise, you move the pointer that points to the smallest value forward.
Some pseudo-Python:
a = []
b = []
res = []
ai = 0
bi = 0
while ai < len(a) and bi < len(b):
if a[ai] == b[bi]:
res += a[ai]
ai+=1
bi+=1
elif a[ai] < b[bi]:
ai+=1
else:
bi+=1
return res
If one set is significantly larger than the other, you can use binary search to look for each item from the smaller in the larger.
Here is the idea (very high level description though).
By the way, I'll take the liberty to assume that the numbers in each set are not appearing more than once, for instance [1,3,5,5,7,7,9,11] will not take place.
You define two variables that will hold the indices you are examining in each array.
You start with the first number of each set and compare them. Two possible conditions: they are equal or one is bigger than the other.
If they are equal, you count the event and move the pointers in both arrays to the next element.
If they differ, you move the pointer of the lower value to the next element in the array and repeat the process (compare both values).
The loop ends when you reach the last element of either array.
Hope I was able to explain it in a clear way.
If both set are sorted, the smallest element of both sets is either the minimum of the first set, or the minimum of second set. If it's the min of the first set, then the next smallest element is either the minimum of the second set or the 2nd minimum of first set. If you repeat this till the end of both sets you have ordered both set. For your specific problem you just need to compare if elements are also equals.
You can iterate over the union of both sets with the following algorithm:
intersection_set_cardinality(s1, s2)
{
iterator i = begin(s1);
iterator j = begin(s2);
count = 0;
while(i != end(s1) && j != end(s2))
{
if(elt(i) == elt(j))
{
count = count + 1;
i = i + 1;
j = j + 1;
}
else if(elt(i) < elt(j))
{
i = i + 1;
}
else
{
j = j + 1;
}
}
return count
}

Number of distinct sequences of fixed length which can be generated using a given set of numbers

I am trying to find different sequences of fixed length which can be generated using the numbers from a given set (distinct elements) such that each element from set should appear in the sequence. Below is my logic:
eg. Let the set consists of S elements, and we have to generate sequences of length K (K >= S)
1) First we have to choose S places out of K and place each element from the set in random order. So, C(K,S)*S!
2) After that, remaining places can be filled from any values from the set. So, the factor
(K-S)^S should be multiplied.
So, overall result is
C(K,S)S!((K-S)^S)
But, I am getting wrong answer. Please help.
PS: C(K,S) : No. of ways selecting S elements out of K elements (K>=S) irrespective of order. Also, ^ : power symbol i.e 2^3 = 8.
Here is my code in python:
# m is the no. of element to select from a set of n elements
# fact is a list containing factorial values i.e. fact[0] = 1, fact[3] = 6& so on.
def ways(m,n):
res = fact[n]/fact[n-m+1]*((n-m)**m)
return res
What you are looking for is the number of surjective functions whose domain is a set of K elements (the K positions that we are filling out in the output sequence) and the image is a set of S elements (your input set). I think this should work:
static int Count(int K, int S)
{
int sum = 0;
for (int i = 1; i <= S; i++)
{
sum += Pow(-1, (S-i)) * Fact(S) / (Fact(i) * Fact(S - i)) * Pow(i, K);
}
return sum;
}
...where Pow and Fact are what you would expect.
Check out this this math.se question.
Here's why your approach won't work. I didn't check the code, just your explanation of the logic behind it, but I'm pretty sure I understand what you're trying to do. Let's take for example K = 4, S = {7,8,9}. Let's examine the sequence 7,8,9,7. It is a unique sequence, but you can get to it by:
Randomly choosing positions 1,2,3, filling them randomly with 7,8,9 (your step 1), then randomly choosing 7 for the remaining position 4 (your step 2).
Randomly choosing positions 2,3,4, filling them randomly with 8,9,7 (your step 1), then randomly choosing 7 for the remaining position 1 (your step 2).
By your logic, you will count it both ways, even though it should be counted only once as the end result is the same. And so on...

Is it possible to rearrange an array in place in O(N)?

If I have a size N array of objects, and I have an array of unique numbers in the range 1...N, is there any algorithm to rearrange the object array in-place in the order specified by the list of numbers, and yet do this in O(N) time?
Context: I am doing a quick-sort-ish algorithm on objects that are fairly large in size, so it would be faster to do the swaps on indices than on the objects themselves, and only move the objects in one final pass. I'd just like to know if I could do this last pass without allocating memory for a separate array.
Edit: I am not asking how to do a sort in O(N) time, but rather how to do the post-sort rearranging in O(N) time with O(1) space. Sorry for not making this clear.
I think this should do:
static <T> void arrange(T[] data, int[] p) {
boolean[] done = new boolean[p.length];
for (int i = 0; i < p.length; i++) {
if (!done[i]) {
T t = data[i];
for (int j = i;;) {
done[j] = true;
if (p[j] != i) {
data[j] = data[p[j]];
j = p[j];
} else {
data[j] = t;
break;
}
}
}
}
}
Note: This is Java. If you do this in a language without garbage collection, be sure to delete done.
If you care about space, you can use a BitSet for done. I assume you can afford an additional bit per element because you seem willing to work with a permutation array, which is several times that size.
This algorithm copies instances of T n + k times, where k is the number of cycles in the permutation. You can reduce this to the optimal number of copies by skipping those i where p[i] = i.
The approach is to follow the "permutation cycles" of the permutation, rather than indexing the array left-to-right. But since you do have to begin somewhere, everytime a new permutation cycle is needed, the search for unpermuted elements is left-to-right:
// Pseudo-code
N : integer, N > 0 // N is the number of elements
swaps : integer [0..N]
data[N] : array of object
permute[N] : array of integer [-1..N] denoting permutation (used element is -1)
next_scan_start : integer;
next_scan_start = 0;
while (swaps < N )
{
// Search for the next index that is not-yet-permtued.
for (idx_cycle_search = next_scan_start;
idx_cycle_search < N;
++ idx_cycle_search)
if (permute[idx_cycle_search] >= 0)
break;
next_scan_start = idx_cycle_search + 1;
// This is a provable invariant. In short, number of non-negative
// elements in permute[] equals (N - swaps)
assert( idx_cycle_search < N );
// Completely permute one permutation cycle, 'following the
// permutation cycle's trail' This is O(N)
while (permute[idx_cycle_search] >= 0)
{
swap( data[idx_cycle_search], data[permute[idx_cycle_search] )
swaps ++;
old_idx = idx_cycle_search;
idx_cycle_search = permute[idx_cycle_search];
permute[old_idx] = -1;
// Also '= -idx_cycle_search -1' could be used rather than '-1'
// and would allow reversal of these changes to permute[] array
}
}
Do you mean that you have an array of objects O[1..N] and then you have an array P[1..N] that contains a permutation of numbers 1..N and in the end you want to get an array O1 of objects such that O1[k] = O[P[k]] for all k=1..N ?
As an example, if your objects are letters A,B,C...,Y,Z and your array P is [26,25,24,..,2,1] is your desired output Z,Y,...C,B,A ?
If yes, I believe you can do it in linear time using only O(1) additional memory. Reversing elements of an array is a special case of this scenario. In general, I think you would need to consider decomposition of your permutation P into cycles and then use it to move around the elements of your original array O[].
If that's what you are looking for, I can elaborate more.
EDIT: Others already presented excellent solutions while I was sleeping, so no need to repeat it here. ^_^
EDIT: My O(1) additional space is indeed not entirely correct. I was thinking only about "data" elements, but in fact you also need to store one bit per permutation element, so if we are precise, we need O(log n) extra bits for that. But most of the time using a sign bit (as suggested by J.F. Sebastian) is fine, so in practice we may not need anything more than we already have.
If you didn't mind allocating memory for an extra hash of indexes, you could keep a mapping of original location to current location to get a time complexity of near O(n). Here's an example in Ruby, since it's readable and pseudocode-ish. (This could be shorter or more idiomatically Ruby-ish, but I've written it out for clarity.)
#!/usr/bin/ruby
objects = ['d', 'e', 'a', 'c', 'b']
order = [2, 4, 3, 0, 1]
cur_locations = {}
order.each_with_index do |orig_location, ordinality|
# Find the current location of the item.
cur_location = orig_location
while not cur_locations[cur_location].nil? do
cur_location = cur_locations[cur_location]
end
# Swap the items and keep track of whatever we swapped forward.
objects[ordinality], objects[cur_location] = objects[cur_location], objects[ordinality]
cur_locations[ordinality] = orig_location
end
puts objects.join(' ')
That obviously does involve some extra memory for the hash, but since it's just for indexes and not your "fairly large" objects, hopefully that's acceptable. Since hash lookups are O(1), even though there is a slight bump to the complexity due to the case where an item has been swapped forward more than once and you have to rewrite cur_location multiple times, the algorithm as a whole should be reasonably close to O(n).
If you wanted you could build a full hash of original to current positions ahead of time, or keep a reverse hash of current to original, and modify the algorithm a bit to get it down to strictly O(n). It'd be a little more complicated and take a little more space, so this is the version I wrote out, but the modifications shouldn't be difficult.
EDIT: Actually, I'm fairly certain the time complexity is just O(n), since each ordinality can have at most one hop associated, and thus the maximum number of lookups is limited to n.
#!/usr/bin/env python
def rearrange(objects, permutation):
"""Rearrange `objects` inplace according to `permutation`.
``result = [objects[p] for p in permutation]``
"""
seen = [False] * len(permutation)
for i, already_seen in enumerate(seen):
if not already_seen: # start permutation cycle
first_obj, j = objects[i], i
while True:
seen[j] = True
p = permutation[j]
if p == i: # end permutation cycle
objects[j] = first_obj # [old] p -> j
break
objects[j], j = objects[p], p # p -> j
The algorithm (as I've noticed after I wrote it) is the same as the one from #meriton's answer in Java.
Here's a test function for the code:
def test():
import itertools
N = 9
for perm in itertools.permutations(range(N)):
L = range(N)
LL = L[:]
rearrange(L, perm)
assert L == [LL[i] for i in perm] == list(perm), (L, list(perm), LL)
# test whether assertions are enabled
try:
assert 0
except AssertionError:
pass
else:
raise RuntimeError("assertions must be enabled for the test")
if __name__ == "__main__":
test()
There's a histogram sort, though the running time is given as a bit higher than O(N) (N log log n).
I can do it given O(N) scratch space -- copy to new array and copy back.
EDIT: I am aware of the existance of an algorithm that will proceed through. The idea is to perform the swaps on the array of integers 1..N while at the same time mirroring the swaps on your array of large objects. I just cannot find the algorithm right now.
The problem is one of applying a permutation in place with minimal O(1) extra storage: "in-situ permutation".
It is solvable, but an algorithm is not obvious beforehand.
It is described briefly as an exercise in Knuth, and for work I had to decipher it and figure out how it worked. Look at 5.2 #13.
For some more modern work on this problem, with pseudocode:
http://www.fernuni-hagen.de/imperia/md/content/fakultaetfuermathematikundinformatik/forschung/berichte/bericht_273.pdf
I ended up writing a different algorithm for this, which first generates a list of swaps to apply an order and then runs through the swaps to apply it. The advantage is that if you're applying the ordering to multiple lists, you can reuse the swap list, since the swap algorithm is extremely simple.
void make_swaps(vector<int> order, vector<pair<int,int>> &swaps)
{
// order[0] is the index in the old list of the new list's first value.
// Invert the mapping: inverse[0] is the index in the new list of the
// old list's first value.
vector<int> inverse(order.size());
for(int i = 0; i < order.size(); ++i)
inverse[order[i]] = i;
swaps.resize(0);
for(int idx1 = 0; idx1 < order.size(); ++idx1)
{
// Swap list[idx] with list[order[idx]], and record this swap.
int idx2 = order[idx1];
if(idx1 == idx2)
continue;
swaps.push_back(make_pair(idx1, idx2));
// list[idx1] is now in the correct place, but whoever wanted the value we moved out
// of idx2 now needs to look in its new position.
int idx1_dep = inverse[idx1];
order[idx1_dep] = idx2;
inverse[idx2] = idx1_dep;
}
}
template<typename T>
void run_swaps(T data, const vector<pair<int,int>> &swaps)
{
for(const auto &s: swaps)
{
int src = s.first;
int dst = s.second;
swap(data[src], data[dst]);
}
}
void test()
{
vector<int> order = { 2, 3, 1, 4, 0 };
vector<pair<int,int>> swaps;
make_swaps(order, swaps);
vector<string> data = { "a", "b", "c", "d", "e" };
run_swaps(data, swaps);
}

Resources