A data structure to find a predecessor in a given range - algorithm

Given a list of keys, says [2, 6, 4, 9, 3], how can I find the predecessor of an element, with index left to the element only? For example
The predecessor of 6 should be 2, not 4, because 4 is on the right of 6.
The predecessor of 4 should be 2, not 3.
We know that in a balanced binary search tree, we can find predecessor and successor for given key in O(log n) time complexity, but that is not exactly what I wanted.
I seems like I wanted a data structure with functions from both BST and Interval Tree.
But I don't know how to combine them.

Related

Minimum operations to make K Non Decreasing Array [duplicate]

We know about an algorithm that will find the Longest Increasing subsequence in O(nlogn). I was wondering whether we can find the Longest non-decreasing subsequence with similar time complexity?
For example, consider an array : (4,10,4,8,9).
The longest increasing subsequence is (4,8,9).
And a longest non-decreasing subsequence would be (4,4,8,9).
First, here’s a “black box” approach that will let you find the longest nondecreasing subsequence using an off-the-shelf solver for longest increasing subsequences. Let’s take your sample array:
4, 10, 4, 8, 9
Now, imagine we transformed this array as follows by adding a tiny fraction to each number:
4.0, 10.1, 4.2, 8.3, 9.4
Changing the numbers this way will not change the results of any comparisons between two different integers, since the integer components have a larger magnitude difference than the values after the decimal point. However, if you compare the two 4s now, the latter 4 compares bigger than the previous one. If you now find the longest nondecreasing subsequence, you get back [4.0, 4.2, 8.3, 9.4], which you can then map back to [4, 4, 8, 9].
More generally, if you’re working with an array of n integer values, you can add i / n to each of the numbers, where i is its index, and you’ll be left with a sequence of distinct numbers. From there running a regular LIS algorithm will do the trick.
If you can’t work with fractions this way, you could alternatively multiply each number by n and then add in i, which also works.
On the other hand, suppose you have the code for a solver for LIS and want to convert it to one that solves the longest nondecreasing subsequence problem. The reasoning above shows that if you treat later copies of numbers as being “larger” than earlier copies, then you can just use a regular LIS. Given that, just read over the code for LIS and find spots where comparisons are made. When a comparison is made between two equal values, break the tie by considering the later appearance to be bigger than the earlier one.
I think the following will work in O(nlogn):
Scan the array from right to left, and for each element solve a subproblem of finding a longest subsequence starting from the given element of the array. E.g. if your array has indices from 0 to 4, then you start with the subarray [4,4] and check what's the longest sequence starting from 4, then you check subarray [3,4] and what's the longest subsequence starting from 3, next [2,4], and so on, until [0,4]. Finally, you choose the longest subsequence established in either of the steps.
For the last element (so subarray [4,4]) the longest sequence is always of length 1.
When in the next iteration you consider another element to the left (e.g., in the second step you consider the subarray [3,4], so the new element is element with the index 3 in the original array) you check if that element is not greater than some of the elements to its right. If so, you can take the result for some element from the right and add one.
For instance:
[4,4] -> longest sequence of length 1 (9)
[3,4] -> longest sequence of length 2 (8,9) 1+1 (you take the longest sequence from above which starts with 9 and add one to its length)
[2,4] -> longest sequence of length 3 (4,8,9) 2+1 (you take the longest sequence from above, i.e. (8,9), and add one to its length)
[1,4] -> longest sequence of length 1 (10) nothing to add to (10 is greater than all the elements to its right)
[0,4] -> longest sequence of length 4 (4,4,8,9) 3+1 (you take the longest sequence above, i.e. (4,8,9), and add one to its length)
The main issue is how to browse all the candidates to the right in logarithmic time. For that you keep a sorted map (a balanced binary tree). The keys are the already visited elements of the array. The values are the longest sequence lengths obtainable from that element. No need to store duplicates - among duplicate keys store the entry with largest value.

Create a binary tree in O(n)

I have a sequence with n numbers and i want to create a data structure to answer the following question:
sequence n = [5 ,7, 4, 24, 8, 3, 12, 34]
I want the min(2,5) then the answer is 3 because a2=7, a3=4, a4=24, a5=8. So the min(i,j) returns the position of minimum number between (i,j).
I thought that a good data structure to save this sequence would be a complete binary tree to save the sequence numbers at leaves. But how can i implement this structure in O(n)?
All you need is a Segment Tree with range minimum query. Here is detailed explanation of it. Building time is O(n), because there are in tree no more than 2 * n nodes, so final time complexity will be O(n).
If you need to find not only the minimum value, but also the position, then inside the vertex you need to store not only the minimum, but also where it was reached. How to update such a structure seems clear: when you recalculate the minimum in the father, you need to see from which son it is received and take the corresponding position of the minimum from the son. For leaves, the positions are equal to the positions of the leaves themselves.

binary search in array that contains range

let's say that we have an ordered array contains elements like this ,
[1, 2-5, 6, 8-9, 11-13] , 2-5 is a range that represents 2, 3, 4 and 5, if we want to find "4" then index 1 (start from 0) is answer we need.
It's that possible we apply binary search like this type of elements with constans space and log(n) time?
You can just use binary search, the concept will also work with the ranges like a charm. Actually this is a concept commonly used to reduce time and space complexity, for example in gap encoding.
However you need to write it on your own instead of using any library as the library-method will probably not accept the ranges.
Let us briefly go through the execution of a binary search on your given input of [1, 2-5, 6, 8-9, 11-13] searching for the value 4 which is at index 1.
The array [1, 2-5, 6, 8-9, 11-13] has length 5, we decide for the index in the middle which is 2. It reads the value 6 there. We search for the value 4 so we continue the search to the left.
We now reduced the search interval to [1, 2-5, 6], length 3 and we decide for the middle index 1. It reads 2-5. As 4 is inside that range we have finished and return index 1 as result.
If for example it would read 5-7 there then we would continue the search to the left as 4 is not inside 5-7. Analogously we would continue the search to the right if it would read 1-3.
Here is an explanation of binary search with some pseudo-code: Binary search algorithm at Wikipedia
If you have problems implementing than just edit your question and show us what you have done so far, we will then adapt and help.

Algorithm for finding change in progression of two sets of arrays

I'm looking for an algorithm that will search for a change in the progression of two sets of numbers with the same length. The set starts at the same number all the time. For example:
Assumptions:
1. Arrays 1 and 2 are the same length
2. Progressions are not available at the start, and need to be computed. But computing it will be expensive with resources.
Array 1 [1, 3, 5, 7, 10]
Progression: +2, +2, +2, +3
Array 2 [1, 2, 4, 6, 5]
Progression: +1, +2, +2, -1
Result: Array or numbers deviates on first progression by -1 and last progression by -4.
Is there a way to do this without resorting to any sort of linear search?
Since you are only concerned with the first and last deviations you can search from the front of the list until you find the first deviation (if you don't find one then you obviously don't need to do the second step), and then search from the end of the list until you either find the last one, or reach the position where you found the first one (in which case there is only 1 deviation).
However, I don't believe you can do it without any sort of linear search, and in the worst case you would end up having to check every single deviation.

Check for duplicate subsequences of length >= N in sequence

I have a sequence of values and I want to know if it contains an repeated subsequences of a certain minimum length. For instance:
1, 2, 3, 4, 5, 100, 99, 101, 3, 4, 5, 100, 44, 99, 101
Contains the subsequence 3, 4, 5, 100 twice. It also contains the subsequence 99, 101 twice, but that subsequence is two short to care about.
Is there an efficient algorithm for checking the existence of such a subsequence? I'm not especially interested in location the sequences (though that would be helpful for verification), I'm primarily just interested in a True/False answer, given a sequence and a minimum subsequence length.
My only approach so far is to brute force search it: for each item in the sequence, find all the other locations where the item occurs (already at O(N^2)), and then walk forward one step at a time from each location and see if the next item matches, and keep going until I find a mismatch or find a matching subsequence of sufficient length.
Another thought I had but haven't been able to develop into an actual approach is to build a tree of all the sequences, so that each number is a node, and a child of its the number that preceded it, whereever that node happens to already be in the tree.
There are O(k) solutions (k - the length of the whole sequence) for any value of N.
Solution #1: Build a suffix tree for the input sequence(using Ukkonen's algorithm). Iterate over the nodes with two or more children and check if at least one of them has depth >= N.
Solution #2: Build a suffix automaton for the input sequence.Iterate over all the states which right context contains at least two different strings and check if at least one of those nodes has distance >= N from the initial state of the automaton.
Solution #3:Suffix array and the longest common prefix technique can also be used(build the suffix array for input sequence , compute the longest common prefix array, check that there is a pair of adjacent suffices with common prefix with length at least N).
These solutions have O(k) time complexity under the assumption that alphabet size is constant(alphabet consists of all elements of the input sequence).
If it is not the case, it is still possible to obtain O(k log k) worst case time complexity(by storing all transitions in a tree or in an automaton in a map) or O(k) on average using hashmap.
P.S I use terms string and sequence interchangeably here.
If you only care about subsequences of length exactly N (for example, if just want to check that there are no duplicates), then there is a quadratic solution: use the KMP algorithm for every subsequence.
Let's assume that there are k elements in the whole sequence.
For every subsequence of length N (O(k) of them):
Build its failure function (takes O(N))
Search for it in the remainder of the sequence (takes O(k))
So, assuming N << k, the whole algorithm is indeed O(k^2).
Since your list is unordered, you're going to have to visit every item at least once.
What I'm thinking is that you first go through your list and create a dictionary where you store the number as a key along with all the indices it appears in your sequence. Like:
Key: Indices
1: 0
2: 1
3: 2, 8
....
Where the number 1 appears at index 0, the number 2 appears at index 1, the number 3 appears at indices 2 and 8, and so on.
With that created you can then go through the dictionary keys and start comparing it against the sequences at the other locations. This should save on some of the brute force since you don't have to revisit each number through the initial sequence each time.

Resources