Closest equal numbers - algorithm

Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.
Examples: (interval l,k shown as |...|)
1 2 2 |1 0 1| 2 3 0 1 2 3
Answer 2 (101)
1 |2 2| 1 0 1 2 3 0 1 2 3
Answer 1 (22)
1 2 2 1 0 |1 2 3 0 3 2 3|
Answer 2 (303) or (323)
I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?
Clarification
Thanks for your answers.
The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?

Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.
In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.
Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.

We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.
Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.
Suppose we have this [l,k] range from your second example:
1 2 3 0 3 2 3
We can mark them as
1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)
and sort them as
0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)
Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).
Thus we get
[3,5] in [l,k] or [5,7] in [l,k]
EDIT
Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.
EDIT 2
This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.
First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.
We can imagine that:
[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)
We have one base case
[l,k] = infinity where l = k
If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.
Using this algorithm, we can run the following
for l = n downto 1
for k = l to n
M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)
You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.

Related

How to find length of largest contiguous segment which forms an arithmetic progression in range [L R] of a given array of length N?

Consider Array 1 2 3 5 5
for query [L R D]=[1 5 1], output is 3
for query [1 1 1], output is 1
Also there are Q queries to this questions where 0<Q<10^6 so Brute Forces is not working !
Note: indexing starts at 1
Note2: D represents common difference of AP
Your question can be simplified to find the length longest arithmetic progression with the given common difference, with the array = [L, R] and sizen = R - L + 1
You can then find to solution from Geeksforgeeks
Naive Approach: For each element calculate the length of the longest AP it could form and print the maximum among them. It involves
O(n^2) time complexity.
An efficient approach is to use Hashing.
Create a map where the key is the starting element of an AP and its
value is the number of elements in that AP. The idea is to update the
value at key ‘a'(which is at index i and whose starting element is not
yet there in the map) by 1 whenever the element at index j(>i) could
be in the AP of ‘a'(as the starting element). Then we print the
maximum value among all values in the map.

Is it possible to query number of distinct integers in a range in O(lg N)?

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.
Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.
If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
Sort queries based on r values.
Make a Segment Tree for range sum queries [0, n]
For each value in input array from left to right:
Increment by 1 at current index i in the segment tree.
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.
There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.
Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.
For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.
Solution Idea:
First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query.
Now sort the vector v on the basis of the endpoint of a query.
Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].
Initially, all elements of the tree are zero and all elements of the last are -1.
now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].
last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree.
if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.
Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1.
6. Now print the answer[] array which contains your final answer in the given input order.
the complexity of the algorithm is O(n log n).
The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.
Overall time complexity is O(N*SQRT(N)).
Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.
kd-trees provide range queries in O(logn), where n is the number of points.
If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:
O(logdn + k)
where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Bentley is an important name when it comes to this field. :)

Inversion distance

First of all let's recall definition of inversion.
Inversion of some sequence S which contains numbers is situation when S[i] > S[j] and i < j or frankly speaking it's situation when we have disordered elements. For instance for sequence:
1 4 3 7 5 6 2
We have following inversions (4,3), (4,2), (3,2), (7,5), etc.
We state problem as follows: distance of inversion is maximum (in terms of indexing) distance between two values that are inversion. For out example we can perform human-brain searching that gives us pair (4,2) <=> (S[1], S[6]) and there for index distance is 6-1 = 5 which is maximum possible for this case.
This problem can be solved trivial way in O(n^2) by finding all inversions and keeping max distance (or updated if we find better option)
We can also perform better inversion searching using merge sort and therefore do the same in O(nlogn). Is there any possibility for existence of O(n) algorithm? Take in mind that we just want maximum distance, we don't want to find all inversions. Elaborate please.
Yes, O(n) algorithm is possible.
We could extract strictly increasing subsequence with greedy algorithm:
source: 1 4 3 7 5 6 2
strictly increasing subsequence: 1 4 7
Then we could extract strictly decreasing subsequence going backwards:
source: 1 4 3 7 5 6 2
strictly decreasing subsequence: 1 2
Note that after this strictly decreasing subsequence is found we could interpret it as increasing sequence (in normal direction).
For each element of these subsequences we need to store their index in source sequence.
Now "inversion distance" could be found by merging these two subsequences (similar to merge sort mentioned in OP, but only one merge pass is needed):
merge 1 & 1 ... no inversion, advance both indices
merge 4 & 2 ... inversion found, distance=5, should advance second index,
but here is end of subsequence, so we are done, max distance = 5
Maybe my idea is the same as #Evgeny.
Here is the explanation:
make a strictly increasing array from the beginning we call it array1
make a strictly decreasing array from the ending which is array2 (But keep the values in increasing order)
***Keep track of original indexes of the values of both arrays.
Now start from the beginning of both arrays.
Do this loop following untill array1 or array2 checking is complete
While( array1[index] > arry2[index] )
{
check the original distance between array1 index and arry2 index.
Update result accordingly.
increase array2 index.
}
increase both array index
Continue with the loop
At the end of this process you will have the maximum result. Proof of this solution is not that complex, you can try it yourself.

Number of Ranges that lie completely in a particular Range

Let us say we have n ranges each specified by
[l_i,r_i] where 1<=i<=n
And we have a query of type [L,R] in which we have to find the number of ranges among those n ranges that lie completely in the given range i.e. [L,R]
Example:
n ranges are:
here n is 2.
2 4
3 3
for query 3 5, output should be 1.
for query 2 5 output should be 2.
I know method for O(m*n), where n is number of ranges and m is number of queries, but feels like there must be a more efficient implementation.
Yes, there is. The data structure that you want is called an interval tree.
For the following solution each query has complexity O(log(n)) but it requires to store (n^2/2) ranges and preprocessing requires to sort the ranges (complexity O(n*n log(n)) using quicksort):
Preprocessing:
Sort the n ranges by L_i
For each range [L_i,R_i]:
a) find the subset S_i of all ranges [L_j,R_j] for which L_j >= L_i is true.
b) sort S_i by R_j
Query [L,R]:
Find the S_i with the smallest L_i so that L_i >=L using binary search (complexity O(log(n)). S_i contains all candidate ranges.
Find in S_i the largest R_j so that R_j <= R again using binary search. The index of this entry corresponds to the number of ranges which satisfy your condition.
Lets get back to your example:
S_1: [3 3] [2 4]
S_2: [3 3]
query 3 5 ends up in S_2. Apparently, there is one entry satisfying also the second condition
query 2 5 ends up in S_2. The second entry is the largest one satisfying the R condition and thus the result is 2

How to find number of expected swaps in bubble sort in better than O(n^2) time

I am stuck on problem http://www.codechef.com/JULY12/problems/LEBOBBLE
Here it is required to find number of expected swaps.
I tried an O(n^2) solution but it is timing out.
The code is like:
swaps = 0
for(i = 0;i < n-1;i++)
for(j = i+1;j<n;j++)
{
swaps += expected swap of A[i] and A[j]
}
Since probabilities of elements are varying, so every pair is needed to be compared. So according to me the above code snippet must be most efficient but it is timing out.
Can it be done in O(nlogn) or it any complexity better than O(n^2).
Give me any hint if possible.
Alright, let's think about this.
We realize that every number needs to be eventually swapped with every number after it that's less than it, sooner or later. Thus, the total number of swaps for a given number is the total number of numbers after it which are less than it. However, this is still O(n^2) time.
For every pass of the outer loop of bubble sort, one element gets put in the correct position. Without loss of generality, we'll say that for every pass, the largest element remaining gets sorted to the end of the list.
So, in the first pass of the outer loop, the largest number is put at the end. This takes q swaps, where q is the number of positions the number started away from the final position.
Thus, we can say that it will take q1+q2+ ... +qn swaps to complete this bubble sort. However, keep in mind that with every swap, one number will be taken either one position closer or one position farther away from their final positions. In our specific case, if a number is in front of a larger number, and at or in front of its correct position, one more swap will be required. However, if a number is behind a larger number and behind it's correct position, one less swap will be required.
We can see that this is true with the following example:
5 3 1 2 4
=> 3 5 1 2 4
=> 3 1 5 2 4
=> 3 1 2 5 4
=> 3 1 2 4 5
=> 1 3 2 4 5
=> 1 2 3 4 5 (6 swaps total)
"5" moves 4 spaces. "3" moves 1 space. "1" moves 2 spaces. "2" moves 2 spaces. "4" moves 1 space. Total: 10 spaces.
Note that 3 is behind 5 and in front of its correct position. Thus one more swap will be needed. 1 and 2 are behind 3 and 5 -- four less swaps will be needed. 4 is behind 5 and behind its correct position, thus one less swap will be needed. We can see now that the expected value of 6 matches the actual value.
We can compute Σq by sorting the list first, keeping the original positions of each of the elements in memory while doing the sort. This is possible in O(nlogn + n) time.
We can also see what numbers are behind what other numbers, but this is impossible to do in faster than O(n^2) time. However, we can get a faster solution.
Every swap effectively moves two numbers number needs to their correct positions, but some swaps actually do nothing, because one be eventually swapped with every number gets closerafter it that's less than it, and another gets farthersooner or later. The first swap in our previous exampleThus, between "3" and "5" is the only example of this in our example.
We have to calculate how many total number of said swaps that there are. This is left as an exercise to the reader, but here's one last hint: you only have to loop through the first half of the list. Though this for a given number is still, in the end O(n^2), we only have to do O(n^2) operations on the first half total number of the list, making numbers after it much faster overall.
Use divide and conquer
divide: size of sequence n to two lists of size n/2
conquer: count recursively two lists
combine: this is a trick part (to do it in linear time)
combine use merge-and-count. Suppose the two lists are A, B. They are already sorted. Produce an output list L from A, B while also counting the number of inversions, (a,b) where a is-in A, b is-in B and a > b.
The idea is similar to "merge" in merge-sort. Merge two sorted lists into one output list, but we also count the inversion.
Everytime a_i is appended to the output, no new inversions are encountered, since a_i is smaller than everything left in list B. If b_j is appended to the output, then it is smaller than all the remaining items in A, we increase the number of count of inversions by the number of elements remaining in A.
merge-and-count(A,B)
; A,B two input lists (sorted)
; C output list
; i,j current pointers to each list, start at beginning
; a_i, b_j elements pointed by i, j
; count number of inversion, initially 0
while A,B != empty
append min(a_i,b_j) to C
if b_j < a_i
count += number of element remaining in A
j++
else
i++
; now one list is empty
append the remainder of the list to C
return count, C
With merge-and-count, we can design the count inversion algorithm as follows:
sort-and-count(L)
if L has one element return 0
else
divide L into A, B
(rA, A) = sort-and-count(A)
(rB, B) = sort-and-count(B)
(r, L) = merge-and-count(A,B)
return r = rA+rB+r, L
T(n) = O(n lg n)

Resources