Number of different marks - algorithm

I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?

Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6

Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.

I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.

Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*

Related

Find minimum no of swaps required to move all 1's together in a binary array

Eg: Array : [0,1,0,1,1,0,0]
Final Array: [0,0,1,1,1,0,0] , So swaps required = 1
i need a O(n) or O(nlogn) solution
You can do it in O(n):
In one pass through the data, determine the number of 1s. Call this k (it is just the sum of the elements in the list).
In a second pass through the data, use a sliding window of width k to find the number, m which is the maximum number of 1s in any window of size k. Since this is homework, I'll leave the details to you, but it can be done in O(n).
Then: the minimal number of swaps is k-m.
EDIT This answer assumes that only two neighboring cells can be swapped. If the distance between the two swapped elements is arbitrary, see #JohnColeman's answer.
This can be done easily in linear time.
Suppose that the array is called a and its size is n.
Allocate integer array b of size n. Walk from left to right, save in b[i] the number of ones seen so far in a[0], ..., a[i].
Allocate integer array c of size n. Walk from right to left, save in c[i] the number of ones seen so far in a[i], ..., a[N - 1].
Initialize integer res = 0. Walk through a one last time. For each i with a[i] = 0, add res += min(b[i] c[i])
Output res
Why this works? Each zero must somehow bubble out of the block of ones. So, every zero must either "bubble-up" past all ones to the right of it, or it must "bubble-down" past all ones to the left of it. Swapping zeros with zeros is waste of time, therefore the process of zero-eviction from the homogeneous block of ones must start with those zeros that are as close to the first 1 or the last 1 as possible. This means, that every zero will have to make exactly min(b[i], c[i]) swaps with 1s to exit the homogeneous block of ones.
Example:
a = [0,1,0,1,1,0,1,0,1,0,1,0]
b = [0,1,1,2,3,3,4,4,5,5,6,6]
c = [6,6,5,5,4,3,3,2,2,1,1,0]
now, min(b,c) would be (no need to compute it explicitly):
m = [0,1,1,2,3,3,3,2,2,1,1,0]
^ ^ ^ ^ ^ ^
The interesting values of min(b[i], c[i]) which correspond to 0s are marked with ^. Summing it up yields: 0 + 1 + 3 + 2 + 1 + 0 = 7.
Indeed:
[0,1,0,1,1,0,1,0,1,0,1,0]
[0,0,1,1,1,0,1,0,1,0,1,0] 1
[0,0,1,1,1,0,1,0,1,1,0,0] 2 = 1 + 1
[0,0,1,1,1,0,1,1,0,1,0,0] 3
[0,0,1,1,1,0,1,1,1,0,0,0] 4 = 1 + 1 + 2
[0,0,1,1,0,1,1,1,1,0,0,0] 5
[0,0,1,0,1,1,1,1,1,0,0,0] 6
[0,0,0,1,1,1,1,1,1,0,0,0] 7 = 1 + 1 + 2 + 3
done: block of ones homogeneous.
Runtime for computation of the number res of swaps is obviously O(n). (Note: it does NOT say that the number of swaps is itself O(n)).
Let's consider each 1 as a potential static point. Then the cost for the left side of the static point would be the number of 1's to the left subtracted by the number of 1's already in the section it would naturally extend to, the length of which is the number of 1's on the left. Similarly for the right side.
Now find a way to do it efficiently for each potential static 1 :) Hint: think about how we could update those values as we iterate across the array.
1 0 1 0 1 1 0 0 1 0 1 1
x potential static point
<----- would extend to
-----> would extend to
left cost at x: 3 - 2 = 1
right cost at x: 3 - 1 = 2

Maximise the minimum difference [duplicate]

This question already has answers here:
Take K elements and maximise the minimum distance
(2 answers)
Closed 7 years ago.
We are given N elements in form of array A , Now we have to choose K indexes from N given indexes such that for any 2 indexes i and j minimum value of |A[i]-A[j]| is as large as possible. We need to tell this maximum value.
Lets take an example : Let N=5 and K=2 and array be [1,5,3,7,11] then here answer is 10 as we can simply choose first and last position and differ = 11-1=10.
Example 2 : Let N=10 and K=3 and array A be [3 9 6 11 15 20 23] then here answer will be 8. As we can select [3,11,23] or [3,15,23].
Now given N , K and Array A we need to find this maximum difference.
We are given that 1 ≤ N ≤ 10^5 and 1 ≤ S ≤ 10^7
Let's sort the array.
Now we can do a binary search over the answer.
For a fixed candidate x, we can just pick the elements greedily(iterating over the sorted array and taking each element if we can). If the number of elements we have picked is not less than K, x is feasible. Otherwise, it is not.
The time complexity is O(N * log N + N * log (MAX_ELEMENT - MIN_ELEMENT))
A pseudo code:
bool isFeasible(int x):
cnt = 1
last = a[0]
for i <- 1 ... n - 1:
if a[i] - last >= x:
last = a[i]
cnt++
return cnt >= k
sort(a)
low = 0
high = a[n - 1] - a[0] + 1
while high - low > 1:
mid = low + (high - low) / 2
if isFeasible(mid):
low = mid
else
high = mid
print(low)
I think this can be dealt with as a dynamic programming problem. Start off by sorting A, and then the problem is to mark K elements in A such that the minimum difference between adjacent marked items is as large as possible. As a starter, you can always mark the first and last elements.
Moving from left to right, at each position for i=1..N work out the largest minimum difference you can get by marking i elements in the sub-array terminating at this position. You can work out the largest minimum difference for k items terminating at this position by considering the largest minimum difference for k-1 items terminating at each position to the left of the position you are working on. The obvious thing to do is to consider each possible position up to the position you are currently working on as ending a stretch of k-1 items with minimum difference, but you may be able to do a binary search here to speed things up.
Once you have worked all the way to the right hand end you know the maximum possible value for the original problem. If you need to know where to put the K elements, you can take notes as you go along so that you can backtrack to find out the elements chosen that lead to this solution, working from right to left.

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Levenstein distance from particular group of numbers

My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is

How to find number of expected swaps in bubble sort in better than O(n^2) time

I am stuck on problem http://www.codechef.com/JULY12/problems/LEBOBBLE
Here it is required to find number of expected swaps.
I tried an O(n^2) solution but it is timing out.
The code is like:
swaps = 0
for(i = 0;i < n-1;i++)
for(j = i+1;j<n;j++)
{
swaps += expected swap of A[i] and A[j]
}
Since probabilities of elements are varying, so every pair is needed to be compared. So according to me the above code snippet must be most efficient but it is timing out.
Can it be done in O(nlogn) or it any complexity better than O(n^2).
Give me any hint if possible.
Alright, let's think about this.
We realize that every number needs to be eventually swapped with every number after it that's less than it, sooner or later. Thus, the total number of swaps for a given number is the total number of numbers after it which are less than it. However, this is still O(n^2) time.
For every pass of the outer loop of bubble sort, one element gets put in the correct position. Without loss of generality, we'll say that for every pass, the largest element remaining gets sorted to the end of the list.
So, in the first pass of the outer loop, the largest number is put at the end. This takes q swaps, where q is the number of positions the number started away from the final position.
Thus, we can say that it will take q1+q2+ ... +qn swaps to complete this bubble sort. However, keep in mind that with every swap, one number will be taken either one position closer or one position farther away from their final positions. In our specific case, if a number is in front of a larger number, and at or in front of its correct position, one more swap will be required. However, if a number is behind a larger number and behind it's correct position, one less swap will be required.
We can see that this is true with the following example:
5 3 1 2 4
=> 3 5 1 2 4
=> 3 1 5 2 4
=> 3 1 2 5 4
=> 3 1 2 4 5
=> 1 3 2 4 5
=> 1 2 3 4 5 (6 swaps total)
"5" moves 4 spaces. "3" moves 1 space. "1" moves 2 spaces. "2" moves 2 spaces. "4" moves 1 space. Total: 10 spaces.
Note that 3 is behind 5 and in front of its correct position. Thus one more swap will be needed. 1 and 2 are behind 3 and 5 -- four less swaps will be needed. 4 is behind 5 and behind its correct position, thus one less swap will be needed. We can see now that the expected value of 6 matches the actual value.
We can compute Σq by sorting the list first, keeping the original positions of each of the elements in memory while doing the sort. This is possible in O(nlogn + n) time.
We can also see what numbers are behind what other numbers, but this is impossible to do in faster than O(n^2) time. However, we can get a faster solution.
Every swap effectively moves two numbers number needs to their correct positions, but some swaps actually do nothing, because one be eventually swapped with every number gets closerafter it that's less than it, and another gets farthersooner or later. The first swap in our previous exampleThus, between "3" and "5" is the only example of this in our example.
We have to calculate how many total number of said swaps that there are. This is left as an exercise to the reader, but here's one last hint: you only have to loop through the first half of the list. Though this for a given number is still, in the end O(n^2), we only have to do O(n^2) operations on the first half total number of the list, making numbers after it much faster overall.
Use divide and conquer
divide: size of sequence n to two lists of size n/2
conquer: count recursively two lists
combine: this is a trick part (to do it in linear time)
combine use merge-and-count. Suppose the two lists are A, B. They are already sorted. Produce an output list L from A, B while also counting the number of inversions, (a,b) where a is-in A, b is-in B and a > b.
The idea is similar to "merge" in merge-sort. Merge two sorted lists into one output list, but we also count the inversion.
Everytime a_i is appended to the output, no new inversions are encountered, since a_i is smaller than everything left in list B. If b_j is appended to the output, then it is smaller than all the remaining items in A, we increase the number of count of inversions by the number of elements remaining in A.
merge-and-count(A,B)
; A,B two input lists (sorted)
; C output list
; i,j current pointers to each list, start at beginning
; a_i, b_j elements pointed by i, j
; count number of inversion, initially 0
while A,B != empty
append min(a_i,b_j) to C
if b_j < a_i
count += number of element remaining in A
j++
else
i++
; now one list is empty
append the remainder of the list to C
return count, C
With merge-and-count, we can design the count inversion algorithm as follows:
sort-and-count(L)
if L has one element return 0
else
divide L into A, B
(rA, A) = sort-and-count(A)
(rB, B) = sort-and-count(B)
(r, L) = merge-and-count(A,B)
return r = rA+rB+r, L
T(n) = O(n lg n)

Resources