Find the highest distance in vector - algorithm

I have an array of positive integers. The problem is to find the highest
distance in vector. The Distance is calculated as A[p] + A[q] + (q - p), where A is a vector p, q are indexes and p<=q. The complexity of the solution must be O(n). I'm able to solve this problem with a O(n^2) solution, but I can't find a O(n) algorithm for this problem.
Someone could help me? Thanks in advance. Which language is used to find the solution doesn't matter

Rearrange the objective as (A[p] - p) + (A[q] + q). The first term is a function only of p, and the second term is a function only of q. Thus they can be optimized separately subject to p ≤ q. As we increase q from 0 to n-1, the best choice of p can be computed from the previous best and A[q] - q.
def highest_distance(A):
highest = float('-inf')
max_Ap_minus_p = float('-inf')
for q in range(len(A)):
max_Ap_minus_p = max(max_Ap_minus_p, A[q] - q)
highest = max(highest, max_Ap_minus_p + (A[q] + q))
return highest

Related

Updating query a[i] = max(a[i], x) and Minimum query for intervals

I want to know the fast algorithm to solve this simple problem:
You are given a sequence a[0], a[1],..., a[N - 1].
You are given Q queries. All queries are query 1 or 2. Process all queries.
You are given integers l, r, x. Update a[i] = max(a[i], x) for l <= i <= r.
You are given integers l, r. Find a[l] + a[l + 1] + ... + a[r].
I only have a naive O(NQ) algorithm, so please find more faster algorithm because I want to solve for N <= 200000, Q <= 200000.
This problem can be easily solved by interval tree(or segment tree) with lazy tag.If you are not familiar with interval tree,you may refer to this article.The time complexity is O(Q*logN).

How to solve the equation sum{max(a_i, x)}=y with variable x? Is there any algorithm with O(n) time complexity?

I am trying to find an algorithm to solve the following equation:
∑ max(ai, x) = y
in which the ai are constants and x is the variable.
I can find an algorithm with O(n log n) time complexity as follows:
First of all, sort the ai in O(n log n) time, and arrange intervals
(−∞, a0), (a0, a1), …, (ai, ai+1), …, (an−1, an), (an, ∞)
Then, for each interval, assume x belongs to this interval, and solve the equation. We could get a x̂, and then test whether x̂ belongs to this interval or not. If x̂ belongs to the corresponding interval, we will assign x̂ to x, and return x. On the other hand, we will try the next interval until we get the solution.
The above method is an O(n log n) algorithm due to the sort. With the definition of the equation-solving problem, I expect an algorithm with O(n) time complexity. Is there any reference for this problem?
First of all, this only has a solution if the sum of all a_i is smaller than y. You should check this first, because the algorithm below depends on this property.
Assume that we have chosen some pivot p from all a_i and want to calculate the x that corresponds to the interval [p, q), where q is the next larger a_i. This is:
If you move p to the next larger a_i, x changes as follows:
, where p' is the new pivot and n is the old number of a_i that are smaller or equal to p. Under the assumption that the sum of all a_i is smaller than y, this clearly leads to a decrease of x. Similarly, if we choose a smaller p, x is increased.
Coming back to the first equation, we can observe the following: If x is smaller than p, we should choose a smaller p. If x is greater than the smallest of the greater a_is, we should choose a larger p. In every other case, we have found the right x.
This can be utilized in a quick select procedure. #MvG's comment brought me onto this track. All credits for the quick select idea go to him. Here is some pseudo code (modified version from Wikipedia):
findX(list, y)
left := 0
right := length(list) - 1
sumGreater := 0 // the sum of all a_i greater than the current interval
numSmaller := 0 // the number of all a_i smaller than the current interval
minGreater := inf //the minimum of all a_i greater than the current interval
loop
if left = right
return (y - sumGreater) / (numSmaller + 1)
pivotIndex := medianOfMedians(list, left, right)
//the partition function will also sum the elements larger than the pivot,
//count the elements smaller than the pivot, and find the minimum of the
//larger elements
(pivotIndex, partialSumGreater, partialNumSmaller, partialMinGreater)
:= partition(list, left, right, pivotIndex)
x := (y - sumGreater - partialSumGreater) / (numSmaller + partialNumSmaller + 1)
if(x >= list[pivotIndex] && x < min(partialMinGreater, minGreater))
return x
else if x < list[pivotIndex]
right := pivotIndex - 1
minGreater := list[pivotIndex]
sumGreater += partialSumGreater + list[pivotIndex]
else
left := pivotIndex + 1
numSmaller += partialNumSmaller + 1
The key idea is that the partitioning function gathers some additional statistics. This does not change the time complexity of the partitioning function because it requires O(n) additional operations, leaving a total time complexity of O(n) for the partitioning function. The medianOfMedians function is also linear in time. The remaining operations in the loop are constant time. Assuming that the median of medians yields good pivots, the total time of the entire algorithm is approximately O(n + n/2 + n/4 + n/8 ...) = O(n).
Since comments might get deleted, I'm turning my own comments into a coherent answer. Contrary to the original question, I'm using indices 1 through n, avoiding the a0 originally used. So this is consistent one-based indexing using inclusive indices.
Assume for the moment that bi are the coefficients from your input, but in sorted order, so bi ≤ bi+1. As you essentially already wrote, if bi ≤ x ≤ bi+1 then the result is i ⋅ x + bi+1 + ⋯ + bn since the first i terms will use the x and the other terms will use the bj. Solving for x you get x = (y − bi+1 − ⋯ - bn) / i and putting that back into your inequality you have i ⋅ bi ≤ y − bi+1 − ⋯ − bn ≤ i ⋅ bi+1. Concentrating on one of the inequalities, you want the largest i such that
i ⋅ bi ≤ y − bi+1 − ⋯ − bn       (subsequently called “the inequality”)
But in order to make this work on unsorted ai, you'd need something similar to the median of medians. That is an algorithm which achieves O(n) guaranteed worst-case behavior for the problem of selecting a median, where the typical quickselect would take O(n²) in the worst case although it usually does quite well in practice.
Actually your problem is not that different from quickselect. You can pick a pivot coefficient, and split the remainder into larger and smaller values. Then you evaluate the inequality for the pivot element. If it is satisfied, you recurse into the list of larger elements, otherwise you recurse into the list of smaller elements, until at some point you have two adjacent elements, one which satisfies the inequality and one which does not.
This is O(n²) in the worst case, since you might need O(n) recursive calls, each of them taking O(n) time to process its input. Just like the O(n²) quickselect itself is suboptimal. The median-of-medians shows that that problem can indeed be solved in O(n). So we either need to find a similar solution here, or reformulate this problem here in terms of finding the median, or write some algorithm wich makes use of the median in a reasonable way.
Actually Nico Schertler found a way to achieve that last option: Take the algorithm I outlined above, but choose the pivot element to be the median. That way you can guarantee that each recursive call will process at most half as much input as the previous call. Since the median of medians itself is O(n) this can be done without exceeding the O(n) bound for each recursive call.
So in pseudocode it's like this (using inclusive indices throughout):
# f: Process whole problem with coefficients a_1 through a_n
f(y, a, n) := begin
if y < (sum of a_i for i from 1 through n): # O(n)
throw Error "Cannot satisfy equation" # Or omit check and risk division by zero
return g(a, 1, n, y) # O(n)
end
# g: Recursively process part of the problem, namely a_l through a_r
# Precondition: we know inequality holds for i = l - 1 and fails for i = r + 1
# a: the array as provided to f; will get modified in place
# l: left index (inclusive)
# r: right index (inclusive)
# y: (original y) - (sum of a_j for j from r + 1 through n)
g(a, l, r, y) := begin # process a_l through a_r O(r-l)
if r < l: # inequality holds in r but fails in l O(1)
return y / r # compute x for the case of i = r O(1)
m = median(a, l, r) # computed using median of medians O(r-l)
i = floor((l + r) / 2) # index of median, with same tie breaks O(1)
partition(a, l, r, m) # so a_l…a_(i-1) ≤ a_i=m ≤ a_(i+1)…a_r O(r-l)
rhs = y - (sum of a_j for j from i + 1 to r) # O((r-l)/2)
if i * a_i ≤ rhs: # condition holds, check larger i
return g(a, i + 1, r, y) # recurse in right half of list O((r-l)/2)
else: # condition fails, check smaller i
return g(a, l, i - 1, rhs - m) # recurse in left half of list O((r-l)/2)
end

Floor summing algorithm

Is there a fast algorithm that could compute
? p, q, k, l, A and B are integers. By "fast" I mean that it should be much faster than a simple O(B-A) loop.
Related:
If we set k = 0, there is an O(log(p) + log(q)) algorithm that solves the problem.
Here is a solution in time O(q). (I assume p and q are relatively prime otherwise you should just reduce the fraction first).
For each integer r in the range [A, A+q) you can determine the value of
floor [p*(A+r)/q + k/l]
Let it be n(r). Then you can re-express your sum as
sum_{r=0}^{q} sum_{y=0}^{y_max(r)} (n(r)+p*y),
where
y_max(r) = floor[ (B-A)/q ]
Now each inner sum can be computed in O(1) as it is fully explicit, and you get O(q) in total.

How can I find a faster algorithm for this special case of Longest Common Sub-sequence (LCS)?

I know the LCS problem need time ~ O(mn) where m and n are length of two sequence X and Y respectively. But my problem is a little bit easier so I expect a faster algorithm than ~O(mn).
Here is my problem:
Input:
a positive integer Q, two sequence X=x1,x2,x3.....xn and Y=y1,y2,y3...yn, both of length n.
Output:
True, if the length of the LCS of X and Y is at least n - Q;
False, otherwise.
The well-known algorithm costs O(n^2) here, but actually we can do better than that. Because whenever we eliminate as many as Q elements in either sequence without finding a common element, the result returns False. Someone said there should be an algorithm as good as O(Q*n), but I cannot figure out.
UPDATE:
Already found an answer!
I was told I can just calculate the diagonal block of the table c[i,j], because if |i-j|>Q, means there are already more than Q unmatched elements in both sequences. So we only need to calculate the c[i,j] when |i-j|<=Q.
Here is one possible way to do it:
1. Let's assume that f(prefix_len, deleted_cnt) is the leftmost position in Y such that prefix_len elements of X were already processed and exactly deleted_cnt of them were deleted. Obviously, there are only O(N * Q) states because deleted_cnt cannot exceed Q.
2. The base case is f(0, 0) = 0(nothing was processed, thus nothing was deleted).
3. Transitions:
a) Remove the current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j)).
b) Match the current element with the leftmost possible element from Y that is equal to it and located after f(i, j)(let's assume that it has index pos): f(i + 1, j) = min(f(i + 1, j), pos).
4. So the only question remaining is how to get the leftmost matching element located to the right from a given position. Let's precompute the following pairs: (position in Y, element of X) -> the leftmost occurrence of the element of Y equal to this element of X to the right from this position in Y and put them into a hash table. It looks like O(n^2). But is not. For a fixed position in Y, we never need to go further to the right from it than by Q + 1 positions. Why? If we go further, we skip more than Q elements! So we can use this fact to examine only O(N * Q) pairs and get desired time complexity. When we have this hash table, finding pos during the step 3 is just one hash table lookup. Here is a pseudo code for this step:
map = EmptyHashMap()
for i = 0 ... n - 1:
for j = i + 1 ... min(n - 1, i + q + 1)
map[(i, Y[j])] = min(map[(i, Y[j])], j)
Unfortunately, this solution uses hash tables so it has O(N * Q) time complexity on average, not in the worst case, but it should be feasible.
You can also say cost of the process to make the string equal must not be greater than Q.if it greater than Q than answer must be false.(EDIT DISTANCE PROBLEM)
Suppose of the of string x is m, and the size of string y is n, then we create a two dimensional array d[0..m][0..n], where d[i][j] denotes the edit distance between the i-length prefix of x and j-length prefix of y.
The computation of array d is done using dynamic programming, which uses the following recurrence:
d[i][0] = i , for i <= m
d[0][j] = j , for j <= n
d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.
answer of LCS if m>n, m-dp[m][m-n]

How to minimise integer function that's known to be U-shaped?

Let f be a function defined on the non-negative integers n ≥ 0. Suppose f is known to be U-shaped (convex and eventually increasing). How to find its minimum? That is, m such that f(m) ≤ f(n) for all n.
Examples of U-shaped functions:
n**2 - 1000*n + 100
(1 + 1/2 + ... + 1/n) + 1000/sqrt(1+n)
Of course, a human mathematician can try to minimise these particular functions using calculus. For my computer though, I want a general search algorithm that can minimise any U-shaped function.
Those functions again, in Python, to help anyone who wants to test an algorithm.
f = lambda n: n**2 - 1000*n + 100
g = lambda n: sum(1/i for i in range(1,n+1)) + 1000/sqrt(1+n)
Don't necessarily need code (of any language) in an answer, just a description of an algorithm. Would interest me though to see its answers for these specific functions.
You are probably looking for ternary search .
Ternary search will help to find f(m) as your requirement in O(logN) time , where N is number of points on the curve .
It basically takes two points m1 and m2 in range (l,r) and then recursively searches in 1/3 rd part .
code in python (from wikipedia) :
def ternarySearch(f, left, right, absolutePrecision):
while True:
#left and right are the current bounds; the maximum is between them
if abs(right - left) < absolutePrecision:
return (left + right)/2
leftThird = (2*left + right)/3
rightThird = (left + 2*right)/3
if f(leftThird) < f(rightThird):
right = rightThird
else:
left = leftThird
If your function is known to be unimodal, use Fibonacci search. http://en.wikipedia.org/wiki/Fibonacci_search_technique
For a discrete domain, the way to decide where new "test points" are probed must be slightly adapted as the formulas for the continuous domain don't yield integers. Anyway the working principle remains.
As regards the number of tests required, we have the following hierarchy:
#Fibonacci < #Golden < #Ternary < #Dichotomic
This also works. Use binary search on the derivative to maximise f' <= 0
def minimise_convex(f):
"""Given a U-shaped (convex and eventually increasing) function f, find its minimum over the non-negative integers. That is m such that f(m) <= f(n) for all n. If there exist multiple solutions, return the largest. Uses binary search on the derivative."""
f_prime = lambda n: (f(n) - f(n-1)) if n > 0 else 0
return binary_search(f_prime, 0)
Where binary search is defined
def binary_search(f, t):
"""Given an increasing function f, find the greatest non-negative integer n such that f(n) <= t. If f(n) > t for all n, return None."""

Resources