Floor summing algorithm

Floor summing algorithm - algorithm

Is there a fast algorithm that could compute
? p, q, k, l, A and B are integers. By "fast" I mean that it should be much faster than a simple O(B-A) loop.
Related:
If we set k = 0, there is an O(log(p) + log(q)) algorithm that solves the problem.

Here is a solution in time O(q). (I assume p and q are relatively prime otherwise you should just reduce the fraction first).
For each integer r in the range [A, A+q) you can determine the value of
floor [p*(A+r)/q + k/l]
Let it be n(r). Then you can re-express your sum as
sum_{r=0}^{q} sum_{y=0}^{y_max(r)} (n(r)+p*y),
where
y_max(r) = floor[ (B-A)/q ]
Now each inner sum can be computed in O(1) as it is fully explicit, and you get O(q) in total.

Related

How to solve the equation sum{max(a_i, x)}=y with variable x? Is there any algorithm with O(n) time complexity?

I am trying to find an algorithm to solve the following equation:
∑ max(ai, x) = y
in which the ai are constants and x is the variable.
I can find an algorithm with O(n log n) time complexity as follows:
First of all, sort the ai in O(n log n) time, and arrange intervals
(−∞, a0), (a0, a1), …, (ai, ai+1), …, (an−1, an), (an, ∞)
Then, for each interval, assume x belongs to this interval, and solve the equation. We could get a x̂, and then test whether x̂ belongs to this interval or not. If x̂ belongs to the corresponding interval, we will assign x̂ to x, and return x. On the other hand, we will try the next interval until we get the solution.
The above method is an O(n log n) algorithm due to the sort. With the definition of the equation-solving problem, I expect an algorithm with O(n) time complexity. Is there any reference for this problem?

First of all, this only has a solution if the sum of all a_i is smaller than y. You should check this first, because the algorithm below depends on this property.
Assume that we have chosen some pivot p from all a_i and want to calculate the x that corresponds to the interval [p, q), where q is the next larger a_i. This is:
If you move p to the next larger a_i, x changes as follows:
, where p' is the new pivot and n is the old number of a_i that are smaller or equal to p. Under the assumption that the sum of all a_i is smaller than y, this clearly leads to a decrease of x. Similarly, if we choose a smaller p, x is increased.
Coming back to the first equation, we can observe the following: If x is smaller than p, we should choose a smaller p. If x is greater than the smallest of the greater a_is, we should choose a larger p. In every other case, we have found the right x.
This can be utilized in a quick select procedure. #MvG's comment brought me onto this track. All credits for the quick select idea go to him. Here is some pseudo code (modified version from Wikipedia):
findX(list, y)
left := 0
right := length(list) - 1
sumGreater := 0 // the sum of all a_i greater than the current interval
numSmaller := 0 // the number of all a_i smaller than the current interval
minGreater := inf //the minimum of all a_i greater than the current interval
loop
if left = right
return (y - sumGreater) / (numSmaller + 1)
pivotIndex := medianOfMedians(list, left, right)
//the partition function will also sum the elements larger than the pivot,
//count the elements smaller than the pivot, and find the minimum of the
//larger elements
(pivotIndex, partialSumGreater, partialNumSmaller, partialMinGreater)
:= partition(list, left, right, pivotIndex)
x := (y - sumGreater - partialSumGreater) / (numSmaller + partialNumSmaller + 1)
if(x >= list[pivotIndex] && x < min(partialMinGreater, minGreater))
return x
else if x < list[pivotIndex]
right := pivotIndex - 1
minGreater := list[pivotIndex]
sumGreater += partialSumGreater + list[pivotIndex]
else
left := pivotIndex + 1
numSmaller += partialNumSmaller + 1
The key idea is that the partitioning function gathers some additional statistics. This does not change the time complexity of the partitioning function because it requires O(n) additional operations, leaving a total time complexity of O(n) for the partitioning function. The medianOfMedians function is also linear in time. The remaining operations in the loop are constant time. Assuming that the median of medians yields good pivots, the total time of the entire algorithm is approximately O(n + n/2 + n/4 + n/8 ...) = O(n).

Since comments might get deleted, I'm turning my own comments into a coherent answer. Contrary to the original question, I'm using indices 1 through n, avoiding the a0 originally used. So this is consistent one-based indexing using inclusive indices.
Assume for the moment that bi are the coefficients from your input, but in sorted order, so bi ≤ bi+1. As you essentially already wrote, if bi ≤ x ≤ bi+1 then the result is i ⋅ x + bi+1 + ⋯ + bn since the first i terms will use the x and the other terms will use the bj. Solving for x you get x = (y − bi+1 − ⋯ - bn) / i and putting that back into your inequality you have i ⋅ bi ≤ y − bi+1 − ⋯ − bn ≤ i ⋅ bi+1. Concentrating on one of the inequalities, you want the largest i such that
i ⋅ bi ≤ y − bi+1 − ⋯ − bn       (subsequently called “the inequality”)
But in order to make this work on unsorted ai, you'd need something similar to the median of medians. That is an algorithm which achieves O(n) guaranteed worst-case behavior for the problem of selecting a median, where the typical quickselect would take O(n²) in the worst case although it usually does quite well in practice.
Actually your problem is not that different from quickselect. You can pick a pivot coefficient, and split the remainder into larger and smaller values. Then you evaluate the inequality for the pivot element. If it is satisfied, you recurse into the list of larger elements, otherwise you recurse into the list of smaller elements, until at some point you have two adjacent elements, one which satisfies the inequality and one which does not.
This is O(n²) in the worst case, since you might need O(n) recursive calls, each of them taking O(n) time to process its input. Just like the O(n²) quickselect itself is suboptimal. The median-of-medians shows that that problem can indeed be solved in O(n). So we either need to find a similar solution here, or reformulate this problem here in terms of finding the median, or write some algorithm wich makes use of the median in a reasonable way.
Actually Nico Schertler found a way to achieve that last option: Take the algorithm I outlined above, but choose the pivot element to be the median. That way you can guarantee that each recursive call will process at most half as much input as the previous call. Since the median of medians itself is O(n) this can be done without exceeding the O(n) bound for each recursive call.
So in pseudocode it's like this (using inclusive indices throughout):
# f: Process whole problem with coefficients a_1 through a_n
f(y, a, n) := begin
if y < (sum of a_i for i from 1 through n): # O(n)
throw Error "Cannot satisfy equation" # Or omit check and risk division by zero
return g(a, 1, n, y) # O(n)
end
# g: Recursively process part of the problem, namely a_l through a_r
# Precondition: we know inequality holds for i = l - 1 and fails for i = r + 1
# a: the array as provided to f; will get modified in place
# l: left index (inclusive)
# r: right index (inclusive)
# y: (original y) - (sum of a_j for j from r + 1 through n)
g(a, l, r, y) := begin # process a_l through a_r O(r-l)
if r < l: # inequality holds in r but fails in l O(1)
return y / r # compute x for the case of i = r O(1)
m = median(a, l, r) # computed using median of medians O(r-l)
i = floor((l + r) / 2) # index of median, with same tie breaks O(1)
partition(a, l, r, m) # so a_l…a_(i-1) ≤ a_i=m ≤ a_(i+1)…a_r O(r-l)
rhs = y - (sum of a_j for j from i + 1 to r) # O((r-l)/2)
if i * a_i ≤ rhs: # condition holds, check larger i
return g(a, i + 1, r, y) # recurse in right half of list O((r-l)/2)
else: # condition fails, check smaller i
return g(a, l, i - 1, rhs - m) # recurse in left half of list O((r-l)/2)
end

What is the numerical complexity of computing the empirical cdf of an array?

It's all in the title. Suppose $X$ is an array of n floats. The empirical CDF is the function (of t):
Fn(t) = (1/n) sum{1{Xi <= t} : i=1,...,n}
This has to be computed for t_1<t_2,...,t_m (e.g. for m different, sorted, values of t). My question is what is the numerical complexity of computing this? I think O(nlog(n))+O(mlog(n)) [sort the array then perform m binary search, one for each value of t]
but I may be naive. Can anyone confirm?
Edit:
Sorry for the mess. While writing the question, I realized that I was imposing some constraints that are not in the original problem. I respond to Yves's question below.
The Xi are not sorted.
The t_j are sorted and equi-spaced.
m is smaller than n, but not by orders of magnitudes: typically m~n/4.

The given expression, a sum of N 0/1 terms, is clearly O(N).
UPDATE:
If the Xi are presorted, the function is trivially CDFi = CDF(Xi) = i/N, and the computation is in a way O(0)!
If the Xi are unsorted, you'll need to sort first in O(N.Log(N)), unless the range of the variable allows a faster sorting such as Counting sort.
If you only need to evaluate for a small number of Xis, let K, then you can consider using the naïve summation, as K.N can beat N.Log(N).
UPDATE: (second change by the OP)
Else, sort the Xi if necessary and sort the tj if necessary. Then a single linear pass will suffice. Total complexity will be one of:
O(n.Log(n) + m.Log(m))
O(n.Log(n) + m)
O(n + m.Log(m))
O(n + m).
If m < Log(n) and the Xi are unsorted, use the naïve formula. Complexity O(m.n).
Possibly there could be better options when m>n.
UPDATE: final specs: Xi unsorted, Tj sorted, m < n.
The solution I would choose is as follows:
1) Sort the Xi.
2) "Merge" the sorted Xi and Tj. This means, progress simultaneously in the X and T lists, keeping two running indexes; make sure to always increment the index that causes the shortest move; use CDF(Tj)=i/n. This is a linear process. (Very close to a merge in mergesort.)
Global complexity is O(n.Log(n)), the merging term O(n) being absorbed in the former.
UPDATE: uniform sampling.
When the Tj values are equi-spaced, let Tj = T0 + D.j, you can use an histogram approach.
Allocate an array of m+1 counters, initially 0. For every Xi, compute a bin index as Floor((Xi - T0) / D). Clamp negative values to 0 and values larger than m to m. Increment that bin. In the end, every bin will tell you how many X values are in range [Tj, Tj+1[.
Compute the prefix sum of the counters. They will now tell you how many X values are smaller than Xj+1, and CDF(j)=Counter[j]/n.
[Caution, this is an unchecked sketch, can be wrong in details.]
Total computation will take n bin incrementations followed by a prefix sum on m elements, i.e. O(n) operations.
# Input data
X= [0.125, 6, 3.25, 9, 1.4375, 6, 3.125, 7]
n= len(X)
# Sampling points (1 to 6)
T0= 1
DT= 1
m= 6
# Initialize the counters: O(m)
C= [0] * m
# Accumulate the histogram: O(n)
for x in X:
i= max(0, int((x - T0) / DT))
if i < m:
C[i]+= 1
# Compute the prefix sum: O(m)
S= 0
for i in range(m - 1):
C[i + 1]+= C[i]
# Reduce: O(m)
for i in range(m):
C[i]/= float(n)
# Display
print "T=", C
T= [0.25, 0.25, 0.5, 0.5, 0.5, 0.75]

A CDF Fn(t) is always a non-decreasing function in [0..1]. Therefore I assume your notation is saying to count the number of elements Xi <= t and return that count divided by n.
Thus if t is very large, you have n/n = 1. For very small, it's 0/n = 0 as we'd expect.
This is a poor definition of an empiracle CDF. See for example see Law, Averill M., Simulation & Modeling, 4th ed., p 301 for some more advanced ideas.
The simplest efficient way to compute your function (given that m, the number of Fn(t) values you need, is unknown) is first to sort the inputs Xi. This requires O(n log n) time, but needs to be done only once no matter how many t values you're processing.
Let's call the sorted values Yi. To find the count of Yi values <= t is the same as finding i such that Yi <= t < Yi+i. This can be done by binary search in O(log n) time for a given value of t. Divide by n and you have the Fn(t) value required. Of course you can repeat this m times to get the job done in O(m log n) time.
However you say your special case is m presorted values of t_j. You can find all the i values with a single pass over the Yi and simultaneously over the t_j, in the fashion of the merge operation in mergesort. With this you find all the answers in O(m + n) time.
Putting this together with the sorting cost, you have O(m + n + n log n) = O(m + n log n).
Note this is always faster than using the binary search lookup m times, O(n log n + m log n) = O((m + n) log n).
The only case you'd want to skip the presorting is when m < O(log n). This is because with no presorting, processing all the t_j needs O(mn) time - you must touch all n elements to count the number <= t_j. Consequently, if m < O(log n), then skipping the presort leads to less than O(n log n), i.e. asymptotically faster than the presort method.

Finding the modified factorial N modulo some number coprime to N

Given two numbers N and p, let k be the maximum power of p such that p^k divides N! and let d = N!/(p^k). So d and p are coprimes.
How do I find d mod p? Direct iterations will be impractical as N! will be very high when N is high. A more efficient algorithm is required to find the expression.

Here is O(N) algorithm:
int d=1;
for(int i=1;i<=N;++i)
{
d*=i;
while(d%p==0)
d/=p;
d=d%p;
}
It doesn't require storing huge numbers, so may be acceptable. I suspect that O(p) algorithm is possible (because numbers will repeat after every k*p), but the code will be a bit more complex.

Maximum Subrectangle for very special matrices

While working on an image processing task I have come across the following problem: There are n points in the unit square with coordinates $x_i$ and $y_i$, each assigned with a positive or negative weight $w_i$. Find a rectangle such that the sum of all weights of those points lying within the rectangle is positive and maximal.
By defining a proper grid, the problem can be rephrased as finding a submatrix in an n-by-n matrix A whose sum of elements is maximal. This is also known as the "maximal subrectangle problem" and has been discussed on SO before. While a brute force approach has a run-time of O(n^5), there is a kind of tricky solution with a run-time of O(n^3). It utilizes a solution for the corresponding one-dimensional problem, called "maximal subarray problem", with an O(n) run-time.
I have implemented both algorithms in R and can solve 100s of points in a few seconds. But with thousands of points it will be much too slow, probably even when outsourcing the loops to some Fortran or C code.
Now look at the matrix A. When assuming (w/o loss of generality) that all points have different x- or y-coordinates, A has a special form: In each row and column of A there is exactly one non-zero element. For matrices with this special property I assume there should be an algorithm performing the task in O(n^2) time, or even better.
Here is an example with the optimal rectangle added:
set.seed(723)
N <- 50; w <- rnorm(N)
x <- runif(N); y <- runif(N)
clr <- ifelse (w >= 0, "blue", "red")
plot(x, y, pch = 20, col = clr, xlim = c(0, 1), ylim = c(0, 1))
rect(0.075, 0.45, 0.31, 0.95, border="gray")
You see that there can be red, ie. negative, points in the optimal rectangle. It also shows that it will not suffice to solve the one-dimensional cases for the x- and y-coordinates.
I will translate the standard solution into Fortran, but I would surely like to have a more efficient algorithm at hand.

These guys (found from the wiki page) claim to have a simpler sub-cubic solution for the 2-dimensional case. It may be the one you're already aware of.

See the accepted answer for "Maximum sum subrectangle in a sparse matrix". For an nxn matrix with m non-zero elements, the solution there takes O(nm log n) time. So, for you, since you have exactly n non-zero elements, this would give O(n^2 log n) time. Probably you'll be able to handle cases with n being 50 times larger or more, vs. the standard O(n^3) solution.

The best I can do is O(n^2 log n).
If we look at the n+1 choose 2 calls made by Kadane's 2D algorithm to Kadane's 1D algorithm on an input of your type, all but O(n) successive pairs are on 1D arrays that differ only in one element. I'm going to present a divide-and-conquer variant of Kadane's 1D; by caching the outcomes of each recursive call, only the O(log n) that involve the changed array element have to be recomputed, reducing the (amortized) running time of the inner loop from Theta(n) to Theta(log n).
def maxsubarray(arr, a, b):
# this function returns a 4-tuple
# element 0 is the max over intervals of the form [i, j)
# element 1 is the max over intervals of the form [i, b)
# element 2 is the max over intervals of the form [a, j)
# element 3 is the max over intervals of the form [a, b), i.e., sum(arr[a:b])
n = b - a
if n == 0:
return (0, 0, 0, 0)
elif n == 1:
x = arr[a]
y = max(x, 0)
return (y, y, y, x)
else:
m = a + n // 2
l = maxsubarray(arr, a, m)
r = maxsubarray(arr, m, b)
return (max(l[0], r[0], l[1] + r[2]),
max(r[1], l[1] + r[3]),
max(l[2], l[3] + r[2]),
l[3] + r[3])

Calculating sum of geometric series (mod m)

I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.

This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.

As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.

This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
a[j+k]==b^{j}a[k]+a[j]
a[2n]==(b^n+1)a[n]
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
T=1
e=b%m
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
(a,r)#(b,s)=(ab,as+r)
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]

Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
If[EvenQ[exp],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
exp--];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
];
Return [Mod[sum + factor, m]]
]
Parameters:
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Floor summing algorithm - algorithm

Is there a fast algorithm that could compute ? p, q, k, l, A and B are integers. By "fast" I mean that it should be much faster than a simple O(B-A) loop. Related: If we set k = 0, there is an O(log(p) + log(q)) algorithm that solves the problem.

Related

How to solve the equation sum{max(a_i, x)}=y with variable x? Is there any algorithm with O(n) time complexity?

What is the numerical complexity of computing the empirical cdf of an array?

Finding the modified factorial N modulo some number coprime to N

Maximum Subrectangle for very special matrices

Calculating sum of geometric series (mod m)

Categories

Resources