Levenstein distance from particular group of numbers - algorithm

My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?

[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.

A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is

Related

Number of different marks

I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?
Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6
Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.
I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.
Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*

Find the highest minimum difference of a number from a given range

Say I have an array of N numbers {A1, A2, ... , An} and 2 numbers P, Q
I have to find an integer M between P and Q , such that, min {|Ai-M|, 1 ≤ i ≤ N} is maximised.
1 < N < P ≤ Q ≤ 10^6
in simpler words:
for each number, find the minimum absolute difference between this number and the array.
then out of all those minimum differences, find the number who has the highest minimum difference.
I have to do this in O(NlogN) or less.
I have tried the following:
sort the array A (NlogN)
iterate over all the numbers between P and Q and for each number find the minimum difference using modified binary search and keep track who has the highest difference - O((Q-P)logN)
I'm guessing there is some kind of math "trick" like using average I'm missing..
edit (add example):
for example if you have the array
{5 8 14}
and P=4 Q=9
the answer is 4,6,7, or 9.
lets look at the numbers 4-9
|4-5| = 1
|4-8| = 4
|4-14| = 10
so minimum diff for 4 is 1
|5-5| = 0
|5-8| = 3
|5-14| = 9
so minimum diff for 4 is 0
we keep going and find minimum diff for all the numbers and then we need to say which number (4/5/6/7/8/9) had the highest minimum diff (in this example 4,6,7 and 9 have 1 minimum difference which is max among all minimum differences)
First you have to sort an array. Then you have to notice that your solution is either P or Q or some point x[i] = (A[i] + A[i+1]) // 2. Basically x[i] is in the middle between consecutive elements in the array (if this x[i] is between P, Q).
Because N is really small, this will run basically in O(1).

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Count of numbers between A and B (inclusive) that have sum of digits equal to S

The problems is to find the count of numbers between A and B (inclusive) that have sum of digits equal to S.
Also print the smallest such number between A and B (inclusive).
Input:
Single line consisting of A,B,S.
Output:
Two lines.
In first line the number of integers between A and B having sum of digits equal to S.
In second line the smallest such number between A and B.
Constraints:
1 <= A <= B < 10^15
1 <= S <= 135
Source: Hacker Earth
My solution works for only 30 pc of their inputs. What could be the best possible solution to this?
The algorithm I am using now computes the sum of the smallest digit and then upon every change of the tens digit computes the sum again.
Below is the solution in Python:
def sum(n):
if (n<10):return n
return n%10 + sum(n/10)
stri = raw_input()
min = 99999
stri = stri.split(" ")
a= long (stri[0])
b= long (stri[1])
s= long (stri[2])
count= 0
su = sum(a)
while a<=b :
if (a % 10 == 0 ):
su = sum(a)
print a
if ( s == su):
count+=1
if (a<= min):
min=a
a+=1
su+=1
print count
print min
There are two separate problems here: finding the smallest number between those numbers that has the right digit sum and finding the number of values in the range with that digit sum. I'll talk about those problems separately.
Counting values between A and B with digit sum S.
The general approach for solving this problem will be the following:
Compute the number of values less than or equal to A - 1 with digit sum S.
Compute the number of values less than or equal to B with digit sum S.
Subtract the first number from the second.
To do this, we should be able to use a dynamic programming approach. We're going to try to answer queries of the following form:
How many D-digit numbers are there, whose first digit is k, whose digits that sum up to S?
We'll create a table N[D, k, S] to hold these values. We know that D is going to be at most 16 and that S is going to be at most 136, so this table will have only 10 × 16 × 136 = 21,760 entries, which isn't too bad. To fill it in, we can use the following base cases:
N[1, S, S] = 1 for 0 ≤ S ≤ 9, since there's only one one-digit number that sums up to any value less than ten.
N[1, k, S] = 0 for 0 ≤ S ≤ 9 if k ≠ S, since no one-digit number whose first digit isn't a particular sum sums up to some value.
N[1, k, S] = 0 for 10 ≤ S ≤ 135, since no one-digit number sums up to exactly S for any k greater than a single digit.
N[1, k, S] = 0 for any S < 0.
Then, we can use the following logic to fill in the other table entries:
N[D + 1, k, S] = sum(i from 0 to 9) N[D, i, S - k].
This says that the number of (D+1)-digit numbers whose first digit is k that sum up to S is given by the number of D-digit numbers that sum up to S - k. The number of D-digit numbers that sum up to S - k is given by the number of D-digit numbers that sum up to S - k whose first digit is 0, 1, 2, ..., 9, so we have to sum up over them.
Filling in this DP table takes time only O(1), and in fact you could conceivably precompute it and hardcode it into the program if you were really concerned about time.
So how can we use this table? Well, suppose we want to know how many numbers that sum up to S are less than or equal to some number X. To do this, we can process the digits of X one at a time. Let's write X one digit at a time as d1 ... dn. We can start off by looking at N[n, d1, S]. This gives us the number of n-digit numbers whose first digit is d1 that sum up to S. This may overestimate the number of values less than or equal to X that sum up to S. For example, if our number is 21,111 and we want the number of values that sum up to exactly 12, then looking up this table value will give us false positives for numbers like 29,100 that start with a 2 and are five digits long, but which are still greater than X. To handle this, we can move to the next digit of the number X. Since the first digit was a 2, the rest of the digits in the number must sum up to 10. Moreover, since the next digit of X (21,111) is a 1, we can now subtract from our total the number of 4-digit numbers starting with 2, 3, 4, 5, ..., 9 that add up to 10. We can then repeat this process one digit at a time.
More generally, our algorithm will be as follows. Let X be our number and S the target sum. Write X = d1d2...dn and compute the following:
# Begin by starting with all numbers whose first digit is less than d[1].
# Those count as well.
result = 0
for i from 0 to d[1]:
result += N[n, i, S]
# Now, exclude everything whose first digit is d[1] that is too large.
S -= d[1]
for i = 2 to n:
for j = d[i] to 8:
result -= N[n, d[i], S]
S -= d[i]
The value of result will then be the number of values less than or equal to X that sum up to exactly S. This algorithm will only run for at most 16 iterations, so it should be very quick. Moreover, using this algorithm and the earlier subtraction trick, we can use it to compute how many values between A and B sum up to exactly S.
Finding the smallest value in [A, B] with digit sum S.
We can use a similar trick with our DP table to find the smallest number greater than A number that sums up to exactly S. I'll leave the details as an exercise, but as a hint, work one digit at a time, trying to find the smallest number for which the DP table returns a nonzero value.
Hope this helps!

Number of Positive Solutions to a1 x1+a2 x2+......+an xn=k (k<=10^18)

The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.

Resources