Sum of last k digits same as sum of first k digits - algorithm

I want to find if sum of first k digits of few numbers in given range is equal to sum of last k digits. Here the range is very large and k is less than 20.
One way we can do this is by brute force method. Can someone suggest some other efficient algo. for same?

If it is a range, the first digits will not change often and the last digits will change in a simple way. S is the sum of the first 20 digits. While the secund digit doesn't change, the sum will be increased by one when you go to the next digit. So if all yours digits, except the last one, are fixed, and if the sum with the last digit equal to i is Si, you the only good last digit is n= S - Si + i. You then have to check if n is between 0 and 9, and if the resulting number is in the interval. This decrease by ten the number of lookups.
You can check for the next secund lower digits.
If the first n is lower than 0, you need to decrease the secund digit by -n. Call n2 this secund digit. If n2 > = 0, the good numbers will end by (n2,0), (n2 -1,1), ..., (0, n2). This decrease the complexity by 100.
If n is bigger than 10, you increase the second digit by n-9. Call n2 the second digit. If n2<=9, the good numbers are (n2,9),(n2-1,8),...,(0,something).
This also decrease the complexity by 100.
You can do the same for the third digit, and then for the fourth, up to the 20. This will result in just 1 sum, and a complexity in O(number of solutions), so it is minimal. For coding, be careful that your firsts numbers can change. Do one computation per group of 20 first numbers.

one theoretical improvement to the brute force method:
1) sum up the frist k digits, store in sumFirst
2) sum up the last k digits, but stop if sum exceeds sumFirst.
Point 2 could save summing up some of the last few digits.
But you have to measure if the additional logic, costs more then simply adding all k digits.

Optimization N-k
One way to improve the algorithm is if when the number having N digits has the following property:N < 2k.
For instance if N = 5 and k = 3, 5 < 2x3, digits being
abcde
you only have to count ab against de (ie no need to check k (3) digits, since the 3rd is shared by k-last and k-first digits).In other words, the number of digits to be counted both sides is only
min(k, N-k), having N >= k

If you are going to use that multiple times for the same array, you can sum all element with previous elements which is O(n) where the size of array is n i.e
for(int i = 1; i < n; i++)
arr[i] = arr[i] + arr[i-1];
This will convert your array from probability density function to cumulative distribution function (for discrete numbers). Therefore your query is going to be O(1) i.e.
if(arr[k-1] == (arr[n-1]-arr[n-k])) //arr[k-1] is sum of first k element
return true;
return false;

another improvement over the brute force:
i = 0, T = 0
while |T| < 9 * (k - i)
T = T + last[i] - first[i]
i = i + 1
return T == 0

Related

How can we count the number of pairs of coprime integers in an array of integers? (CSES) [duplicate]

Having a sequence of n <= 10^6 integers, all not exceeding m <= 3*10^6, I'd like to count how many coprime pairs are in it. Two numbers are coprime if their greatest common divisor is 1.
It can be done trivially in O(n^2 log n), but this is obviously way to slow, as the limit suggests something closer to O(n log n). One thing than can be done quickly is factoring out all the numbers, and also throwing out multiple occurences of the same prime in each, but that doesn't lead to any significant improvement. I also thought of counting the opposite - pairs that have a common divisor. It could be done in groups - firstly counting all the pairs that their smallest common prime divisor is 2, then 3, 5, and etc., but it seems to me like an other dead end.
I've come up with a slightly faster alternative based on your answer. On my work PC my C++ implementation (bottom) takes about 350ms to solve any problem instance; on my old laptop, it takes just over 1s. This algorithm avoids all division and modulo operations, and uses only O(m) space.
As with your algorithm, the basic idea is to apply the Inclusion-Exclusion Principle by enumerating every number 2 <= i <= m that contains no repeated factors exactly once, and for each such i, counting the number of numbers in the input that are divisible by i and either adding or subtracting this from the total. The key difference is that we can do the counting part "stupidly", simply by testing whether each possible multiple of i appears in the input, and this still takes just O(m log m) time.
How many times does the innermost line c += v[j].freq; in countCoprimes() repeat? The body of the outer loop is executed once for each number 2 <= i <= m that contains no repeated prime factors; this iteration count is trivially upper-bounded by m. The inner loop advances i steps at a time through the range [2..m], so the number of operations it performs during a single outer loop iteration is upper-bounded by m / i. Therefore the total number of iterations of the innermost line is upper-bounded by the sum from i=2 to m of m/i. The m factor can be moved outside the sum to get an upper bound of
m * sum{i=2..m}(1/i)
That sum is a partial sum in a harmonic series, and it is upper-bounded by log(m), so the total number of innermost loop iterations is O(m log m).
extendedEratosthenes() is designed to reduce constant factors by avoiding all divisions and keeping to O(m) memory usage. All countCoprimes() actually needs to know for a number 2 <= i <= m is (a) whether it has repeated prime factors, and if it doesn't, (b) whether it has an even or odd number of prime factors. To calculate (b) we can make use of the fact that the Sieve of Eratosthenes effectively "hits" any given i with its distinct prime factors in increasing order, so we can just flip a bit (the parity field in struct entry) to keep track of whether i has an even or odd number of factors. Each number starts with a prod field equal to 1; to record (a) we simply "knock out" any number that contains the square of a prime number as a factor by setting its prod field to 0. This field serves a dual purpose: if v[i].prod == 0, it indicates that i was discovered to have repeated factors; otherwise it contains the product of the (necessarily distinct) factors discovered so far. The (fairly minor) utility of this is that it allows us to stop the main sieve loop at the square root of m, instead of going all the way up to m: by now, for any given i that has no repeated factors, either v[i].prod == i, in which case we have found all the factors for i, or v[i].prod < i, in which case i must have exactly one factor > sqrt(3000000) that we have not yet accounted for. We can find all such remaining "large factors" with a second, non-nested loop.
#include <iostream>
#include <vector>
using namespace std;
struct entry {
int freq; // Frequency that this number occurs in the input list
int parity; // 0 for even number of factors, 1 for odd number
int prod; // Product of distinct prime factors
};
const int m = 3000000; // Maximum input value
int n = 0; // Will be number of input values
vector<entry> v;
void extendedEratosthenes() {
int i;
for (i = 2; i * i <= m; ++i) {
if (v[i].prod == 1) {
for (int j = i, k = i; j <= m; j += i) {
if (--k) {
v[j].parity ^= 1;
v[j].prod *= i;
} else {
// j has a repeated factor of i: knock it out.
v[j].prod = 0;
k = i;
}
}
}
}
// Fix up numbers with a prime factor above their square root.
for (; i <= m; ++i) {
if (v[i].prod && v[i].prod != i) {
v[i].parity ^= 1;
}
}
}
void readInput() {
int i;
while (cin >> i) {
++v[i].freq;
++n;
}
}
void countCoprimes() {
__int64 total = static_cast<__int64>(n) * (n - 1) / 2;
for (int i = 2; i <= m; ++i) {
if (v[i].prod) {
// i must have no repeated factors.
int c = 0;
for (int j = i; j <= m; j += i) {
c += v[j].freq;
}
total -= (v[i].parity * 2 - 1) * static_cast<__int64>(c) * (c - 1) / 2;
}
}
cerr << "Total number of coprime pairs: " << total << "\n";
}
int main(int argc, char **argv) {
cerr << "Initialising array...\n";
entry initialElem = { 0, 0, 1 };
v.assign(m + 1, initialElem);
cerr << "Performing extended Sieve of Eratosthenes...\n";
extendedEratosthenes();
cerr << "Reading input...\n";
readInput();
cerr << "Counting coprimes...\n";
countCoprimes();
return 0;
}
Further exploiting the ideas I mentioned in my question, I actually managed to come up with a solution myself. As some of you may be interested in it, I will describe it briefly. It does work in O(m log m + n), I've already implemented it in C++ and tested - solves the biggest cases (10^6 integers) in less than 5 seconds.
We have n integers, all not greater than m. We start by doing Eratosthenes Sieve mapping each integer up to m to it's smalles prime factor, allowing us to factor out any number not greater than m in O(log m) time. Then for all given numbers A[i], as long as there is some prime p than divides A[i] in a power greater than one, we divide A[i] by it, because when asking if two numbers are coprime we can omit the exponents. That leaves us with all A[i] being products of distinct primes.
Now, let us assume that we were able to construct in a reasonable time a table T, such that T[i] is number of entries A[j] such that i divides A[j]. This is somehow similar to the approach #Brainless took in his second answer. Constructing table T quickly was the technic I spoke about in the comments below my question.
From now, we will work by Inclusion-Exclusion Principle. Having T, for each i we calculate P[i] - the amount of pairs (j,k) such that A[j] and A[k] are both divisible by i. Then to compute the answer, sum all P[i], taking minus sign before those P[i] for which i has an even number of prime divisors. Note that all prime divisors of i are distinct, because for all other indices i P[i] equals 0. By Inclusion-Exclusion each pair will be counted only once. To see this differently, take a pair A[i] and A[j], assuming that they share exactly k common prime divisors. Then this pair will be counted k times, then discounted kC2 times, counted kC3 times, discounted kC4 times... for nCk see the Newton's Symbol. Some mathematical manipulation makes us see that the considered pair will be counted 1 - (1-1)^k = 1 times, what concludes the proof.
Steps made so far required O(m log log m) for the Sieve and O(m) for computing the result. The last thing to do is to construct array T. We could for every A[i] just increment T[j] for all j dividing i. As A[i] can have at most O(sqrt(A[i])) divisors (and in practice even less than that) then we could construct T in O(n sqrt m). But we can do better than that!
Take two-dimensional array W. At each moment a following invariant holds - if for each non-zero W[i][j] we would increment the counter in table T by W[i][j] for all numbers that divide i, and also share the exact exponents i has in j smallest primes divisors of i, then T would be constructed properly. As this may seem a little confusing, let's see it in action. At start, to make the invariant true, for each A[i] we just increment W[A[i]][0]. Also note that a number not exceeding m can have at most O(log m) prime divisors, so the overall size of W is O(m log m). Now we see that an information stored in W[i][j] can be "pushed forward" in a following way: consider p to be (j+1)-th prime divisor of i, assuming it has one. Then some divisor of i can either have p with an exponent same as in i, or lower. First of these cases is W[i][j+1] - we add another prime that has to be "fully taken" by a divisor. Second case is W[i/p][j] as a divisor of i that doesn't have p with a highest exponent must also divide i/p. And that's it! We consider all i in descending order, then j in ascending order. We "push forward" information from W[i][j]. See that if i has exactly j prime divisors, then the information from it cannot be pushed, but we don't really need that! If i has j prime divisors, then W[i][j] basically says: increment by W[i][j] only index i in array T. So when all the information has been pushed to "last rows" in each W[i] we pass through those rows and finish constructing T. As each cell of W[i][j] has been visited once, this algorithm takes O(m log m) time, and also O(n) at the begining. That concludes the construction. Here's some C++ code from the actual implementation:
FORD(i,SIZE(W)-1,2) //i in descending order
{
int v = i, p;
FOR(j,0,SIZE(W[i])-2) //exclude last row
{
p = S[v]; //j-th divisor; S[v] - smallest prime divisor of v
while (v%p == 0) v /= p;
W[i][j+1] += W[i][j];
W[i/p][j] += W[i][j];
}
T[i] = W[i].back();
}
At the end I'd say that I think array T can be constructed faster and simpler than what I've shown. If anyone has some neat idea about how it could be done, I would appreciate all feedback.
Here's an idea based on the formula for the complete sequence 1..n, found on http://oeis.org/A018805:
a(n) = 2*( Sum phi(j), j=1..n ) - 1, where phi is Euler's totient function
Iterate over the sequence, S. For each term, S_i:
for each of the prime factors, p, of S_i:
if a hash for p does not exist:
create a hash with index p that points to a set of all indexes of S except i,
and a counter set to 1, representing how many terms of S are divisible by p so far
else:
delete i in the existing set of indexes and increment the counter
Sort the hashes for S_i's prime factors by their counters in descending order. Starting with
the largest counter (which means the smallest set), make a list of indexes up to i that are also
members of the next smallest set, until the sets are exhausted. Add the remaining number of
indexes in the list to the cumulative total.
Example:
sum phi' [4,7,10,15,21]
S_0: 4
prime-hash [2:1-4], counters [2:1]
0 indexes up to i in the set for prime 2
total 0
S_1: 7
prime hash [2:1-4; 7:0,2-4], counters [2:1, 7:1]
1 index up to i in the set for prime 7
total 1
S_2: 10
prime hash [2:1,3-4; 5:0-1,3-4; 7:0,2-4], counters [2:2, 5:1, 7:1]
1 index up to i in the set for prime 2, which is also a member
of the set for prime 5
total 2
S_3: 15
prime hash [2:1,3-4; 5:0-1,4; 7:0,2-4; 3:0-2,4], counters [2:2: 5:2, 7:1, 3:1]
2 indexes up to i in the set for prime 5, which are also members
of the set for prime 3
total 4
S_4: 21
prime hash [2:1,3-4; 5:0-1,4; 7:0,2-3; 3:0-2], counters [2:2: 5:2, 7:2, 3:2]
2 indexes up to i in the set for prime 7, which are also members
of the set for prime 3
total 6
6 coprime pairs:
(4,7),(4,15),(4,21),(7,10),(7,15),(10,21)
I would suggest :
1) Use Eratosthene to get a list of sorted prime numbers under 10^6.
2) For each number n in the list, get it's prime factors. Associate it another number f(n) in the following way : let's say that the prime factors of n are 3, 7 and 17. Then the binary representation of f(n) is :
`0 1 0 1 0 0 1`
The first digit (0 here) is associated to the prime number 2, the second (1 here) is associated to the prime number 3, etc ...
Therefore 2 numbers n and m are coprime iff f(n) & f(m) = 0.
3) It's easy to see that there is a N such that for each n : f(n) <= (2^N) - 1. This means that the biggest number f(n) is smaller or equal to a number whose binary representation is :
`1 1 1 1 1 1 1 1 1 1 1 1 1 1 1`
Here N is the number of 1 in the above sequence. Get this N and sort the list of numbers f(n). Let's call this list L.
If you want to optimize: in this list, instead of sorting duplicates, store a pair containing f(n) and the number of times f(n) is duplicated.
4) Iterate from 1 to N in this way : initialize i = 1 0 0 0 0, and at each iteration, move the digit 1 to the right with all other values kept to 0 (implement it using bitshift).
At each iteration, iterate over L to get the number d(i) of elements l in L such that i & l != 0 (be careful if you use the above optimization). In other words, for each i, get the number of elements in L which are not coprimes with i, and name this number d(i). Add the total
D = d(1) + d(2) + ... + d(N)
5) This number D is the number of pairs which are not coprime in the original list. The number of coprime pairs is :
M*(M-1)/2 - D
where M is the number of elements in the original list. The complexity of this method is O(n log(n)).
Good luck !
My previous answer was wrong, apologies. I propose here a modification:
Once you get the prime divisors of each number of the list, associate to each prime number p the number l(p) of numbers in the list which has p as divisor. For example consider the prime number 5, and the list's number which can be divided by 5 are 15, 100 and 255. Then l(5)=3.
To achieve it in O(n logn), iterate over the list and for each number in this list, iterate over it's prime factors; for each prime factor p, increment its l(p).
Then the number of pairs which are not coprime and can be divided by p is
l(p)*(l(p) - 1) / 2
Sum this number for all prime p, and you will get the number of pairs in the list which are not coprime (note that l(p) can be 0). Let say this sum is D, then the answer is
M*(M-1)/2 - D
where M is the length of the list. Good luck !

find the biggest possible number comprised of the digits of of a given number

Given a number N find the biggest possible number X that can be created from the given number digits.
Example: N=231 then X will be 321.
The restrictions are time complexity O(1)
and space complexity O(1).
I think this has to be done with counting sort.
Best I can do is O(1) space and O(log(N)) time. Pretty sure it's impossible to do any better because at the bare minimum you have to analyze each digit in the input, which is log(N) right there.
The short answer is, sort the digits of N in descending order.
Pseudocode:
Create an array of 10 integers, all initialized to zero.
iterate through each digit of N. increment the slot in the array that corresponds to each digit.
Iterate through the array. Add N instances of the character C to the beginning of the result string, where N is the number stored in slot number C in the array.
Sample Python implementation:
N = 231
slots = [0,0,0,0,0,0,0,0,0,0]
while N > 0:
slots[N%10] += 1
N = int(N / 10)
result = ""
for slot_idx in range(10):
for i in range(slots[slot_idx]):
result = str(slot_idx) + result
print result
Result:
321

write a number as sum of a consecutive primes

How to check if n can be partitioned to sum of a sequence of consecutive prime numbers.
For example, 12 is equal to 5+7 which 5 and 7 are consecutive primes, but 20 is equal to 3+17 which 3 and 17 are not consecutive.
Note that, repetition is not allowed.
My idea is to find and list all primes below n, then use 2 loops to sum all primes. The first 2 numbers, second 2 numbers, third 2 numbers etc. and then first 3 numbers, second 3 numbers and so far. But it takes lot of time and memory.
Realize that a consecutive list of primes is defined only by two pieces of information, the starting and the ending prime number. You just have to find these two numbers.
I assume that you have all the primes at your disposal, sorted in the array called primes. Keep three variables in memory: sum which initially is 2 (the smallest prime), first_index and last_index which are initially 0 (index of the smallest prime in array primes).
Now you have to "tweak" these two indices, and "travel" the array along the way in the loop:
If sum == n then finish. You have found your sequence of primes.
If sum < n then enlarge the list by adding next available prime. Increment last_index by one, and then increment sum by the value of new prime, which is primes[last_index]. Repeat the loop. But if primes[last_index] is larger than n then there is no solution, and you must finish.
If sum > n then reduce the list by removing the smallest prime from the list. Decrement sum by that value, which is primes[first_index], and then increment first_index by one. Repeat the loop.
Dialecticus's algorithm is the classic O(m)-time, O(1)-space way to solve this type of problem (here I'll use m to represent the number of prime numbers less than n). It doesn't depend on any mysterious properties of prime numbers. (Interestingly, for the particular case of prime numbers, AlexAlvarez's algorithm is also linear time!) Dialecticus gives a clear and correct description, but seems at a loss to explain why it is correct, so I'll try to do this here. I really think it's valuable to take the time to understand this particular algorithm's proof of correctness: although I had to read a number of explanations before it finally "sank in", it was a real "Aha!" moment when it did! :) (Also, problems that can be efficiently solved in the same manner crop up quite a lot.)
The candidate solutions this algorithm tries can be represented as number ranges (i, j), where i and j are just the indexes of the first and last prime number in a list of prime numbers. The algorithm gets its efficiency by ruling out (that is, not considering) sets of number ranges in two different ways. To prove that it always gives the right answer, we need to show that it never rules out the only range with the right sum. To that end, it suffices to prove that it never rules out the first (leftmost) range with the right sum, which is what we'll do here.
The first rule it applies is that whenever we find a range (i, j) with sum(i, j) > n, we rule out all ranges (i, k) having k > j. It's easy to see why this is justified: the sum can only get bigger as we add more terms, and we have determined that it's already too big.
The second, trickier rule, crucial to the linear time complexity, is that whenever we advance the starting point of a range (i, j) from i to i+1, instead of "starting again" from (i+1, i+1), we start from (i+1, j) -- that is, we avoid considering (i+1, k) for all i+1 <= k < j. Why is it OK to do this? (To put the question the other way: Couldn't it be that doing this causes us to skip over some range with the right sum?)
[EDIT: The original version of the next paragraph glossed over a subtlety: we might have advanced the range end point to j on any previous step.]
To see that it never skips a valid range, we need to think about the range (i, j-1). For the algorithm to advance the starting point of the current range, so that it changes from (i, j) to (i+1, j), it must have been that sum(i, j) > n; and as we will see, to get to a program state in which the range (i, j) is being considered in the first place, it must have been that sum(i, j-1) < n. That second claim is subtle, because there are two different ways to arrive in such a program state: either we just incremented the end point, meaning that the previous range was (i, j-1) and this range was found to be too small (in which case our desired property sum(i, j-1) < n obviously holds); or we just incremented the start point after considering (i-1, j) and finding it to be too large (in which case it's not obvious that the property still holds).
What we do know, however, is that regardless of whether the end point was increased from j-1 to j on the previous step, it was definitely increased at some time before the current step -- so let's call the range that triggered this end point increase (k, j-1). Clearly sum(k, j-1) < n, since this was (by definition) the range that caused us to increase the end point from j-1 to j; and just as clearly k <= i, since we only process ranges in increasing order of their start points. Since i >= k, sum(i, j-1) is just the same as sum(k, j-1) but with zero or more terms removed from the left end, and all of these terms are positive, so it must be that sum(i, j-1) <= sum(k, j-1) < n.
So we have established that whenever we increase i to i+1, we know that sum(i, j-1) < n. To finish the analysis of this rule, what we (again) need to make use of is that dropping terms from either end of this sum can't make it any bigger. Removing the first term leaves us with sum(i+1, j-1) <= sum(i, j-1) < n. Starting from that sum and successively removing terms from the other end leaves us with sum(i+1, j-2), sum(i+1, j-3), ..., sum(i+1, i+1), all of which we know must be less than n -- that is, none of the ranges corresponding to these sums can be valid solutions. Therefore we can safely avoid considering them in the first place, and that's exactly what the algorithm does.
One final potential stumbling block is that it might seem that, since we are advancing two loop indexes, the time complexity should be O(m^2). But notice that every time through the loop body, we advance one of the indexes (i or j) by one, and we never move either of them backwards, so if we are still running after 2m loop iterations we must have i + j = 2m. Since neither index can ever exceed m, the only way for this to hold is if i = j = m, which means that we have reached the end: i.e. we are guaranteed to terminate after at most 2m iterations.
The fact that primes have to be consecutive allows to solve quite efficiently this problem in terms of n. Let me suppose that we have previously computed all the primes less or equal than n. Therefore, we can easily compute sum(i) as the sum of the first i primes.
Having this function precomputed, we can loop over the primes less or equal than n and see whether there exists a length such that starting with that prime we can sum up to n. But notice that for a fixed starting prime, the sequence of sums is monotone, so we can binary search over the length.
Thus, let k be the number of primes less or equal than n. Precomputing the sums has cost O(k) and the loop has cost O(klogk), dominating the cost. Using the Prime number theorem, we know that k = O(n/logn), and then the whole algorithm has cost O(n/logn log(n/logn)) = O(n).
Let me put a code in C++ to make it clearer, hope there are not bugs:
#include <iostream>
#include <vector>
using namespace std;
typedef long long ll;
int main() {
//Get the limit for the numbers
int MAX_N;
cin >> MAX_N;
//Compute the primes less or equal than MAX_N
vector<bool> is_prime(MAX_N + 1, true);
for (int i = 2; i*i <= MAX_N; ++i) {
if (is_prime[i]) {
for (int j = i*i; j <= MAX_N; j += i) is_prime[j] = false;
}
}
vector<int> prime;
for (int i = 2; i <= MAX_N; ++i) if (is_prime[i]) prime.push_back(i);
//Compute the prefixed sums
vector<ll> sum(prime.size() + 1, 0);
for (int i = 0; i < prime.size(); ++i) sum[i + 1] = sum[i] + prime[i];
//Get the number of queries
int n_queries;
cin >> n_queries;
for (int z = 1; z <= n_queries; ++z) {
int n;
cin >> n;
//Solve the query
bool found = false;
for (int i = 0; i < prime.size() and prime[i] <= n and not found; ++i) {
//Do binary search over the lenght of the sum:
//For all x < ini, [i, x] sums <= n
int ini = i, fin = int(prime.size()) - 1;
while (ini <= fin) {
int mid = (ini + fin)/2;
int value = sum[mid + 1] - sum[i];
if (value <= n) ini = mid + 1;
else fin = mid - 1;
}
//Check the candidate of the binary search
int candidate = ini - 1;
if (candidate >= i and sum[candidate + 1] - sum[i] == n) {
found = true;
cout << n << " =";
for (int j = i; j <= candidate; ++j) {
cout << " ";
if (j > i) cout << "+ ";
cout << prime[j];
}
cout << endl;
}
}
if (not found) cout << "No solution" << endl;
}
}
Sample input:
1000
5
12
20
28
17
29
Sample output:
12 = 5 + 7
No solution
28 = 2 + 3 + 5 + 7 + 11
17 = 2 + 3 + 5 + 7
29 = 29
I'd start by noting that for a pair of consecutive primes to sum to the number, one of the primes must be less than N/2, and the other prime must be greater than N/2. For them to be consecutive primes, they must be the primes closest to N/2, one smaller and the other larger.
If you're starting with a table of prime numbers, you basically do a binary search for N/2. Look at the primes immediately larger and smaller than that. Add those numbers together and see if they sum to your target number. If they don't, then it can't be the sum of two consecutive primes.
If you don't start with a table of primes, it works out pretty much the same way--you still start from N/2 and find the next larger prime (we'll call that prime1). Then you subtract N-prime1 to get a candidate for prime2. Check if that's prime, and if it is, search the range prime2...N/2 for other primes to see if there was a prime in between. If there's a prime in between your number is a sum of non-consecutive primes. If there's no other prime in that range, then it is a sum of consecutive primes.
The same basic idea applies for sequences of 3 or more primes, except that (of course) your search starts from N/3 (or whatever number of primes you want to sum to get to the number).
So, for three consecutive primes to sum to N, 2 of the three must be the first prime smaller than N/3 and the first prime larger than N/3. So, we start by finding those, then compute N-(prime1+prime2). That gives use our third candidate. We know these three numbers sum to N. We still need to prove that this third number is a prime. If it is prime, we need to verify that it's consecutive to the other two.
To give a concrete example, for 10 we'd start from 3.333. The next smaller prime is 3 and the next larger is 5. Those add to 8. 10-8 = 2. 2 is prime and consecutive to 3, so we've found the three consecutive primes that add to 10.
There are some other refinements you can make as well. The most obvious would be based on the fact that all primes (other than 2) are odd numbers. Therefore (assuming we can ignore 2), an even number can only be the sum of an even number of primes, and an odd number can only be a sum of an odd number of primes. So, given 123456789, we know immediately that it can't possibly be the sum of 2 (or 4, 6, 8, 10, ...) consecutive primes, so the only candidates to consider are 3, 5, 7, 9, ... primes. Of course, the opposite works as well: given, say, 12345678, the simple fact that it's even lets us immediately rule out the possibility that it could be the sum of 3, 5, 7 or 9 consecutive primes; we only need to consider sequences of 2, 4, 6, 8, ... primes. We violate this basic rule only when we get to a large enough number of primes that we could include 2 as part of the sequence.
I haven't worked through the math to figure out exactly how many that would be be for a given number, but I'm pretty sure it should be fairly easy and it's something we want to know anyway (because it's the upper limit on the number of consecutive primes to look for for a given number). If we use M for the number of primes, the limit should be approximately M <= sqrt(N), but that's definitely only an approximation.
I know that this question is a little old, but I cannot refrain from replying to the analysis made in the previous answers. Indeed, it has been emphasized that all the three proposed algorithms have a run-time that is essentially linear in n. But in fact, it is not difficult to produce an algorithm that runs at a strictly smaller power of n.
To see how, let us choose a parameter K between 1 and n and suppose that the primes we need are already tabulated (if they must be computed from scratch, see below). Then, here is what we are going to do, to search a representation of n as a sum of k consecutive primes:
First we search for k<K using the idea present in the answer of Jerry Coffin; that is, we search k primes located around n/k.
Then to explore the sums of k>=K primes we use the algorithm explained in the answer of Dialecticus; that is, we begin with a sum whose first element is 2, then we advance the first element one step at a time.
The first part, that concerns short sums of big primes, requires O(log n) operations to binary search one prime close to n/k and then O(k) operations to search for the other k primes (there are a few simple possible implementations). In total this makes a running time
R_1=O(K^2)+O(Klog n).
The second part, that is about long sums of small primes, requires us to consider sums of consecutive primes p_1<\dots<p_k where the first element is at most n/K.
Thus, it requires to visit at most n/K+K primes (one can actually save a log factor by a weak version of the prime number theorem). Since in the algorithm every prime is visited at most O(1) times, the running time is
R_2=O(n/K) + O(K).
Now, if log n < K < \sqrt n we have that the first part runs with O(K^2) operations and the second part runs in O(n/K). We optimize with the choice K=n^{1/3}, so that the overall running time is
R_1+R_2=O(n^{2/3}).
If the primes are not tabulated
If we also have to find the primes, here is how we do it.
First we use Erathostenes, that in C_2=O(T log log T) operations finds all the primes up to T, where T=O(n/K) is the upper bound for the small primes visited in the second part of the algorithm.
In order to perform the first part of the algorithm we need, for every k<K, to find O(k) primes located around n/k. The Riemann hypothesis implies that there are at least k primes in the interval [x,x+y] if y>c log x (k+\sqrt x) for some constant c>0. Therefore a priori we need to find the primes contained in an interval I_k centered at n/k with width |I_k|= O(k log n)+O(\sqrt {n/k} log n).
Using the sieve Eratosthenes to sieve the interval I_k requires O(|I_k|log log n) + O(\sqrt n) operations. If k<K<\sqrt n we get a time complexity C_1=O(\sqrt n log n log log n) for every k<K.
Summing up, the time complexity C_1+C_2+R_1+R_2 is maximized when
K = n^{1/4} / (log n \sqrt{log log n}).
With this choice have the sublinear time complexity
R_1+R_2+C_1+C_2 = O(n^{3/4}\sqrt{log log n}.
If we do not assume the Riemann Hypothesis we will have to search on larger intervals, but we still get at the end a sublinear time complexity. If instead we assume stronger conjectures on prime gaps, we may only need to search on intervals I_k with width |I_k|=k (log n)^A for some A>0. Then, instead of Erathostenes, we can use other deterministic primality tests. For example, suppose that you can test a single number for primality in O((log n)^B) operations, for some B>0.
Then you can search the interval I_k in O(k(log n)^{A+B}) operations. In this case the optimal K is still K\approx n^{1/3}, up to logarithmic factors, and so the total complexity is O(n^{2/3}(log n)^D for some D>0.

Count of numbers between A and B (inclusive) that have sum of digits equal to S

The problems is to find the count of numbers between A and B (inclusive) that have sum of digits equal to S.
Also print the smallest such number between A and B (inclusive).
Input:
Single line consisting of A,B,S.
Output:
Two lines.
In first line the number of integers between A and B having sum of digits equal to S.
In second line the smallest such number between A and B.
Constraints:
1 <= A <= B < 10^15
1 <= S <= 135
Source: Hacker Earth
My solution works for only 30 pc of their inputs. What could be the best possible solution to this?
The algorithm I am using now computes the sum of the smallest digit and then upon every change of the tens digit computes the sum again.
Below is the solution in Python:
def sum(n):
if (n<10):return n
return n%10 + sum(n/10)
stri = raw_input()
min = 99999
stri = stri.split(" ")
a= long (stri[0])
b= long (stri[1])
s= long (stri[2])
count= 0
su = sum(a)
while a<=b :
if (a % 10 == 0 ):
su = sum(a)
print a
if ( s == su):
count+=1
if (a<= min):
min=a
a+=1
su+=1
print count
print min
There are two separate problems here: finding the smallest number between those numbers that has the right digit sum and finding the number of values in the range with that digit sum. I'll talk about those problems separately.
Counting values between A and B with digit sum S.
The general approach for solving this problem will be the following:
Compute the number of values less than or equal to A - 1 with digit sum S.
Compute the number of values less than or equal to B with digit sum S.
Subtract the first number from the second.
To do this, we should be able to use a dynamic programming approach. We're going to try to answer queries of the following form:
How many D-digit numbers are there, whose first digit is k, whose digits that sum up to S?
We'll create a table N[D, k, S] to hold these values. We know that D is going to be at most 16 and that S is going to be at most 136, so this table will have only 10 × 16 × 136 = 21,760 entries, which isn't too bad. To fill it in, we can use the following base cases:
N[1, S, S] = 1 for 0 ≤ S ≤ 9, since there's only one one-digit number that sums up to any value less than ten.
N[1, k, S] = 0 for 0 ≤ S ≤ 9 if k ≠ S, since no one-digit number whose first digit isn't a particular sum sums up to some value.
N[1, k, S] = 0 for 10 ≤ S ≤ 135, since no one-digit number sums up to exactly S for any k greater than a single digit.
N[1, k, S] = 0 for any S < 0.
Then, we can use the following logic to fill in the other table entries:
N[D + 1, k, S] = sum(i from 0 to 9) N[D, i, S - k].
This says that the number of (D+1)-digit numbers whose first digit is k that sum up to S is given by the number of D-digit numbers that sum up to S - k. The number of D-digit numbers that sum up to S - k is given by the number of D-digit numbers that sum up to S - k whose first digit is 0, 1, 2, ..., 9, so we have to sum up over them.
Filling in this DP table takes time only O(1), and in fact you could conceivably precompute it and hardcode it into the program if you were really concerned about time.
So how can we use this table? Well, suppose we want to know how many numbers that sum up to S are less than or equal to some number X. To do this, we can process the digits of X one at a time. Let's write X one digit at a time as d1 ... dn. We can start off by looking at N[n, d1, S]. This gives us the number of n-digit numbers whose first digit is d1 that sum up to S. This may overestimate the number of values less than or equal to X that sum up to S. For example, if our number is 21,111 and we want the number of values that sum up to exactly 12, then looking up this table value will give us false positives for numbers like 29,100 that start with a 2 and are five digits long, but which are still greater than X. To handle this, we can move to the next digit of the number X. Since the first digit was a 2, the rest of the digits in the number must sum up to 10. Moreover, since the next digit of X (21,111) is a 1, we can now subtract from our total the number of 4-digit numbers starting with 2, 3, 4, 5, ..., 9 that add up to 10. We can then repeat this process one digit at a time.
More generally, our algorithm will be as follows. Let X be our number and S the target sum. Write X = d1d2...dn and compute the following:
# Begin by starting with all numbers whose first digit is less than d[1].
# Those count as well.
result = 0
for i from 0 to d[1]:
result += N[n, i, S]
# Now, exclude everything whose first digit is d[1] that is too large.
S -= d[1]
for i = 2 to n:
for j = d[i] to 8:
result -= N[n, d[i], S]
S -= d[i]
The value of result will then be the number of values less than or equal to X that sum up to exactly S. This algorithm will only run for at most 16 iterations, so it should be very quick. Moreover, using this algorithm and the earlier subtraction trick, we can use it to compute how many values between A and B sum up to exactly S.
Finding the smallest value in [A, B] with digit sum S.
We can use a similar trick with our DP table to find the smallest number greater than A number that sums up to exactly S. I'll leave the details as an exercise, but as a hint, work one digit at a time, trying to find the smallest number for which the DP table returns a nonzero value.
Hope this helps!

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Resources