Return the count of all prime numbers in range [a,b] such that all the digits are from set {1,5,9} . 1<=a<=b<=10⁹ - performance

Return the count of all prime numbers in range [a,b] such that all the digits are from set {1,5,9} . 1<=a<=b<=10⁹.
My approach -
I was trying to generate all the numbers which are from set {1,5,9}. which comes out to be 3^9(19683) and after that I am checking for is it prime or not.
Can I do this in a better time complexity?

Never generate a large set and after check all elements of the set, ruling out most. That requires a lot of memory to store things you'll be discarding. Instead, find a single number with "valid" digits, check for primeness, and only then store it in a set. Accessing large arrays of memory is very time-intense on modern computers compared to doing math.
"I produced all the numbers": I hope you're doing this smartly! You never have to check a number with a last digit being 5 for primeness (there's only a single prime that ends in 5; that's 5 itself!), for example. Also, you hopefully don't just build all combinations of digits "manually". Say, you find a number 19551, then 19559 is also a candidate, you never have to manually "combine" digits to try out the last digit.
Of course, your prime-checking algorithm needs to be matching your kind of problem: You can remove the initial check for divisibility by 2 (you never produce even numbers), for example. You never need to check for divisibility by 5, because you never use 5 or 0 as last digit. Depending on your prime checking algorithm, you also would want to save the factor that "killed" the xxxx1 – that's one factor you don't have to check xxxx9 against. Do your 3-factor-checking based on the count of 1,5 and 9 in your number; you can directly infer cross-sum and hence 3-divisibility from that.

Related

Is there a better way to generate all equal arithmetic sequences using numbers 1 to 10?

Problem:
The numbers from 1 to 10 are given. Put the equal sign(somewhere between
them) and any arithmetic operator {+ - * /} so that a perfect integer
equality is obtained(both the final result and the partial results must be
integer)
Example:
1*2*3*4*5/6+7=8+9+10
1*2*3*4*5/6+7-8=9+10
My first idea to resolve this was using backtracking:
Generate all possibilities of putting operators between the numbers
For one such possibility replace all the operators, one by one, with the equal sign and check if we have two equal results
But this solution takes a lot of time.
So, my question is: Is there a faster solution, maybe something that uses the operator properties or some other cool math trick ?
I'd start with the equals sign. Pick a possible location for that, and split your sequence there. For left and right side independently, find all possible results you could get for each, and store them in a dict. Then match them up later on.
Finding all 226 solutions took my Python program, based on this approach, less than 0.15 seconds. So there certainly is no need to optimize further, is there? Along the way, I computed a total of 20683 subexpressions for a single side of one equation. They are fairly well balenced: 10327 expressions for left hand sides and 10356 expressions for right hand sides.
If you want to be a bit more clever, you can try reduce the places where you even attempt division. In order to allov for division without remainder, the prime factors of the divisor must be contained in those of the dividend. So the dividend must be some product and that product must contain the factors of number by which you divide. 2, 3, 5 and 7 are prime numbers, so they can never be such divisors. 4 will never have two even numbers before it. So the only possible ways are 2*3*4*5/6, 4*5*6*7/8 and 3*4*5*6*7*8/9. But I'd say it's far easier to check whether a given division is possible as you go, without any need for cleverness.

Given a true random number generator which outputs either a 1 or 0 per call, how do you use this to pick a number from an arbitrary range?

If I have a true random number generator (TRNG) which can give me either a 0 or a 1 each time I call it, then it is trivial to then generate any number in a range with a length equal to a power of 2. For example, if I wanted to generate a random number between 0 and 63, I would simply poll the TRNG 5 times, for a maximum value of 11111 and a minimum value of 00000. The problem is when I want a number in a rangle not equal to 2^n. Say I wanted to simulate the roll of a dice. I would need a range between 1 and 6, with equal weighting. Clearly, I would need three bits to store the result, but polling the TRNG 3 times would introduce two eroneous values. We could simply ignore them, but then that would give one side of the dice a much lower odds of being rolled.
My question of ome most effectively deals with this.
The easiest way to get a perfectly accurate result is by rejection sampling. For example, generate a random value from 1 to 8 (3 bits), rejecting and generating a new value (3 new bits) whenever you get a 7 or 8. Do this in a loop.
You can get arbitrarily close to accurate just by generating a large number of bits, doing the mod 6, and living with the bias. In cases like 32-bit values mod 6, the bias will be so small that it will be almost impossible to detect, even after simulating millions of rolls.
If you want a number in range 0 .. R - 1, pick least n such that R is less or equal to 2n. Then generate a random number r in the range 0 .. 2n-1 using your method. If it is greater or equal to R, discard it and generate again. The probability that your generation fails in this manner is at most 1/2, you will get a number in your desired range with less than two attempts on the average. This method is balanced and does not impair the randomness of the result in any fashion.
As you've observed, you can repeatedly double the range of a possible random values through powers of two by concatenating bits, but if you start with an integer number of bits (like zero) then you cannot obtain any range with prime factors other than two.
There are several ways out; none of which are ideal:
Simply produce the first reachable range which is larger than what you need, and to discard results and start again if the random value falls outside the desired range.
Produce a very large range, and distribute that as evenly as possible amongst your desired outputs, and overlook the small bias that you get.
Produce a very large range, distribute what you can evenly amongst your desired outputs, and if you hit upon one of the [proportionally] few values which fall outside of the set which distributes evenly, then discard the result and start again.
As with 3, but recycle the parts of the value that you did not convert into a result.
The first option isn't always a good idea. Numbers 2 and 3 are pretty common. If your random bits are cheap then 3 is normally the fastest solution with a fairly small chance of repeating often.
For the last one; supposing that you have built a random value r in [0,31], and from that you need to produce a result x [0,5]. Values of r in [0,29] could be mapped to the required output without any bias using mod 6, while values [30,31] would have to be dropped on the floor to avoid bias.
In the former case, you produce a valid result x, but there's some more randomness left over -- the difference between the ranges [0,5], [6,11], etc., (five possible values in this case). You can use this to start building your new r for the next random value you'll need to produce.
In the latter case, you don't get any x and are going to have to try again, but you don't have to throw away all of r. The specific value picked from the illegal range [30,31] is left-over and free to be used as a starting value for your next r (two possible values).
The random range you have from that point on needn't be a power of two. That doesn't mean it'll magically reach the range you need at the time, but it does mean you can minimise what you throw away.
The larger you make r, the more bits you may need to throw away if it overflows, but the smaller the chances of that happening. Adding one bit halves your risk but increases the cost only linearly, so it's best to use the largest r you can handle.

Find if any permutation of a number is within a range

I need to find if any permutation of the number exists within a specified range, i just need to return Yes or No.
For eg : Number = 122, and Range = [200, 250]. The answer would be Yes, as 221 exists within the range.
PS:
For the problem that i have in hand, the number to be searched
will only have two different digits (It will only contain 1 and 2,
Eg : 1112221121).
This is not a homework question. It was asked in an interview.
The approach I suggested was to find all permutations of the given number and check. Or loop through the range and check if we find any permutation of the number.
Checking every permutation is too expensive and unnecessary.
First, you need to look at them as strings, not numbers,
Consider each digit position as a seperate variable.
Consider how the set of possible digits each variable can hold is restricted by the range. Each digit/variable pair will be either (a) always valid (b) always invalid; or (c) its validity is conditionally dependent on specific other variables.
Now model these dependencies and independencies as a graph. As case (c) is rare, it will be easy to search in time proportional to O(10N) = O(N)
Numbers have a great property which I think can help you here:
For a given number a of value KXXXX, where K is given, we can
deduce that K0000 <= a < K9999.
Using this property, we can try to build a permutation which is within the range:
Let's take your example:
Range = [200, 250]
Number = 122
First, we can define that the first number must be 2. We have two 2's so we are good so far.
The second number must be be between 0 and 5. We have two candidate, 1 and 2. Still not bad.
Let's check the first value 1:
Any number would be good here, and we still have an unused 2. We have found our permutation (212) and therefor the answer is Yes.
If we did find a contradiction with the value 1, we need to backtrack and try the value 2 and so on.
If none of the solutions are valid, return No.
This Algorithm can be implemented using backtracking and should be very efficient since you only have 2 values to test on each position.
The complexity of this algorithm is 2^l where l is the number of elements.
You could try to implement some kind of binary search:
If you have 6 ones and 4 twos in your number, then first you have the interval
[1111112222; 2222111111]
If your range does not overlap with this interval, you are finished. Now split this interval in the middle, you get
(1111112222 + 222211111) / 2
Now find the largest number consisting of 1's and 2's of the respective number that is smaller than the split point. (Probably this step could be improved by calculating the split directly in some efficient way based on the 1 and 2 or by interpreting 1 and 2 as 0 and 1 of a binary number. One could also consider taking the geometric mean of the two numbers, as the candidates might then be more evenly distributed between left and right.)
[Edit: I think I've got it: Suppose the bounds have the form pq and pr (i.e. p is a common prefix), then build from q and r a symmetric string s with the 1's at the beginning and the end of the string and the 2's in the middle and take ps as the split point (so from 1111112222 and 1122221111 you would build 111122222211, prefix is p=11).]
If this number is contained in the range, you are finished.
If not, look whether the range is above or below and repeat with [old lower bound;split] or [split;old upper bound].
Suppose the range given to you is: ABC and DEF (each character is a digit).
Algorithm permutationExists(range_start, range_end, range_index, nos1, nos2)
if (nos1>0 AND range_start[range_index] < 1 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1-1, nos2))
return true
elif (nos2>0 AND range_start[range_index] < 2 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1, nos2-1))
return true
else
return false
I am assuming every single number to be a series of digits. The given number is represented as {numberOf1s, numberOf2s}. I am trying to fit the digits (first 1s and then 2s) within the range, if not the procudure returns a false.
PS: I might be really wrong. I dont know if this sort of thing can work. I haven't given it much thought, really..
UPDATE
I am wrong in the way I express the algorithm. There are a few changes that need to be done in it. Here is a working code (It worked for most of my test cases): http://ideone.com/1aOa4
You really only need to check at most TWO of the possible permutations.
Suppose your input number contains only the digits X and Y, with X<Y. In your example, X=1 and Y=2. I'll ignore all the special cases where you've run out of one digit or the other.
Phase 1: Handle the common prefix.
Let A be the first digit in the lower bound of the range, and let B be the first digit in the upper bound of the range. If A<B, then we are done with Phase 1 and move on to Phase 2.
Otherwise, A=B. If X=A=B, then use X as the first digit of the permutation and repeat Phase 1 on the next digit. If Y=A=B, then use Y as the first digit of the permutation and repeat Phase 1 on the next digit.
If neither X nor Y is equal to A and B, then stop. The answer is No.
Phase 2: Done with the common prefix.
At this point, A<B. If A<X<B, then use X as the first digit of the permutation and fill in the remaining digits however you want. The answer is Yes. (And similarly if A<Y<B.)
Otherwise, check the following four cases. At most two of the cases will require real work.
If A=X, then try using X as the first digit of the permutation, followed by all the Y's, followed by the rest of the X's. In other words, make the rest of the permutation as large as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
If B=X, then try using X as the first digit of the permutation, followed by the rest of the X's, followed by all the Y's. In other words, make the rest of the permutation as small as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
Similar cases if A=Y or B=Y.
If none of these four cases succeed, then the answer is No. Notice that at most one of the X cases and at most one of the Y cases can match.
In this solution, I've assumed that the input number and the two numbers in the range all contain the same number of digits. With a little extra work, the approach can be extended to cases where the numbers of digits differ.

Quickly determine if number divides any element in a set

Is there an algorithm that can quickly determine if a number is a factor of a given set of numbers ?
For example, 12 is a factor of [24,33,52] while 5 is not.
Is there a better approach than linear search O(n)? The set will contain a few million elements. I don't need to find the number, just a true or false result.
If a large number of numbers are checked against a constant list one possible approach to speed up the process is to factorize the numbers in the list into their prime factors first. Then put the list members in a dictionary and have the prime factors as the keys. Then when a number (potential factor) comes you first factorize it into its prime factors and then use the constructed dictionary to check whether the number is a factor of the numbers which can be potentially multiples of the given number.
I think in general O(n) search is what you will end up with. However, depending on how large the numbers are in general, you can speed up the search considerably assuming that the set is sorted (you mention that it can be) by observing that if you are searching to find a number divisible by D and you have currently scanned x and x is not divisible by D, the next possible candidate is obviously at floor([x + D] / D) * D. That is, if D = 12 and the list is
5 11 13 19 22 25 27
and you are scanning at 13, the next possible candidate number would be 24. Now depending on the distribution of your input, you can scan forwards using binary search instead of linear search, as you are searching now for the least number not less than 24 in the list, and the list is sorted. If D is large then you might save lots of comparisons in this way.
However from pure computational complexity point of view, sorting and then searching is going to be O(n log n), whereas just a linear scan is O(n).
For testing many potential factors against a constant set you should realize that if one element of the set is just a multiple of two others, it is irrelevant and can be removed. This approach is a variation of an ancient algorithm known as the Sieve of Eratosthenes. Trading start-up time for run-time when testing a huge number of candidates:
Pick the smallest number >1 in the set
Remove any multiples of that number, except itself, from the set
Repeat 2 for the next smallest number, for a certain number of iterations. The number of iterations will depend on the trade-off with start-up time
You are now left with a much smaller set to exhaustively test against. For this to be efficient you either want a data structure for your set that allows O(1) removal, like a linked-list, or just replace "removed" elements with zero and then copy non-zero elements into a new container.
I'm not sure of the question, so let me ask another: Is 12 a factor of [6,33,52]? It is clear that 12 does not divide 6, 33, or 52. But the factors of 12 are 2*2*3 and the factors of 6, 33 and 52 are 2*2*2*3*3*11*13. All of the factors of 12 are present in the set [6,33,52] in sufficient multiplicity, so you could say that 12 is a factor of [6,33,52].
If you say that 12 is not a factor of [6,33,52], then there is no better solution than testing each number for divisibility by 12; simply perform the division and check the remainder. Thus 6%12=6, 33%12=9, and 52%12=4, so 12 is not a factor of [6.33.52]. But if you say that 12 is a factor of [6,33,52], then to determine if a number f is a factor of a set ns, just multiply the numbers ns together sequentially, after each multiplication take the remainder modulo f, report true immediately if the remainder is ever 0, and report false if you reach the end of the list of numbers ns without a remainder of 0.
Let's take two examples. First, is 12 a factor of [6,33,52]? The first (trivial) multiplication results in 6 and gives a remainder of 6. Now 6*33=198, dividing by 12 gives a remainder of 6, and we continue. Now 6*52=312 and 312/12=26r0, so we have a remainder of 0 and the result is true. Second, is 5 a factor of [24,33,52]? The multiplication chain is 24%5=5, (5*33)%5=2, and (2*52)%5=4, so 5 is not a factor of [24,33,52].
A variant of this algorithm was recently used to attack the RSA cryptosystem; you can read about how the attack worked here.
Since the set to be searched is fixed any time spent organising the set for search will be time well spent. If you can get the set in memory, then I expect that a binary tree structure will suit just fine. On average searching for an element in a binary tree is an O(log n) operation.
If you have reason to believe that the numbers in the set are evenly distributed throughout the range [0..10^12] then a binary search of a sorted set in memory ought to perform as well as searching a binary tree. On the other hand, if the middle element in the set (or any subset of the set) is not expected to be close to the middle value in the range encompassed by the set (or subset) then I think the binary tree will have better (practical) performance.
If you can't get the entire set in memory then decomposing it into chunks which will fit into memory and storing those chunks on disk is probably the way to go. You would store the root and upper branches of the set in memory and use them to index onto the disk. The depth of the part of the tree which is kept in memory is something you should decide for yourself, but I'd be surprised if you needed more than the root and 2 levels of branch, giving 8 chunks on disk.
Of course, this only solves part of your problem, finding whether a given number is in the set; you really want to find whether the given number is the factor of any number in the set. As I've suggested in comments I think any approach based on factorising the numbers in the set is hopeless, giving an expected running time beyond polynomial time.
I'd approach this part of the problem the other way round: generate the multiples of the given number and search for each of them. If your set has 10^7 elements then any given number N will have about (10^7)/N multiples in the set. If the given number is drawn at random from the range [0..10^12] the mean value of N is 0.5*10^12, which suggests (counter-intuitively) that in most cases you will only have to search for N itself.
And yes, I am aware that in many cases you would have to search for many more values.
This approach would parallelise relatively easily.
A fast solution which requires some precomputation:
Organize your set in a binary tree with the following rules:
Numbers of the set are on the leaves.
The root of the tree contains r the minimum of all prime numbers that divide a number of the set.
The left subtree correspond to the subset of multiples of r (divided by r so that r won't be repeated infinitly).
The right subtree correspond to the subset of numbers not multiple of r.
If you want to test if a number N divides some element of the set, compute its prime decomposition and go through the tree until you reach a leaf. If the leaf contains a number then N divides it, else if the leaf is empty then N divides no element in the set.
Simply calculate the product of the set and mod the result with the test factor.
In your example
{24,33,52} P=41184
Tf 12: 41184 mod 12 = 0 True
Tf 5: 41184 mod 5 = 4 False
The set can be broken into chunks if calculating the product would overflow the arithmetic of the calculator, but huge numbers are possible by storing a strings.

Finding numbers whose digits sum to a prime

I was trying to solve this problem on SPOJ, in which I have to find how many numbers are there in a range whose digits sum up to a prime. This range can be very big, (upper bound of 10^8 is given). The naive solution timed out, I just looped over the entire range and checked the required condition. I cant seem find a pattern or a formula too. Could someone please give a direction to proceed in??
Thanks in advance...
Here are some tips:
try to write a function that finds how many numbers in a given range have a given sum of the digits. Easiest way to implement this is to write a function that returns the number of numbers with a given sum of digits up to a given value a(call this f(sum,a)) and then the number of such numbers in the range a to b will be f(sum,b) - f(sum, a - 1)
Pay attention that the sum of the digits itself will not be too high - up to 8 * 9 < 100 so the number of prime sums to check is really small
Hope this helps.
I (seriously) doubt whether this 'opposite' approach will be any faster than #izomorphius's suggestion, but it might prompt some thoughts about improving the performance of your program:
1) Get the list of primes in the range 2..71 (you can omit 1 and 72 from any consideration since neither is prime).
2) Enumerate the integer partitions of each of the prime numbers in the list. Here's some Python code. You'd want to modify this so as not to generate partitions which were invalid, such as those containing numbers larger than 9.
3) For each of those partitions, pad out with 0s to make a set of 8 digits, then enumerate all the permutations of the padded set.
Now you have the list of numbers you require.
Generate the primes using the sieve of Eratosthenes up to the maximum sum (9 + 9...). Put them in a Hash table. Then you could likely loop quickly through 10^8 numbers and add up their sums. There might be more efficient methods, but this should be quick enough.

Resources