How to know number is including specific digit without converting string? - algorithm

For example i want to find numbers including 2,7 from 1 to 7000 without converting string.
2 7 12 17 ... 20 21 22 23...7000
Is there a good algorithm with math?
thank you in advance...

Something like that
while ( n > 0 ) {
digit = n % 10;
// check the digit
n = n / 10;
}
Example with 523
In the first iteration you will have digit = 3 (123 % 10)
In the second iteration you will have digit = 2 (12 % 10)
In the third digit = 5 (5 % 10)

Consider that if you put Matteo's code on a loop, it works.
By the way you can improve performance skipping obvious numbers.
For example if you find 7 on third digit like 15783, you can skip all 127XX (they are all valids!) and you can go to 15800
You can also directly build them. From 1 to 7000 they are:
xxx2 xxx7
xx2x xx7x
x2xx x7xx
2xxx 7000
Replacing x with 0-9 digit. (taking care of overlapping like xxx2 = xx7x for 0072 or 0172 ...)
EDIT:
TIP: You don't need strings to do this. 1332 == 1 * 10^3 + 3 * 10^2 + 3 * 10^1 + 2 * 10^0

Related

how to convert decimal value to binary bits?

Basically I want to learn the algorithm on how to convert decimal to binary, I found this:
int convert(int dec)
{
if (dec == 0)
{
return 0;
}
else
{
return (dec % 2 + 10 * convert(dec / 2));
}
}
It works just fine, but I am not able to understand dec % 2 + 10 * convert(dec / 2). Can you please convert this in an understandable way for people with basic math? e.g. what method is performed first and how does the binary dec = 50 turns to 110010?
FYI: I can do it, this way: 50=(2^5=32)+(2^4=16)+(2^1)=50
Thanks in advance.
I won't implement it for you, but I am happy to describe the algorithm and give an example.
Converting from base 10 to base b ultimately follows the same series of steps which includes repeatedly dividing by b then saving the remainder.
An example of what this looks like for 50 (base10) to base2 would be:
Quotient Remainder
----------------------------
50 / 2 = 25 0
25 / 2 = 12 1
12 / 2 = 6 0
6 / 2 = 3 0
3 / 2 = 1 1
1 / 2 = 0 1
Examining the remainders in reverse (bottom to top) gives your the correct representation in base b (2 in this case): 110010
For information on why this works, take a look at this question: https://math.stackexchange.com/questions/86207/converting-decimalbase-10-numbers-to-binary-by-repeatedly-dividing-by-2
Let's look at dec % 2 + 10 * convert(dec / 2). The first part dec % 2 is a modulo operation, and this is what decides if a digit should be 1 or 0. The rest, 10 * convert(dec / 2) finds the next (and next and next recursively) digit and puts it on the left of the current digit.
You could quite easily see what is going on by slightly modifying your code. Change the else to:
else
{
int ret = (dec % 2 + 10 * convert(dec / 2));
printf("%d %d\n", dec, ret);
return ret;
}
and then convert(50) will print this:
$ ./a.out
1 1
3 11
6 110
12 1100
25 11001
50 110010
But as pointed out in the comments, this is not a real base conversion. You have converted the number 50 to a completely different number that looks like the binary representation.
An algorithm that will, given an integer N, produce a string of characters S representing N in binary notation.
do
{
if N is odd
{
add '1' to the beginning of S
}
else
{
add '0' to the beginning of S
}
divide N by 2
}
while N is non-zero
Using the requested example:
initially N=50 and S is empty
50 is even: S="0"
divide N by 2: N=25
25 is odd: S="10"
divide N by 2: N=12
12 is even: S="010"
divide N by 2: N=6
6 is even: S="0010"
divide N by 2: N=3
3 is odd: S="10010"
divide N by 2: N=1
1 is odd: S="110010"
divide N by 2: N=0
stop looping

Consolidate 10 bit Value into a Unique Byte

As part of an algorithm I'm writing, I need to find a way to convert a 10-bit word into a unique 8-bit word. The 10-bit word is made up of 5 pairs, where each pair can only ever equal 0, 1 or 2 (never 3). For example:
|00|10|00|01|10|
This value needs to somehow be consolidated into a single, unique byte.
As each pair can never equal 3, there are a wide range of values that this 10-bit word will never represent, which makes me think that it is possible to create an algorithm to perform this conversion. The simplest way to do this would be to use a lookup table, but it seems like a waste of resources to store ~680 values which will only be used once in my program. I've already tried to incorporate one of the pairs into the others somehow, but every attempt I've made has resulted in a non-unique value, and I'm now very quickly running out of ideas!
Any help?
The number you have is essentially base 3. You just need to convert this to base 2.
There are 5 pairs, so 3^5 = 243 numbers. And 8 bits is 2^8 = 256 numbers, so it's possible.
The simplest way to convert between bases is to go to base 10 first.
So, for your example:
00|10|00|01|10
Base 3: 02012
Base 10: 2*3^3 + 1*3^1 + 2*3^0
= 54 + 3 + 2
= 59
Base 2:
59 % 2 = 1
/2 29 % 2 = 1
/2 14 % 2 = 0
/2 7 % 2 = 1
/2 3 % 2 = 1
/2 1 % 2 = 1
So 111011 is your number in binary
This explains the above process in a bit more detail.
Note that once you have 59 above stored in a 1-byte integer, you'll probably already have what you want, thus explicitly converting to base 2 might not be necessary.
What you basically have is a base 3 number and you want to convert this to a single number 0 - 255, luckily 5 digits in ternary (base 3) gives 243 combinations.
What you'll need to do is:
Digit Action
( 1st x 3^4)
+ (2nd x 3^3)
+ (3rd x 3^2)
+ (4th x 3)
+ (5th)
This will give you a number 0 to 242.
You are considering to store some information in a byte. A byte can contain at most 2 ^ 8 = 256 status.
Your status is totally 3 ^ 5 = 243 < 256. That make the transfer possible.
Consider your pairs are ABCDE (each character can be 0, 1 or 2)
You can just calculate A*3^4 + B*3^3 + C*3^2 + D*3 + E as your result. I guarantee the result will be in range 0 -- 255.

How to find the units digit of a certain power in a simplest way

How to find out the units digit of a certain number (e.g. 3 power 2011). What logic should I use to find the answer to this problem?
For base 3:
3^1 = 3
3^2 = 9
3^3 = 27
3^4 = 81
3^5 = 243
3^6 = 729
3^7 = 2187
...
That is the units digit has only 4 possibilities and then it repeats in ever the same cycle.
With the help of Euler's theorem we can show that this holds for any integer n, meaning their units digit will repeat after at most 4 consecutive exponents. Looking only at the units digit of an arbitrary product is equivalent to taking the remainder of the multiplication modulo 10, for example:
2^7 % 10 = 128 % 10 = 8
It can also be shown (and is quite intuitive) that for an arbitrary base, the units digit of any power will only depend on the units digit of the base itself - that is 2013^2013 has the same units digit as 3^2013.
We can exploit both facts to come up with an extremely fast algorithm (thanks for the help - with kind permission I may present a much faster version).
The idea is this: As we know that for any number 0-9 there will be at most 4 different outcomes, we can as well store them in a lookup table:
{ 0,0,0,0, 1,1,1,1, 6,2,4,8, 1,3,9,7, 6,4,6,4,
5,5,5,5, 6,6,6,6, 1,7,9,3, 6,8,4,2, 1,9,1,9 }
That's the possible outcomes for 0-9 in that order, grouped in fours. The idea is now for an exponentiation n^a to
first take the base mod 10 => := i
go to index 4*i in our table (it's the starting offset of that particular digit)
take the exponent mod 4 => := off (as stated by Euler's theorem we only have four possible outcomes!)
add off to 4*i to get the result
Now to make this as efficient as possible, some tweaks are applied to the basic arithmetic operations:
Multiplying by 4 is equivalent to shifting two to the left ('<< 2')
Taking a number a % 4 is equivalent to saying a&3 (masking the 1 and 2 bit, which form the remainder % 4)
The algorithm in C:
static int table[] = {
0, 0, 0, 0, 1, 1, 1, 1, 6, 2, 4, 8, 1, 3, 9, 7, 6, 4, 6, 4,
5, 5, 5, 5, 6, 6, 6, 6, 1, 7, 9, 3, 6, 8, 4, 2, 1, 9, 1, 9
};
int /* assume n>=0, a>0 */
unit_digit(int n, int a)
{
return table[((n%10)<<2)+(a&3)];
}
Proof for the initial claims
From observing we noticed that the units digit for 3^x repeats every fourth power. The claim was that this holds for any integer. But how is this actually proven? As it turns out that it's quite easy using modular arithmetic. If we are only interested in the units digit, we can perform our calculations modulo 10. It's equivalent to say the units digit cycles after 4 exponents or to say
a^4 congruent 1 mod 10
If this holds, then for example
a^5 mod 10 = a^4 * a^1 mod 10 = a^4 mod 10 * a^1 mod 10 = a^1 mod 10
that is, a^5 yields the same units digit as a^1 and so on.
From Euler's theorem we know that
a^phi(10) mod 10 = 1 mod 10
where phi(10) is the numbers between 1 and 10 that are co-prime to 10 (i.e. their gcd is equal to 1). The numbers < 10 co-prime to 10 are 1,3,7 and 9. So phi(10) = 4 and this proves that really a^4 mod 10 = 1 mod 10.
The last claim to prove is that for exponentiations where the base is >= 10 it suffices to just look at the base's units digit. Lets say our base is x >= 10, so we can say that x = x_0 + 10*x_1 + 100*x_2 + ... (base 10 representation)
Using modular representation it's easy to see that indeed
x ^ y mod 10
= (x_0 + 10*x_1 + 100*x_2 + ...) ^ y mod 10
= x_0^y + a_1 * (10*x_1)^y-1 + a_2 * (100*x_2)^y-2 + ... + a_n * (10^n) mod 10
= x_0^y mod 10
where a_i are coefficients that include powers of x_0 but finally not relevant since the whole product a_i * (10 * x_i)^y-i will be divisible by 10.
You should look at Modular exponentiation. What you want is the same of calculating n^e (mod m) with m = 10. That is the same thing as calculating the remainder of the division by ten of n^e.
You are probably interested in the Right-to-left binary method to calculate it, since it's the most time-efficient one and the easiest not too hard to implement. Here is the pseudocode, from Wikipedia:
function modular_pow(base, exponent, modulus)
result := 1
while exponent > 0
if (exponent & 1) equals 1:
result = (result * base) mod modulus
exponent := exponent >> 1
base = (base * base) mod modulus
return result
After that, just call it with modulus = 10 for you desired base and exponent and there's your answer.
EDIT: for an even simpler method, less efficient CPU-wise but more memory-wise, check out the Memory-efficient section of the article on Wikipedia. The logic is straightforward enough:
function modular_pow(base, exponent, modulus)
c := 1
for e_prime = 1 to exponent
c := (c * base) mod modulus
return c
I'm sure there's a proper mathematical way to solve this, but I would suggest that since you only care about the last digit and since in theory every number multiplied by itself repeatedly should generate a repeating pattern eventually (when looking only at the last digit), you could simply perform the multiplications until you detect the first repetition and then map your exponent into the appropriate position in the pattern that you built.
Note that because you only care about the last digit, you can further simplify things by truncating your input number down to its ones-digit before you start building your pattern mapping. This will let you to determine the last digit even for arbitrarily large inputs that would otherwise cause an overflow on the first or second multiplication.
Here's a basic example in JavaScript: http://jsfiddle.net/dtyuA/2/
function lastDigit(base, exponent) {
if (exponent < 0) {
alert("stupid user, negative values are not supported");
return 0;
}
if (exponent == 0) {
return 1;
}
var baseString = base + '';
var lastBaseDigit = baseString.substring(baseString.length - 1);
var lastDigit = lastBaseDigit;
var pattern = [];
do {
pattern.push(lastDigit);
var nextProduct = (lastDigit * lastBaseDigit) + '';
lastDigit = nextProduct.substring(nextProduct.length - 1);
} while (lastDigit != lastBaseDigit);
return pattern[(exponent - 1) % pattern.length];
};
function doMath() {
var base = parseInt(document.getElementById("base").value, 10);
var exp = parseInt(document.getElementById("exp").value, 10);
console.log(lastDigit(base, exp));
};
console.log(lastDigit(3003, 5));
Base: <input id="base" type="text" value="3" /> <br>
Exponent: <input id="exp" type="text" value="2011"><br>
<input type="button" value="Submit" onclick="doMath();" />
And the last digit in 3^2011 is 7, by the way.
We can start by inspecting the last digit of each result obtained by raising the base 10 digits to successive powers:
d d^2 d^3 d^4 d^5 d^6 d^7 d^8 d^9 (mod 10)
--- --- --- --- --- --- --- --- ---
0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1
2 4 8 6 2 4 8 6 2
3 9 7 1 3 9 7 1 3
4 6 4 6 4 6 4 6 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
7 9 3 1 7 9 3 1 7
8 4 2 6 8 4 2 6 8
9 1 9 1 9 1 9 1 9
We can see that in all cases the last digit cycles through no more than four distinct values. Using this fact, and assuming that n is a non-negative integer and p is a positive integer, we can compute the result fairly directly (e.g. in Javascript):
function lastDigit(n, p) {
var d = n % 10;
return [d, (d*d)%10, (d*d*d)%10, (d*d*d*d)%10][(p-1) % 4];
}
... or even more simply:
function lastDigit(n, p) {
return Math.pow(n % 10, (p-1) % 4 + 1) % 10;
}
lastDigit(3, 2011)
/* 7 */
The second function is equivalent to the first. Note that even though it uses exponentiation, it never works with a number larger than nine to the fourth power (6561).
The key to solving this type of question lies in Euler's theorem.
This theorem allows us to say that a^phi(m) mod m = 1 mod m, if and only if a and m are coprime. That is, a and m do not divide evenly. If this is the case, (and for your example it is), we can solve the problem on paper, without any programming what so ever.
Let's solve for the unit digit of 3^2011, as in your example. This is equivalent to 3^2011 mod 10.
The first step is to check is 3 and 10 are co-prime. They do not divide evenly, so we can use Euler's theorem.
We also need to compute what the totient, or phi value, is for 10. For 10, it is 4. For 100 phi is 40, 1000 is 4000, etc.
Using Euler's theorem, we can see that 3^4 mod 10 = 1. We can then re-write the original example as:
3^2011 mod 10 = 3^(4*502 + 3) mod 10 = 3^(4*502) mod 10 + 3^3 mod 10 = 1^502 * 3^3 mod 10 = 27 mod 10 = 7
Thus, the last digit of 3^2011 is 7.
As you saw, this required no programming whatsoever and I solved this example on a piece of scratch paper.
You ppl are making simple thing complicated.
Suppose u want to find out the unit digit of abc ^ xyz .
divide the power xyz by 4,if remainder is 1 ans is c^1=c.
if xyz%4=2 ans is unit digit of c^2.
else if xyz%4=3 ans is unit digit of c^3.
if xyz%4=0
then we need to check whether c is 5,then ans is 5
if c is even ans is 6
if c is odd (other than 5 ) ans is 1.
Bellow is a table with the power and the unit digit of 3 to that power.
0 1
1 3
2 9
3 7
4 1
5 3
6 9
7 7
Using this table you can see that the unit digit can be 1, 3, 9, 7 and the sequence repeats in this order for higher powers of 3. Using this logic you can find that the unit digit of (3 power 2011) is 7. You can use the same algorithm for the general case.
Here's a trick that works for numbers that aren't a multiple of a factor of the base (for base 10, it can't be a multiple of 2 or 5.) Let's use base 3. What you're trying to find is 3^2011 mod 10. Find powers of 3, starting with 3^1, until you find one with the last digit 1. For 3, you get 3^4=81. Write the original power as (3^4)^502*3^3. Using modular arithmetic, (3^4)^502*3^3 is congruent to (has the same last digit as) 1^502*3^3. So 3^2011 and 3^3 have the same last digit, which is 7.
Here's some pseudocode to explain it in general. This finds the last digit of b^n in base B.
// Find the smallest power of b ending in 1.
i=1
while ((b^i % B) != 1) {
i++
}
// b^i has the last digit 1
a=n % i
// For some value of j, b^n == (b^i)^j * b^a, which is congruent to b^a
return b^a % B
You'd need to be careful to prevent an infinite loop, if no power of b ends in 1 (in base 10, multiples of 2 or 5 don't work.)
Find out the repeating set in this case, it is 3,9,7,1 and it repeats in the same order for ever....so divide 2011 by 4 which will give you a reminder 3. That is the 3rd element in the repeating set. This is the easiest way to find for any given no. say if asked for 3^31, then the reminder of 31/4 is 3 and so 7 is the unit digit. for 3^9, 9/4 is 1 and so the unit will be 3. 3^100, the unit will be 1.
If you have the number and exponent separate it's easy.
Let n1 is the number and n2 is the power. And ** represents power.
assume n1>0.
% means modulo division.
pseudo code will look like this
def last_digit(n1, n2)
if n2==0 then return 1 end
last = n1%10
mod = (n2%4).zero? ? 4 : (n2%4)
last_digit = (last**mod)%10
end
Explanation:
We need to consider only the last digit of the number because that determines the last digit of the power.
it's the maths property that count of possibility of each digits(0-9) power's last digit is at most 4.
1) Now if the exponent is zero we know the last digit would be 1.
2) Get the last digit by %10 on the number(n1)
3) %4 on the exponent(n2)- if the output is zero we have to consider that as 4 because n2 can't be zero. if %4 is non zero we have to consider %4 value.
4) now we have at most 9**4. This is easy for the computer to calculate.
take the %10 on that number. You have the last digit.

How can I take the modulus of two very large numbers?

I need an algorithm for A mod B with
A is a very big integer and it contains digit 1 only (ex: 1111, 1111111111111111)
B is a very big integer (ex: 1231, 1231231823127312918923)
Big, I mean 1000 digits.
To compute a number mod n, given a function to get quotient and remainder when dividing by (n+1), start by adding one to the number. Then, as long as the number is bigger than 'n', iterate:number = (number div (n+1)) + (number mod (n+1))Finally at the end, subtract one. An alternative to adding one at the beginning and subtracting one at the end is checking whether the result equals n and returning zero if so.
For example, given a function to divide by ten, one can compute 12345678 mod 9 thusly:
12345679 -> 1234567 + 9
1234576 -> 123457 + 6
123463 -> 12346 + 3
12349 -> 1234 + 9
1243 -> 124 + 3
127 -> 12 + 7
19 -> 1 + 9
10 -> 1
Subtract 1, and the result is zero.
1000 digits isn't really big, use any big integer library to get rather fast results.
If you really worry about performance, A can be written as 1111...1=(10n-1)/9 for some n, so computing A mod B can be reduced to computing ((10^n-1) mod (9*B)) / 9, and you can do that faster.
Try Montgomery reduction on how to find modulo on large numbers - http://en.wikipedia.org/wiki/Montgomery_reduction
1) Just find a language or package that does arbitrary precision arithmetic - in my case I'd try java.math.BigDecimal.
2) If you are doing this yourself, you can avoid having to do division by using doubling and subtraction. E.g. 10 mod 3 = 10 - 3 - 3 - 3 = 1 (repeatedly subtracting 3 until you can't any more) - which is incredibly slow, so double 3 until it is just smaller than 10 (e.g. to 6), subtract to leave 4, and repeat.

How to count each digit in a range of integers?

Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:
1 to 100
51 to 300
1 to 2,000 with zeros to the left
The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.
I wonder if there is a better way to solve this, without having to loop through the entire integers range.
Solutions in any language or pseudocode are welcome.
Edit:
Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten.
Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.
A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:
For digits 1 to 9, n*10(n-1) pieces are needed
For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed
For digit 0, if using leading zeros, n*10n-1 - n are needed
The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).
For example, if n = 6, range is 1 to 999,999:
For digits 1 to 9 we need 6*105 = 600,000 of each one
For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889
For digit 0, with leading zeros, we need 6*105 – 6 = 599,994
These numbers can be checked using High-Performance Mark results.
Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:
integer First,Last //First and last number in the range
integer Number //Current number in the loop
integer Power //Power is the n in 10^n in the formulas
integer Nines //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix //First digits in a number. For 14,200, prefix is 142
array 0..9 Digits //Will hold the count for all the digits
FOR Number = First TO Last
CALL TallyDigitsForOneNumber WITH Number,1 //Tally the count of each digit
//in the number, increment by 1
//Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
Power = Zeros at the end of number //For 1,000, Power = 3
IF Power > 0 //The number ends in 0 00 000 etc
Nines = 10^Power-1 //Nines = 10^3 - 1 = 1000 - 1 = 999
IF Number+Nines <= Last //If 1,000+999 < 8,000, add a full set
Digits[0-9] += Power*10^(Power-1) //Add 3*10^(3-1) = 300 to digits 0 to 9
Digits[0] -= -Power //Adjust digit 0 (leading zeros formula)
Prefix = First digits of Number //For 1000, prefix is 1
CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each
//digit in prefix,
//increment by 999
Number += Nines //Increment the loop counter 999 cycles
ENDIF
ENDIF
//End of optimization
ENDFOR
SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
REPEAT
Digits [ Number % 10 ] += Count
Number = Number / 10
UNTIL Number = 0
For example, for range 786 to 3,021, the counter will be incremented:
By 1 from 786 to 790 (5 cycles)
By 9 from 790 to 799 (1 cycle)
By 1 from 799 to 800
By 99 from 800 to 899
By 1 from 899 to 900
By 99 from 900 to 999
By 1 from 999 to 1000
By 999 from 1000 to 1999
By 1 from 1999 to 2000
By 999 from 2000 to 2999
By 1 from 2999 to 3000
By 1 from 3000 to 3010 (10 cycles)
By 9 from 3010 to 3019 (1 cycle)
By 1 from 3019 to 3021 (2 cycles)
Total: 28 cycles
Without optimization: 2,235 cycles
Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:
If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.
Benchmark and Source code
I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:
Original 104.78 seconds
With %10 83.66
With Powers of Ten 0.07
A screenshot of the benchmark application:
(source: clarion.sca.mx)
If you would like to see the full source code or run the benchmark, use these links:
Complete Source code (in Clarion): http://sca.mx/ftp/countdigits.txt
Compilable project and win32 exe: http://sca.mx/ftp/countdigits.zip
Accepted answer
noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.
Aaronaught solution seems to be correct, but the code is just too complex for my taste.
I accepted strainer’s answer, because his line of thought guided me to develop this new solution.
There's a clear mathematical solution to a problem like this. Let's assume the value is zero-padded to the maximum number of digits (it's not, but we'll compensate for that later), and reason through it:
From 0-9, each digit occurs once
From 0-99, each digit occurs 20 times (10x in position 1 and 10x in position 2)
From 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)
The obvious pattern for any given digit, if the range is from 0 to a power of 10, is N * 10N-1, where N is the power of 10.
What if the range is not a power of 10? Start with the lowest power of 10, then work up. The easiest case to deal with is a maximum like 399. We know that for each multiple of 100, each digit occurs at least 20 times, but we have to compensate for the number of times it appears in the most-significant-digit position, which is going to be exactly 100 for digits 0-3, and exactly zero for all other digits. Specifically, the extra amount to add is 10N for the relevant digits.
Putting this into a formula, for upper bounds that are 1 less than some multiple of a power of 10 (i.e. 399, 6999, etc.) it becomes: M * N * 10N-1 + iif(d <= M, 10N, 0)
Now you just have to deal with the remainder (which we'll call R). Take 445 as an example. This is whatever the result is for 399, plus the range 400-445. In this range, the MSD occurs R more times, and all digits (including the MSD) also occur at the same frequencies they would from range [0 - R].
Now we just have to compensate for the leading zeros. This pattern is easy - it's just:
10N + 10N-1 + 10N-2 + ... + **100
Update: This version correctly takes into account "padding zeros", i.e. the zeros in middle positions when dealing with the remainder ([400, 401, 402, ...]). Figuring out the padding zeros is a bit ugly, but the revised code (C-style pseudocode) handles it:
function countdigits(int d, int low, int high) {
return countdigits(d, low, high, false);
}
function countdigits(int d, int low, int high, bool inner) {
if (high == 0)
return (d == 0) ? 1 : 0;
if (low > 0)
return countdigits(d, 0, high) - countdigits(d, 0, low);
int n = floor(log10(high));
int m = floor((high + 1) / pow(10, n));
int r = high - m * pow(10, n);
return
(max(m, 1) * n * pow(10, n-1)) + // (1)
((d < m) ? pow(10, n) : 0) + // (2)
(((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) + // (3)
(((r >= 0) && (d == m)) ? (r + 1) : 0) + // (4)
(((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) - // (5)
(((d == 0) && !inner) ? countleadingzeros(n) : 0); // (6)
}
function countleadingzeros(int n) {
int tmp= 0;
do{
tmp= pow(10, n)+tmp;
--n;
}while(n>0);
return tmp;
}
function countpaddingzeros(int n, int r) {
return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}
As you can see, it's gotten a bit uglier but it still runs in O(log n) time, so if you need to handle numbers in the billions, this will still give you instant results. :-) And if you run it on the range [0 - 1000000], you get the exact same distribution as the one posted by High-Performance Mark, so I'm almost positive that it's correct.
FYI, the reason for the inner variable is that the leading-zero function is already recursive, so it can only be counted in the first execution of countdigits.
Update 2: In case the code is hard to read, here's a reference for what each line of the countdigits return statement means (I tried inline comments but they made the code even harder to read):
Frequency of any digit up to highest power of 10 (0-99, etc.)
Frequency of MSD above any multiple of highest power of 10 (100-399)
Frequency of any digits in remainder (400-445, R = 45)
Additional frequency of MSD in remainder
Count zeros in middle position for remainder range (404, 405...)
Subtract leading zeros only once (on outermost loop)
I'm assuming you want a solution where the numbers are in a range, and you have the starting and ending number. Imagine starting with the start number and counting up until you reach the end number - it would work, but it would be slow. I think the trick to a fast algorithm is to realize that in order to go up one digit in the 10^x place and keep everything else the same, you need to use all of the digits before it 10^x times plus all digits 0-9 10^(x-1) times. (Except that your counting may have involved a carry past the x-th digit - I correct for this below.)
Here's an example. Say you're counting from 523 to 1004.
First, you count from 523 to 524. This uses the digits 5, 2, and 4 once each.
Second, count from 524 to 604. The rightmost digit does 6 cycles through all of the digits, so you need 6 copies of each digit. The second digit goes through digits 2 through 0, 10 times each. The third digit is 6 5 times and 5 100-24 times.
Third, count from 604 to 1004. The rightmost digit does 40 cycles, so add 40 copies of each digit. The second from right digit doers 4 cycles, so add 4 copies of each digit. The leftmost digit does 100 each of 7, 8, and 9, plus 5 of 0 and 100 - 5 of 6. The last digit is 1 5 times.
To speed up the last bit, look at the part about the rightmost two places. It uses each digit 10 + 1 times. In general, 1 + 10 + ... + 10^n = (10^(n+1) - 1)/9, which we can use to speed up counting even more.
My algorithm is to count up from the start number to the end number (using base-10 counting), but use the fact above to do it quickly. You iterate through the digits of the starting number from least to most significant, and at each place you count up so that that digit is the same as the one in the ending number. At each point, n is the number of up-counts you need to do before you get to a carry, and m the number you need to do afterwards.
Now let's assume pseudocode counts as a language. Here, then, is what I would do:
convert start and end numbers to digit arrays start[] and end[]
create an array counts[] with 10 elements which stores the number of copies of
each digit that you need
iterate through start number from right to left. at the i-th digit,
let d be the number of digits you must count up to get from this digit
to the i-th digit in the ending number. (i.e. subtract the equivalent
digits mod 10)
add d * (10^i - 1)/9 to each entry in count.
let m be the numerical value of all the digits to the right of this digit,
n be 10^i - m.
for each digit e from the left of the starting number up to and including the
i-th digit, add n to the count for that digit.
for j in 1 to d
increment the i-th digit by one, including doing any carries
for each digit e from the left of the starting number up to and including
the i-th digit, add 10^i to the count for that digit
for each digit e from the left of the starting number up to and including the
i-th digit, add m to the count for that digit.
set the i-th digit of the starting number to be the i-th digit of the ending
number.
Oh, and since the value of i increases by one each time, keep track of your old 10^i and just multiply it by 10 to get the new one, instead of exponentiating each time.
To reel of the digits from a number, we'd only ever need to do a costly string conversion if we couldnt do a mod, digits can most quickly be pushed of a number like this:
feed=number;
do
{ digit=feed%10;
feed/=10;
//use digit... eg. digitTally[digit]++;
}
while(feed>0)
that loop should be very fast and can just be placed inside a loop of the start to end numbers for the simplest way to tally the digits.
To go faster, for larger range of numbers, im looking for an optimised method of tallying all digits from 0 to number*10^significance
(from a start to end bazzogles me)
here is a table showing digit tallies of some single significant digits..
these are inclusive of 0, but not the top value itself, -that was an oversight
but its maybe a bit easier to see patterns (having the top values digits absent here)
These tallies dont include trailing zeros,
1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000 6000
0 1 1 10 190 2890 1 2 3 4 6 9 30 110 490 1690
1 0 1 20 300 4000 1 12 13 14 16 19 140 220 1600 2800
2 0 1 20 300 4000 0 2 13 14 16 19 40 220 600 2800
3 0 1 20 300 4000 0 2 3 14 16 19 40 220 600 2800
4 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
5 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
6 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
7 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
8 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
9 0 1 20 300 4000 0 2 3 4 6 9 40 120 600 1800
edit: clearing up my origonal
thoughts:
from the brute force table showing
tallies from 0 (included) to
poweroTen(notinc) it is visible that
a majordigit of tenpower:
increments tally[0 to 9] by md*tp*10^(tp-1)
increments tally[1 to md-1] by 10^tp
decrements tally[0] by (10^tp - 10)
(to remove leading 0s if tp>leadingzeros)
can increment tally[moresignificantdigits] by self(md*10^tp)
(to complete an effect)
if these tally adjustments were applied for each significant digit,
the tally should be modified as though counted from 0 to end-1
the adjustments can be inverted to remove preceeding range (start number)
Thanks Aaronaught for your complete and tested answer.
Here's a very bad answer, I'm ashamed to post it. I asked Mathematica to tally the digits used in all numbers from 1 to 1,000,000, no leading 0s. Here's what I got:
0 488895
1 600001
2 600000
3 600000
4 600000
5 600000
6 600000
7 600000
8 600000
9 600000
Next time you're ordering sticky digits for selling in your hardware store, order in these proportions, you won't be far wrong.
I asked this question on Math Overflow, and got spanked for asking such a simple question. One of the users took pity on me and said if I posted it to The Art of Problem Solving, he would answer it; so I did.
Here is the answer he posted:
http://www.artofproblemsolving.com/Forum/viewtopic.php?p=1741600#1741600
Embarrassingly, my math-fu is inadequate to understand what he posted (the guy is 19 years old...that is so depressing). I really need to take some math classes.
On the bright side, the equation is recursive, so it should be a simple matter to turn it into a recursive function with a few lines of code, by someone who understands the math.
I know this question has an accepted answer but I was tasked with writing this code for a job interview and I think I came up with an alternative solution that is fast, requires no loops and can use or discard leading zeroes as required.
It is in fact quite simple but not easy to explain.
If you list out the first n numbers
1
2
3
.
.
.
9
10
11
It is usual to start counting the digits required from the start room number to the end room number in a left to right fashion, so for the above we have one 1, one 2, one 3 ... one 9, two 1's one zero, four 1's etc. Most solutions I have seen used this approach with some optimisation to speed it up.
What I did was to count vertically in columns, as in hundreds, tens, and units. You know the highest room number so we can calculate how many of each digit there are in the hundreds column via a single division, then recurse and calculate how many in the tens column etc. Then we can subtract the leading zeros if we like.
Easier to visualize if you use Excel to write out the numbers but use a separate column for each digit of the number
A B C
- - -
0 0 1 (assuming room numbers do not start at zero)
0 0 2
0 0 3
.
.
.
3 6 4
3 6 5
.
.
.
6 6 9
6 7 0
6 7 1
^
sum in columns not rows
So if the highest room number is 671 the hundreds column will have 100 zeroes vertically, followed by 100 ones and so on up to 71 sixes, ignore 100 of the zeroes if required as we know these are all leading.
Then recurse down to the tens and perform the same operation, we know there will be 10 zeroes followed by 10 ones etc, repeated six times, then the final time down to 2 sevens. Again can ignore the first 10 zeroes as we know they are leading. Finally of course do the units, ignoring the first zero as required.
So there are no loops everything is calculated with division. I use recursion for travelling "up" the columns until the max one is reached (in this case hundreds) and then back down totalling as it goes.
I wrote this in C# and can post code if anyone interested, haven't done any benchmark timings but it is essentially instant for values up to 10^18 rooms.
Could not find this approach mentioned here or elsewhere so thought it might be useful for someone.
Your approach is fine. I'm not sure why you would ever need anything faster than what you've described.
Or, this would give you an instantaneous solution: Before you actually need it, calculate what you would need from 1 to some maximum number. You can store the numbers needed at each step. If you have a range like your second example, it would be what's needed for 1 to 300, minus what's needed for 1 to 50.
Now you have a lookup table that can be called at will. Doing up to 10,000 would only take a few MB and, what, a few minutes to compute, once?
This doesn't answer your exact question, but it's interesting to note the distribution of first digits according to Benford's Law. For example, if you choose a set of numbers at random, 30% of them will start with "1", which is somewhat counter-intuitive.
I don't know of any distributions describing subsequent digits, but you might be able to determine this empirically and come up with a simple formula for computing an approximate number of digits required for any range of numbers.
If "better" means "clearer," then I doubt it. If it means "faster," then yes, but I wouldn't use a faster algorithm in place of a clearer one without a compelling need.
#!/usr/bin/ruby1.8
def digits_for_range(min, max, leading_zeros)
bins = [0] * 10
format = [
'%',
('0' if leading_zeros),
max.to_s.size,
'd',
].compact.join
(min..max).each do |i|
s = format % i
for digit in s.scan(/./)
bins[digit.to_i] +=1 unless digit == ' '
end
end
bins
end
p digits_for_range(1, 49, false)
# => [4, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 49, true)
# => [13, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 10000, false)
# => [2893, 4001, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]
Ruby 1.8, a language known to be "dog slow," runs the above code in 0.135 seconds. That includes loading the interpreter. Don't give up an obvious algorithm unless you need more speed.
If you need raw speed over many iterations, try a lookup table:
Build an array with 2 dimensions: 10 x max-house-number
int nDigits[10000][10] ; // Don't try this on the stack, kids!
Fill each row with the count of digits required to get to that number from zero.
Hint: Use the previous row as a start:
n=0..9999:
if (n>0) nDigits[n] = nDigits[n-1]
d=0..9:
nDigits[n][d] += countOccurrencesOf(n,d) //
Number of digits "between" two numbers becomes simple subtraction.
For range=51 to 300, take the counts for 300 and subtract the counts for 50.
0's = nDigits[300][0] - nDigits[50][0]
1's = nDigits[300][1] - nDigits[50][1]
2's = nDigits[300][2] - nDigits[50][2]
3's = nDigits[300][3] - nDigits[50][3]
etc.
You can separate each digit (look here for a example), create a histogram with entries from 0..9 (which will count how many digits appeared in a number) and multiply by the number of 'numbers' asked.
But if isn't what you are looking for, can you give a better example?
Edited:
Now I think I got the problem. I think you can reckon this (pseudo C):
int histogram[10];
memset(histogram, 0, sizeof(histogram));
for(i = startNumber; i <= endNumber; ++i)
{
array = separateDigits(i);
for(j = 0; k < array.length; ++j)
{
histogram[k]++;
}
}
Separate digits implements the function in the link.
Each position of the histogram will have the amount of each digit. For example
histogram[0] == total of zeros
histogram[1] == total of ones
...
Regards

Resources