sql periodic function - oracle

There is an SQL function with date as argument
f(p_date) = mod(to_char(p_date,'mm')+1,2)*39 + to_char(p_date,'dd')
The values of f(p_date) repeat themselves with a peroid of 2 months, i.e.
f(Feb 7th) = 46
f(Feb 8th) = 47
...
f(Apr 7th) = 46
...
f(Jun 7th) = 46
...
I don't catch a pattern here. Why is the multiplier equal to 39? Where do the 2 months come from?
What I need, is eventually same sort of function, but with a period of 40 days (or 1.5 months):
f(Feb 7th) = 46
..
f(Mar 19th) = 46
..
f(Apr 28th) = 46, etc
Thanks for any help.

Why is the multiplier equal to 39?
The modulo expression will evaluate to 0 for odd months and 1 for even months. This multiplied by 39 is either 0 or 39. Added the day, the function will return the day for odd months, and 39+day for even months.
Thus,
odd (january)
1, 2, 3, ..., last-of-month
even (february)
40, 41, 42, ... 39+last-of-month
Where do the 2 months come from?
The 2 is the argument of the modulus function (its divisor). The modulus function will return the sequence 1, 0, 1, 0, 1 ... for the input 1, 2, 3, 4, 5, ... and so on. Mathematically the remainder. It is used to create the odd/even periodicity.

#AlexeyKryuchkov, can you give more background about what you're trying to achieve and why?
1.5 months does not map to 40 days (or to any fixed number of days).
If you're trying to define a "40-day month", the easiest solution is to convert a date into an absolute day, then mod by 40.
I wrote a Q&A recently about the complexity of working with calendars: https://stackoverflow.com/a/48611348/9129668.
And adapting some of the code in that answer (which is based on SQL Server, not Oracle), the function you may be looking for would be something like:
((((DATEDIFF(DD, CONVERT(DATETIME2(0),'0001-01-01',102), p_date) + 1) - 1) % 40) + 1) AS day_of_40_day_mth
But if you give me a bit more explanation, I might be able to be more specific.

Related

DateTime subtraction in ruby 2?

I need to subtract two DateTime objects in order to find out the difference in hours between them.
I try to do the following:
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
puts a - b
I get (-1/1), the object of class Rational.
So, the question is, how do I find out what the difference betweent the two dates is? In hours or days, or whatever.
And what does this Rational mean/represent when I subtract DateTimes just like that?
BTW:
When I try to subtract DateTime's with the difference of 1 year, I get (366/1), so when I do (366/1).to_i, I get the number of days. But when I tried subtracting two DateTime's with the difference of 1 hour, it gave me -1, the number of hours. So, how do I also find out the meaning of the returned value (hours, days, years, seconds)?
When you substract two datetimes, you'll get the difference in days, not hours.
You get a Rational type for the precision (some float numbers cannot be expressed exactly with computers)
To get a number of hours, multiply the result by 24, for minutes multiply by 24*60 etc...
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
(a - b).to_i
# days
# => -1
((a - b)* 24).to_i
# hours
# => -24
# ...
Here's a link to the official doc
If you do subtraction on them as a Time object it will return the result in seconds and then you can multiply accordingly to get minutes/hours/days/whatever.
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
diff = b.to_time - a.to_time # 86400
hours = diff / 60 / 60 # 24

Ruby project Euler #12 efficiency

I'm attempting to do problem #12 on project euler (see quote below)
The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7
= 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1 3: 1,3 6: 1,2,3,6 10: 1,2,5,10 15: 1,3,5,15 21: 1,3,7,21 28:
1,2,4,7,14,28 We can see that 28 is the first triangle number to have
over five divisors.
What is the value of the first triangle number to have over five hundred divisors?
I have written out what I "think" is a valid solution in Ruby, however the runtime is incredibly slow. See code below:
def num_divisors_of(num)
sum = 0
for i in 1..num/2 do
if num % i == 0 then sum += 1 end
end
return sum += 1
end
currentSum = 0
for i in 1..10000 do
currentSum += i
if num_divisors_of(currentSum) > 500
puts currentSum
break
end
end
Basically, I start at 1, add it to a running total, and check the number of divisors of that total. If the number is over 500, I stop and return the number.
I'm wondering if there's another way to look at this that I haven't thought of yet? I've thought of finding prime factors (I'm pretty sure my method to find the number of divisors of a number is what's bogging me down), but otherwise I really have no clue where to make it more efficient.
Any thoughts/ideas?
EDIT: Okay, I found a way to save some runtime. When searching for divisors, I look up until the square root of the number, and add 2 every time I find a divisor (ex: lets say for divisors of 625, I find 5. 5 * 125 = 625, so those are 2 divisors). Next, if the square root IS an exact divisor, then remove 1 (as I'll have counted it twice. For example, 25 * 25 = 625, but thats just 1 divisor).
Really sped up my runtime, and I got the answer. Woohoo!
Look at this answer:
All factors of a given number
Then you just count the number of elements in the array until you find one with more than 500 divisors.

How to check if a given number is of the form x^y?

I'm preparing for my interviews and came across this question:
Write a program to check if a number n is of x^y form. It is known that n, x and y are integers and that x and y are greater than 2.
I thought of taking log and stuff but couldn't certainly figure out how to check if the number is of the form. Could any of you please help? :)
"Taking the log and stuff" is the way to go. Note that N > 1 is never a^b for integer a and b > log_2(N). So you can check floor(N^(1/b))^b = N for each integer b between 2 and log_2(N). You have to do about log(N) many exponentiations, each of which produces a number at most the size of N.
This is far faster than #dasblinkenlight's solution, which requires you to factor N first. (No polynomial-time algorithm---that is, polynomial in the number of bits in N, is known for integer factorisation. However, integer exponentiation with a small exponent can be done in polynomial time.)
One way to solve this would be to factorize n, count the individual factors, and find the greatest common denominator of the counts. If GCD is 1, the answer is "no". Otherwise, the answer is "yes".
Here are some examples:
7, prime factor 7 (one time). We have one factor repeated once. Answer "no", because the GCD is 1.
8, prime factors 2 (3 times). We have one factor with the count of three. Answer "yes", because GCD is 3.
144, prime factors 2 (4 times) 3 (2 times). GCD of 4 and 2 is 2, so the answer is "yes".
72, prime factors 2 (3 times) 3 (2 times). GCD of 3 and 2 is 1, so the answer is "no".
There are a lot of good answers, but I see modulo arithmetics is still missing.
Depending on the magnitude of the numbers to check, it might be useful to classify them by their last bits. We can easily create a table with possible candidates.
To show how it works, let us create such a table for 4 last bits. In that case we have 16 cases to consider:
0^2, 0^3, ... : 0 mod 16
1^2, 1^3, ... : 1 mod 16
2^2, 2^3, ... : 0, 4, 8 mod 16
3^2, 3^3, ... : 9, 11, 1, 3 mod 16
4^2, 4^3, ... : 0 mod 16
5^2, 5^3, ... : 9, 13, 1, 5 mod 16
6^2, 6^3, ... : 4, 8, 0 mod 16
7^2, 7^3, ... : 1, 7 mod 16
8^2, 8^3, ... : 0 mod 16
9^2, 9^3, ... : 9, 1 mod 16
10^2,10^3, ... : 4, 8, 0 mod 16
11^2,11^3, ... : 9, 3, 1, 11 mod 16
12^2,12^3, ... : 0 mod 16
13^2,13^3, ... : 9, 5, 1, 13 mod 16
14^2,14^3, ... : 4, 8, 0 mod 16
15^2,15^3, ... : 1, 15 mod 16
The table is more useful the other way round; which bases x are possible for a given number n = x^y.
0: 0, 2, 4, 6, 8, 10, 12, 14 mod 16
1: 1, 3, 5, 7, 9, 11, 13, 15
2: -
3: 3, 11
4: 2, 6, 10, 14
5: 5, 13
6: -
7: 7
8: 2, 6, 10, 14
9: 3, 5, 9, 11, 13
10: -
11: 3, 11
12: -
13: 5, 13
14: -
15: 15
So, just by looking at the four last bits over one quarter of numbers can be discarded immediately.
If we take number 13726423, its remainder by 16 is 7, and thus if it is of the form we are interested in, it must be (16 n+7)^y.
For most numbers the number of divisors to try is quite limited. In practice, the table could me much larger, e.g., 16 bits.
A simple optimization with binary numbers is to remove the trailing zeros. This makes it unnecessary to worry about even numbers, and y must be a factor of the number of the zeros removed.
If we still have too much work, we can create another modulo table. The other could be, e.g. modulo 15. The equivalent table looks like this:
0: 0
1: 1, 2, 4, 7, 8, 11, 13, 14
2: 2, 8
3: 3, 12
4: 2, 4, 7, 8, 13
5: 5
6: 3, 6, 9, 12
7: 7, 13
8: 2, 8
9: 3, 9, 12
10: 5, 10
11: 11
12: 3, 12
13: 7, 13
14: 14
As our number from the previous example (13726423) is 13 modulo 15, then x = (15 m +7) or (15 m +13). As there are no common factors in 15 and 16, the valid numbers are 240 p + 7 and 240 p + 103. By two integer divisions and two table lookups we have managed to limit the possible values of x to 1/120 of numbers.
If the tables are largish, the number of possible x s is easy to limit to a very low number. For example, with tables of 65536 and 65535 elements the cycle is 4294901760, so for any number below approximately 1.6 x 10^19 the two tables give a short unique list of possible values of x.
If you can factor n, then it is easy to find an answer by examining the multiplicities of the factors. But the usual use for determining if a number is a perfect power is as a preliminary test for some factoring algorithms, in which case it is not realistic to find the factors of n.
The trick to determining if a number is a perfect power is to know that, if the number is a perfect power, then the exponent e must be less than log2 n, because if e is greater then 2e will be greater than n. Further, it is only necessary to test prime es, because if a number is a perfect power to a composite exponent it will also be a perfect power to the prime factors of the composite component; for instance, 215 = 32768 = 323 = 85 is a perfect cube root and also a perfect fifth root. Here is pseudocode for a function that returns b if there is some exponent e such that be = n or 0 if there is not; the function root(e,n) returns the e-th root of n:
function perfectPower(n)
for p in primes(log2(n))
b = floor(root(p,n))
if b**p == n return b
return 0
I discuss this function at my blog.
Alternatively, if factorization is too hard, you can exploit your maths library and try many values of x or y until you find one that works.
Trying for y will be less work, if you have an operation "y-th root of n" available (it could be masquerading under the name of "x to the power of 1/y"). Just try all integer values of y larger than 2 until either you find one that gives an integer answer, or the result drops below 2. If n is a standard 32-bit integer, then it will take no more than 32 attempts (and, more generally, if n is a m-bit integer, then it will take no more than m attempts).
If you do not have "y-th root of n" available, you can try all x's with the operation "log base x of n", until you get an integer answer or the result drops below 2. This will take more work since you need to check all values up until square root of x. I think it should be possible to optimize this somehow and "home in" on potential integer results.
The exponent y is easily bounded 2 ≤ y ≤ log_2(n) . Test each y in that range. If it exists, x will be the integer yth root of n.
The point is while x determines y and vice versa, the search space for y is much smaller, so you should search y rather than x (which could be as large as sqrt(n)).

Amount of "jumping" numbers from 101 to 10^60? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Let's say number is "ascending" if its digits are going in ascending order. Example: 1223469. Digits of "descending" number go in descending order. Example: 9844300. Numbers that are not "ascending" or "descending", are called "jumping". Numbers from 1 to 100 are not "jumping". How many "jumping" numbers are there from 101 to 10^60?
Here is an idea: instead of counting the jumping numbers, count the ascending and descending ones. Then subtract them from all the numbers.
Counting the ascending/descending ones should be easy - you can use a dynamic programming based on the number of digits left to generate, and the digit you have placed in the last position.
I'll describe how to count the ascending numbers, because that's easier. Going from that, you could also count the descending ones and then subtract the combined amount from the total amount of numbers, compensating for duplicates, as indicated by Ivan, or devise a more complex way to only count jumping numbers directly.
A different approach
Think about the numbers sorted by ending digit. We start with numbers that are 1 digit long, this will be our list
1 // Amount of numbers ending with 1
1 // Amount of numbers ending with 2
1 // Amount of numbers ending with 3
1 // Amount of numbers ending with 4
1 // Amount of numbers ending with 5
1 // Amount of numbers ending with 6
1 // Amount of numbers ending with 7
1 // Amount of numbers ending with 8
1 // Amount of numbers ending with 9
To construct numbers with two digits ending with 6, we can use all numbers ending with 6 or less
1 // Amount of numbers ending with 1 with 2 digits
2 // Amount of numbers ending with 2 with 2 digits
3 // Amount of numbers ending with 3 with 2 digits
4 // Amount of numbers ending with 4 with 2 digits
5 // Amount of numbers ending with 5 with 2 digits
6 // Amount of numbers ending with 6 with 2 digits
7 // Amount of numbers ending with 7 with 2 digits
8 // Amount of numbers ending with 8 with 2 digits
9 // Amount of numbers ending with 9 with 2 digits
Writing these side by side, can see how to calculate the new values very quickly:
y a // y, a, and x have been computed previously
x (a + x)
1 1 1 1
1 2 3 4
1 3 6 10
1 4 10 20
1 5 15 35
1 6 21 56
1 7 28 84
1 8 36 120
1 9 45 165
A simple Python program
Iterating over one such column, we can directly produce all values of the new column, if we always remember the last computation. The scan() function abstracts away exactly that behavior of taking one element, and do some computation with it and the last result.
def scan(f, state, it):
for x in it:
state = f(state, x)
yield state
Producing the next column is now as simple as:
new_column = list(scan(operator.add, 0, column))
To make it simple, we use single digit numbers as starting point:
first_row = [1]*9
Seeing that we always need to feed back the new row to the function, can use scan again to do just that:
def next_row(row):
return list(scan(operator.add, 0, column))
def next_row_wrapper(row, _):
return next_row(row)
>>> [list(x) for x in scan(next_row_wrapper, [1]*9, range(3))] # 3 iterations
[[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 3, 6, 10, 15, 21, 28, 36, 45], [1, 4, 10, 20, 35, 56, 84, 120, 165]]
As you can see, this gives the first three row apart from the first one.
Since we want to know the sum, of all numbers, we can do just that. When we do 1 iteration, we get all ascending numbers until 10^2, so we need to do 59 iterations for all numbers until 10^60:
>>> sum(sum(x) for x in scan(lambda x, _: next_row(x), [1]*9, range(59))) + 10
56672074888L
For the descending numbers, it's quite similar:
>>> sum(sum(x) for x in scan(lambda x, _: next_row(x), [1]*10, range(59))) + 10 - 58
396704524157L<
Old approach
Think about how the numbers end:
From 10 to 99, we have two digits per number.
There are
1 that ends in 1
2 that end in 2
3 that end in 3
4 that end in 4
5 that end in 5
6 that end in 6
7 that end in 7
8 that end in 8
9 that end in 9
All of these numbers act as prefixes for numbers from 100 to 999.
An example, there are three numbers that end in 3:
13
23
33
For each of these three numbers, we can create seven ascending numbers:
133
134
135
136
137
138
139
It is easy to see, that this adds three numbers for each of the seven possible ending digits.
If we wanted to extend numbers ending on 4, the process would be similar: Currently, there are 4 numbers ending on 4. Thus, for each such number, we can create 6 new ascending numbers. That means, that there will be an additional 4 for all of the six possible ending digits.
If you have understood everything I've written here, it should be easy to generalize that and implement an algorithm to count all those numbers.
Non-jumping numbers:
69 choose 9 (ascending numbers of size ≤ 60)
+ 70 choose 10 - 60 (descending numbers of size ≤ 60)
- 60 * 9 (double count: all digits the same)
- 1 (double count: zero)
= 453376598563
(To get jumping numbers, subtract from total numbers: 1060)
Simple python program to compute the number:
# I know Python doesn't do tail call elimination, but it's a good habit.
def choose(n, k, num=1, denom=1):
return num/denom if k == 0 else choose(n-1, k-1, num*n, denom*k)
def f(digits, base=10):
return choose(digits+base-1, base-1) + choose(digits+base, base) - digits*base - 1
Ascending numbers: select 9 positions to increment the digit, starting with 0.
Descending numbers: pretend we have a digit 10 which is used to left-pad the number. Then select 10 positions to decrement the digit, starting with 10. Then remove all the choices where the 10 selected positions are consecutive and not at the end, which would correspond to digit sequences with a leading 0.
Since all numbers whose digits are all the same will be produced by both descending and ascending algorithms, we have to subtract them.
Note that all of these algorithms consider the number 0 to be written with no digits at all. Also, all numbers ≤ 100 are either ascending or descending (or both), so there's no need to worry about them.
Do you count 321 as descending or do you count 000000321 as jumping?
Hint for the answer: the number of ascending numbers with 59 digits will be something like (69 choose 10) because you have to choose which points in the number are between differing digits.

How to count each digit in a range of integers?

Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:
1 to 100
51 to 300
1 to 2,000 with zeros to the left
The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.
I wonder if there is a better way to solve this, without having to loop through the entire integers range.
Solutions in any language or pseudocode are welcome.
Edit:
Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten.
Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.
A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:
For digits 1 to 9, n*10(n-1) pieces are needed
For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed
For digit 0, if using leading zeros, n*10n-1 - n are needed
The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).
For example, if n = 6, range is 1 to 999,999:
For digits 1 to 9 we need 6*105 = 600,000 of each one
For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889
For digit 0, with leading zeros, we need 6*105 – 6 = 599,994
These numbers can be checked using High-Performance Mark results.
Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:
integer First,Last //First and last number in the range
integer Number //Current number in the loop
integer Power //Power is the n in 10^n in the formulas
integer Nines //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix //First digits in a number. For 14,200, prefix is 142
array 0..9 Digits //Will hold the count for all the digits
FOR Number = First TO Last
CALL TallyDigitsForOneNumber WITH Number,1 //Tally the count of each digit
//in the number, increment by 1
//Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
Power = Zeros at the end of number //For 1,000, Power = 3
IF Power > 0 //The number ends in 0 00 000 etc
Nines = 10^Power-1 //Nines = 10^3 - 1 = 1000 - 1 = 999
IF Number+Nines <= Last //If 1,000+999 < 8,000, add a full set
Digits[0-9] += Power*10^(Power-1) //Add 3*10^(3-1) = 300 to digits 0 to 9
Digits[0] -= -Power //Adjust digit 0 (leading zeros formula)
Prefix = First digits of Number //For 1000, prefix is 1
CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each
//digit in prefix,
//increment by 999
Number += Nines //Increment the loop counter 999 cycles
ENDIF
ENDIF
//End of optimization
ENDFOR
SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
REPEAT
Digits [ Number % 10 ] += Count
Number = Number / 10
UNTIL Number = 0
For example, for range 786 to 3,021, the counter will be incremented:
By 1 from 786 to 790 (5 cycles)
By 9 from 790 to 799 (1 cycle)
By 1 from 799 to 800
By 99 from 800 to 899
By 1 from 899 to 900
By 99 from 900 to 999
By 1 from 999 to 1000
By 999 from 1000 to 1999
By 1 from 1999 to 2000
By 999 from 2000 to 2999
By 1 from 2999 to 3000
By 1 from 3000 to 3010 (10 cycles)
By 9 from 3010 to 3019 (1 cycle)
By 1 from 3019 to 3021 (2 cycles)
Total: 28 cycles
Without optimization: 2,235 cycles
Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:
If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.
Benchmark and Source code
I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:
Original 104.78 seconds
With %10 83.66
With Powers of Ten 0.07
A screenshot of the benchmark application:
(source: clarion.sca.mx)
If you would like to see the full source code or run the benchmark, use these links:
Complete Source code (in Clarion): http://sca.mx/ftp/countdigits.txt
Compilable project and win32 exe: http://sca.mx/ftp/countdigits.zip
Accepted answer
noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.
Aaronaught solution seems to be correct, but the code is just too complex for my taste.
I accepted strainer’s answer, because his line of thought guided me to develop this new solution.
There's a clear mathematical solution to a problem like this. Let's assume the value is zero-padded to the maximum number of digits (it's not, but we'll compensate for that later), and reason through it:
From 0-9, each digit occurs once
From 0-99, each digit occurs 20 times (10x in position 1 and 10x in position 2)
From 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)
The obvious pattern for any given digit, if the range is from 0 to a power of 10, is N * 10N-1, where N is the power of 10.
What if the range is not a power of 10? Start with the lowest power of 10, then work up. The easiest case to deal with is a maximum like 399. We know that for each multiple of 100, each digit occurs at least 20 times, but we have to compensate for the number of times it appears in the most-significant-digit position, which is going to be exactly 100 for digits 0-3, and exactly zero for all other digits. Specifically, the extra amount to add is 10N for the relevant digits.
Putting this into a formula, for upper bounds that are 1 less than some multiple of a power of 10 (i.e. 399, 6999, etc.) it becomes: M * N * 10N-1 + iif(d <= M, 10N, 0)
Now you just have to deal with the remainder (which we'll call R). Take 445 as an example. This is whatever the result is for 399, plus the range 400-445. In this range, the MSD occurs R more times, and all digits (including the MSD) also occur at the same frequencies they would from range [0 - R].
Now we just have to compensate for the leading zeros. This pattern is easy - it's just:
10N + 10N-1 + 10N-2 + ... + **100
Update: This version correctly takes into account "padding zeros", i.e. the zeros in middle positions when dealing with the remainder ([400, 401, 402, ...]). Figuring out the padding zeros is a bit ugly, but the revised code (C-style pseudocode) handles it:
function countdigits(int d, int low, int high) {
return countdigits(d, low, high, false);
}
function countdigits(int d, int low, int high, bool inner) {
if (high == 0)
return (d == 0) ? 1 : 0;
if (low > 0)
return countdigits(d, 0, high) - countdigits(d, 0, low);
int n = floor(log10(high));
int m = floor((high + 1) / pow(10, n));
int r = high - m * pow(10, n);
return
(max(m, 1) * n * pow(10, n-1)) + // (1)
((d < m) ? pow(10, n) : 0) + // (2)
(((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) + // (3)
(((r >= 0) && (d == m)) ? (r + 1) : 0) + // (4)
(((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) - // (5)
(((d == 0) && !inner) ? countleadingzeros(n) : 0); // (6)
}
function countleadingzeros(int n) {
int tmp= 0;
do{
tmp= pow(10, n)+tmp;
--n;
}while(n>0);
return tmp;
}
function countpaddingzeros(int n, int r) {
return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}
As you can see, it's gotten a bit uglier but it still runs in O(log n) time, so if you need to handle numbers in the billions, this will still give you instant results. :-) And if you run it on the range [0 - 1000000], you get the exact same distribution as the one posted by High-Performance Mark, so I'm almost positive that it's correct.
FYI, the reason for the inner variable is that the leading-zero function is already recursive, so it can only be counted in the first execution of countdigits.
Update 2: In case the code is hard to read, here's a reference for what each line of the countdigits return statement means (I tried inline comments but they made the code even harder to read):
Frequency of any digit up to highest power of 10 (0-99, etc.)
Frequency of MSD above any multiple of highest power of 10 (100-399)
Frequency of any digits in remainder (400-445, R = 45)
Additional frequency of MSD in remainder
Count zeros in middle position for remainder range (404, 405...)
Subtract leading zeros only once (on outermost loop)
I'm assuming you want a solution where the numbers are in a range, and you have the starting and ending number. Imagine starting with the start number and counting up until you reach the end number - it would work, but it would be slow. I think the trick to a fast algorithm is to realize that in order to go up one digit in the 10^x place and keep everything else the same, you need to use all of the digits before it 10^x times plus all digits 0-9 10^(x-1) times. (Except that your counting may have involved a carry past the x-th digit - I correct for this below.)
Here's an example. Say you're counting from 523 to 1004.
First, you count from 523 to 524. This uses the digits 5, 2, and 4 once each.
Second, count from 524 to 604. The rightmost digit does 6 cycles through all of the digits, so you need 6 copies of each digit. The second digit goes through digits 2 through 0, 10 times each. The third digit is 6 5 times and 5 100-24 times.
Third, count from 604 to 1004. The rightmost digit does 40 cycles, so add 40 copies of each digit. The second from right digit doers 4 cycles, so add 4 copies of each digit. The leftmost digit does 100 each of 7, 8, and 9, plus 5 of 0 and 100 - 5 of 6. The last digit is 1 5 times.
To speed up the last bit, look at the part about the rightmost two places. It uses each digit 10 + 1 times. In general, 1 + 10 + ... + 10^n = (10^(n+1) - 1)/9, which we can use to speed up counting even more.
My algorithm is to count up from the start number to the end number (using base-10 counting), but use the fact above to do it quickly. You iterate through the digits of the starting number from least to most significant, and at each place you count up so that that digit is the same as the one in the ending number. At each point, n is the number of up-counts you need to do before you get to a carry, and m the number you need to do afterwards.
Now let's assume pseudocode counts as a language. Here, then, is what I would do:
convert start and end numbers to digit arrays start[] and end[]
create an array counts[] with 10 elements which stores the number of copies of
each digit that you need
iterate through start number from right to left. at the i-th digit,
let d be the number of digits you must count up to get from this digit
to the i-th digit in the ending number. (i.e. subtract the equivalent
digits mod 10)
add d * (10^i - 1)/9 to each entry in count.
let m be the numerical value of all the digits to the right of this digit,
n be 10^i - m.
for each digit e from the left of the starting number up to and including the
i-th digit, add n to the count for that digit.
for j in 1 to d
increment the i-th digit by one, including doing any carries
for each digit e from the left of the starting number up to and including
the i-th digit, add 10^i to the count for that digit
for each digit e from the left of the starting number up to and including the
i-th digit, add m to the count for that digit.
set the i-th digit of the starting number to be the i-th digit of the ending
number.
Oh, and since the value of i increases by one each time, keep track of your old 10^i and just multiply it by 10 to get the new one, instead of exponentiating each time.
To reel of the digits from a number, we'd only ever need to do a costly string conversion if we couldnt do a mod, digits can most quickly be pushed of a number like this:
feed=number;
do
{ digit=feed%10;
feed/=10;
//use digit... eg. digitTally[digit]++;
}
while(feed>0)
that loop should be very fast and can just be placed inside a loop of the start to end numbers for the simplest way to tally the digits.
To go faster, for larger range of numbers, im looking for an optimised method of tallying all digits from 0 to number*10^significance
(from a start to end bazzogles me)
here is a table showing digit tallies of some single significant digits..
these are inclusive of 0, but not the top value itself, -that was an oversight
but its maybe a bit easier to see patterns (having the top values digits absent here)
These tallies dont include trailing zeros,
1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000 6000
0 1 1 10 190 2890 1 2 3 4 6 9 30 110 490 1690
1 0 1 20 300 4000 1 12 13 14 16 19 140 220 1600 2800
2 0 1 20 300 4000 0 2 13 14 16 19 40 220 600 2800
3 0 1 20 300 4000 0 2 3 14 16 19 40 220 600 2800
4 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
5 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
6 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
7 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
8 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
9 0 1 20 300 4000 0 2 3 4 6 9 40 120 600 1800
edit: clearing up my origonal
thoughts:
from the brute force table showing
tallies from 0 (included) to
poweroTen(notinc) it is visible that
a majordigit of tenpower:
increments tally[0 to 9] by md*tp*10^(tp-1)
increments tally[1 to md-1] by 10^tp
decrements tally[0] by (10^tp - 10)
(to remove leading 0s if tp>leadingzeros)
can increment tally[moresignificantdigits] by self(md*10^tp)
(to complete an effect)
if these tally adjustments were applied for each significant digit,
the tally should be modified as though counted from 0 to end-1
the adjustments can be inverted to remove preceeding range (start number)
Thanks Aaronaught for your complete and tested answer.
Here's a very bad answer, I'm ashamed to post it. I asked Mathematica to tally the digits used in all numbers from 1 to 1,000,000, no leading 0s. Here's what I got:
0 488895
1 600001
2 600000
3 600000
4 600000
5 600000
6 600000
7 600000
8 600000
9 600000
Next time you're ordering sticky digits for selling in your hardware store, order in these proportions, you won't be far wrong.
I asked this question on Math Overflow, and got spanked for asking such a simple question. One of the users took pity on me and said if I posted it to The Art of Problem Solving, he would answer it; so I did.
Here is the answer he posted:
http://www.artofproblemsolving.com/Forum/viewtopic.php?p=1741600#1741600
Embarrassingly, my math-fu is inadequate to understand what he posted (the guy is 19 years old...that is so depressing). I really need to take some math classes.
On the bright side, the equation is recursive, so it should be a simple matter to turn it into a recursive function with a few lines of code, by someone who understands the math.
I know this question has an accepted answer but I was tasked with writing this code for a job interview and I think I came up with an alternative solution that is fast, requires no loops and can use or discard leading zeroes as required.
It is in fact quite simple but not easy to explain.
If you list out the first n numbers
1
2
3
.
.
.
9
10
11
It is usual to start counting the digits required from the start room number to the end room number in a left to right fashion, so for the above we have one 1, one 2, one 3 ... one 9, two 1's one zero, four 1's etc. Most solutions I have seen used this approach with some optimisation to speed it up.
What I did was to count vertically in columns, as in hundreds, tens, and units. You know the highest room number so we can calculate how many of each digit there are in the hundreds column via a single division, then recurse and calculate how many in the tens column etc. Then we can subtract the leading zeros if we like.
Easier to visualize if you use Excel to write out the numbers but use a separate column for each digit of the number
A B C
- - -
0 0 1 (assuming room numbers do not start at zero)
0 0 2
0 0 3
.
.
.
3 6 4
3 6 5
.
.
.
6 6 9
6 7 0
6 7 1
^
sum in columns not rows
So if the highest room number is 671 the hundreds column will have 100 zeroes vertically, followed by 100 ones and so on up to 71 sixes, ignore 100 of the zeroes if required as we know these are all leading.
Then recurse down to the tens and perform the same operation, we know there will be 10 zeroes followed by 10 ones etc, repeated six times, then the final time down to 2 sevens. Again can ignore the first 10 zeroes as we know they are leading. Finally of course do the units, ignoring the first zero as required.
So there are no loops everything is calculated with division. I use recursion for travelling "up" the columns until the max one is reached (in this case hundreds) and then back down totalling as it goes.
I wrote this in C# and can post code if anyone interested, haven't done any benchmark timings but it is essentially instant for values up to 10^18 rooms.
Could not find this approach mentioned here or elsewhere so thought it might be useful for someone.
Your approach is fine. I'm not sure why you would ever need anything faster than what you've described.
Or, this would give you an instantaneous solution: Before you actually need it, calculate what you would need from 1 to some maximum number. You can store the numbers needed at each step. If you have a range like your second example, it would be what's needed for 1 to 300, minus what's needed for 1 to 50.
Now you have a lookup table that can be called at will. Doing up to 10,000 would only take a few MB and, what, a few minutes to compute, once?
This doesn't answer your exact question, but it's interesting to note the distribution of first digits according to Benford's Law. For example, if you choose a set of numbers at random, 30% of them will start with "1", which is somewhat counter-intuitive.
I don't know of any distributions describing subsequent digits, but you might be able to determine this empirically and come up with a simple formula for computing an approximate number of digits required for any range of numbers.
If "better" means "clearer," then I doubt it. If it means "faster," then yes, but I wouldn't use a faster algorithm in place of a clearer one without a compelling need.
#!/usr/bin/ruby1.8
def digits_for_range(min, max, leading_zeros)
bins = [0] * 10
format = [
'%',
('0' if leading_zeros),
max.to_s.size,
'd',
].compact.join
(min..max).each do |i|
s = format % i
for digit in s.scan(/./)
bins[digit.to_i] +=1 unless digit == ' '
end
end
bins
end
p digits_for_range(1, 49, false)
# => [4, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 49, true)
# => [13, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 10000, false)
# => [2893, 4001, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]
Ruby 1.8, a language known to be "dog slow," runs the above code in 0.135 seconds. That includes loading the interpreter. Don't give up an obvious algorithm unless you need more speed.
If you need raw speed over many iterations, try a lookup table:
Build an array with 2 dimensions: 10 x max-house-number
int nDigits[10000][10] ; // Don't try this on the stack, kids!
Fill each row with the count of digits required to get to that number from zero.
Hint: Use the previous row as a start:
n=0..9999:
if (n>0) nDigits[n] = nDigits[n-1]
d=0..9:
nDigits[n][d] += countOccurrencesOf(n,d) //
Number of digits "between" two numbers becomes simple subtraction.
For range=51 to 300, take the counts for 300 and subtract the counts for 50.
0's = nDigits[300][0] - nDigits[50][0]
1's = nDigits[300][1] - nDigits[50][1]
2's = nDigits[300][2] - nDigits[50][2]
3's = nDigits[300][3] - nDigits[50][3]
etc.
You can separate each digit (look here for a example), create a histogram with entries from 0..9 (which will count how many digits appeared in a number) and multiply by the number of 'numbers' asked.
But if isn't what you are looking for, can you give a better example?
Edited:
Now I think I got the problem. I think you can reckon this (pseudo C):
int histogram[10];
memset(histogram, 0, sizeof(histogram));
for(i = startNumber; i <= endNumber; ++i)
{
array = separateDigits(i);
for(j = 0; k < array.length; ++j)
{
histogram[k]++;
}
}
Separate digits implements the function in the link.
Each position of the histogram will have the amount of each digit. For example
histogram[0] == total of zeros
histogram[1] == total of ones
...
Regards

Resources