Related
I am trying to write a simple script, where the input would be a start date, end date and a total amount of hours (150) and the script would generate a simple report containing random date-time intervals (with ideally weekdays) that would sum the entered amount of hours.
This is what I am trying to achieve:
Start: 2020-01-01
End: 2020-01-31
Total hours: 150
Report:
Jan 1, 2019, 08:02:20 – Jan 1, 2019, 08:55:00: sub time -> 52:40 (52 minutes 40 seconds)
Jan 1, 2019, 09:00:00 – Jan 1, 2019, 09:38:13: sub time -> 38:13 (38 minutes 13 seconds)
...
Jan 3, 2019, 13:15:00 – Jan 3, 2019, 14:45:13: sub time -> 01:30:13 (1 hour 30 minutes 13 seconds)
...
TOTAL TIME: 150 hours (or in minutes)
How do I generate time intervals where the total amount of minutes/hours would be equal to a given number of hours?
I assume the question is loosely-worded in the sense that "random" is not meant in a probability sense; that is, the intent is not to select a set of intervals (that total a given number of hours in length) with a mechanism that ensures all possible sets of such intervals have an equal likelihood of being selected. Rather, I understand that a set of intervals is to be chosen (e.g., for testing purposes) in a way that incorporates elements of randomness.
I have assumed the intervals are to be non-overlapping and the number of intervals is to be specified. I don't understand what "with ideally weekdays" means so I have disregarded that.
The heart of the approach I will propose is the following method.
def rnd_lengths(tot_secs, target_nbr)
max_secs = 2 * tot_secs/target_nbr - 1
arr = []
loop do
break(arr) if tot_secs.zero?
l = [(0.5 + max_secs * rand).round, tot_secs].min
arr << l
tot_secs -= l
end
end
The method generates an array of integers (lengths of intervals), measured in seconds, ideally having target_nbr elements. tot_secs is the required combined length of the "random" intervals (e.g., 150*3600).
Each element of the array is drawn randomly drawn from a uniform distribution that ranges from zero to max_secs (to be computed). This is done sequentially until tot_secs is reached. Should the last random value cause the total to exceed tot_secs it is reduced to make the total equal tot_secs.`
Suppose tot_secs equals 100 and we wish to generate 4 random intervals (target_nbr = 4). That means the average length of the intervals would be 25. As we are using a uniform distribution having an average of (1 + max_secs)/2, we may derive the value of max_secs from the expression
target_nbr * (1 + max_secs)/2 = tot_secs
which is
max_secs = 2 * tot_secs/target_nbr - 1
the first line of the method. For the example I mentioned, this would be
max_secs = 2 * 100/4 - 1
#=> 49
Let's try it.
rnd_lengths(100, 4)
#=> [49, 36, 15]
As you see the array that is returned sums to 100, as required, but it contains only 3 elements. That's why I named the argument target_nbr, as there is no assurance the array returned will have that number of elements. What to do? Try again!
rnd_lengths(100, 4)
#=> [14, 17, 26, 37, 6]
Still not 4 elements, so keep trying:
rnd_lengths(100, 4)
#=> [11, 37, 39, 13]
Success! It may take a few tries to get the correct number of elements, but for parameters likely to be used, and the nature of the probability distribution employed, I wouldn't expect that to be a problem.
Let's put this in a method.
def rdm_intervals(tot_secs, nbr_intervals)
loop do
arr = rnd_lengths(tot_secs, nbr_intervals)
break(arr) if arr.size == nbr_intervals
end
end
intervals = rdm_intervals(100, 4)
#=> [29, 26, 7, 38]
We can compute random gaps between intervals in the same way. Suppose the intervals fall within a range of 175 seconds (the number of seconds between the start time and end time). Then:
gaps = rdm_intervals(175-100, 5)
#=> [26, 5, 19, 4, 21]
As seen, the gaps sum to 75, as required. We can disregard the last element.
We can now form the intervals. The first interval begins at 26 seconds and ends at 26+29 #=> 55 seconds. The second interval begins at 55+5 #=> 60 seconds and ends at 60+26 #=> 86 seconds, and so on. We therefore find the intervals (each in ranges of seconds from zero) to be:
[26..55, 60..86, 105..112, 116..154]
Note that 175 - 154 = 21, the last element of gaps.
If one is uncomfortable with the fact that the last elements of intervals and gaps that are generally constrained in size one could of course randomly reposition those elements within their respective arrays.
One might not care if the number of intervals is exactly target_nbr. It would be simpler and faster to just use the first array of interval lengths produced. That's fine, but we still need the above methods to compute the random gaps, as their number must equal the number of intervals plus one:
gaps = rdm_intervals(175-100, intervals.size + 1)
We can now use these two methods to construct a method that will return the desired result. The argument tot_secs of this method equals total number of seconds spanned by the array intervals returned (e.g., 3600 * 150). The method returns an array containing nbr_intervals non-overlapping ranges of Time objects that fall between the given start and end dates.
require 'date'
def construct_intervals(start_date_str, end_date_str, tot_secs, nbr_intervals)
start_time = Date.strptime(start_date_str, '%Y-%m-%d').to_time
secs_in_period = Date.strptime(end_date_str, '%Y-%m-%d').to_time - start_time
intervals = rdm_intervals(tot_secs, nbr_intervals)
gaps = rdm_intervals(secs_in_period - tot_secs, nbr_intervals+1)
nbr_intervals.times.with_object([]) do |_,arr|
start_time += gaps.shift
end_time = start_time + intervals.shift
arr << (start_time..end_time)
start_time = end_time
end
end
See Date::strptime.
Let's try an example.
start_date_str = '2020-01-01'
end_date_str = '2020-01-31'
tot_secs = 3600*150
#=> 540000
construct_intervals(start_date_str, end_date_str, tot_secs, 4)
#=> [2020-01-06 18:05:04 -0800..2020-01-09 03:48:00 -0800,
# 2020-01-09 06:44:16 -0800..2020-01-11 23:33:44 -0800,
# 2020-01-20 20:30:21 -0800..2020-01-21 17:27:44 -0800,
# 2020-01-27 19:08:38 -0800..2020-01-28 01:38:51 -0800]
construct_intervals(start_date_str, end_date_str, tot_secs, 8)
#=> [2020-01-03 18:43:36 -0800..2020-01-04 10:49:14 -0800,
# 2020-01-08 07:55:44 -0800..2020-01-08 08:17:18 -0800,
# 2020-01-11 00:54:36 -0800..2020-01-11 23:00:53 -0800,
# 2020-01-14 05:20:14 -0800..2020-01-14 22:48:45 -0800,
# 2020-01-16 18:28:28 -0800..2020-01-17 22:50:24 -0800,
# 2020-01-22 02:59:31 -0800..2020-01-22 22:33:08 -0800,
# 2020-01-23 00:36:59 -0800..2020-01-24 12:15:37 -0800,
# 2020-01-29 11:22:21 -0800..2020-01-29 21:46:10 -0800]
See Date::strptime
START -xxx----xxx--x----xxxxx---xx--xx---xx-xx-x-xxx-- END
We need to fill a timespan with alternating periods of ON and OFF. This can be
denoted by a list of timestamps. Let's say that the period always starts with
an OFF period for simplicity's sake.
From the start/end of the timespan and the total seconds in ON state, we
gather useful facts:
the timespan's total size in seconds total_seconds
the second totals of both the ON (on_total_seconds) and the OFF (off_total_seconds) periods
Once we know these, a workable algorithm looks more or less like this - pardon
the functions without implementation:
# this can be a parameter as well
MIN_PERIODS = 10
MAX_PERIODS = 100
def fill_periods(start_date, end_date, on_total_seconds = 150*60*60)
total_seconds = get_total_seconds(start_date, end_date)
off_total_seconds = total_seconds - on_total_seconds
# establish two buckets to pull from alternately in populating our array of durations
on_bucket = on_total_seconds
off_bucket = off_total_seconds
result = []
# populate `result` with durations in seconds. `result` will sum to `total_seconds`
while on_bucket > 0 || off_bucket > 0 do
off_slice = rand(off_total_seconds / MAX_PERIODS / 2, off_total_seconds / MIN_PERIODS / 2).to_i
off_bucket -= [off_slice, off_bucket].min
on_slice = rand(on_total_seconds / MAX_PERIODS / 2, on_total_seconds / MIN_PERIODS / 2).to_i
on_bucket -= [on_slice, on_bucket].min
# randomness being random, we're going to hit 0 in one bucket before the
# other. when this happens, just add this (off, on) pair to the last one.
if off_slice == 0 || on_slice == 0
last_off, last_on = result.pop(2)
result << last_off + off_slice << last_on + on_slice
else
result << off_slice << on_slice
end
end
# build up an array of datetimes by progressively adding seconds to the last timestamp.
datetimes = result.each_with_object([start_date]) do |period, memo|
memo << add_seconds(memo.last, period)
end
# we want a list of datetime pairs denoting ON periods. since we know our
# timespan starts with OFF, we start our list of pairs with the second element.
datetimes.slice(1..-1).each_slice(2).to_a
end
There is one very famous problem. I am asking the same here.
There is number of elephants time span given, here time span means, year of birth to year of death.
You have to calculate the period where maximum number of elephants are alive.
Example:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Answer is 1995 - 1999
I tried hard to solve this, but I am unable to do so.
How can I solve this problem?
I got approach for when a user asks to find the number of elephants in any year. I solved that by using segment tree, whenever any elephants time span given, increase every year of that time span by 1. We can solve that in this way. Can this be used to solve the above problem?
For above question, I only need the high-level approach, I will code it myself.
Split each date range into start date and end date.
Sort the dates. If a start date and an end date are the same, put the end date first (otherwise you could get an empty date range as the best).
Start with a count of 0.
Iterate through the dates using a sweep-line algorithm:
If you get a start date:
Increment the count.
If the current count is higher than the last best count, set the count, store this start date and set a flag.
If you get an end date:
If the flag is set, store the stored start date and this end date with the count as the best interval so far.
Reset the flag.
Decrement the count.
Example:
For input:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Split and sorted: (S = start, E = end)
1990 S, 1992 S, 1995 S, 1999 E, 2000 E, 2010 S, 2013 E, 2020 E
Iterating through them:
count = 0
lastStart = N/A
1990: count = 1
count = 1 > 0, so set flag
and lastStart = 1990
1992: count = 2
count = 2 > 0, so set flag
and lastStart = 1992
1995: count = 3
count = 3 > 0, so set flag
and lastStart = 1995
1999: flag is set, so
record [lastStart (= 1995), 1999] with a count of 3
reset flag
count = 2
2000: flag is not set
reset flag
count = 1
2010: count = 2
since count = 2 < 3, don't set flag
2013: flag is not set
reset flag
count = 1
2020: flag is not set
reset flag
count = 0
How about this?
Say I have all the above data stored in a file. Read it into two arrays separated by the " - ".
Hence, now I have birthYear[] which contains all the birth years and deathYear[] containing all the death years.
so birthYear[] = [1990, 1995, 2010, 1992]
deathYear[] = [2013, 2000, 2020, 1999]
Get the min birth year and the max death year. Create a Hashtable with the Key as a year, and the Value as the count.
Hence,
HashTable<String, Integer> numOfElephantsAlive = new HashTable<String, Integer>();
Now, from the min(BirthYear) to the max(BirthYear), do the following :
Iterate through the Birth Year Array and do an add to the HashTable all the years in between the BirthYear and Corresponding DeathYear with the count being 1. If the key already exists, add 1 to it. Hence, for the last case :
1992 - 1999
HashTable.put(1992, 1)
HashTable.put(1993, 1)
and so on for every year.
Say, for example, you have a Hashtable that looks like this at the end of it:
Key Value
1995 3
1996 3
1992 2
1993 1
1994 3
1998 1
1997 2
1999 2
Now, you need the range of the Years when the number of elephants were maximum. Hence, let's iterate and find the year with the max value. This is pretty easy. Iterate over the keySet() and get the year.
Now, you need a contiguous range of years. You can either do this in two ways:
Do Collections.sort() over the keySet() and when you hit the max value, save all contiguous locations.
Hence, on hitting 3 for our example at 1994, we would check for all the following years with a 3. This will return you your range which is the min-year, max-year combo.
One approach maybe:
Iterate through the periods. Keep track of a list of periods up to now. Note: At each step, the number of periods increases by 2 (or 1 if there is no overlap with the existing list of periods).
For example
1990 - 2013
Period List contains 1 period { (1990,2013) }
Count List contains 1 entry { 1 }
1995 - 2000
Period List contains 3 periods { (1990,1995), (1995,2000), (2000,2013) }
Count List contains 3 entries { 1, 2, 1 }
2010 - 2020
Period List contains 5 periods { (1990,1995), (1995,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 5 entries { 1, 2, 1, 2, 1 }
1992 - 1999
Period List contains 7 periods { (1990,1992), (1992,1995), (1995,1999), (1999,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 7 entries { 1, 2, 3, 2, 1, 2, 1 }
1) arrange in assending order year wise starting from the largest series.
2) count the years for largest series for whole data set
3) then identify the largest count.
4) the largest count is your answer for years... this can be done in Algo.
I have a requirement to obtain number of days passed since creation date. This number would need to minus the weekends. I have only some functions : JulianDay, JulianWeek, JulianYear to get Julian date values, I also have Today which returns the date of today, time stamp which returns date and time. I have manage to get the difference of today-creation date by using: JulianDay(today)-JulianDay(creation date) but I still can't wrap my head around subtracting the weekends
Not completely sure what the functions you cited in your question do, however, you seem to be comfortable with
doing the basic date arithmetic to determine the number of days between two given dates. The hard part seems
to be figuring out how may days to subtract for weekends.
I think you can accomplish this with two functions:
Given two dates, return the number of days between them. Call this DAYS(date-1, date-2)
Given a date, return the day of the week (where 1 = Monday ... 7 = Sunday). Call this DAY-OF-WEEK(date)
Having these functions you can then do the following:
Calculate full weeks in the date range: WEEKS = DAYS(date-1, date2) mod 7
Calculate days not parts of full weeks: DAYS-LEFT = DAYS(date-1, date-2) - (WEEKS * 7)
Determine which day of the week the last day falls on: LAST-DAY = DAY-OF-WEEK(date-2)
Adjust the number of DAYS-LEFT from the partial week as follows:
if DAYS-LEFT > 0 then
case LAST-DAY
when 6 then /* Saturday */
DAYS-LEFT = DAYS-LEFT - 1
when 7 then /* Sunday */
if DAYS-LEFT = 1 then
DAYS-LEFT = 0
else
DAYS-LEFT = DAYS-LEFT - 2
end-if
when other /* Monday through Friday */
case DAYS-LEFT - LAST-DAY
when > 1 then
DAYS-LEFT = DAYS-LEFT - 2
when = 1 then
DAYS-LEFT = DAYS-LEFT - 1
when other
DAYS-LEFT = DAYS-LEFT /* no adjustment */
end-case
end-case
end-if
DAYS-EXCLUDING-WEEKENDS = DAYS(date-1, date-2) - (WEEKS * 2) + DAYS-LEFT
I assume you have, or can build, a DAYS(date-1, date-2) function. The next bit is to determine what day of the week
a given date falls on. The algorithm to do this is called Zeller's congruence. I won't
repeat the algorithm here since Wikipedia does a fine job of describing it.
Hope this gets you on your way...
Your JulianDay(y,m,d) function returns a serial number for each date; let's say for the sake of discussion that JulianDay(2013,7,4) returns 2456478. The next day will be 2456479, then 2456480, and so on. And let's say that the difference of two days is diff.
The number of full weeks in diff, each containing 5 weekdays, is diff // 7 (that's integer division, so it rounds down). Thus if diff is 25, there will be 25 // 7 = 3 full weeks plus an extra diff % 7 = 4 days. The 3 full weeks contain 15 weekdays; it doesn't matter which day of the week you start from. So you only need to consider the 4 extra days to see how may are weekdays.
The number that the JulianDay function returns can be taken modulo 7 to calculate the day of the week; on my JulianDay function, modulo 5 represents Saturday and modulo 6 represents Sunday. You can take the 4 extra days to be either the 4 days at the beginning of the period or the 4 days at the end; it doesn't matter because all the other days are part of a period of consecutive full weeks that each have 5 weekdays. Say you pick the first 4 days. Then take the JulianDay of the first day modulo 7, then the JulianDay of the first day plus 1 modulo 7, then the JulianDay of the first day plus 2 modulo 7, then the JulianDay of the first day plus 3 modulo 7, determine how many of them are weekdays, and add that to the number of weekdays in full weeks.
All you need is a JulianDay function.
This code should do what you want:
Date fromDate = new Date(System.currentTimeMillis()-(30L*24*60*60*1000)); // 30 days ago
Date toDate = new Date(System.currentTimeMillis()); // now
Calendar cal = Calendar.getInstance();
cal.setTime(fromDate);
int countDays = 0;
while (toDate.compareTo(cal.getTime()) > 0) {
if (cal.get(Calendar.DAY_OF_WEEK) != Calendar.SATURDAY && cal.get(Calendar.DAY_OF_WEEK) != Calendar.SUNDAY)
countDays++;
cal.add(Calendar.DATE, 1);
}
System.out.println(countDays);
A friend of mine who who is a teacher has 23 students in a class. They want an algorithm that assigns students in groups of 2 and one group of 3 (handle the odd number of students) across 14 weeks such that no two pairs repeat across the 14 weeks (a pair is assigned to one week).
A brute force approach would be too inefficient, so I was thinking of other approaches, matrix representation sounds appealing, and graph theory. Does anyone have any ideas? The problems that I could find deal only with 1 week and this answer I could quite figure out.
Round-robin algorithm will do the trick i think.
Add the remaining student to the second group and you are done.
First run
1 2 3 4 5 6 7 8 9 10 11 12
23 22 21 20 19 18 17 16 15 14 13
Second run
1 23 2 3 4 5 6 7 8 9 10 11
22 21 20 19 18 17 16 15 14 13 12
...
Another possibility might be graph matching, 14 distinct graph matchings would be needed.
Try to describe the problem in terms of constraints.
Then pass the constraints to a tool like ECLiPSe (not Eclipse), see http://eclipseclp.org/.
In fact, your problem seems similar to that of the Golf example on that site (http://eclipseclp.org/examples/golf.ecl.txt).
Here's an example in Haskell that will produce groups of 14 non-repeating 11-pair-combinations. The value 'pairs' is all combinations of pairs from 1 to 23 (e.g., [1,2], [1,3] etc.). Then the program builds lists where each list is 14 lists of 11 pairs (choosing from the value 'pairs') such that no pair is repeated and no single number is repeated in one list of 11 pairs. It's up to you to simply place the missing last student for each week as you see fit. (It took about three minutes to calculate before it started to output results):
import Data.List
import Control.Monad
pairs = nubBy (\x y -> reverse x == y)
$ filter (\x -> length (nub x) == length x) $ replicateM 2 [1..23]
solve = solve' [] where
solve' results =
if length results == 14
then return results
else solveOne [] where
solveOne result =
if length result == 11
then solve' (result:results)
else do next <- pairs
guard (notElem (head next) result'
&& notElem (last next) result'
&& notElem next results')
solveOne (next:result)
where result' = concat result
results' = concat results
One sample from the output:
[[[12,17],[10,19],[9,18],[8,22],[7,21],[6,23],[5,11],[4,14],[3,13],[2,16],[1,15]],
[[12,18],[11,19],[9,17],[8,21],[7,23],[6,22],[5,10],[4,15],[3,16],[2,13],[1,14]],
[[12,19],[11,18],[10,17],[8,23],[7,22],[6,21],[5,9],[4,16],[3,15],[2,14],[1,13]],
[[15,23],[14,22],[13,17],[8,18],[7,19],[6,20],[5,16],[4,9],[3,10],[2,11],[1,12]],
[[16,23],[14,21],[13,18],[8,17],[7,20],[6,19],[5,15],[4,10],[3,9],[2,12],[1,11]],
[[16,21],[15,22],[13,19],[8,20],[7,17],[6,18],[5,14],[4,11],[3,12],[2,9],[1,10]],
[[16,22],[15,21],[14,20],[8,19],[7,18],[6,17],[5,13],[4,12],[3,11],[2,10],[1,9]],
[[20,21],[19,22],[18,23],[12,13],[11,14],[10,15],[9,16],[4,5],[3,6],[2,7],[1,8]],
[[20,22],[19,21],[17,23],[12,14],[11,13],[10,16],[9,15],[4,6],[3,5],[2,8],[1,7]],
[[20,23],[18,21],[17,22],[12,15],[11,16],[10,13],[9,14],[4,7],[3,8],[2,5],[1,6]],
[[19,23],[18,22],[17,21],[12,16],[11,15],[10,14],[9,13],[4,8],[3,7],[2,6],[1,5]],
[[22,23],[18,19],[17,20],[14,15],[13,16],[10,11],[9,12],[6,7],[5,8],[2,3],[1,4]],
[[21,23],[18,20],[17,19],[14,16],[13,15],[10,12],[9,11],[6,8],[5,7],[2,4],[1,3]],
[[21,22],[19,20],[17,18],[15,16],[13,14],[11,12],[9,10],[7,8],[5,6],[3,4],[1,2]]]
Start off with a set (maybe a bitset mapping to students for less memory consumption) for each student that has all other students in it. Iterate 14 times, each time picking 11 students (for the 11 groups you will form) for whom you will pick partners. For each student, pick a partner they haven't been in a group with yet. For one random student of those 11, pick a second partner, but make sure no student has less remaining partners than there are iterations left. For every pick, adjust the sets.
Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:
1 to 100
51 to 300
1 to 2,000 with zeros to the left
The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.
I wonder if there is a better way to solve this, without having to loop through the entire integers range.
Solutions in any language or pseudocode are welcome.
Edit:
Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten.
Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.
A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:
For digits 1 to 9, n*10(n-1) pieces are needed
For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed
For digit 0, if using leading zeros, n*10n-1 - n are needed
The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).
For example, if n = 6, range is 1 to 999,999:
For digits 1 to 9 we need 6*105 = 600,000 of each one
For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889
For digit 0, with leading zeros, we need 6*105 – 6 = 599,994
These numbers can be checked using High-Performance Mark results.
Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:
integer First,Last //First and last number in the range
integer Number //Current number in the loop
integer Power //Power is the n in 10^n in the formulas
integer Nines //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix //First digits in a number. For 14,200, prefix is 142
array 0..9 Digits //Will hold the count for all the digits
FOR Number = First TO Last
CALL TallyDigitsForOneNumber WITH Number,1 //Tally the count of each digit
//in the number, increment by 1
//Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
Power = Zeros at the end of number //For 1,000, Power = 3
IF Power > 0 //The number ends in 0 00 000 etc
Nines = 10^Power-1 //Nines = 10^3 - 1 = 1000 - 1 = 999
IF Number+Nines <= Last //If 1,000+999 < 8,000, add a full set
Digits[0-9] += Power*10^(Power-1) //Add 3*10^(3-1) = 300 to digits 0 to 9
Digits[0] -= -Power //Adjust digit 0 (leading zeros formula)
Prefix = First digits of Number //For 1000, prefix is 1
CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each
//digit in prefix,
//increment by 999
Number += Nines //Increment the loop counter 999 cycles
ENDIF
ENDIF
//End of optimization
ENDFOR
SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
REPEAT
Digits [ Number % 10 ] += Count
Number = Number / 10
UNTIL Number = 0
For example, for range 786 to 3,021, the counter will be incremented:
By 1 from 786 to 790 (5 cycles)
By 9 from 790 to 799 (1 cycle)
By 1 from 799 to 800
By 99 from 800 to 899
By 1 from 899 to 900
By 99 from 900 to 999
By 1 from 999 to 1000
By 999 from 1000 to 1999
By 1 from 1999 to 2000
By 999 from 2000 to 2999
By 1 from 2999 to 3000
By 1 from 3000 to 3010 (10 cycles)
By 9 from 3010 to 3019 (1 cycle)
By 1 from 3019 to 3021 (2 cycles)
Total: 28 cycles
Without optimization: 2,235 cycles
Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:
If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.
Benchmark and Source code
I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:
Original 104.78 seconds
With %10 83.66
With Powers of Ten 0.07
A screenshot of the benchmark application:
(source: clarion.sca.mx)
If you would like to see the full source code or run the benchmark, use these links:
Complete Source code (in Clarion): http://sca.mx/ftp/countdigits.txt
Compilable project and win32 exe: http://sca.mx/ftp/countdigits.zip
Accepted answer
noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.
Aaronaught solution seems to be correct, but the code is just too complex for my taste.
I accepted strainer’s answer, because his line of thought guided me to develop this new solution.
There's a clear mathematical solution to a problem like this. Let's assume the value is zero-padded to the maximum number of digits (it's not, but we'll compensate for that later), and reason through it:
From 0-9, each digit occurs once
From 0-99, each digit occurs 20 times (10x in position 1 and 10x in position 2)
From 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)
The obvious pattern for any given digit, if the range is from 0 to a power of 10, is N * 10N-1, where N is the power of 10.
What if the range is not a power of 10? Start with the lowest power of 10, then work up. The easiest case to deal with is a maximum like 399. We know that for each multiple of 100, each digit occurs at least 20 times, but we have to compensate for the number of times it appears in the most-significant-digit position, which is going to be exactly 100 for digits 0-3, and exactly zero for all other digits. Specifically, the extra amount to add is 10N for the relevant digits.
Putting this into a formula, for upper bounds that are 1 less than some multiple of a power of 10 (i.e. 399, 6999, etc.) it becomes: M * N * 10N-1 + iif(d <= M, 10N, 0)
Now you just have to deal with the remainder (which we'll call R). Take 445 as an example. This is whatever the result is for 399, plus the range 400-445. In this range, the MSD occurs R more times, and all digits (including the MSD) also occur at the same frequencies they would from range [0 - R].
Now we just have to compensate for the leading zeros. This pattern is easy - it's just:
10N + 10N-1 + 10N-2 + ... + **100
Update: This version correctly takes into account "padding zeros", i.e. the zeros in middle positions when dealing with the remainder ([400, 401, 402, ...]). Figuring out the padding zeros is a bit ugly, but the revised code (C-style pseudocode) handles it:
function countdigits(int d, int low, int high) {
return countdigits(d, low, high, false);
}
function countdigits(int d, int low, int high, bool inner) {
if (high == 0)
return (d == 0) ? 1 : 0;
if (low > 0)
return countdigits(d, 0, high) - countdigits(d, 0, low);
int n = floor(log10(high));
int m = floor((high + 1) / pow(10, n));
int r = high - m * pow(10, n);
return
(max(m, 1) * n * pow(10, n-1)) + // (1)
((d < m) ? pow(10, n) : 0) + // (2)
(((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) + // (3)
(((r >= 0) && (d == m)) ? (r + 1) : 0) + // (4)
(((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) - // (5)
(((d == 0) && !inner) ? countleadingzeros(n) : 0); // (6)
}
function countleadingzeros(int n) {
int tmp= 0;
do{
tmp= pow(10, n)+tmp;
--n;
}while(n>0);
return tmp;
}
function countpaddingzeros(int n, int r) {
return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}
As you can see, it's gotten a bit uglier but it still runs in O(log n) time, so if you need to handle numbers in the billions, this will still give you instant results. :-) And if you run it on the range [0 - 1000000], you get the exact same distribution as the one posted by High-Performance Mark, so I'm almost positive that it's correct.
FYI, the reason for the inner variable is that the leading-zero function is already recursive, so it can only be counted in the first execution of countdigits.
Update 2: In case the code is hard to read, here's a reference for what each line of the countdigits return statement means (I tried inline comments but they made the code even harder to read):
Frequency of any digit up to highest power of 10 (0-99, etc.)
Frequency of MSD above any multiple of highest power of 10 (100-399)
Frequency of any digits in remainder (400-445, R = 45)
Additional frequency of MSD in remainder
Count zeros in middle position for remainder range (404, 405...)
Subtract leading zeros only once (on outermost loop)
I'm assuming you want a solution where the numbers are in a range, and you have the starting and ending number. Imagine starting with the start number and counting up until you reach the end number - it would work, but it would be slow. I think the trick to a fast algorithm is to realize that in order to go up one digit in the 10^x place and keep everything else the same, you need to use all of the digits before it 10^x times plus all digits 0-9 10^(x-1) times. (Except that your counting may have involved a carry past the x-th digit - I correct for this below.)
Here's an example. Say you're counting from 523 to 1004.
First, you count from 523 to 524. This uses the digits 5, 2, and 4 once each.
Second, count from 524 to 604. The rightmost digit does 6 cycles through all of the digits, so you need 6 copies of each digit. The second digit goes through digits 2 through 0, 10 times each. The third digit is 6 5 times and 5 100-24 times.
Third, count from 604 to 1004. The rightmost digit does 40 cycles, so add 40 copies of each digit. The second from right digit doers 4 cycles, so add 4 copies of each digit. The leftmost digit does 100 each of 7, 8, and 9, plus 5 of 0 and 100 - 5 of 6. The last digit is 1 5 times.
To speed up the last bit, look at the part about the rightmost two places. It uses each digit 10 + 1 times. In general, 1 + 10 + ... + 10^n = (10^(n+1) - 1)/9, which we can use to speed up counting even more.
My algorithm is to count up from the start number to the end number (using base-10 counting), but use the fact above to do it quickly. You iterate through the digits of the starting number from least to most significant, and at each place you count up so that that digit is the same as the one in the ending number. At each point, n is the number of up-counts you need to do before you get to a carry, and m the number you need to do afterwards.
Now let's assume pseudocode counts as a language. Here, then, is what I would do:
convert start and end numbers to digit arrays start[] and end[]
create an array counts[] with 10 elements which stores the number of copies of
each digit that you need
iterate through start number from right to left. at the i-th digit,
let d be the number of digits you must count up to get from this digit
to the i-th digit in the ending number. (i.e. subtract the equivalent
digits mod 10)
add d * (10^i - 1)/9 to each entry in count.
let m be the numerical value of all the digits to the right of this digit,
n be 10^i - m.
for each digit e from the left of the starting number up to and including the
i-th digit, add n to the count for that digit.
for j in 1 to d
increment the i-th digit by one, including doing any carries
for each digit e from the left of the starting number up to and including
the i-th digit, add 10^i to the count for that digit
for each digit e from the left of the starting number up to and including the
i-th digit, add m to the count for that digit.
set the i-th digit of the starting number to be the i-th digit of the ending
number.
Oh, and since the value of i increases by one each time, keep track of your old 10^i and just multiply it by 10 to get the new one, instead of exponentiating each time.
To reel of the digits from a number, we'd only ever need to do a costly string conversion if we couldnt do a mod, digits can most quickly be pushed of a number like this:
feed=number;
do
{ digit=feed%10;
feed/=10;
//use digit... eg. digitTally[digit]++;
}
while(feed>0)
that loop should be very fast and can just be placed inside a loop of the start to end numbers for the simplest way to tally the digits.
To go faster, for larger range of numbers, im looking for an optimised method of tallying all digits from 0 to number*10^significance
(from a start to end bazzogles me)
here is a table showing digit tallies of some single significant digits..
these are inclusive of 0, but not the top value itself, -that was an oversight
but its maybe a bit easier to see patterns (having the top values digits absent here)
These tallies dont include trailing zeros,
1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000 6000
0 1 1 10 190 2890 1 2 3 4 6 9 30 110 490 1690
1 0 1 20 300 4000 1 12 13 14 16 19 140 220 1600 2800
2 0 1 20 300 4000 0 2 13 14 16 19 40 220 600 2800
3 0 1 20 300 4000 0 2 3 14 16 19 40 220 600 2800
4 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
5 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800
6 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
7 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
8 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800
9 0 1 20 300 4000 0 2 3 4 6 9 40 120 600 1800
edit: clearing up my origonal
thoughts:
from the brute force table showing
tallies from 0 (included) to
poweroTen(notinc) it is visible that
a majordigit of tenpower:
increments tally[0 to 9] by md*tp*10^(tp-1)
increments tally[1 to md-1] by 10^tp
decrements tally[0] by (10^tp - 10)
(to remove leading 0s if tp>leadingzeros)
can increment tally[moresignificantdigits] by self(md*10^tp)
(to complete an effect)
if these tally adjustments were applied for each significant digit,
the tally should be modified as though counted from 0 to end-1
the adjustments can be inverted to remove preceeding range (start number)
Thanks Aaronaught for your complete and tested answer.
Here's a very bad answer, I'm ashamed to post it. I asked Mathematica to tally the digits used in all numbers from 1 to 1,000,000, no leading 0s. Here's what I got:
0 488895
1 600001
2 600000
3 600000
4 600000
5 600000
6 600000
7 600000
8 600000
9 600000
Next time you're ordering sticky digits for selling in your hardware store, order in these proportions, you won't be far wrong.
I asked this question on Math Overflow, and got spanked for asking such a simple question. One of the users took pity on me and said if I posted it to The Art of Problem Solving, he would answer it; so I did.
Here is the answer he posted:
http://www.artofproblemsolving.com/Forum/viewtopic.php?p=1741600#1741600
Embarrassingly, my math-fu is inadequate to understand what he posted (the guy is 19 years old...that is so depressing). I really need to take some math classes.
On the bright side, the equation is recursive, so it should be a simple matter to turn it into a recursive function with a few lines of code, by someone who understands the math.
I know this question has an accepted answer but I was tasked with writing this code for a job interview and I think I came up with an alternative solution that is fast, requires no loops and can use or discard leading zeroes as required.
It is in fact quite simple but not easy to explain.
If you list out the first n numbers
1
2
3
.
.
.
9
10
11
It is usual to start counting the digits required from the start room number to the end room number in a left to right fashion, so for the above we have one 1, one 2, one 3 ... one 9, two 1's one zero, four 1's etc. Most solutions I have seen used this approach with some optimisation to speed it up.
What I did was to count vertically in columns, as in hundreds, tens, and units. You know the highest room number so we can calculate how many of each digit there are in the hundreds column via a single division, then recurse and calculate how many in the tens column etc. Then we can subtract the leading zeros if we like.
Easier to visualize if you use Excel to write out the numbers but use a separate column for each digit of the number
A B C
- - -
0 0 1 (assuming room numbers do not start at zero)
0 0 2
0 0 3
.
.
.
3 6 4
3 6 5
.
.
.
6 6 9
6 7 0
6 7 1
^
sum in columns not rows
So if the highest room number is 671 the hundreds column will have 100 zeroes vertically, followed by 100 ones and so on up to 71 sixes, ignore 100 of the zeroes if required as we know these are all leading.
Then recurse down to the tens and perform the same operation, we know there will be 10 zeroes followed by 10 ones etc, repeated six times, then the final time down to 2 sevens. Again can ignore the first 10 zeroes as we know they are leading. Finally of course do the units, ignoring the first zero as required.
So there are no loops everything is calculated with division. I use recursion for travelling "up" the columns until the max one is reached (in this case hundreds) and then back down totalling as it goes.
I wrote this in C# and can post code if anyone interested, haven't done any benchmark timings but it is essentially instant for values up to 10^18 rooms.
Could not find this approach mentioned here or elsewhere so thought it might be useful for someone.
Your approach is fine. I'm not sure why you would ever need anything faster than what you've described.
Or, this would give you an instantaneous solution: Before you actually need it, calculate what you would need from 1 to some maximum number. You can store the numbers needed at each step. If you have a range like your second example, it would be what's needed for 1 to 300, minus what's needed for 1 to 50.
Now you have a lookup table that can be called at will. Doing up to 10,000 would only take a few MB and, what, a few minutes to compute, once?
This doesn't answer your exact question, but it's interesting to note the distribution of first digits according to Benford's Law. For example, if you choose a set of numbers at random, 30% of them will start with "1", which is somewhat counter-intuitive.
I don't know of any distributions describing subsequent digits, but you might be able to determine this empirically and come up with a simple formula for computing an approximate number of digits required for any range of numbers.
If "better" means "clearer," then I doubt it. If it means "faster," then yes, but I wouldn't use a faster algorithm in place of a clearer one without a compelling need.
#!/usr/bin/ruby1.8
def digits_for_range(min, max, leading_zeros)
bins = [0] * 10
format = [
'%',
('0' if leading_zeros),
max.to_s.size,
'd',
].compact.join
(min..max).each do |i|
s = format % i
for digit in s.scan(/./)
bins[digit.to_i] +=1 unless digit == ' '
end
end
bins
end
p digits_for_range(1, 49, false)
# => [4, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 49, true)
# => [13, 15, 15, 15, 15, 5, 5, 5, 5, 5]
p digits_for_range(1, 10000, false)
# => [2893, 4001, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]
Ruby 1.8, a language known to be "dog slow," runs the above code in 0.135 seconds. That includes loading the interpreter. Don't give up an obvious algorithm unless you need more speed.
If you need raw speed over many iterations, try a lookup table:
Build an array with 2 dimensions: 10 x max-house-number
int nDigits[10000][10] ; // Don't try this on the stack, kids!
Fill each row with the count of digits required to get to that number from zero.
Hint: Use the previous row as a start:
n=0..9999:
if (n>0) nDigits[n] = nDigits[n-1]
d=0..9:
nDigits[n][d] += countOccurrencesOf(n,d) //
Number of digits "between" two numbers becomes simple subtraction.
For range=51 to 300, take the counts for 300 and subtract the counts for 50.
0's = nDigits[300][0] - nDigits[50][0]
1's = nDigits[300][1] - nDigits[50][1]
2's = nDigits[300][2] - nDigits[50][2]
3's = nDigits[300][3] - nDigits[50][3]
etc.
You can separate each digit (look here for a example), create a histogram with entries from 0..9 (which will count how many digits appeared in a number) and multiply by the number of 'numbers' asked.
But if isn't what you are looking for, can you give a better example?
Edited:
Now I think I got the problem. I think you can reckon this (pseudo C):
int histogram[10];
memset(histogram, 0, sizeof(histogram));
for(i = startNumber; i <= endNumber; ++i)
{
array = separateDigits(i);
for(j = 0; k < array.length; ++j)
{
histogram[k]++;
}
}
Separate digits implements the function in the link.
Each position of the histogram will have the amount of each digit. For example
histogram[0] == total of zeros
histogram[1] == total of ones
...
Regards