Related
I came across a question and unable to find a feasible solution.
Image Quantization
Given a grayscale mage, each pixels color range from (0 to 255), compress the range of values to a given number of quantum values.
The goal is to do that with the minimum sum of costs needed, the cost of a pixel is defined as the absolute difference between its color and the closest quantum value for it.
Example
There are 3 rows 3 columns, image [[7,2,8], [8,2,3], [9,8 255]] quantums = 3 number of quantum values.The optimal quantum values are (2,8,255) Leading to the minimum sum of costs |7-8| + |2-2| + |8-8| + |8-8| + |2-2| + |3-2| + |9-8| + |8-8| + |255-255| = 1+0+0+0+0+1+1+0+0 = 3
Function description
Complete the solve function provided in the editor. This function takes the following 4 parameters and returns the minimum sum of costs.
n Represents the number of rows in the image
m Represents the number of columns in the image
image Represents the image
quantums Represents the number of quantum values.
Output:
Print a single integer the minimum sum of costs/
Constraints:
1<=n,m<=100
0<=image|i||j|<=255
1<=quantums<=256
Sample Input 1
3
3
7 2 8
8 2 3
9 8 255
10
Sample output 1
0
Explanation
The optimum quantum values are {0,1,2,3,4,5,7,8,9,255} Leading the minimum sum of costs |7-7| + |2-2| + |8-8| + |8-8| + |2-2| + |3-3| + |9-9| + |8-8| + |255-255| = 0+0+0+0+0+0+0+0+0 = 0
can anyone help me to reach the solution ?
Clearly if we have as many or more quantums available than distinct pixels, we can return 0 as we set at least enough quantums to each equal one distinct pixel. Now consider setting the quantum at the lowest number of the sorted, grouped list.
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
]
[(2, 2), (3, 1), (7, 1), (8, 3), (9, 1), (255, 1)]
2
We record the required sum of differences:
0 + 0 + 1 + 5 + 6 + 6 + 6 + 7 + 253 = 284
Now to update by incrementing the quantum by 1, we observe that we have a movement of 1 per element so all we need is the count of affected elements.
Incremenet 2 to 3
3
1 + 1 + 0 + 4 + 5 + 5 + 5 + 6 + 252 = 279
or
284 + 2 * 1 - 7 * 1
= 284 + 2 - 7
= 279
Consider traversing from the left with a single quantum, calculating only the effect on pixels in the sorted, grouped list that are on the left side of the quantum value.
To only update the left side when adding a quantum, we have:
left[k][q] = min(left[k-1][p] + effect(A, p, q))
where effect is the effect on the elements in A (the sorted, grouped list) as we reduce p incrementally and update the effect on the pixels in the range, [p, q] according to whether they are closer to p or q. As we increase q for each round of k, we can keep the relevant place in the sorted, grouped pixel list with a pointer that moves incrementally.
If we have a solution for
left[k][q]
where it is the best for pixels on the left side of q when including k quantums with the rightmost quantum set as the number q, then the complete candidate solution would be given by:
left[k][q] + effect(A, q, list_end)
where there is no quantum between q and list_end
Time complexity would be O(n + k * q * q) = O(n + quantums ^ 3), where n is the number of elements in the input matrix.
Python code:
def f(M, quantums):
pixel_freq = [0] * 256
for row in M:
for colour in row:
pixel_freq[colour] += 1
# dp[k][q] stores the best solution up
# to the qth quantum value, with
# considering the effect left of
# k quantums with the rightmost as q
dp = [[0] * 256 for _ in range(quantums + 1)]
pixel_count = pixel_freq[0]
for q in range(1, 256):
dp[1][q] = dp[1][q-1] + pixel_count
pixel_count += pixel_freq[q]
predecessor = [[None] * 256 for _ in range(quantums + 1)]
# Main iteration, where the full
# candidate includes both right and
# left effects while incrementing the
# number of quantums.
for k in range(2, quantums + 1):
for q in range(k - 1, 256):
# Adding a quantum to the right
# of the rightmost doesn't change
# the left cost already calculated
# for the rightmost.
best_left = dp[k-1][q-1]
predecessor[k][q] = q - 1
q_effect = 0
p_effect = 0
p_count = 0
for p in range(q - 2, k - 3, -1):
r_idx = p + (q - p) // 2
# When the distance between p
# and q is even, we reassign
# one pixel frequency to q
if (q - p - 1) % 2 == 0:
r_freq = pixel_freq[r_idx + 1]
q_effect += (q - r_idx - 1) * r_freq
p_count -= r_freq
p_effect -= r_freq * (r_idx - p)
# Either way, we add one pixel frequency
# to p_count and recalculate
p_count += pixel_freq[p + 1]
p_effect += p_count
effect = dp[k-1][p] + p_effect + q_effect
if effect < best_left:
best_left = effect
predecessor[k][q] = p
dp[k][q] = best_left
# Records the cost only on the right
# of the rightmost quantum
# for candidate solutions.
right_side_effect = 0
pixel_count = pixel_freq[255]
best = dp[quantums][255]
best_quantum = 255
for q in range(254, quantums-1, -1):
right_side_effect += pixel_count
pixel_count += pixel_freq[q]
candidate = dp[quantums][q] + right_side_effect
if candidate < best:
best = candidate
best_quantum = q
quantum_list = [best_quantum]
prev_quantum = best_quantum
for i in range(k, 1, -1):
prev_quantum = predecessor[i][prev_quantum]
quantum_list.append(prev_quantum)
return best, list(reversed(quantum_list))
Output:
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
]
k = 3
print(f(M, k)) # (3, [2, 8, 255])
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
]
k = 10
print(f(M, k)) # (0, [2, 3, 7, 8, 9, 251, 252, 253, 254, 255])
I would propose the following:
step 0
Input is:
image = 7 2 8
8 2 3
9 8 255
quantums = 3
step 1
Then you can calculate histogram from the input image. Since your image is grayscale, it can contain only values from 0-255.
It means that your histogram array has length equal to 256.
hist = int[256] // init the histogram array
for each pixel color in image // iterate over image
hist[color]++ // and increment histogram values
hist:
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
---------------------------------------------
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
How to read the histogram:
color 3 has 1 occurrence
color 8 has 2 occurrences
With tis approach, we have reduced our problem from N (amount of pixels) to 256 (histogram size).
Time complexity of this step is O(N)
step 2
Once we have histogram in place, we can calculate its # of quantums local maximums. In our case, we can calculate 3 local maximums.
For the sake of simplicity, I will not write the pseudo code, there are numerous examples on internet. Just google ('find local maximum/extrema in array'
It is important that you end up with 3 biggest local maximums. In our case it is:
hist:
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
---------------------------------------------
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
^ ^ ^
These values (2, 8, 266) are your tops of the mountains.
Time complexity of this step is O(quantums)
I could explain why it is not O(1) or O(256), since you can find local maximums in a single pass. If needed I will add a comment.
step 3
Once you have your tops of the mountains, you want to isolate each mountain in a way that it has the maximum possible surface.
So, you will do that by finding the minimum value between two tops
In our case it is:
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
---------------------------------------------
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
^ ^
| \ / \
- - _ _ _ _ . . . _ ^
So our goal is to find between index values:
from 0 to 2 (not needed, first mountain start from beginning)
from 2 to 8 (to see where first mountain ends, and second one starts)
from 8 to 255 (to see where second one ends, and third starts)
from 255 to end (just noted, also not needed, last mountain always reaches the end)
There are multiple candidates (multiple zeros), and it is not important which one you choose for minimum. Final surface of the mountain is always the same.
Let's say that our algorithm return two minimums. We will use them in next step.
min_1_2 = 6
min_2_3 = 254
Time complexity of this step is O(256). You need just a single pass over histogram to calculate all minimums (actually you will do multiple smaller iterations, but in total you visit each element only once.
Someone could consider this as O(1)
Step 4
Calculate the median of each mountain.
This can be the tricky one. Why? Because we want to calculate the median using the original values (colors) and not counters (occurrences).
There is also the formula that can give us good estimate, and this one can be performed quite fast (looking only at histogram values) (https://medium.com/analytics-vidhya/descriptive-statistics-iii-c36ecb06a9ae)
If that is not precise enough, then the only option is to "unwrap" the calculated values. Then, we could sort these "raw" pixels and easily find the median.
In our case, those medians are 2, 8, 255
Time complexity of this step is O(nlogn) if we have to sort the whole original image. If approximation works fine, then time complexity of this step is almost the constant.
step 5
This is final step.
You now know the start and end of the "mountain".
You also know the median that belongs to that "mountain"
Again, you can iterate over each mountain and calculate the DIFF.
diff = 0
median_1 = 2
median_2 = 8
median_3 = 255
for each hist value (color, count) between START and END // for first mountain -> START = 0, END = 6
// for second mountain -> START = 6, END = 254
// for third mountain -> START = 254, END = 255
diff = diff + |color - median_X| * count
Time complexity of this step is again O(256), and it can be considered as constant time O(1)
I am trying to write a simple script, where the input would be a start date, end date and a total amount of hours (150) and the script would generate a simple report containing random date-time intervals (with ideally weekdays) that would sum the entered amount of hours.
This is what I am trying to achieve:
Start: 2020-01-01
End: 2020-01-31
Total hours: 150
Report:
Jan 1, 2019, 08:02:20 – Jan 1, 2019, 08:55:00: sub time -> 52:40 (52 minutes 40 seconds)
Jan 1, 2019, 09:00:00 – Jan 1, 2019, 09:38:13: sub time -> 38:13 (38 minutes 13 seconds)
...
Jan 3, 2019, 13:15:00 – Jan 3, 2019, 14:45:13: sub time -> 01:30:13 (1 hour 30 minutes 13 seconds)
...
TOTAL TIME: 150 hours (or in minutes)
How do I generate time intervals where the total amount of minutes/hours would be equal to a given number of hours?
I assume the question is loosely-worded in the sense that "random" is not meant in a probability sense; that is, the intent is not to select a set of intervals (that total a given number of hours in length) with a mechanism that ensures all possible sets of such intervals have an equal likelihood of being selected. Rather, I understand that a set of intervals is to be chosen (e.g., for testing purposes) in a way that incorporates elements of randomness.
I have assumed the intervals are to be non-overlapping and the number of intervals is to be specified. I don't understand what "with ideally weekdays" means so I have disregarded that.
The heart of the approach I will propose is the following method.
def rnd_lengths(tot_secs, target_nbr)
max_secs = 2 * tot_secs/target_nbr - 1
arr = []
loop do
break(arr) if tot_secs.zero?
l = [(0.5 + max_secs * rand).round, tot_secs].min
arr << l
tot_secs -= l
end
end
The method generates an array of integers (lengths of intervals), measured in seconds, ideally having target_nbr elements. tot_secs is the required combined length of the "random" intervals (e.g., 150*3600).
Each element of the array is drawn randomly drawn from a uniform distribution that ranges from zero to max_secs (to be computed). This is done sequentially until tot_secs is reached. Should the last random value cause the total to exceed tot_secs it is reduced to make the total equal tot_secs.`
Suppose tot_secs equals 100 and we wish to generate 4 random intervals (target_nbr = 4). That means the average length of the intervals would be 25. As we are using a uniform distribution having an average of (1 + max_secs)/2, we may derive the value of max_secs from the expression
target_nbr * (1 + max_secs)/2 = tot_secs
which is
max_secs = 2 * tot_secs/target_nbr - 1
the first line of the method. For the example I mentioned, this would be
max_secs = 2 * 100/4 - 1
#=> 49
Let's try it.
rnd_lengths(100, 4)
#=> [49, 36, 15]
As you see the array that is returned sums to 100, as required, but it contains only 3 elements. That's why I named the argument target_nbr, as there is no assurance the array returned will have that number of elements. What to do? Try again!
rnd_lengths(100, 4)
#=> [14, 17, 26, 37, 6]
Still not 4 elements, so keep trying:
rnd_lengths(100, 4)
#=> [11, 37, 39, 13]
Success! It may take a few tries to get the correct number of elements, but for parameters likely to be used, and the nature of the probability distribution employed, I wouldn't expect that to be a problem.
Let's put this in a method.
def rdm_intervals(tot_secs, nbr_intervals)
loop do
arr = rnd_lengths(tot_secs, nbr_intervals)
break(arr) if arr.size == nbr_intervals
end
end
intervals = rdm_intervals(100, 4)
#=> [29, 26, 7, 38]
We can compute random gaps between intervals in the same way. Suppose the intervals fall within a range of 175 seconds (the number of seconds between the start time and end time). Then:
gaps = rdm_intervals(175-100, 5)
#=> [26, 5, 19, 4, 21]
As seen, the gaps sum to 75, as required. We can disregard the last element.
We can now form the intervals. The first interval begins at 26 seconds and ends at 26+29 #=> 55 seconds. The second interval begins at 55+5 #=> 60 seconds and ends at 60+26 #=> 86 seconds, and so on. We therefore find the intervals (each in ranges of seconds from zero) to be:
[26..55, 60..86, 105..112, 116..154]
Note that 175 - 154 = 21, the last element of gaps.
If one is uncomfortable with the fact that the last elements of intervals and gaps that are generally constrained in size one could of course randomly reposition those elements within their respective arrays.
One might not care if the number of intervals is exactly target_nbr. It would be simpler and faster to just use the first array of interval lengths produced. That's fine, but we still need the above methods to compute the random gaps, as their number must equal the number of intervals plus one:
gaps = rdm_intervals(175-100, intervals.size + 1)
We can now use these two methods to construct a method that will return the desired result. The argument tot_secs of this method equals total number of seconds spanned by the array intervals returned (e.g., 3600 * 150). The method returns an array containing nbr_intervals non-overlapping ranges of Time objects that fall between the given start and end dates.
require 'date'
def construct_intervals(start_date_str, end_date_str, tot_secs, nbr_intervals)
start_time = Date.strptime(start_date_str, '%Y-%m-%d').to_time
secs_in_period = Date.strptime(end_date_str, '%Y-%m-%d').to_time - start_time
intervals = rdm_intervals(tot_secs, nbr_intervals)
gaps = rdm_intervals(secs_in_period - tot_secs, nbr_intervals+1)
nbr_intervals.times.with_object([]) do |_,arr|
start_time += gaps.shift
end_time = start_time + intervals.shift
arr << (start_time..end_time)
start_time = end_time
end
end
See Date::strptime.
Let's try an example.
start_date_str = '2020-01-01'
end_date_str = '2020-01-31'
tot_secs = 3600*150
#=> 540000
construct_intervals(start_date_str, end_date_str, tot_secs, 4)
#=> [2020-01-06 18:05:04 -0800..2020-01-09 03:48:00 -0800,
# 2020-01-09 06:44:16 -0800..2020-01-11 23:33:44 -0800,
# 2020-01-20 20:30:21 -0800..2020-01-21 17:27:44 -0800,
# 2020-01-27 19:08:38 -0800..2020-01-28 01:38:51 -0800]
construct_intervals(start_date_str, end_date_str, tot_secs, 8)
#=> [2020-01-03 18:43:36 -0800..2020-01-04 10:49:14 -0800,
# 2020-01-08 07:55:44 -0800..2020-01-08 08:17:18 -0800,
# 2020-01-11 00:54:36 -0800..2020-01-11 23:00:53 -0800,
# 2020-01-14 05:20:14 -0800..2020-01-14 22:48:45 -0800,
# 2020-01-16 18:28:28 -0800..2020-01-17 22:50:24 -0800,
# 2020-01-22 02:59:31 -0800..2020-01-22 22:33:08 -0800,
# 2020-01-23 00:36:59 -0800..2020-01-24 12:15:37 -0800,
# 2020-01-29 11:22:21 -0800..2020-01-29 21:46:10 -0800]
See Date::strptime
START -xxx----xxx--x----xxxxx---xx--xx---xx-xx-x-xxx-- END
We need to fill a timespan with alternating periods of ON and OFF. This can be
denoted by a list of timestamps. Let's say that the period always starts with
an OFF period for simplicity's sake.
From the start/end of the timespan and the total seconds in ON state, we
gather useful facts:
the timespan's total size in seconds total_seconds
the second totals of both the ON (on_total_seconds) and the OFF (off_total_seconds) periods
Once we know these, a workable algorithm looks more or less like this - pardon
the functions without implementation:
# this can be a parameter as well
MIN_PERIODS = 10
MAX_PERIODS = 100
def fill_periods(start_date, end_date, on_total_seconds = 150*60*60)
total_seconds = get_total_seconds(start_date, end_date)
off_total_seconds = total_seconds - on_total_seconds
# establish two buckets to pull from alternately in populating our array of durations
on_bucket = on_total_seconds
off_bucket = off_total_seconds
result = []
# populate `result` with durations in seconds. `result` will sum to `total_seconds`
while on_bucket > 0 || off_bucket > 0 do
off_slice = rand(off_total_seconds / MAX_PERIODS / 2, off_total_seconds / MIN_PERIODS / 2).to_i
off_bucket -= [off_slice, off_bucket].min
on_slice = rand(on_total_seconds / MAX_PERIODS / 2, on_total_seconds / MIN_PERIODS / 2).to_i
on_bucket -= [on_slice, on_bucket].min
# randomness being random, we're going to hit 0 in one bucket before the
# other. when this happens, just add this (off, on) pair to the last one.
if off_slice == 0 || on_slice == 0
last_off, last_on = result.pop(2)
result << last_off + off_slice << last_on + on_slice
else
result << off_slice << on_slice
end
end
# build up an array of datetimes by progressively adding seconds to the last timestamp.
datetimes = result.each_with_object([start_date]) do |period, memo|
memo << add_seconds(memo.last, period)
end
# we want a list of datetime pairs denoting ON periods. since we know our
# timespan starts with OFF, we start our list of pairs with the second element.
datetimes.slice(1..-1).each_slice(2).to_a
end
Code in Github
The objective of the code is simple: convert base-256 byte string to base-10
def debase256(string)
string.reverse.bytes.inject([0, 1]) do |(sum, pow), byte|
[pow * byte.ord, pow * 256]
end.first
end
I tried to read it, but I only went as far as 'reverse.bytes'
I can't imagine in my head how the bytes move and change during the process.
An example explaining this is all I need.
The code is wrong. It's not doing the sum. The first array item in the block should be sum + pow * byte.ord. Also, there's no point in having byte.ord as Integer#ord just returns itself.
Thus, the correct code would be:
def debase256(string)
string.reverse.bytes.inject([0, 1]) do |(sum, pow), byte|
[sum + pow * byte, pow * 256]
end.first
end
This code is a bit hard to follow though. Maybe the following code (without the method declaration) helps you in understanding it better:
string.reverse.bytes.map.with_index do |byte, i|
byte * 256**i
end.sum
Let's look at an example with the string "Test":
string = "Test"
First, we reverse it:
string.reverse # => "tseT"
Then we get the bytes:
string.reverse.bytes # => [116, 115, 101, 84]
Now we want to construct a base 10 number from this base 256 number. We do this by multiplying each slot index i with 256^i where i starts at 0.
"Test".reverse.bytes.map.with_index { |byte, i| byte * 256**i }
# => [116 * 256^0, 115 * 256^1, 101 * 256^2, 84 * 256^3]
# => [116 * 1, 115 * 256, 101 * 65536, 84 * 16777216]
# => [116, 29440, 6619136, 1409286144]
Finally, we take the sum, which is the base 10 representation of it.
"Test".reverse.bytes.map.with_index { |byte, i| byte * 256**i }.sum
# => 1415934836
In order to understand what we are doing, let's try the same thing with a base 10 to base 10 conversion. Let's assume we have a number in base 10, e.g. 1234. We get the digits of this:
1234.digits
# => [4, 3, 2, 1]
Notice how #digits already returns the digits reversed.
Now, in base 10, every slot i needs to be multiplied by 10^i (compared to 256^i above in the base 256 case):
1234.digits.map.with_index { |byte, i| byte * 10**i }
# => [4 * 10^0, 3 * 10^1, 2 * 10^2, 1 * 10^3]
# => [4 * 1, 3 * 10, 2 * 100, 1 * 1000]
# => [4, 30, 200, 1000]
Summing it will give us the base 10 number:
1234.digits.map.with_index { |byte, i| byte * 10**i }.sum
# => 1234
Thus, the only difference is the base, the logic is the same.
A further example that you might have encountered is an RGB color value in hex, e.g. #ac4fbe. For red, green, and blue we have a value ranging from 0 to 255 encoded in hexadecimal. Hexadecimal is a fancy word for base 16. Typically, hexadecimal digits are represented as 0 to 9 and a to f:
0 1 2 3 4 5 6 7 8 9 a b c d e f
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Knowing this, let's look at the red value of the color #ac4fbe, which is represented by the first to characters ac.
The logic here is the same as above. Reversing this gives us ca. If we get the base 10 numbers for each character, that's [12, 10]. Let's multiply each slot with 16^i:
[12 * 16^0, 10 * 16^1] == [12 * 1, 10 * 16] == [12, 160]
The sum 12 + 160 is 172, which is the value for the red component in the color.
Again, it's the same logic as in the other examples.
I hope these examples help you understand how this works. As an exercise, try converting this binary (i.e. base 2) number to base 10:
101010
Remember, these are the slots:
digits: 1 0 1 0 1 0
slot i: 5 4 3 2 1 0
(Hint: it's the answer to the Ultimate Question of Life, The Universe, and Everything.)
inject performs a loop, where the "accumulating variable" (also known as "memo object") is a two-element array, which is initially set to [0,1]. On each iteration, this object is passed as [sum, pow] to the loop body, together with the next element from the input array, which is stored in byte. The loop body calculates the updated memo object to be used on the next iteration. The result of the inject is the final value of the memo object.
You can follow what is going on, by replacing the loop body by
[pow * byte.ord, pow * 256].tap do
|new_sum, new_pow|
puts "Working on byte #{byte.inspect}"
puts "old sum and pow : #{sum},#{pow}"
puts "new sum and pow : #{new_sum}, #{new_pow}"
end
Let's say the string is "AB" (for which the ASCII codes are 65 and 66):
string.reverse.bytes gives you [66,65]
[66,65].inject([0,1]) goes through the array [66,65] and brings along the result array [0,1] into every iteration. Your loop needs to return the altered version of the array and it gets passed to the next iteration.
Example 1:
[66,65].inject([0,1]) do |(sum, pow), byte|
puts "sum: #{sum} pow: #{pow} byte: #{byte}"
[sum, pow] # this gets passed to the next round
end
This outputs:
sum: 0 pow: 1 byte: 66
sum: 0 pow: 1 byte: 65
With a different kind of "memo" array:
memo = []
[66,65].inject(memo) do |memo, byte|
memo << "byte is #{byte}"
memo
end
puts memo.inspect
This outputs:
["byte is 66", "byte is 65"]
So, inject is like each, but the given "memo" object will be passed from each round to the next.
The method uses the memo to hold two values: the sum and the multiplier for the next byte.
Adding debug output to the original method:
def debase256(string)
string.reverse.bytes.inject([0, 1]) do |(sum, pow), byte|
puts "sum: #{sum} pow: #{pow} byte: #{byte}"
[pow * byte.ord, pow * 256]
end.first
end
Running this with debase256("ABC") outputs:
sum: 0 pow: 1 byte: 67
sum: 67 pow: 256 byte: 66
sum: 16896 pow: 65536 byte: 65
So, we see that the input for the first round's |(sum, pow), byte| is (0, 1), 67.
pow * byte.ord is 1 * 67 = 67
pow * 256 is 1 * 256 = 256.
So, the |(sum, pow), byte| for the second round will be: (67, 256), 66.
pow * byte.ord is 256 * 66 = 16896.
pow * 256 is 256 * 256 = 65536.
So the |(sum, pow), byte| for the last round is: (16896, 65536), 65:
pow * byte.ord is 65536 * 65 = 4259840.
pow * 256 is 65536 * 256 = 16777216.
Because this was the last round, the block will return the memo, which is [4259840, 16777216]. The first element contains the desired result and the last one is no longer needed, so .first is called to just get the sum.
So I'm doing one of those programming challenges on HackerRank to help build my skills. (No this is NOT for an interview! The problem I am on is the Prime Digit Sum. (Full description: https://www.hackerrank.com/challenges/prime-digit-sums/problem) Basically given a value n, I am to find all numbers that are n digits long that meet the following three criteria:
Every 3 consecutive digits sums to a prime number
Every 4 consecutive digits sums to a prime number
Every 5 consecutive digits sums to a prime number
See the link for a detailed breakdown...
I've got a basic function that works, problem is that when n gets big enough it breaks:
#!/bin/ruby
require 'prime'
def isChloePrime?(num)
num = num.to_s
num.chars.each_cons(5) do |set|
return false unless Prime.prime?(set.inject(0) {|sum, i| sum + i.to_i})
end
num.chars.each_cons(4) do |set|
return false unless Prime.prime?(set.inject(0) {|sum, i| sum + i.to_i})
end
num.chars.each_cons(3) do |set|
return false unless Prime.prime?(set.inject(0) {|sum, i| sum + i.to_i})
end
return true
end
def primeDigitSums(n)
total = 0
(10**(n-1)..(10**n-1)).each do |i|
total += 1 if isChloePrime?(i)
end
return total
end
puts primeDigitSums(6) # prints 95 as expected
puts primeDigitSums(177779) # runtime error
If anyone could point me in the right direction that would be awesome. Not necessarily looking for a "here's the answer". Ideally would love a "try looking into using this function...".
UPDATE here is version 2:
#!/bin/ruby
require 'prime'
#primes = {}
def isChloePrime?(num)
num = num.to_s
(0..num.length-5).each do |i|
return false unless #primes[num[i,5]]
end
return true
end
def primeDigitSums(n)
total = 0
(10**(n-1)...(10**n)).each do |i|
total += 1 if isChloePrime?(i)
end
return total
end
(0..99999).each do |val|
#primes[val.to_s.rjust(5, "0")] = true if [3,4,5].all? { |n| val.digits.each_cons(n).all? { |set| Prime.prime? set.sum } }
end
I regard every non-negative integer to be valid if the sum of every sequence of 3, 4 and 5 of its digits form a prime number.
Construct set of relevant prime numbers
We will need to determine if the sums of digits of 3-, 4- and 5-digit numbers are prime. The largest number will therefore be no larger than 5 * 9. It is convenient to construct a set of those primes (a set rather than an array to speed lookups).
require 'prime'
require 'set'
primes = Prime.each(5*9).to_set
#=> #<Set: {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43}>
Construct transition hash
valid1 is a hash whose keys are all 1-digit numbers (all of which are valid). The value of the key 0 is an array of all 1-digit numbers. For 1-9 the values are arrays of 2-digit numbers (all of which are valid) that are obtained by appending a digit to the key. Collectively, the values include all 2-digit numbers.
valid1 = (0..9).each_with_object({}) { |v1,h|
h[v1] = 10.times.map { |i| 10 * v1 + i } }
valid2 is a hash that maps 2-digit numbers (all valid) to arrays of valid 3-digit numbers that are obtained by appending a digit to the 2-digit number. Collectively, the values include all valid 3-digit numbers. All values are non-empty arrays.
valid2 = (10..99).each_with_object({}) do |v2,h|
p = 10 * v2
b, a = v2.digits
h[v2] = (0..9).each_with_object([]) { |c,arr|
arr << (p+c) if primes.include?(a+b+c) }
end
Note that Integer#digits returns an array with the 1's digit first.
valid3 is a hash that maps valid 3-digit numbers to arrays of valid 4-digit numbers that are obtained by appending a digit to the key. Collectively, the values include all valid 4-digit numbers. 152 of the 303 values are empty arrays.
valid3 = valid2.values.flatten.each_with_object({}) do |v3,h|
p = 10 * v3
c, b, a = v3.digits
h[v3] = (0..9).each_with_object([]) do |d,arr|
t = b+c+d
arr << (p+d) if primes.include?(t) && primes.include?(t+a)
end
end
valid4 is a hash that maps valid 4-digit numbers to arrays of valid 4-digit numbers that are obtained by appending a digit to the key and dropping the first digit of key. valid5.values.flatten.size #=> 218 is the number of valid 5-digit numbers. 142 of the 280 values are empty arrays.
valid4 = valid3.values.flatten.each_with_object({}) do |v4,h|
p = 10 * v4
d, c, b, a = v4.digits
h[v4] = (0..9).each_with_object([]) do |e,arr|
t = c+d+e
arr << ((p+e) % 10_000) if primes.include?(t) &&
primes.include?(t += b) && primes.include?(t + a)
end
end
We merge these four hashes to form a single hash #transition. The former hashes are no longer needed. #transition has 294 keys.
#transition = [valid1, valid2, valid3, valid4].reduce(:merge)
#=> {0=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# 1=>[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# ...
# 9=>[90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
# 10=>[101, 102, 104, 106], 11=>[110, 111, 113, 115, 119],
# ...
# 97=>[971, 973, 977], 98=>[980, 982, 986], 99=>[991, 995],
# 101=>[1011], 102=>[1020], 104=>[], 106=>[], 110=>[1101],
# ...
# 902=>[9020], 904=>[], 908=>[], 911=>[9110], 913=>[], 917=>[],
# 1011=>[110], 1020=>[200], 1101=>[], 1110=>[], 1200=>[],
# ...
# 8968=>[], 9020=>[200], 9110=>[], 9200=>[]}
Transition method
This is the method that will be used to update counts each time n, the number of digits, is incremented by one.
def next_counts(counts)
counts.each_with_object({}) do |(k,v),new_valid|
#transition[k].each do |new_v|
(new_valid[new_v] = new_valid[new_v].to_i + v) if #transition.key?(k)
end
end
end
prime_digit_sum method
def prime_digit_sum(n)
case n
when 1 then 10
when 2 then 90
when 3 then #transition.sum { |k,v| (10..99).cover?(k) ? v.size : 0 }
else
counts = #transition.select { |k,_| (100..999).cover?(k) }.
values.flatten.product([1]).to_h
(n - 4).times { counts = next_counts(counts) }
counts.values.sum % (10**9 + 7)
end
end
Note that, for n = 4 the hash counts has keys that are valid 4-digit numbers and values that all equal 1:
counts = #transition.select { |k,_| (100..999).cover?(k) }.
values.flatten.product([1]).to_h
#=> {1011=>1, 1020=>1, 1101=>1, 1110=>1, 1200=>1, 2003=>1, 2005=>1,
# ...
# 8902=>1, 8920=>1, 8968=>1, 9020=>1, 9110=>1, 9200=>1}
counts.size
#=> 280
As shown, for n >= 5, counts is updated each time n is incremented by one. The sum of the values equals the number of valid n-digit numbers.
The number formed by the last four digits of every valid n-digit numbers is one of count's keys. The value of each key is an array of numbers that comprise the last four digits of all valid (n+1)-digit numbers that are produced by appending a digit to the key.
Consider, for example, the value of counts for n = 6, which is found to be the following.
counts
#=> {1101=>1, 2003=>4, 2005=>4, 300=>1, 302=>1, 304=>1, 308=>1, 320=>1,
# 322=>1, 326=>1, 328=>1, 380=>1, 382=>1, 386=>1, 388=>1, 500=>1,
# 502=>1, 506=>1, 508=>1, 560=>1, 562=>1, 566=>1, 568=>1, 1200=>7,
# 3002=>9, 3020=>4, 3200=>6, 5002=>6, 9200=>4, 200=>9, 1020=>3, 20=>3,
# 5200=>4, 201=>2, 203=>2, 205=>2, 209=>2, 5020=>2, 9020=>1}
Consider the key 2005 and note that
#transition[2005]
#=> [50, 56]
We see that there are 4 valid 6-digit numbers whose last four digits are 2005 and that, for each of those 4 numbers, a valid number is produced by adding the digits 0 and 6, resulting in numbers whose last 5-digits are 20050 and 20056. However, we need only keep the last four digits, 0050 and 0056, which are the numbers 50 and 56. Therefore, when recomputing counts for n = 7--call it counts7--we add 4 to both counts7[50] and counts7[56]. Other keys k of counts (for n=6) may be such that #transition[k] have values that include 50 and 56, so they too would contribute to counts7[50] and counts7[50].
Selective results
Let's try it for various values of n
puts "digits nbr valid* seconds"
[1, 2, 3, 4, 5, 6, 20, 50, 100, 1_000, 10_000, 40_000].each do |n|
print "%6d" % n
t = Time.now
print "%11d" % prime_digit_sum(n)
puts "%10f" % (Time.now-t).round(4)
end
puts "\n* modulo (10^9+7)"
digits nbr valid* seconds
1 10 0.000000
2 90 0.000000
3 303 0.000200
4 280 0.002200
5 218 0.000400
6 95 0.000400
20 18044 0.000800
50 215420656 0.001400
100 518502061 0.002700
1000 853799949 0.046100
10000 590948890 0.474200
40000 776929051 2.531600
I would approach the problem by pre-calculating a list of all the allowed 5-digit sub-sequences: '00002' fails while '28300' is allowed etc. This could perhaps be set up as a binary array or hash set.
Once you have the list, then you can check any number by moving a 5-digit frame over the number one step at a time.
Problem: given n, find the number of different ways to write n as the sum of 1, 3, 4
Example:for n=5, the answer is 6
5=1+1+1+1+1
5=1+1+3
5=1+3+1
5=3+1+1
5=1+4
5=4+1
I have tried with permutation method,but its efficiency is very low,is there a more efficient way to do?
Using dynamic programming with a lookup table (implemented with a hash, as it makes the code simpler):
nums=[1,3,4]
n=5
table={0=>1}
1.upto(n) { |i|
table[i] = nums.map { |num| table[i-num].to_i }.reduce(:+)
}
table[n]
# => 6
Note: Just checking one of the other answers, mine was instantaneous for n=500.
def add_next sum, a1, a2
residue = a1.inject(sum, :-)
residue.zero? ? [a1] : a2.reject{|x| residue < x}.map{|x| a1 + [x]}
end
a = [[]]
until a == (b = a.flat_map{|a| add_next(5, a, [1, 3, 4])})
a = b
end
a:
[
[1, 1, 1, 1, 1],
[1, 1, 3],
[1, 3, 1],
[1, 4],
[3, 1, 1],
[4, 1]
]
a.length #=> 6
I believe this problem should be addressed in two steps.
Step 1
The first step is to determine the different numbers of 1s, 3s and 4s that sum to the given number. For n = 5, there are only 3, which we could write:
[[5,0,0], [2,1,0], [1,0,1]]
These 3 elements are respectively interpreted as "five 1s, zero 3s and zero 4s", "two 1s, one 3 and zero 4s" and "one 1, zero 3s and one 4".
To compute these combinations efficiently, I first I compute the possible combinations using only 1s, that sum to each number between zero and 5 (which of course is trivial). These values are saved in a hash, whose keys are the summands and the value is the numbers of 1's needed to sum to the value of the key:
h0 = { 0 => 0, 1 => 1, 2 => 2, 3 => 3, 4 => 4, 5 => 5 }
(If the first number had been 2, rather than 1, this would have been:
h0 = { 0 => 0, 2 => 1, 4 => 2 }
since there is no way to sum only 2s to equal 1 or 3.)
Next we consider using both 1 and 3 to sum to each value between 0 and 5. There are only two choices for the number of 3s used, zero or one. This gives rise to the hash:
h1 = { 0 => [[0,0]], 1 => [[1,0]], 2 => [[2,0]], 3 => [[3,0], [0,1]],
4 => [[4,0], [1,1]], 5 => [[5,0], [2,1]] }
This indicates, for example, that:
there is only 1 way to use 1 and 3 to sum to 1: 1 => [1,0], meaning one 1 and zero 3s.
there are two ways to sum to 4: 4 => [[4,0], [1,1]], meaning four 1s and zero 3s or one 1 and one 3.
Similarly, when 1, 3 and 4 can all be used, we obtain the hash:
h2 = { 5 => [[5,0,0], [2,1,0], [1,0,1]] }
Since this hash corresponds to the use of all three numbers, 1, 3 and 4, we are concerned only with the combinations that sum to 5.
In constructing h2, we can use zero 4s or one 4. If we use use zero 4s, we would use one 1s and 3s that sum to 5. We see from h1 that there are two combinations:
5 => [[5,0], [2,1]]
For h2 we write these as:
[[5,0,0], [2,1,0]]
If one 4 is used, 1s and 3s totalling 5 - 1*4 = 1 are used. From h1 we see there is just one combination:
1 => [[1,0]]
which for h2 we write as
[[1,0,1]]
so
the value for the key 5 in h2 is:
[[5,0,0], [2,1,0]] + [[1,0,1]] = [[5,0,0], [2,1,0]], [1,0,1]]
Aside: because of form of hashes I've chosen to represent hashes h1 and h2, it is actually more convenient to represent h0 as:
h0 = { 0 => [[0]], 1 => [[1]],..., 5 => [[5]] }
It should be evident how this sequential approach could be used for any collection of integers whose combinations are to be summed.
Step 2
The numbers of distinct arrangements of each array [n1, n3, n4] produced in Step 1 equals:
(n1+n3+n4)!/(n1!n3!n4!)
Note that if one of the n's were zero, these would be binomial coefficients. If fact, these are coefficients from the multinomial distribution, which is a generalization of the binomial distribution. The reasoning is simple. The numerator gives the number of permutations of all the numbers. The n1 1s can be permuted n1! ways for each distinct arrangement, so we divide by n1!. Same for n3 and n4
For the example of summing to 5, there are:
5!/5! = 1 distinct arrangement for [5,0,0]
(2+1)!/(2!1!) = 3 distinct arrangements for [2,1,0] and
(1+1)!/(1!1!) = 2 distinct arrangements for [1,0,1], for a total of:
1+3+2 = 6 distinct arrangements for the number 5.
Code
def count_combos(arr, n)
a = make_combos(arr,n)
a.reduce(0) { |tot,b| tot + multinomial(b) }
end
def make_combos(arr, n)
arr.size.times.each_with_object([]) do |i,a|
val = arr[i]
if i.zero?
a[0] = (0..n).each_with_object({}) { |t,h|
h[t] = [[t/val]] if (t%val).zero? }
else
first = (i==arr.size-1) ? n : 0
a[i] = (first..n).each_with_object({}) do |t,h|
combos = (0..t/val).each_with_object([]) do |p,b|
prev = a[i-1][t-p*val]
prev.map { |pr| b << (pr +[p]) } if prev
end
h[t] = combos unless combos.empty?
end
end
end.last[n]
end
def multinomial(arr)
(arr.reduce(:+)).factorial/(arr.reduce(1) { |tot,n|
tot * n.factorial })
end
and a helper:
class Fixnum
def factorial
return 1 if self < 2
(1..self).reduce(:*)
end
end
Examples
count_combos([1,3,4], 5) #=> 6
count_combos([1,3,4], 6) #=> 9
count_combos([1,3,4], 9) #=> 40
count_combos([1,3,4], 15) #=> 714
count_combos([1,3,4], 30) #=> 974169
count_combos([1,3,4], 50) #=> 14736260449
count_combos([2,3,4], 50) #=> 72581632
count_combos([2,3,4,6], 30) #=> 82521
count_combos([1,3,4], 500) #1632395546095013745514524935957247\
00017620846265794375806005112440749890967784788181321124006922685358001
(I broke the result the example (one long number) into two pieces, for display purposes.)
count_combos([1,3,4], 500) took about 2 seconds to compute; the others were essentially instantaneous.
#sawa's method and mine gave the same results for n between 6 and 9, so I'm confident they are both correct. sawa's solution times increase much more quickly with n than do mine, because he is computing and then counting all the permutations.
Edit: #Karole, who just posted an answer, and I get the same results for all my tests (including the last one!). Which answer do I prefer? Hmmm. Let me think about that.)
I don't know ruby so I am writing it in C++
say for your example n=5.
Use dynamic programming set
int D[n],n;
cin>>n;
D[0]=1;
D[1]=1;
D[2]=1;
D[3]=2;
for(i = 4; i <= n; i++)
D[i] = D[i-1] + D[i-3] + D[i-4];
cout<<D[i];