I would like to know, how to convert fractional values (say, -.06), into negadecimal or a negative base. I know -.06 is .14 in negadecimal, because I can do it the other way around, but the regular algorithm used for converting fractions into other bases doesn't work with a negative base. Dont give a code example, just explain the steps required.
The regular algorithm works like this:
You times the value by the base you're converting into. Record whole numbers, then keep going with the remaining fraction part until there is no more fraction:
0.337 in binary:
0.337*2 = 0.674 "0"
0.674*2 = 1.348 "1"
0.348*2 = 0.696 "0"
0.696*2 = 1.392 "1"
0.392*2 = 0.784 "0"
0.784*2 = 1.568 "1"
0.568*2 = 1.136 "1"
Approximately .0101011
I have a two-step algorithm for doing the conversion. I'm not sure if this is the optimal algorithm, but it works pretty well.
The basic idea is to start off by getting a decimal representation of the number, then converting that decimal representation into a negadecimal representation by handling the even powers and odd powers separately.
Here's an example that motivates the idea behind the algorithm. This is going to go into a lot of detail, but ultimately will arrive at the algorithm and at the same time show where it comes from.
Suppose we want to convert the number 0.523598734 to negadecimal (notice that I'm presupposing you can convert to decimal). Notice that
0.523598734 = 5 * 10^-1
+ 2 * 10^-2
+ 3 * 10^-3
+ 5 * 10^-4
+ 9 * 10^-5
+ 8 * 10^-6
+ 7 * 10^-7
+ 3 * 10^-8
+ 4 * 10^-9
Since 10^-n = (-10)^-n when n is even, we can rewrite this as
0.523598734 = 5 * 10^-1
+ 2 * (-10)^-2
+ 3 * 10^-3
+ 5 * (-10)^-4
+ 9 * 10^-5
+ 8 * (-10)^-6
+ 7 * 10^-7
+ 3 * (-10)^-8
+ 4 * 10^-9
Rearranging and regrouping terms gives us this:
0.523598734 = 2 * (-10)^-2
+ 5 * (-10)^-4
+ 8 * (-10)^-6
+ 3 * (-10)^-8
+ 5 * 10^-1
+ 3 * 10^-3
+ 9 * 10^-5
+ 7 * 10^-7
+ 4 * 10^-9
If we could rewrite those negative terms as powers of -10 rather than powers of 10, we'd be done. Fortunately, we can make a nice observation: if d is a nonzero digit (1, 2, ..., or 9), then
d * 10^-n + (10 - d) * 10^-n
= 10^-n (d + 10 - d)
= 10^-n (10)
= 10^{-n+1}
Restated in a different way:
d * 10^-n + (10 - d) * 10^-n = 10^{-n+1}
Therefore, we get this useful fact:
d * 10^-n = 10^{-n+1} - (10 - d) * 10^-n
If we assume that n is odd, then -10^-n = (-10)^-n and 10^{-n+1} = (-10)^{-n+1}. Therefore, for odd n, we see that
d * 10^-n = 10^{-n+1} - (10 - d) * 10^-n
= (-10)^{-n+1} + (10 - d) * (-10)^-n
Think about what this means in a negadecimal setting. We've turned a power of ten into a sum of two powers of minus ten.
Applying this to our summation gives this:
0.523598734 = 2 * (-10)^-2
+ 5 * (-10)^-4
+ 8 * (-10)^-6
+ 3 * (-10)^-8
+ 5 * 10^-1
+ 3 * 10^-3
+ 9 * 10^-5
+ 7 * 10^-7
+ 4 * 10^-9
= 2 * (-10)^-2
+ 5 * (-10)^-4
+ 8 * (-10)^-6
+ 3 * (-10)^-8
+ (-10)^0 + 5 * (-10)^-1
+ (-10)^-2 + 7 * (-10)^-3
+ (-10)^-4 + 1 * (-10)^-5
+ (-10)^-6 + 3 * (-10)^-7
+ (-10)^-8 + 6 * (-10)^-9
Regrouping gives this:
0.523598734 = (-10)^0
+ 5 * (-10)^-1
+ 2 * (-10)^-2 + (-10)^-2
+ 7 * (-10)^-3
+ 5 * (-10)^-4 + (-10)^-4
+ 1 * (-10)^-5
+ 8 * (-10)^-6 + (-10)^-6
+ 3 * (-10)^-7
+ 3 * (-10)^-8 + (-10)^-8
+ 6 * (-10)^-9
Overall, this gives a negadecimal representation of 1.537619346ND
Now, let's think about this at a negadigit level. Notice that
Digits in even-numbered positions are mostly preserved.
Digits in odd-numbered positions are flipped: any nonzero, odd-numbered digit is replaced by 10 minus that digit.
Each time an odd-numbered digit is flipped, the preceding digit is incremented.
Let's look at 0.523598734 and apply this algorithm directly. We start by flipping all of the odd-numbered digits to give their 10's complement:
0.523598734 --> 0.527518336
Next, we increment the even-numbered digits preceding all flipped odd-numbered digits:
0.523598734 --> 0.527518336 --> 1.537619346ND
This matches our earlier number, so it looks like we have the makings of an algorithm!
Things get a bit trickier, unfortunately, when we start working with decimal values involving the number 9. For example, let's take the number 0.999. Applying our algorithm, we start by flipping all the odd-numbered digits:
0.999 --> 0.191
Now, we increment all the even-numbered digits preceding a column that had a value flipped:
0.999 --> 0.191 --> 1.1(10)1
Here, the (10) indicates that the column containing a 9 overflowed to a 10. Clearly this isn't allowed, so we have to fix it.
To figure out how to fix this, it's instructive to look at how to count in negabinary. Here's how to count from 0 to 110:
000
001
002
003
...
008
009
190
191
192
193
194
...
198
199
180
181
...
188
189
170
...
118
119
100
101
102
...
108
109
290
Fortunately, there's a really nice pattern here. The basic mechanism works like normal base-10 incrementing: increment the last digit, and if it overflows, carry a 1 into the next column, continuing to carry until everything stabilizes. The difference here is that the odd-numbered columns work in reverse. If you increment the -10s digit, for example, you actually subtract one rather than adding one, since increasing the value in that column by 10 corresponds to having one fewer -10 included in your sum. If that number underflows at 0, you reset it back to 9 (subtracting 90), then increment the next column (adding 100). In other words, the general algorithm for incrementing a negadecimal number works like this:
Start at the 1's column.
If the current column is at an even-numbered position:
Add one.
If the value reaches 10, set it to zero, then apply this procedure to the preceding column.
If the current column is at an odd-numbered position:
Subtract one.
If the values reaches -1, set it to 9, then apply this procedure to the preceding column.
You can confirm that this math works by generalizing the above reasoning about -10s digits and 100s digits and realizing that overflowing an even-numbered column corresponding to 10k means that you need to add in 10k+1, which means that you need to decrement the previous column by one, and that underflowing an odd-numbered column works by subtracting out 9 · 10k, then adding in 10k+1.
Let's go back to our example at hand. We're trying to convert 0.999 into negadecimal, and we've gotten to
0.999 --> 0.191 --> 1.1(10)1
To fix this, we'll take the 10's column and reset it back to 0, then carry the 1 into the previous column. That's an odd-numbered column, so we decrement it. This gives the final result:
0.999 --> 0.191 --> 1.1(10)1 --> 1.001ND
Overall, for positive numbers, we have the following algorithm for doing the conversion:
Processing digits from left to right:
If you're at an odd-numbered digit that isn't zero:
Replace the digit d with the digit 10 - d.
Using the standard negadecimal addition algorithm, increment the value in the previous column.
Of course, negative numbers are a whole other story. With negative numbers, the odd columns are correct and the even columns need to be flipped, since the parity of the (-10)k terms in the summation flip. Consequently, for negative numbers, you apply the above algorithm, but preserve the odd columns and flip the even columns. Similarly, instead of incrementing the preceding digit when doing a flip, you decrement the preceding digit.
As an example, suppose we want to convert -0.523598734 into negadecimal. Applying the algorithm gives this:
-0.523598734 --> 0.583592774 --> 0.6845(10)2874 --> 0.684402874ND
This is indeed the correct representation.
Hope this helps!
For your question i thought about this object-oriented code. I am not sure although. This class takes two negadecimals numbers with an operator and creates an equation, then converts those numbers to decimals.
public class NegadecimalNumber {
private int number1;
private char operator;
private int number2;
public NegadecimalNumber(int a, char op, int b) {
this.number1 = a;
this.operator = op;
this.number2 = b;
}
public int ConvertNumber1(int a) {
int i = 1;
int nega, temp;
temp = a;
int n = a & (-10);
while (n > 0) {
temp = a / (-10);
n = temp % (-10);
n = n * i;
i = i * 10;
}
nega = n;
return nega;
}
public int ConvertNumber2(int b) {
int i = 1;
int negb, temp;
temp = b;
int n = b & (-10);
while (n > 0) {
temp = b / (-10);
n = temp % (-10);
n = n * i;
i = i * 10;
}
negb = n;
return negb;
}
public double Equation() {
double ans = 0;
if (this.operator == '+') {
ans = this.number1 + this.number2;
} else if (this.operator == '-') {
ans = this.number1 - this.number2;
} else if (this.operator == '*') {
ans = this.number1 * this.number2;
} else if (this.operator == '/') {
ans = this.number1 / this.number2;
}
return ans;
}
}
Note that https://en.wikipedia.org/wiki/Negative_base#To_Negative_Base tells you how to convert whole numbers to a negative base. So one way to solve the problem is simply to multiply the fraction by a high enough power of 100 to turn it into a whole number, convert, and then divide again: -0.06 = -6 / 100 => 14/100 = 0.14.
Another way is to realise that you are trying to create a sum of the form -a/10 + b/100 -c/1000 + d/10000... to approximate the target number so you want to reduce the error as much as possible at each stage, but you need to leave an error in the direction that you can correct at the next stage. Note that this also means that a fraction might not start with 0. when converted. 0.5 => 1.5 = 1 - 5/10.
So to convert -0.06. This is negative and the first digit after the decimal point is in the range [0.0, -0.1 .. -0.9] so we start with 0. to leave us -0.06 to convert. Now if the first digit after the decimal point is 0 then I have -0.06 left, which is in the wrong direction to convert with 0.0d so I need to chose the first digit after the decimal point to produce an approximation below my target -0.06. So I chose 0.1, which is actually -0.1 and leaves me with an error of 0.04, which I can convert exactly leaving me the conversion of 0.14.
So at each point output the digit which gives you either
1) The exact result, in which case you are finished
2) An approximation which is slightly larger than the target number, if the next digit will be negative.
3) An approximation which is slightly smaller than the target number, if the next digit will be positive.
And if you start off trying to approximate a number in the range (-1.0, 0.0] at each point you can choose a digit which keeps the remaining error small enough and in the right direction, so this always works.
Related
What is an efficient algorithm for finding the digit in nth position in the following string
112123123412345123456 ... 123456789101112 ...
Storing the entire string in memory is not feasible for very large n, so I am looking for an algorithm that can find the nth digit in the above string which works if n is very large (i.e. an alternative to just generating the first n digits of the string).
There are several levels here: the digit is part of a number x, the number x is part of a sequence 1,2,3...x...y and that sequence is part of a block of sequences that lead up to numbers like y that have z digits. We'll tackle these levels one by one.
There are 9 numbers with 1 digit:
first: 1 (sequence length: 1 * 1)
last: 9 (sequence length: 9 * 1)
average sequence length: (1 + 9) / 2 = 5
1-digit block length: 9 * 5 = 45
There are 90 numbers with 2 digits:
first: 10 (sequence length: 9 * 1 + 1 * 2)
last: 99 (sequence length: 9 * 1 + 90 * 2)
average sequence length: 9 + (2 + 180) / 2 = 100
2-digit block length: 90 * 100 = 9000
There are 900 numbers with 3 digits:
first: 100 (sequence length: 9 * 1 + 90 * 2 + 1 * 3)
last: 999 (sequence length: 9 * 1 + 90 * 2 + 900 * 3)
average sequence length: 9 + 180 + (3 + 2,700) / 2 = 1,540.5
3-digit block length: 900 * 1,540.5 = 1,386,450
If you continue to calculate these values, you'll find which block (of sequences up to how many digits) the digit you're looking for is in, and you'll know the start and end point of this block.
Say you want the millionth digit. You find that it's in the 3-digit block, and that this block is located in the total sequence at:
start of 3-digit block: 45 + 9,000 + = 9,045
start of 4-digit block: 45 + 9,000 + 1,386,450 = 1,395,495
So in this block we're looking for digit number:
1,000,000 - 9,045 = 990,955
Now you can use e.g. a binary search to find which sequence the 990,955th digit is in; you start with the 3-digit number halfway in the 3-digit block:
first: 100 (sequence length: 9 + 180 + 1 * 3)
number: 550 (sequence length: 9 + 180 + 550 * 3)
average sequence length: 9 + 180 + (3 + 1650) / 2 = 1,015.5
total sequence length: 550 * 1,015.5 = 558,525
Which is too small; so we try 550 * 3/4 = 825, see if that is too small or large, and go up or down in increasingly smaller steps until we know which sequence the 990,995th digit is in.
Say it's in the sequence for the number n; then we calculate the total length of all 3-digit sequences up to n-1, and this will give us the location of the digit we're looking for in the sequence for the number n. Then we can use the numbers 9*1, 90*2, 900*3 ... to find which number the digit is in, and then what the digit is.
We have three types of structures that we would like to be able to search on, (1) the sequence of concatenating d-digit numbers, for example, single digit:
123456...
or 3-digit:
100101102103
(2) the rows in a section,
where each section builds on the previous section added to a prefix. For example, section 1:
1
12
123
...
or section 3:
1234...10111213...100
1234...10111213...100102
1234...10111213...100102103
<----- prefix ----->
and (3) the full sections, although the latter we can just enumerate since they grow exponentially and help build our section prefixes. For (1), we can use simple division if we know the digit count; for (2), we can binary search.
Here's Python code that also answers the big ones:
def getGreatest(n, d, prefix):
rows = 9 * 10**(d - 1)
triangle = rows * (d + rows * d) // 2
l = 0
r = triangle
while l < r:
mid = l + ((r - l) >> 1)
triangle = mid * prefix + mid * (d + mid * d) // 2
prevTriangle = (mid-1) * prefix + (mid-1) * (d + (mid-1) * d) // 2
nextTriangle = (mid+1) * prefix + (mid+1) * (d + (mid+1) * d) // 2
if triangle >= n:
if prevTriangle < n:
return prevTriangle
else:
r = mid - 1
else:
if nextTriangle >= n:
return triangle
else:
l = mid
return l * prefix + l * (d + l * d) // 2
def solve(n):
debug = 1
d = 0
p = 0.1
prefixes = [0]
sections = [0]
while sections[d] < n:
d += 1
p *= 10
rows = int(9 * p)
triangle = rows * (d + rows * d) // 2
section = rows * prefixes[d-1] + triangle
sections.append(sections[d-1] + section)
prefixes.append(prefixes[d-1] + rows * d)
section = sections[d - 1]
if debug:
print("section: %s" % section)
n = n - section
rows = getGreatest(n, d, prefixes[d - 1])
if debug:
print("rows: %s" % rows)
n = n - rows
d = 1
while prefixes[d] < n:
d += 1;
if prefixes[d] == n:
return 9;
prefix = prefixes[d - 1]
if debug:
print("prefix: %s" % prefix)
n -= prefix
if debug:
print((n, d, prefixes, sections))
countDDigitNums = n // d
remainder = n % d
prev = 10**(d - 1) - 1
num = prev + countDDigitNums
if debug:
print("num: %s" % num)
if remainder:
return int(str(num + 1)[remainder - 1])
else:
s = str(num);
return int(s[len(s) - 1])
ns = [
1, # 1
2, # 1
3, # 2
100, # 1
2100, # 2
31000, # 2
999999999999999999, # 4
1000000000000000000, # 1
999999999999999993, # 7
]
for n in ns:
print(n)
print(solve(n))
print('')
Well, you have a series of sequences each increasing by a single number.
If you have "x" of them, then the sequences up to that point occupy x * (x + 1) / 2 character positions. Or, another way of saying this is that the "x"s sequence starts at x * (x - 1) / 2 (assuming zero-based indexing). These are called triangular numbers.
So, all you need to do is to find the "x" value where the cumulative amount is closest to a given "n". Here are three ways:
Search for a closed from solution. This exists, but the formula is rather complicated. (Here is one reference for the sum of triangular numbers.)
Pre-calculate a table in memory with values up to, say, 1,000,000. that will get you to 10^10 sizes.
Use a "binary" search and the formula. So, generate the sequence of values for 1, 2, 4, 8, and so on and then do a binary search to find the exact sequence.
Once you know the sequence where the value lies, determining the value is simply a matter of arithmetic.
I'm doing a Ruby kata that asks me to find the sum of the digits of all the numbers from 1 to N (both ends included).
So if I had these inputs, I would get these outputs:
For N = 10 the sum is 1+2+3+4+5+6+7+8+9+(1+0) = 46
For N = 11 the sum is 1+2+3+4+5+6+7+8+9+(1+0)+(1+1) = 48
For N = 12 the sum is 1+2+3+4+5+6+7+8+9+(1+0)+(1+1) +(1+2)= 51
Now I know in my head what needs to be done. Below is the code that I have to solve this problem:
def solution(n)
if n <= 9
return n if n == 1
solution(n-1) + n
elsif n >= 10
45 + (10..n) #How can I grab the ones,tenths, and hundreds?
end
end
Basically everything is fine until I hit over 10.
I'm trying to find some sort of method that could do this. I searched Fixnum and Integer but I haven't found anything that could help me. I want is to find something like "string"[0] but of course without having to turn the integer back in forth between a string and integer. I know that there is a mathematical relationship there but I'm having a hard time trying to decipher that.
Any help would be appreciated.
You can use modulo and integer division to calculate it recursively:
def sum_digits(n)
return n if n < 10
(n % 10) + sum_digits(n / 10)
end
sum_digits(123)
# => 6
A beginner would probably do this:
123.to_s.chars.map(&:to_i)
# => [1, 2, 3]
but a more thoughtful person would do this:
n, a = 123, []
until n.zero?
n, r = n.divmod(10)
a.unshift(r)
end
a
# => [1, 2, 3]
Rather than computing the sum of the digits for each number in the range, and then summing those subtotals, I have computed the total using combinatorial methods. As such, it is much more efficient than straight enumeration.
Code
SUM_ONES = 10.times.with_object([]) { |i,a| a << i*(i+1)/2 }
S = SUM_ONES[9]
def sum_digits_nbrs_up_to(n)
pwr = n.to_s.size - 1
tot = n.to_s.chars.map(&:to_i).reduce(:+)
sum_leading_digits = 0
pwr.downto(0).each do |p|
pwr_term = 10**p
leading_digit = n/pwr_term
range_size = leading_digit * pwr_term
tot += sum_leading_digits * range_size +
sum_digits_to_pwr(leading_digit, p)
sum_leading_digits += leading_digit
n -= range_size
end
tot
end
def sum_digits_to_pwr(d, p)
case
when d.zero? && p.zero?
0
when d.zero?
10**(p-1) * S * d * p
when p.zero?
10**p * SUM_ONES[d-1]
else
10**p * SUM_ONES[d-1] + 10**(p-1) * S * d * p
end
end
Examples
sum_digits_nbrs_up_to(456) #=> 4809
sum_digits_nbrs_up_to(2345) #=> 32109
sum_digits_nbrs_up_to(43021) #=> 835759
sum_digits_nbrs_up_to(65827359463206357924639357824065821)
#=> 10243650329265398180347270847360769369
These calculations were all essentially instantaneous. I verified the totals for the first three examples by straight enumeration, using #sawa's method for calculating the sum of digits for each number in the range.
Explanation
The algorithm can best be explained with an example. Suppose n equals 2345.
We begin by defining the following functions:
t(n) : sum of all digits of all numbers between 1 and n, inclusive (the answer)
sum(d): sum of all digits between 1 and d, inclusive, (for d=1..9, sum(d) = 0, 1, 3, 6, 10, 15, 21, 28, 36, 45).
g(i) : sum of digits of the number i.
f(i,j): sum of all digits of all integers between i and j-1, inclusive.
g(m) : sum of digits of the number m.
h(d,p): sum of all digits of all numbers between 0 and d*(10^p)-1 (derived below).
Then (I explain the following below):
t(2345) = f(0-1999)+f(2000-2299)+f(2300-2339)+f(2340-2344)+g(2345)
f( 0-1999) = h(2,3) = h(2,3)
f(2000-2299) = 2 * (2299-2000+1) + h(3,2) = 600 + h(3,2)
f(2300-2339) = (2+3) * (2339-2300+1) + h(4,1) = 200 + h(4,1)
f(2340-2344) = (2+3+4) * (2344-2340+1) + h(5,0) = 45 + h(5,0)
g(2345) = 2+3+4+5 = 14
so
t(2345) = 859 + h(2,3) + h(3,2) + h(4,1) + h(5,0)
First consider f(2000-2299). The first digit, 2, appears in every number in the range (2000..2299); i.e., 300 times. The remaining three digits contribute (by definition) h(3,2) to the total:
f(2000-2299) = 2 * 300 + h(3,2)
For f(2300-2339) the first two digits, 2 and 3, are present in all 40 numbers in the range (2300..2339) and the remaining two digits contribute h(4,1) to the total:
f(2300-2339) = 5 * 40 + h(4,1)
For f(2340-2344), the first three digits, '2,3and4, are present in all four number in the range ``(2340-2344) and the last digit contributes h(5,0) to the total.
It remains to derive an expression for computing h(d,p). Again, this is best explained with an example.
Consider h(3,2), which is the sum of the all digits of all numbers between 0 and 299.
First consider the sum of digits for the first digit. 0, 1 and 2 are each the first digit for 100 numbers in the range 0-299. Hence, the first digit, summed, contributes
0*100 + 1*100 + 2*100 = sum(2) * 10^2
to the total. We now add the sum of digits for the remaining 2 digits. The 300 numbers each have 2 digits in the last two positions. Each of the digits 0-9 appears in 1/10th of 2 * 300 = 600 digits; i.e, 60 times. Hence, the sum of all digits in last 2 digit positions, over all 300 numbers, equals:
sum(9) * 2 * 300 / 10 = 45 * 2 * 30 = 2700.
More generally,
h(d,p) = sum(d-1) * 10**p + sum(9) * d * p * 10**(p-1) if d > 0 and p > 0
= sum(d-1) * 10**p if d > 0 and p == 0
= sum(9) * d * p * 10**(p-1) if d == 0 and p > 0
= 0 if d == 0 and p == 0
Applying this to the above example, we have
h(2,3) = sum(1) * 10**3 + (45 * 2 * 3) * 10**2 = 1 * 1000 + 270 * 100 = 28000
h(3,2) = sum(2) * 10**2 + (45 * 3 * 2) * 10**1 = 3 * 100 + 270 * 10 = 3000
h(4,1) = sum(3) * 10**1 + (45 * 4 * 1) * 10**0 = 6 * 10 + 180 * 1 = 240
h(5,0) = sum(4) * 10**0 = 10 * 1 = 10
Therefore
t(2345) = 859 + 28000 + 3000 + 240 + 10 = 32109
The code above implements this algorithm in a straightforward way.
I confirmed the results for the first three examples above by using using #sawa's code to determine the sum of the digits for each number in the range and then summed those totals:
def sum_digits(n)
a = []
until n.zero?
n, r = n.divmod(10)
a.unshift(r)
end
a.reduce(:+)
end
def check_sum_digits_nbrs_up_to(n)
(1..n).reduce(0) {|t,i| t + sum_digits(i) }
end
check_sum_digits_nbrs_up_to(2345) #=> 32109
I have a math problem that I solve by trial and error (I think this is called brute force), and the program works fine when there are a few options, but as I add more variables/data it takes longer and longer to run.
My problem is although, the prototype works, it is useful with thousands of variables and large data sets; so, I'm wondering if it is possible to scale brute force algorithms. How can I approach scaling it?
I was starting to learn and play around with Hadoop (and HBase); although it looks promising, I wanted to verify that what I'm trying to do isn't impossible.
If it helps, I wrote the program in Java (and can use it if possible), but ended up porting it to Python, because I feel more comfortable with it.
Update: To provide more insight, I think I'll add a simplified version of the code to get the idea. Basically if I know the sum is 100, I am trying to find all combinations of the variables that could equal it. This is simple, in my version I may use larger numbers and many more variables. It's the Diophantine, and I believe there is no algorithm that exists to solve it without brute force.
int sum = 100;
int a1 = 20;
int a2 = 5;
int a3 = 10;
for (int i = 0; i * a1 <= sum; i++) {
for (int j = 0; i * a1 + j * a2 <= sum; j++) {
for (int k = 0; i * a1 + j * a2 + k * a3 <= sum; k++) {
if (i * a1 + j * a2 + k * a3 == sum) {
System.out.println(i + "," + j + "," + k);
}
}
}
}
I am new to programming, and I am sorry if I'm not framing this question correctly. This is more of a general question.
Typically, you can quantify how well an algorithm will scale by using big-O notation to analyze its growth rate. When you say that your algorithm works by "brute force," it's unclear to what extent it will scale. If your "brute force" solution works by listing all possible subsets or combinations of a set of data, then it almost certainly will not scale (it will have asymptotic complexity O(2n) or O(n!), respectively). If your brute force solution works by finding all pairs of elements and checking each, it may scale reasonably well (O(n2)). Without more information about how your algorithm works, though, it's difficult to say.
You may want to look at this excellent post about big-O as a starting point for how to reason about the long-term scalablility of your program. Typically speaking, anything that has growth rate O(n log n), O(n), O(log n), or O(1) scale extremely well, anything with growth rate O(n2) or O(n3) will scale up to a point, and anything with growth rate O(2n) or higher will not scale at all.
Another option would be to look up the problem you're trying to solve to see how well-studied it is. Some problems are known to have great solutions, and if yours is one of them it might be worth seeing what others have come up with. Perhaps there is a very clean, non-brute-force solution that scales really well! Some other problems are conjectured to have no scalable algorithms at all (the so-called NP-hard problems). If that's the case, then you should be pretty confident that there's no way to get a scalable approach.
And finally, you can always ask a new question here at Stack Overflow describing what you're trying to do and asking for input. Maybe the community can help you solve your problem more efficiently than you initially expected!
EDIT: Given the description of the problem that you are trying to solve, right now you are doing one for loop per variable from 0 up to the number you're trying to target. The complexity of this algorithm is O(Uk), where k is the number of variables and U is the sum. This approach will not scale very well at all. Introducing each new variable in the above case will make the algori2thm run 100 times slower, which definitely will not scale very well if you want 100 variables!
However, I think that there is a fairly good algorithm whose runtime is O(U2k) that uses O(Uk) memory to solve the problem. The intuition is as follows: Suppose that we want to sum up 1, 2, and 4 to get 10. There are many ways to do this:
2 * 4 + 1 * 2 + 0 * 1
2 * 4 + 0 * 2 + 2 * 1
1 * 4 + 3 * 2 + 0 * 1
1 * 4 + 2 * 2 + 2 * 1
1 * 4 + 1 * 2 + 4 * 1
1 * 4 + 0 * 2 + 6 * 1
0 * 4 + 5 * 2 + 0 * 1
0 * 4 + 4 * 2 + 2 * 1
0 * 4 + 3 * 2 + 4 * 1
0 * 4 + 2 * 2 + 6 * 1
0 * 4 + 1 * 2 + 8 * 1
0 * 4 + 0 * 2 + 10 * 1
The key observation is that we can write all of these out as sums, but more importantly, as sums where each term in the sum is no greater than the previous term:
2 * 4 + 1 * 2 + 0 * 1 = 4 + 4 + 2
2 * 4 + 0 * 2 + 2 * 1 = 4 + 4 + 1 + 1
1 * 4 + 3 * 2 + 0 * 1 = 4 + 2 + 2 + 2
1 * 4 + 2 * 2 + 2 * 1 = 4 + 2 + 2 + 1 + 1
1 * 4 + 1 * 2 + 4 * 1 = 4 + 2 + 1 + 1 + 1 + 1
1 * 4 + 0 * 2 + 6 * 1 = 4 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 5 * 2 + 0 * 1 = 2 + 2 + 2 + 2 + 2
0 * 4 + 4 * 2 + 2 * 1 = 2 + 2 + 2 + 2 + 1 + 1
0 * 4 + 3 * 2 + 4 * 1 = 2 + 2 + 2 + 1 + 1 + 1 + 1
0 * 4 + 2 * 2 + 6 * 1 = 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 1 * 2 + 8 * 1 = 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 0 * 2 + 10 * 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
So this gives an interesting idea about how to generate all possible ways to sum up to the target. The idea is to fix the first coefficient, then generate all possible ways to make the rest of the sum work out. In other words, we can think about the problem recursively. If we list the variables in order as x1, x2, ..., xn, then we can try fixing some particular coefficient for x1, then solving the problem of summing up sum - c_1 x_1 using just x2, ..., xn.
So far this doesn't seem all that fancy - in fact, it's precisely what you're doing above - but there is one trick we can use. As long as we're going to be thinking about this problem recursively, let's think about the problem in the opposite manner. Rather than starting with sum and trying to break it down, what if instead we started with 0 and tried to build up everything that we could?
Here's the idea. Suppose that we already know in advance all the numbers we can make using just sums of x1. Then for every number k between 0 and sum, inclusive, we can make k out of x2 and x1 out of any combination where k - c2 x2 is something that can be made out of combinations of x1. But since we've precomputed this, we can just iterate up over all possible legal values of c2, compute k - c2 x2, and see if we know how to make it. Assuming we store a giant U x (k + 1) table of boolean values such that table entry [x, y] stores "can we sum up the first y values, inclusive, in a way that sums up to precisely U?," we can fill in the table efficiently. This is called dynamic programming and is a powerful algorithmic tool.
More concretely, here's how this might work. Given k variables, create a U x (k + 1) table T of values. Then, set T[0][0] = true and T[x][0] = false for all x > 0. The rationale here is that T[0][0] means "can we get the sum zero using a linear combination of the first zero variables?" and the answer is definitely yes (the empty sum is zero!), but for any other sum made of no a linear combination of no variables we definitely cannot make it.
Now, for i = 1 .. k, we'll try to fill in the values of T[x][i]. Remember that T[x][i] means "can we make x as a linear combination of the first i variables?" Well, we know that we can do this if there is some coefficient c such that k - c xi can be made using a linear combination of x1, x2, ..., xi - 1. But for any c, that's just whether T[x - c xi][i - 1] is true. Thus we can say
for i = 1 to k
for z = 0 to sum:
for c = 1 to z / x_i:
if T[z - c * x_i][i - 1] is true:
set T[z][i] to true
Inspecting the loops, we see that the outer loop runs k times, the inner loop runs sum times per iteration, and the innermost loop runs also at most sum times per iteration. Their product is (using our notation from above) O(U2 k), which is way better than the O(Uk) algorithm that you had originally.
But how do you use this information to list off all of the possible ways to sum up to the target? The trick here is to realize that you can use the table to avoid wasting a huge amount of effort searching over every possible combination when many of them aren't going to work.
Let's see an example. Suppose that we have this table completely computed and want to list off all solutions. One idea is to think about listing all solutions where the coefficient of the last variable is zero, then when the last variable is one, etc. The issue with the approach you had before is that for some coefficients there might not be any solutions at all. But with the table we have constructed above, we can prune out those branches. For example, suppose that we want to see if there are any solutions that start with xk having coefficient 0. This means that we're asking if there are any ways to sum up a linear combination of the first k - 1 variables so that the sum of those values is sum. This is possible if and only if T[sum][k - 1] is true. If it is true, then we can recursively try assigning coefficients to the rest of the values in a way that sums up to sum. If not, then we skip this coefficient and go on to the next.
Recursively, this looks something like this:
function RecursivelyListAllThatWork(k, sum) // Using last k variables, make sum
/* Base case: If we've assigned all the variables correctly, list this
* solution.
*/
if k == 0:
print what we have so far
return
/* Recursive step: Try all coefficients, but only if they work. */
for c = 0 to sum / x_k:
if T[sum - c * x_k][k - 1] is true:
mark the coefficient of x_k to be c
call RecursivelyListAllThatWork(k - 1, sum - c * x_k)
unmark the coefficient of x_k
This recursively will list all the solutions that work, using the values in the table we just constructed to skip a huge amount of wasted effort. Once you've built this table, you could divvy this work up by farming out the task to multiple computers, having them each list a subset of the total solutions, and processing them all in parallel.
Hope this helps!
By definition, brute force algorithms are stupid. You'd be much better off with a more clever algorithm (if you have one). A better algorithm will reduce the work that has do be done, hopefully to a degree that you can do it without needing to "scale out" to multiple machines.
Regardless of algorithm, there comes a point when the amount of data or computation power required is so big that you will need use something like Hadoop. But usually, we are really talking Big Data here. You can already do a lot with a single PC these days.
The algorithm to solve this issue is closed to the process we learn for manual mathematical division or also to convert from decimal to another base like octal or hexadecimal - except that two examples only look for a single canonical solution.
To be sure the recursion ends, it is important to order the data array. To be efficient and limit the number of recursions, it is also important to start with higher data values.
Concretely, here is a Java recursive implementation for this problem - with a copy of the result vector coeff for each recursion as expected in theory.
import java.util.Arrays;
public class Solver
{
public static void main(String[] args)
{
int target_sum = 100;
// pre-requisite: sorted values !!
int[] data = new int[] { 5, 10, 20, 25, 40, 50 };
// result vector, init to 0
int[] coeff = new int[data.length];
Arrays.fill(coeff, 0);
partialSum(data.length - 1, target_sum, coeff, data);
}
private static void printResult(int[] coeff, int[] data) {
for (int i = coeff.length - 1; i >= 0; i--) {
if (coeff[i] > 0) {
System.out.print(data[i] + " * " + coeff[i] + " ");
}
}
System.out.println();
}
private static void partialSum(int k, int sum, int[] coeff, int[] data) {
int x_k = data[k];
for (int c = sum / x_k; c >= 0; c--) {
coeff[k] = c;
if (c * x_k == sum) {
printResult(coeff, data);
continue;
} else if (k > 0) {
// contextual result in parameters, local to method scope
int[] newcoeff = Arrays.copyOf(coeff, coeff.length);
partialSum(k - 1, sum - c * x_k, newcoeff, data);
// for loop on "c" goes on with previous coeff content
}
}
}
}
But now that code is in a special case: the last value test for each coeff is 0, so the copy is not necessary.
As a complexity estimation, we can use the maximum depth of recursive calls as data.length * min({ data }). For sure, it will not scale well and the limited factor is the stack trace memory (-Xss JVM option). The code may fail with a stack overflow error for a large data set.
To avoid this drawbacks, the "derecursion" process is useful. It consists in replacing the method call stack by a programmatic stack to store an execution context to process later. Here is the code for that:
import java.util.Arrays;
import java.util.ArrayDeque;
import java.util.Queue;
public class NonRecursive
{
// pre-requisite: sorted values !!
private static final int[] data = new int[] { 5, 10, 20, 25, 40, 50 };
// Context to store intermediate computation or a solution
static class Context {
int k;
int sum;
int[] coeff;
Context(int k, int sum, int[] coeff) {
this.k = k;
this.sum = sum;
this.coeff = coeff;
}
}
private static void printResult(int[] coeff) {
for (int i = coeff.length - 1; i >= 0; i--) {
if (coeff[i] > 0) {
System.out.print(data[i] + " * " + coeff[i] + " ");
}
}
System.out.println();
}
public static void main(String[] args)
{
int target_sum = 100;
// result vector, init to 0
int[] coeff = new int[data.length];
Arrays.fill(coeff, 0);
// queue with contexts to process
Queue<Context> contexts = new ArrayDeque<Context>();
// initial context
contexts.add(new Context(data.length - 1, target_sum, coeff));
while(!contexts.isEmpty()) {
Context current = contexts.poll();
int x_k = data[current.k];
for (int c = current.sum / x_k; c >= 0; c--) {
current.coeff[current.k] = c;
int[] newcoeff = Arrays.copyOf(current.coeff, current.coeff.length);
if (c * x_k == current.sum) {
printResult(newcoeff);
continue;
} else if (current.k > 0) {
contexts.add(new Context(current.k - 1, current.sum - c * x_k, newcoeff));
}
}
}
}
}
From my point of view, it is difficult to be more efficient in a single thread execution - the stack mechanism now requires coeff array copies.
Is there an algorithm for figuring out the following things?
If the result of a division is a repeating decimal (in binary).
If it repeats, at what digit (represented as a power of 2) does the repetition start?
What digits repeat?
Some examples:
1/2 = 1/10 = 0.1 // 1 = false, 2 = N/A, 3 = N/A, 4 = N/A
1/3 = 1/11 = 0.010101... // 1 = true, 2 = -2, 3 = 10
2/3 = 10/11 = 0.101010... // 1 = true, 2 = -1, 3 = 10
4/3 = 100/11 = 1.010101... // 1 = true, 2 = 0, 3 = 10
1/5 = 1/101 = 0.001100110011... // 1 = true, 2 = -3, 3 = 1100
Is there a way to do this? Efficiency is a big concern. A description of the algorithm would be preferred over code, but I'll take what answer I can get.
It's also worth noting that the base isn't a big deal; I can convert the algorithm over to binary (or if it's in, say base 256 to use chars for ease, I could just use that). I say this because if you're explaining it might be easier for you to explain in base 10 :).
if the divisor is not a power of 2 (in general, contains prime factors not shared with the base of representation)
repeat cycle length will be driven by the largest prime factor of the dividend (but not connected with the length of the representation of that factor -- see 1/7 in decimal), but the first cycle length may differ from the repeat unit (e.g. 11/28 = 1/4+1/7 in decimal).
the actual cycle will depend on the numerator.
I can give a hint - repeating decimals in base ten are all fraction with the denominator having at least one prime factors other than two and five. If the denominator contains no prime factors two or five, they can always be represented with a denominator of all nines. Then the nominator is the repeating part and the number of nines is the length of the repeating part.
3 _
- = 0.3
9
1 142857 ______
- = ------ = 0.142857
7 999999
If there are prime factors two or five in the denominator, the repeating part starts not at the first position.
17 17 ______
-- = ----- = 0.4857142
35 5 * 7
But I cannot remember how to derive the non-repeating part and its length.
This seem to translate well to base two. Only fraction with a power of two denominator are non-repeating. This can be easily checked by asserting that only a single bit in the denominator is set.
1/2 = 1/10 = 0.1
1/4 = 1/100 = 0.01
3/4 = 11/100 = 0.11
5/8 = 101/1000 = 0.101
All fraction with odd denominators should be repeating and the pattern and its length can be obtained by expressing the fraction with a denominator in the form 2^n-1.
__
1/3 = 1/(2^2-1) = 1/11 = 0.01
__
2/3 = 2/(2^2-1) = 10/11 = 0.10
__
4/3 => 1 + 1/3 => 1.01
__
10/3 => 3 + 1/3 => 11.01
____
1/5 = 3/15 = 3/(2^4-1) = 11/1111 = 0.0011
________
11/17 = 165/255 = 11/(2^8-1) = 10100101/11111111 = 0.10100101
As for base ten, I cannot tell how to handle denominators containing but not being a power of two - for example 12 = 3 * 2^2.
First of all, one of your examples is wrong. The repeating part of 1/5 is 0011 rather than 1100, and it begins at the very beginning of the fractional part.
A repeating decimal is something like:
a/b = c + d(2-n + 2-n-k + 2-n-2k + ...)
= c + 2-n * d / (1 - 2-k)
in which n and d are what you want.
For example,
1/10(dec) = 1/1010(bin) = 0.0001100110011... // 1 = true, 2 = -1, 3 = 0011
could be represented by the formula with
a = 1, b = 10(dec), c = 0, d = 0.0011(bin), n = 1, k = 4;
(1 - 2-k) = 0.1111
Therefore, 1/10 = 0.1 * 0.0011/0.1111. The key part of a repeating decimal representation is generated by dividing by (2n - 1) or its any multiple of 2. So you can either find a way to express your denominator as such (like building constant tables), or do a big number division (which is relatively slow) and find the loop. There's no quick way to do this.
Check out decimal expansion, and specifically about the period of a fraction.
You can do a long division, noting the remainders. The structure of the remainders will give you the structure of any rational decimal:
the last remainder is zero: it is a decimal without any repeating part
the first and the last remainder are equal: the decimal is repeating right after the dot
the distance between the first and the first remainder equal to the last are the non-repeating digits, the remainder is the repeating part
In general the distances will give you the amount of digits for each part.
You can see this algorithm coded in C++ in the method decompose() here.
Try 228142/62265, it has a period of 1776 digits!
To find the repeating pattern, just keep track of the values you use along the line:
1/5 = 1/101:
1 < 101 => 0
(decimal separator here)
10 < 101 => 0
100 < 101 => 0
1000 >= 101 => 1
1000 - 101 = 11
110 >= 101 => 1
110 - 101 = 1
10 -> match
As you reach the same value as you had at the second bit, the process will just repeat from that point producing the same bit pattern over and over. You have the pattern "0011" repeating from the second bit (first after decimal separator).
If you want the pattern to start with a "1", you can just rotate it until it matches that condition:
"0011" from the second bit
"0110" from the third bit
"1100" from the fourth bit
Edit:
Example in C#:
void FindPattern(int n1, int n2) {
int digit = -1;
while (n1 >= n2) {
n2 <<= 1;
digit++;
}
Dictionary<int, int> states = new Dictionary<int, int>();
bool found = false;
while (n1 > 0 || digit >= 0) {
if (digit == -1) Console.Write('.');
n1 <<= 1;
if (states.ContainsKey(n1)) {
Console.WriteLine(digit >= 0 ? new String('0', digit + 1) : String.Empty);
Console.WriteLine("Repeat from digit {0} length {1}.", states[n1], states[n1] - digit);
found = true;
break;
}
states.Add(n1, digit);
if (n1 < n2) {
Console.Write('0');
} else {
Console.Write('1');
n1 -= n2;
}
digit--;
}
if (!found) {
Console.WriteLine();
Console.WriteLine("No repeat.");
}
}
Called with your examples it outputs:
.1
No repeat.
.01
Repeat from digit -1 length 2.
.10
Repeat from digit -1 length 2.
1.0
Repeat from digit 0 length 2.
.0011
Repeat from digit -1 length 4.
As others have noted, the answer involves a long division.
Here is a simple python function which does the job:
def longdiv(numerator,denominator):
digits = []
remainders = [0]
n = numerator
while n not in remainders: # until repeated remainder or no remainder
remainders.append(n) # add remainder to collection
digits.append(n//denominator) # add integer division to result
n = n%denominator * 10 # remainder*10 for next iteration
# Result
result = list(map(str,digits)) # convert digits to strings
result = ''.join(result) # combine list to string
if not n:
result = result[:1]+'.'+result[1:] # Insert . into string
else:
recurring = remainders.index(n)-1 # first recurring digit
# Insert '.' and then surround recurring part in brackets:
result = result[:1]+'.'+result[1:recurring]+'['+result[recurring:]+']'
return result;
print(longdiv(31,8)) # 3.875
print(longdiv(2,13)) # 0.[153846]
print(longdiv(13,14)) # 0.9[285714]
It’s heavily commented, so it shouldn’t be too hard to write in other languages, such as JavaScript.
The most important parts, as regards recurring decimals are:
keep a collection of remainders; the first remainder of 0 is added as a convenience for the next step
divide, noting the integer quotient and the remainder
if the new remainder is 0 you have a terminating decimal
if the new remainder is already in the collection, you have a recurring decimal
repeat, adlib and fade etc
The rest of the function is there to format the results.
is there a fast algorithm, similar to power of 2, which can be used with 3, i.e. n%3.
Perhaps something that uses the fact that if sum of digits is divisible by three, then the number is also divisible.
This leads to a next question. What is the fast way to add digits in a number? I.e. 37 -> 3 +7 -> 10
I am looking for something that does not have conditionals as those tend to inhibit vectorization
thanks
4 % 3 == 1, so (4^k * a + b) % 3 == (a + b) % 3. You can use this fact to evaluate x%3 for a 32-bit x:
x = (x >> 16) + (x & 0xffff);
x = (x >> 10) + (x & 0x3ff);
x = (x >> 6) + (x & 0x3f);
x = (x >> 4) + (x & 0xf);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
if (x == 3) x = 0;
(Untested - you might need a few more reductions.) Is this faster than your hardware can do x%3? If it is, it probably isn't by much.
This comp.compilers item has a specific recommendation for computing modulo 3.
An alternative, especially if the maximium size of the dividend is modest, is to multiply by the reciprocal of 3 as a fixed-point value, with enough bits of precision to handle the maximum size dividend to compute the quotient, and then subtract 3*quotient from the the dividend to get the remainder. All of these multiplies can be implemented with a fixed sequence of shifts-and-adds. The number of instructions will depend on the bit pattern of the reciprocal. This works pretty well when the dividend max is modest in size.
Regarding adding digits in the number... if you want to add the decimal digits, you're going to end up doing what amounts to a number-conversion-to-decimal, which involves divide by 10 somewhere. If you're willing to settle for adding up the digits in base2, you can do this with an easy shift-right and add loop. Various clever tricks can be used to do this in chunks of N bits to speed it up further.
Not sure for your first question, but for your second, you can take advantage of the % operator and integer division:
int num = 12345;
int sum = 0;
while (num) {
sum += num % 10;
num /= 10;
}
This works because 12345 % 10 = 5, 12345 / 10 = 1234 and keep going until num == 0
If you are happy with 1 byte integer division, here's a trick. You could extend it to 2 bytes, 4 bytes, etc.
Division is essentially multiplication by 0.3333. If you want to simulate floating point arithmetic then you need closest approximation for the 256 (decimal) boundary. This is 85, because 85 / 256 = 0.332. So if you multiply your value by 85, you should be getting a value close to the result in the high 8 bits.
Multiplying a value with 85 fast is easy. n * 85 = n * 64 + n * 16 + n * 4 + n. Now all these factors are powers of 2 so you can calculate n * 4 by shifting, then use this value to calculate n * 16, etc. So you have max 5 shifts and 4 additions.
As said, this'll give you approximation. To know how good it is you'll need to check the lower byte of the next value using this rule
n ... is the 16 bit number you want to divide
approx = HI(n*85)
if LO(n*85)>LO((n+1)*85)THEN approx++
And that should do the trick.
Example 1:
3 / 3 =?
3 * 85 = 00000000 11111111 (approx=0)
4 * 85 = 00000001 01010100 (LO(3*85)>LO(4*85)=>approx=1)
result approx=1
Example 2:
254 / 3
254 * 85 = 01010100 01010110 (approx=84)
255 * 85 = 01010100 10101011 (LO(254*85)<LO(255*85), don't increase)
result approx=84
If you're dealing with big-integers, one very fast method is realizing the fact for all
bases 10 +/- multiple-of-3
i.e.
4,7,10,13,16,19,22…. etc
All you have to do is count the digits, then % 3. something like :
** note : x ^ y is power, not bit-wise XOR,
x ** y being the python equivalent
function mod3(__,_) {
#
# can handle bases
# { 4, 7,10,13,16,19,
# 22,25,28,31,34 } w/o conversion
#
# assuming base digits :
#
# 0-9A-X for any base,
# or 0-9a-f for base-16
return \
(length(__)<=+((_+=++_+_)+_^_)\
&& (__~"^[0-9]+$") )\
? (substr(__,_~_,_+_*_+_)+\
substr(__,++_*_--))%+_\
:\
(substr("","",gsub(\
"[_\3-0369-=CFILORUXcf-~]+","",__))\
+ length(__) \
+ gsub("[258BbEeHKNQTW]","",__))%+_
}
This isn't the fastest method possible, but it's one of the more agile methods.