I'm looking for easiest way to divide two floating point numbers using VHDL. I need the code to be synthesizable (I'll be implementing it on Spartan 3 FPGA).
First operand will always be a fixed number (e.g. 600), and second one will be integer, let's say between 0 and 99999. Fixed number is dividend, and the integer one is divisor. So I'll have to calculate something like this: 600/124.
Or any other number instead of 124, of course that is in range between 0 and 99999. Second number (the one that is changing) will always be integer !! (there won't be something like 123.45).
After division, I need to convert the result into integer (round it up or just ignore numbers after decimal point, which ever is faster).
Any ideas ? Thanks !
There are many ways to do this, with the easiest being a ROM. You don't need floating point anywhere since doing an integer divide and compensating for a non-zero remainder can give you the same results. I'd suggest calculating the first 600 results in MATLAB or a spreadsheet so you can see that handling values up to 99999 isn't necessary.
Also, some common nomenclature for range and precision is QI.F where I is the number of integer bits and F is the number of fractional bits. Thus 0..99999 would be Q17.0 and your output would be Q10.0.
There's an FP divide function in this VHDL file from this site.
Related
Let's say that we have a random number generator that can generate random 32 or 64 bit integers (like rand.Rand in the standard library)
Generating a random int64 in a given range [a,b] is fairly easy:
rand.Seed(time.Now().UnixNano())
n := rand.Int63n(b-a) + a
Is it possible to generate random 128 bit decimal (as defined in specification IEEE 754-2008) in a given range from a combination of 32 or 64 bit random integers?
It is possible, but the solution is far from trivial. For a correct solution, there are several things to consider.
For one thing, values with exponent E are 10 times more likely than values with exponent E - 1.
Other issues include subnormal numbers and ranges that straddle zero.
I am aware of the Rademacher Floating-Point Library, which tackled this problem for binary floating-point numbers, but the solution there is complicated and its author has not yet written up how his algorithm works.
EDIT (May 11):
I have now specified an algorithm for generating random "uniform" floating-point numbers—
In any range,
with full coverage, and
regardless of the digit base (such as binary or decimal).
Possible, but by no means easy. Here is a sketch of a solution that might be acceptable — writing and debugging it would probably be at least a day of concerted effort.
Let min and max be primitive.Decimal128 objects from go.mongodb.org/mongo-driver/bson. Let MAXBITS be a multiple of 32; 128 is likely to be adequate.
Get the significand (as big.Int) and exponent (as int) of min and max using the BigInt method.
Align min and max so that they have the same exponent. As far as possible, left-justify the value with the larger exponent by decreasing its exponent and adding a corresponding number of zeroes to the right side of its significand. If this would cause the absolute value of the significand to become >= 2**(MAXBITS-1), then either
(a) Right-shift the value with the smaller exponent by dropping digits from the right side of its significand and increasing its exponent, causing precision loss.
(b) Dynamically increase MAXBITS.
(c) Throw an error.
At this point both exponents will be the same, and both significands will be aligned big integers. Set aside the exponents for now, and let range (a new big.Int) be maxSignificand - minSignificand. It will be between 0 and 2**MAXBITS.
Turn range into MAXBITS/32 uint32s using the Bytes or DivMod methods, whatever is easier.
If the highest word of range is equal to math.MaxUint32 then set a flag limit to false, otherwise true.
For n from 0 to MAXBITS/32:
if limit is true, use rand.Int63n (!, not rand.Int31n or rand.Uint32) to generate a value between 0 and the nth word of range, inclusive, cast it to uint32, and store it as the nth word of the output. If the value generated is equal to the nth word of range (i.e. if we generated the maximum possible random value for this word) then let limit remain true, otherwise set it false.
If limit is false, use rand.Uint32 to generate the nth word of the output. limit remains false regardless of the generated value.
Combine the generated words into a big.Int by building a []byte and using big/Int.SetBytes or multiplication and addition, as convenient.
Add the generated value to minSignificand to obtain the significand of the result.
Use ParseDecimal128FromBigInt with the result significand and the exponent from steps 2-3 to obtain the result.
The heart of the algorithm is step 6, which generates a uniform random unsigned integer of arbitrary length 32 bits at a time. The alignment in step 2 reduces the problem from a floating-point to an integer one, and the subtraction in step 3 reduces it to an unsigned one, so that we only have to think about one bound instead of 2. The limit flag records whether we're still dealing with that bound, or whether we've already narrowed the result down to an interval that doesn't include it.
Caveats:
I haven't written this, let alone tested it. I may have gotten it quite wrong. A sanity check by someone who does more numerical computation work than me would be welcome.
Generating numbers across a large dynamic range (including crossing zero) will lose some precision and omit some possible output values with smaller exponents unless a ludicrously large MAXBITS is used; however, 128 bits should give a result at least as good as a naive algorithm implemented in terms of decimal128.
The performance is probably pretty bad.
Go has a large number package that can do arbitrary length integers: https://golang.org/pkg/math/big/
It has a pseudo random number generator https://golang.org/pkg/math/big/#Int.Rand, and the crypto package also has https://golang.org/pkg/crypto/rand/#Int
You'd want to specify the max using https://golang.org/pkg/math/big/#Int.Exp as 2^128.
Can't speak to performance, though, or whether this is compliant if the IEEE standard, but large random numbers like what you'd use for UUIDs are possible.
It depends how many values you want to generate. If it's enough to have no more 10^34 values in a specified range - it's quite simple.
As I see the problem, a random value in the range min..max can be calculated as random(0..1)*(max-min)+min
Look like we need to generate only decimal128 value in range 0..1. So it's a random value in range 0..10^34-1 with exponent -34. This value can be generated with a golang standard random package.
To multiply, add and substruct float128 values can be used golang math/big package with values normalization.
This is definitely what you are looking for.
Problem:
The numbers from 1 to 10 are given. Put the equal sign(somewhere between
them) and any arithmetic operator {+ - * /} so that a perfect integer
equality is obtained(both the final result and the partial results must be
integer)
Example:
1*2*3*4*5/6+7=8+9+10
1*2*3*4*5/6+7-8=9+10
My first idea to resolve this was using backtracking:
Generate all possibilities of putting operators between the numbers
For one such possibility replace all the operators, one by one, with the equal sign and check if we have two equal results
But this solution takes a lot of time.
So, my question is: Is there a faster solution, maybe something that uses the operator properties or some other cool math trick ?
I'd start with the equals sign. Pick a possible location for that, and split your sequence there. For left and right side independently, find all possible results you could get for each, and store them in a dict. Then match them up later on.
Finding all 226 solutions took my Python program, based on this approach, less than 0.15 seconds. So there certainly is no need to optimize further, is there? Along the way, I computed a total of 20683 subexpressions for a single side of one equation. They are fairly well balenced: 10327 expressions for left hand sides and 10356 expressions for right hand sides.
If you want to be a bit more clever, you can try reduce the places where you even attempt division. In order to allov for division without remainder, the prime factors of the divisor must be contained in those of the dividend. So the dividend must be some product and that product must contain the factors of number by which you divide. 2, 3, 5 and 7 are prime numbers, so they can never be such divisors. 4 will never have two even numbers before it. So the only possible ways are 2*3*4*5/6, 4*5*6*7/8 and 3*4*5*6*7*8/9. But I'd say it's far easier to check whether a given division is possible as you go, without any need for cleverness.
I need to multiply an integer ranging from 0-1023 by 1023 and divide the result by a number ranging from 1-1023 in hardware (verilog/fpga implementation). The multiplication is straight forward since I can probably get away with just shifting 10 bits (and if needed I'll subtract an extra 1023). The division is a little interesting though. Area/power arent't really critical to me (I'm in an FPGA so the resources are already there). Latency (within reason) isn't a big deal so long as I can pipeline the deisgn. There are obviously several choices with different trade offs, but I'm wondering if there's an "obvious" or "no brainer" algorithm for a situation like this. Given the limited range of operands and the abundance of resources that I have (bram etc) I'm wondering if there isn't something obvious to do.
If you can work with fixed point precision rather than integers it may be possible to change :
divide the result by a number ranging from 1-1023
to multiplication by a number ranging from 1 - 1/1023, ie pre-compute the divide and store that as the coefficient for the multiply.
If you can pre-compute everything, and you've got a spare 20x20 multiplier, and some way to store your pre-computed number, then go for Morgan's suggestion. You need to precompute a 20-bit multiplicand (10b quotient, 10b remainder), and multiply by your first 10b number, and take the bottom 30b of the 40b result.
Otherwise, the no-brainer is non-restoring division, since you say that latency isn't important (lots of stuff on the web, most of it incomprehensible). you have a 20-bit numerator (the result of your (1023 x) multiplication), and a 10-bit denominator. This gives a 20b quotient, and a 10b remainder (ie. 20 bits for the integer part of the answer, and 10 bits for the fractional part, giving a 30b answer).
The actual hardware is pretty trivial: an 11b adder/subtractor, a 31b shift register, and a 10b or 11b register to store the divisor. You also need a small FSM to control it (2b). You have to do a compare, add or subtract, and shift in every clock cycle, and you get the answer out in 21 cycles. I think. :)
I have a list in which each item has 2 integer attributes, n and m. I would like to map these two integer attributes to a single new attribute so that when the list is sorted on the new attribute, it is sorted on n first and then ties are broken with m.
I came up with n - 1/m. So the two integers are mapped to a single real number. I think this works. Any better ideas?
That's clever, so I hate to break it to you, but it won't work. Try it (with a computer) using the n=1,000,000,000 and values of m between 999,999,990 and 1,000,000,010. You'll find that n-1/m is the same value for all of those cases.
It would work if floating point numbers had infinite precision, or even if they had twice as much precision as an int (although even there you might run into some issues), but they don't: a double precision floating point number has 53 bits of precision. An integer is (probably) 32 bits, so you'd need at least 64 bits to encode two of them. But then, you could just use a 64-bit (long long) integer, encoding the pair as n*2^32 + m.
A recent homework assignment I have received asks us to take expressions which could create a loss of precision when performed in the computer, and alter them so that this loss is avoided.
Unfortunately, the directions for doing this haven't been made very clear. From watching various examples being performed, I know that there are certain methods of doing this: using Taylor series, using conjugates if square roots are involved, or finding a common denominator when two fractions are being subtracted.
However, I'm having some trouble noticing exactly when loss of precision is going to occur. So far the only thing I know for certain is that when you subtract two numbers that are close to being the same, loss of precision occurs since high order digits are significant, and you lose those from round off.
My question is what are some other common situations I should be looking for, and what are considered 'good' methods of approaching them?
For example, here is one problem:
f(x) = tan(x) − sin(x) when x ~ 0
What is the best and worst algorithm for evaluating this out of these three choices:
(a) (1/ cos(x) − 1) sin(x),
(b) (x^3)/2
(c) tan(x)*(sin(x)^2)/(cos(x) + 1).
I understand that when x is close to zero, tan(x) and sin(x) are nearly the same. I don't understand how or why any of these algorithms are better or worse for solving the problem.
Another rule of thumb usually used is this: When adding a long series of numbers, start adding from numbers closest to zero and end with the biggest numbers.
Explaining why this is good is abit tricky. when you're adding small numbers to a large numbers, there is a chance they will be completely discarded because they are smaller than then lowest digit in the current mantissa of a large number. take for instance this situation:
a = 1,000,000;
do 100,000,000 time:
a += 0.01;
if 0.01 is smaller than the lowest mantissa digit, then the loop does nothing and the end result is a == 1,000,000
but if you do this like this:
a = 0;
do 100,000,000 time:
a += 0.01;
a += 1,000,000;
Than the low number slowly grow and you're more likely to end up with something close to a == 2,000,000 which is the right answer.
This is ofcourse an extreme example but I hope you get the idea.
I had to take a numerics class back when I was an undergrad, and it was thoroughly painful. Anyhow, IEEE 754 is the floating point standard typically implemented by modern CPUs. It's useful to understand the basics of it, as this gives you a lot of intuition about what not to do. The simplified explanation of it is that computers store floating point numbers in something like base-2 scientific notation with a fixed number of digits (bits) for the exponent and for the mantissa. This means that the larger the absolute value of a number, the less precisely it can be represented. For 32-bit floats in IEEE 754, half of the possible bit patterns represent between -1 and 1, even though numbers up to about 10^38 are representable with a 32-bit float. For values larger than 2^24 (approximately 16.7 million) a 32-bit float cannot represent all integers exactly.
What this means for you is that you generally want to avoid the following:
Having intermediate values be large when the final answer is expected to be small.
Adding/subtracting small numbers to/from large numbers. For example, if you wrote something like:
for(float index = 17000000; index < 17000001; index++) {}
This loop would never terminate becuase 17,000,000 + 1 is rounded down to 17,000,000.
If you had something like:
float foo = 10000000 - 10000000.0001
The value for foo would be 0, not -0.0001, due to rounding error.
My question is what are some other
common situations I should be looking
for, and what are considered 'good'
methods of approaching them?
There are several ways you can have severe or even catastrophic loss of precision.
The most important reason is that floating-point numbers have a limited number of digits, e.g..doubles have 53 bits. That means if you have "useless" digits which are not part of the solution but must be stored, you lose precision.
For example (We are using decimal types for demonstration):
2.598765000000000000000000000100 -
2.598765000000000000000000000099
The interesting part is the 100-99 = 1 answer. As 2.598765 is equal in both cases, it
does not change the result, but waste 8 digits. Much worse, because the computer doesn't
know that the digits is useless, it is forced to store it and crams 21 zeroes after it,
wasting at all 29 digits. Unfortunately there is no way to circumvent it for differences,
but there are other cases, e.g. exp(x)-1 which is a function occuring very often in physics.
The exp function near 0 is almost linear, but it enforces a 1 as leading digit. So with 12
significant digits
exp(0.001)-1 = 1.00100050017 - 1 = 1.00050017e-3
If we use instead a function expm1(), use the taylor series:
1 + x +x^2/2 +x^3/6 ... -1 =
x +x^2/2 +x^3/6 =: expm1(x)
expm1(0.001) = 1.00500166667e-3
Much better.
The second problem are functions with a very steep slope like tangent of x near pi/2.
tan(11) has a slope of 50000 which means that any small deviation caused by rounding errors
before will be amplified by the factor 50000 ! Or you have singularities if e.g. the result approaches 0/0, that means it can have any value.
In both cases you create a substitute function, simplying the original function. It is of no use to highlight the different solution approaches because without training you will simply not "see" the problem in the first place.
A very good book to learn and train: Forman S. Acton: Real Computing made real
Another thing to avoid is subtracting numbers that are nearly equal, as this can also lead to increased sensitivity to roundoff error. For values near 0, cos(x) will be close to 1, so 1/cos(x) - 1 is one of those subtractions that you'd like to avoid if possible, so I would say that (a) should be avoided.