Why is gcc evaluating floating point fractional #defines as zero? - gcc

How is (float)_micron_to_meter equal to zero? Given:
//convert microns to meters via multiplication
#define _micron_to_meter (1 / 1000000)
And yet
printf("factor: %f\n", (float)_micron_to_meter);
prints out 0.000000.
Oh.. duh... I can't believe I forgot this. Answer below. I hesitate to post this, but I did pretty extensive searches here and couldn't find any other post about it. And there must be some other programmer who will be as clueless as I was being here. If you find another, feel free to mark this as a duplicate, but check why that post wasn't found?

Note to any future C programmers (and I /knew/ this, I'm not sure how I missed it). If you have a fractional define, made up of integers, e.g.
//convert microns to meters via multiplication
#define _micron_to_meter (1 / 1000000)
then the value is zero. It's already zero before it makes it out of the (). Because 1 is assumed to be an integer, and (int)1 divided by anything greater than 0 is less than 1 and integers can only go to 0 from 1. If you do this:
#define _micron_to_meter (1.0 / 1000000)
Then it works as expected, because 1.0 is assumed to be floating point.

Related

Best way finding the middle point

In many algorithm i have seen people are using two different way to get middle point.
(LOW+HIGH)/2
LOW+(HIGH-LOW)/2
Mostly i have seen 2nd method for example in QuickSort. What is the best way to find middle between two number and why ?
It entirely depends on the context, but I'll make a case for case number 2 and explain why.
Let's first assume you go for case number 1, which is this:
(LOW + HIGH) / 2
This looks entirely reasonable, and mathematically, it is. Let's plug in two numbers and look at the results:
(12345 + 56789) / 2
The result is 34567. Looks OK doesn't it?
Now, the problem is that in computers it's not as simple as that. You also got something called data types to contend with. Usually this is denoted with such things as number of bits. In other words, you might have a 32-bit number, or a 16-bit number, or a 64-bit number, and so on.
All of these have what is known as a legal range of values, ie. "what kind of values will these types hold". An 8-bit number, unsigned (which means it cannot be negative) can hold 2 to the power of 8 different values, or 256. A 16-bit unsigned value can hold 65536 values, or a range from 0 to 65535. If the values are signed they go from -half to +half-1, meaning it will go from -128 to +127 for a 8-bit signed value, and from -32768 to +32767 for a signed 16-bit value.
So now we go back to the original formula. What if the data type we're using for the calculation isn't enough to hold LOW + HIGH ?
For instance, let's say we used 16-bit signed values for this, and we still got this expression:
(12345 + 56789) / 2
12345 can be held in a 16-bit value (it is less than 65536), same with 56789, but what about the result? The result of adding 12345 and 56789 is 69134 which is more than 65535 (the highest unsigned 16-bit value).
So what will happen to it? There's two outcomes:
It will overflow, which means it will start back at 0 and count upwards, which means it will actually end up with the result of 3598 (123456 + 56789) - 65536
It will throw an exception or similar, crashing the program with an overflow problem.
If we get the first result, then (12345 + 56789)/2 becomes 3598/2 or 1799. Clearly incorrect.
So, then what if we used the other approach:
12345 + (56789-12345)/2
First, let's do the parenthesis: 56789-12345 is equal to 44444, a number that can be held in a 16-bit data type.
Adding 12345 + 44444 gives us 56789, a number that can also be held in a 16-bit data type.
Dividing 56789 by 2 gives us 28934.5. Since we're probably dealing with "integers" here we get 28934 (typically, unless your particular world rounds up).
So the reason the second expression is chosen above the first is that it doesn't have to handle overflow the same way and is more resilient to this kind of problem.
In fact, if you think about it, the maximum second value you can have is the maximum legal value you can have for your data type, so this kind of expression:
X + (Y-X)
... assuming both X and Y are the same data type, can at most be the maximum of that data type. Basically, it will not have to contend with an overflow at all.
2nd method used, for avoid int overflow during computation.
Imagine, you used just 1-byte unsigned integer, so overflow happening if value reached 256. Imagine, we have low=100 and high=200. See computing:
1. (lo + hi) / 2 = (100 + 200) / 2 = 300 / 2; // 300 > 256, int overflow
2. lo + (hi - lo) / 2 = 100 + (200 - 100) / 2 = 150; // no overflow
There isn't a best one, but they are clearly different.
(LOW+HIGH)/2 causes trouble if the addition overflows, both in the signed and unsigned case. Doesn't have to be an issue, if you can assume that it never happens it's a perfectly reasonable thing to do.
LOW+(HIGH-LOW)/2 can still cause trouble with overflows if LOW is allowed to be arbitrarily negative. For non-negative input it's fine though. Costs an extra operation.
(LOW+HIGH) >>> 1 as seen in Java. For non-negative but signed input, overflow is non-destructive. The sign bit is used as an extra bit of space that the addition can carry into, the shift then divides the unsigned result by two. Does not work at all if the result is supposed to negative, because by construction the result is non-negative. For array indexes that isn't really a consideration. Doesn't help if you were working with unsigned indexes though.
MOV MID, LOW \ ADD MID, HIGH \ RCR MID in pseudo x86 asm, explicitly uses an extra bit of space and therefore works for all unsigned input no matter what, but cannot be used in most languages.
They all have their upsides and downsides. There is no winner.
The best way is dependent on what you're trying to accomplish. The first is clearly faster (best for performance), while the second one is used to avoid overflow (best for correctness). So the answer to your question is dependent on your definition of "best".

ruby issue with float point iteration [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
When I run the code
count = 0
while count < 1
count += 0.1
puts count
end
I would expect
0.1
0.2
0.3
. . .
I have however been getting
0.1
0.2
0.30000000000000004
0.4
0.5
0.6
0.7
0.7999999999999999
0.8999999999999999
0.9999999999999999
1.0999999999999999
can anyone help explain this?
Think of it this way:
Your computer only has 32 or 64 bits to represent a number. That means it can only represent a finite amount of numbers.
Now consider all the decimal values between 0 and 1. There is an infinite amount of them. How can you possibly represent all Real Numbers if your machine can't even represent all the numbers between 0 and 1?
The answer is that your machine needs to approximate decimal numbers. This is what you are seeing.
Of course there are libraries that try to overcome these limitations and make it so that you can still accurately represent decimal numbers. One such library is BigDecimal:
require 'bigdecimal'
count = BigDecimal.new("0")
while count < 1
count += 0.1
puts count.to_s('F')
end
The downfall is that these libraries are generally slower at arithmetic, because they are a software layer above the CPU doing these calculations.
Floating-point numbers cannot precisely represent all real numbers, and floating-point operations cannot precisely represent true arithmetic operations, this leads to many surprising situations.
I advise to read: https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
You may want to use BigDecimal to avoid such problems.
This is one of the many consequences of the representation of floating point number in memory !
To explain what exactly is is happening would be very long, and has other people have already done it better before, the best thing for you would be to read about go read about it elsewhere :
The very good What Every Computer Scientist Should Know About Floating-Point Arithmetic (article from 1991, reprint on oracle)
Wikipedia pages Floating point and IEEE floating point
IEEE 754 References
You can also have a look at those previous questions on SO:
What is a simple example of floating point/rounding error?
Ruby BigDecimal sanity check (floating point newb)
Strange output when using float instead of double

A clever homebrew modulus implementation

I'm programming a PLC with some legacy software (RSLogix 500, don't ask) and it does not natively support a modulus operation, but I need one. I do not have access to: modulus, integer division, local variables, a truncate operation (though I can hack it with rounding). Furthermore, all variables available to me are laid out in tables sorted by data type. Finally, it should work for floating point decimals, for example 12345.678 MOD 10000 = 2345.678.
If we make our equation:
dividend / divisor = integer quotient, remainder
There are two obvious implementations.
Implementation 1:
Perform floating point division: dividend / divisor = decimal quotient. Then hack together a truncation operation so you find the integer quotient. Multiply it by the divisor and find the difference between the dividend and that, which results in the remainder.
I don't like this because it involves a bunch of variables of different types. I can't 'pass' variables to a subroutine, so I just have to allocate some of the global variables located in multiple different variable tables, and it's difficult to follow. Unfortunately, 'difficult to follow' counts, because it needs to be simple enough for a maintenance worker to mess with.
Implementation 2:
Create a loop such that while dividend > divisor divisor = dividend - divisor. This is very clean, but it violates one of the big rules of PLC programming, which is to never use loops, since if someone inadvertently modifies an index counter you could get stuck in an infinite loop and machinery would go crazy or irrecoverably fault. Plus loops are hard for maintenance to troubleshoot. Plus, I don't even have looping instructions, I have to use labels and jumps. Eww.
So I'm wondering if anyone has any clever math hacks or smarter implementations of modulus than either of these. I have access to + - * /, exponents, sqrt, trig functions, log, abs value, and AND/OR/NOT/XOR.
How many bits are you dealing with? You could do something like:
if dividend > 32 * divisor dividend -= 32 * divisor
if dividend > 16 * divisor dividend -= 16 * divisor
if dividend > 8 * divisor dividend -= 8 * divisor
if dividend > 4 * divisor dividend -= 4 * divisor
if dividend > 2 * divisor dividend -= 2 * divisor
if dividend > 1 * divisor dividend -= 1 * divisor
quotient = dividend
Just unroll as many times as there are bits in dividend. Make sure to be careful about those multiplies overflowing. This is just like your #2 except it takes log(n) instead of n iterations, so it is feasible to unroll completely.
If you don't mind overly complicating things and wasting computer time you can calculate modulus with periodic trig functions:
atan(tan(( 12345.678 -5000)*pi/10000))*10000/pi+5000 = 2345.678
Seriously though, subtracting 10000 once or twice (your "implementation 2") is better. The usual algorithms for general floating point modulus require a number of bit-level manipulations that are probably unfeasible for you. See for example http://www.netlib.org/fdlibm/e_fmod.c (The algorithm is simple but the code is complex because of special cases and because it is written for IEEE 754 double precision numbers assuming there is no 64-bit integer type)
This all seems completely overcomplicated. You have an encoder index that rolls over at 10000 and objects rolling along the line whose positions you are tracking at any given point. If you need to forward project stop points or action points along the line, just add however many inches you need and immediately subtract 10000 if your target result is greater than 10000.
Alternatively, or in addition, you always get a new encoder value every PLC scan. In the case where the difference between the current value and last value is negative you can energize a working contact to flag the wrap event and make appropriate corrections for any calculations on that scan. (**or increment a secondary counter as below)
Without knowing more about the actual problem it is hard to suggest a more specific solution but there are certainly better solutions. I don't see a need for MOD here at all. Furthermore, the guys on the floor will thank you for not filling up the machine with obfuscated wizard stuff.
I quote :
Finally, it has to work for floating point decimals, for example
12345.678 MOD 10000 = 2345.678
There is a brilliant function that exists to do this - it's a subtraction. Why does it need to be more complicated than that? If your conveyor line is actually longer than 833 feet then roll a second counter that increments on a primary index roll-over until you've got enough distance to cover the ground you need.
For example, if you need 100000 inches of conveyor memory you can have a secondary counter that rolls over at 10. Primary encoder rollovers can be easily detected as above and you increment the secondary counter each time. Your working encoder position, then, is 10000 times the counter value plus the current encoder value. Work in the extended units only and make the secondary counter roll over at whatever value you require to not lose any parts. The problem, again, then reduces to a simple subtraction (as above).
I use this technique with a planetary geared rotational part holder, for example. I have an encoder that rolls over once per primary rotation while the planetary geared satellite parts (which themselves rotate around a stator gear) require 43 primary rotations to return to an identical starting orientation. With a simple counter that increments (or decrements, depending on direction) at the primary encoder rollover point it gives you a fully absolute measure of where the parts are at. In this case, the secondary counter rolls over at 43.
This would work identically for a linear conveyor with the only difference being that a linear conveyor can go on for an infinite distance. The problem then only needs to be limited by the longest linear path taken by the worst-case part on the line.
With the caveat that I've never used RSLogix, here is the general idea (I've used generic symbols here and my syntax is probably a bit wrong but you should get the idea)
With the above, you end up with a value ENC_EXT which has essentially transformed your encoder from a 10k inch one to a 100k inch one. I don't know if your conveyor can run in reverse, if it can you would need to handle the down count also. If the entire rest of your program only works with the ENC_EXT value then you don't even have to worry about the fact that your encoder only goes to 10k. It now goes to 100k (or whatever you want) and the wraparound can be handled with a subtraction instead of a modulus.
Afterword :
PLCs are first and foremost state machines. The best solutions for PLC programs are usually those that are in harmony with this idea. If your hardware is not sufficient to fully represent the state of the machine then the PLC program should do its best to fill in the gaps for that missing state information with the information it has. The above solution does this - it takes the insufficient 10000 inches of state information and extends it to suit the requirements of the process.
The benefit of this approach is that you now have preserved absolute state information, not just for the conveyor, but also for any parts on the line. You can track them forward and backward for troubleshooting and debugging and you have a much simpler and clearer coordinate system to work with for future extensions. With a modulus calculation you are throwing away state information and trying to solve individual problems in a functional way - this is often not the best way to work with PLCs. You kind of have to forget what you know from other programming languages and work in a different way. PLCs are a different beast and they work best when treated as such.
You can use a subroutine to do exactly what you are talking about. You can tuck the tricky code away so the maintenance techs will never encounter it. It's almost certainly the easiest for you and your maintenance crew to understand.
It's been a while since I used RSLogix500, so I might get a couple of terms wrong, but you'll get the point.
Define a Data File each for your floating points and integers, and give them symbols something along the lines of MOD_F and MOD_N. If you make these intimidating enough, maintenance techs leave them alone, and all you need them for is passing parameters and workspace during your math.
If you really worried about them messing up the data tables, there are ways to protect them, but I have forgotten what they are on a SLC/500.
Next, defined a subroutine, far away numerically from the ones in use now, if possible. Name it something like MODULUS. Again, maintenance guys almost always stay out of SBRs if they sound like programming names.
In the rungs immediately before your JSR instruction, load the variables you want to process into the MOD_N and MOD_F Data Files. Comment these rungs with instructions that they load data for MODULUS SBR. Make the comments clear to anyone with a programming background.
Call your JSR conditionally, only when you need to. Maintenance techs do not bother troubleshooting non-executing logic, so if your JSR is not active, they will rarely look at it.
Now you have your own little walled garden where you can write your loop without maintenance getting involved with it. Only use those Data Files, and don't assume the state of anything but those files is what you expect. In other words, you cannot trust indirect addressing. Indexed addressing is OK, as long as you define the index within your MODULUS JSR. Do not trust any incoming index. It's pretty easy to write a FOR loop with one word from your MOD_N file, a jump and a label. Your whole Implementation #2 should be less than ten rungs or so. I would consider using an expression instruction or something...the one that lets you just type in an expression. Might need a 504 or 505 for that instruction. Works well for combined float/integer math. Check the results though to make sure the rounding doesn't kill you.
After you are done, validate your code, perfectly if possible. If this code ever causes a math overflow and faults the processor, you will never hear the end of it. Run it on a simulator if you have one, with weird values (in case they somehow mess up the loading of the function inputs), and make sure the PLC does not fault.
If you do all that, no one will ever even realize you used regular programming techniques in the PLC, and you will be fine. AS LONG AS IT WORKS.
This is a loop based on the answer by #Keith Randall, but it also maintains the result of the division by substraction. I kept the printf's for clarity.
#include <stdio.h>
#include <limits.h>
#define NBIT (CHAR_BIT * sizeof (unsigned int))
unsigned modulo(unsigned dividend, unsigned divisor)
{
unsigned quotient, bit;
printf("%u / %u:", dividend, divisor);
for (bit = NBIT, quotient=0; bit-- && dividend >= divisor; ) {
if (dividend < (1ul << bit) * divisor) continue;
dividend -= (1ul << bit) * divisor;
quotient += (1ul << bit);
}
printf("%u, %u\n", quotient, dividend);
return dividend; // the remainder *is* the modulo
}
int main(void)
{
modulo( 13,5);
modulo( 33,11);
return 0;
}

What has a better performance: multiplication or division?

Which version is faster:
x * 0.5
or
x / 2 ?
I've had a course at the university called computer systems some time ago. From back then I remember that multiplying two values can be achieved with comparably "simple" logical gates but division is not a "native" operation and requires a sum register that is in a loop increased by the divisor and compared to the dividend.
Now I have to optimise an algorithm with a lot of divisions. Unfortunately it's not just dividing by two, so binary shifting is not an option. Will it make a difference to change all divisions to multiplications ?
Update:
I have changed my code and didn't notice any difference. You're probably right about compiler optimisations. Since all the answers were great ive upvoted them all. I chose rahul's answer because of the great link.
Usually division is a lot more expensive than multiplication, but a smart compiler will often convert division by a compile-time constant to a multiplication anyway. If your compiler is not smart enough though, or if there are floating point accuracy issues, then you can always do the optimisation explicitly, e.g. change:
float x = y / 2.5f;
to:
const float k = 1.0f / 2.5f;
...
float x = y * k;
Note that this is most likely a case of premature optimisation - you should only do this kind of thing if you have profiled your code and positively identified division as being a performance bottlneck.
Division by a compile-time constant that's a power of 2 is quite fast (comparable to multiplication by a compile-time constant) for both integers and floats (it's basically convertible into a bit shift).
For floats even dynamic division by powers of two is much faster than regular (dynamic or static division) as it basically turns into a subtraction on its exponent.
In all other cases, division appears to be several times slower than multiplication.
For dynamic divisor the slowndown factor at my Intel(R) Core(TM) i5 CPU M 430 # 2.27GHz appears to be about 8, for static ones about 2.
The results are from a little benchmark of mine, which I made because I was somewhat curious about this (notice the aberrations at powers of two) :
ulong -- 64 bit unsigned
1 in the label means dynamic argument
0 in the lable means statically known argument
The results were generated from the following bash template:
#include <stdio.h>
#include <stdlib.h>
typedef unsigned long ulong;
int main(int argc, char** argv){
$TYPE arg = atoi(argv[1]);
$TYPE i = 0, res = 0;
for (i=0;i< $IT;i++)
res+=i $OP $ARG;
printf($FMT, res);
return 0;
}
with the $-variables assigned and the resulting program compiled with -O3 and run (dynamic values came from the command line as it's obvious from the C code).
Well if it is a single calculation you wil hardly notice any difference but if you talk about millions of transaction then definitely Division is costlier than Multiplication. You can always use whatever is the clearest and readable.
Please refer this link:- Should I use multiplication or division?
That will likely depend on your specific CPU and the types of your arguments. For instance, in your example you're doing a floating-point multiplication but an integer division. (Probably, at least, in most languages I know of that use C syntax.)
If you are doing work in assembler, you can look up the specific instructions you are using and see how long they take.
If you are not doing work in assembler, you probably don't need to care. All modern compilers with optimization will change your operations in this way to the most appropriate instructions.
Your big wins on optimization will not be from twiddling the arithmetic like this. Instead, focus on how well you are using your cache. Consider whether there are algorithm changes that might speed things up.
One note to make, if you are looking for numerical stability:
Don't recycle the divisions for solutions that require multiple components/coordinates, e.g. like implementing an n-D vector normalize() function, i.e. the following will NOT give you a unit-length vector:
V3d v3d(x,y,z);
float l = v3d.length();
float oneOverL = 1.f / l;
v3d.x *= oneOverL;
v3d.y *= oneOverL;
v3d.z *= oneOverL;
assert(1. == v3d.length()); // fails!
.. but this code will..
V3d v3d(x,y,z);
float l = v3d.length();
v3d.x /= l;
v3d.y /= l;
v3d.z /= l;
assert(1. == v3d.length()); // ok!
Guess the problem in the first code excerpt is the additional float normalization (the pre-division will impose a different scale normalization to the floating point number, which is then forced upon the actual result and introducing additional error).
Didn't look into this for too long, so please share your explanation why this happens. Tested it with x,y and z being .1f (and with doubles instead of floats)

Fastest/easiest way to average ARGB color ints?

I have five colors stored in the format #AARRGGBB as unsigned ints, and I need to take the average of all five. Obviously I can't simply divide each int by five and just add them, and the only way I thought of so far is to bitmask them, do each channel separately, and then OR them together again. Is there a clever or concise way of averaging all five of them?
Half way between your (OP) proposed solution and Patrick's solution looks quite neat:
Color colors[5]={ 0xAARRGGBB,...};
unsigned long sum1=0,sum2=0;
for (int i=0;i<5;i++)
{
sum1+= colors[i] &0x00FF00FF; // 0x00RR00BB
sum2+=(colors[i]>>8)&0x00FF00FF; // 0x00AA00GG
}
unsigned long output=0;
output|=(((sum1&0xFFFF)/5)&0xFF);
output|=(((sum2&0xFFFF)/5)&0xFF)<<8;
sum1>>=16;sum2>>=16; // and now the top halves
output|=(((sum1&0xFFFF)/5)&0xFF)<<16;
output|=(((sum2&0xFFFF)/5)&0xFF)<<24;
I don't think you could really divide sum1/sum2 by 5, because the bits from the top half would spill down...
If an approximation would be valid, you could try a multiplication by something like, 0.1875 (0.125+0.0625), (this means: multiply by 3 and shift down by 4 places. This you could do with bitmasking and care.)
The problem is, 0.2 has a crappy binary representation, so multiplying by it is an ass.
As ever, accuracy or speed. Your choice.
When using x86 machines with at least SSE, and if you need to approximate only, you could use the assembly instruction PAVGB (Packed Average Byte), which averages bytes. See http://www.tommesani.com/SSEPrimer.html for explanation.
Since you've got 5 values, you would need to be creative in calling PAVGB, since PAVGB will only do two values at a time.
I found smart solution of your problem, sadly it is only applicable if number of colors is power of 2. I'll show it in case of two colors:
mask = 01010101
pom = ~(a^b & mask) # ^ means xor here, ~ negation
a = a & pom
b = b & pom
avg = (a+b) >> 1
The trick of this method is — when you count average, LSB of sum (in case of two numbers) has no meaning, as it will be dropped in division (we're talking integers here, of course). In your problem, LSB of partial sums is at the same moment carry bit of sum of adjacent color. Provided, that LSB of every color sum will be 0 you can safely add those two integers — additions won't interfere with each other. Bit shift divides every color by two.
This method can be used with 4 colors as well, but you have to implement finding out the carry flag of sum of numbers made of two last bits of every color. It is also possible to omit this part and just zero last two bits of every color — biggest mistake made with this omission is 1 for every component.
EDIT I'll leave this attempt for posterity, but please note that it is incorrect and will not work.
One "clever" way you could do it would be to insert zeros between the components, parse into an unsigned long, average the numbers, convert back to a hex string, remove the zeros and finally parse into an unsigned int.
i.e. convert #AARRGGBB to #AA00RR00GG00BB
This method involves parsing and string manipulations, so will undoubtedly be slower than the method you proposed.
If you were to factor your own solution carefully, it might actually look quite clever itself.

Resources