Using bit operation to determine if a int is a power of 2 - bit

I once encountered a method of determining if a number x is a power of 2 by doing the following:
X&(x-1) followed by a 0 test, if the result is 0 then it means there is only one 1 bit in the number and that it is a power of 2. But the problem is it can't be used for signed int? I am just wondering if the only exception for a signed int is that it may be the signed bit that is the only one bit, if this is the case I could simply add another test and be done with it. Or does it have other exceptions that this method may not apply to a signed int. since I really wanna use it in java, I hope I can adopt it in some enhanced way. Thanks a lot.

It doesn't matter; just test for negatives. A power of a positive number is never negative, so you can safely say that any negative number given isn't a power of two.

Related

Best way finding the middle point

In many algorithm i have seen people are using two different way to get middle point.
(LOW+HIGH)/2
LOW+(HIGH-LOW)/2
Mostly i have seen 2nd method for example in QuickSort. What is the best way to find middle between two number and why ?
It entirely depends on the context, but I'll make a case for case number 2 and explain why.
Let's first assume you go for case number 1, which is this:
(LOW + HIGH) / 2
This looks entirely reasonable, and mathematically, it is. Let's plug in two numbers and look at the results:
(12345 + 56789) / 2
The result is 34567. Looks OK doesn't it?
Now, the problem is that in computers it's not as simple as that. You also got something called data types to contend with. Usually this is denoted with such things as number of bits. In other words, you might have a 32-bit number, or a 16-bit number, or a 64-bit number, and so on.
All of these have what is known as a legal range of values, ie. "what kind of values will these types hold". An 8-bit number, unsigned (which means it cannot be negative) can hold 2 to the power of 8 different values, or 256. A 16-bit unsigned value can hold 65536 values, or a range from 0 to 65535. If the values are signed they go from -half to +half-1, meaning it will go from -128 to +127 for a 8-bit signed value, and from -32768 to +32767 for a signed 16-bit value.
So now we go back to the original formula. What if the data type we're using for the calculation isn't enough to hold LOW + HIGH ?
For instance, let's say we used 16-bit signed values for this, and we still got this expression:
(12345 + 56789) / 2
12345 can be held in a 16-bit value (it is less than 65536), same with 56789, but what about the result? The result of adding 12345 and 56789 is 69134 which is more than 65535 (the highest unsigned 16-bit value).
So what will happen to it? There's two outcomes:
It will overflow, which means it will start back at 0 and count upwards, which means it will actually end up with the result of 3598 (123456 + 56789) - 65536
It will throw an exception or similar, crashing the program with an overflow problem.
If we get the first result, then (12345 + 56789)/2 becomes 3598/2 or 1799. Clearly incorrect.
So, then what if we used the other approach:
12345 + (56789-12345)/2
First, let's do the parenthesis: 56789-12345 is equal to 44444, a number that can be held in a 16-bit data type.
Adding 12345 + 44444 gives us 56789, a number that can also be held in a 16-bit data type.
Dividing 56789 by 2 gives us 28934.5. Since we're probably dealing with "integers" here we get 28934 (typically, unless your particular world rounds up).
So the reason the second expression is chosen above the first is that it doesn't have to handle overflow the same way and is more resilient to this kind of problem.
In fact, if you think about it, the maximum second value you can have is the maximum legal value you can have for your data type, so this kind of expression:
X + (Y-X)
... assuming both X and Y are the same data type, can at most be the maximum of that data type. Basically, it will not have to contend with an overflow at all.
2nd method used, for avoid int overflow during computation.
Imagine, you used just 1-byte unsigned integer, so overflow happening if value reached 256. Imagine, we have low=100 and high=200. See computing:
1. (lo + hi) / 2 = (100 + 200) / 2 = 300 / 2; // 300 > 256, int overflow
2. lo + (hi - lo) / 2 = 100 + (200 - 100) / 2 = 150; // no overflow
There isn't a best one, but they are clearly different.
(LOW+HIGH)/2 causes trouble if the addition overflows, both in the signed and unsigned case. Doesn't have to be an issue, if you can assume that it never happens it's a perfectly reasonable thing to do.
LOW+(HIGH-LOW)/2 can still cause trouble with overflows if LOW is allowed to be arbitrarily negative. For non-negative input it's fine though. Costs an extra operation.
(LOW+HIGH) >>> 1 as seen in Java. For non-negative but signed input, overflow is non-destructive. The sign bit is used as an extra bit of space that the addition can carry into, the shift then divides the unsigned result by two. Does not work at all if the result is supposed to negative, because by construction the result is non-negative. For array indexes that isn't really a consideration. Doesn't help if you were working with unsigned indexes though.
MOV MID, LOW \ ADD MID, HIGH \ RCR MID in pseudo x86 asm, explicitly uses an extra bit of space and therefore works for all unsigned input no matter what, but cannot be used in most languages.
They all have their upsides and downsides. There is no winner.
The best way is dependent on what you're trying to accomplish. The first is clearly faster (best for performance), while the second one is used to avoid overflow (best for correctness). So the answer to your question is dependent on your definition of "best".

A clever homebrew modulus implementation

I'm programming a PLC with some legacy software (RSLogix 500, don't ask) and it does not natively support a modulus operation, but I need one. I do not have access to: modulus, integer division, local variables, a truncate operation (though I can hack it with rounding). Furthermore, all variables available to me are laid out in tables sorted by data type. Finally, it should work for floating point decimals, for example 12345.678 MOD 10000 = 2345.678.
If we make our equation:
dividend / divisor = integer quotient, remainder
There are two obvious implementations.
Implementation 1:
Perform floating point division: dividend / divisor = decimal quotient. Then hack together a truncation operation so you find the integer quotient. Multiply it by the divisor and find the difference between the dividend and that, which results in the remainder.
I don't like this because it involves a bunch of variables of different types. I can't 'pass' variables to a subroutine, so I just have to allocate some of the global variables located in multiple different variable tables, and it's difficult to follow. Unfortunately, 'difficult to follow' counts, because it needs to be simple enough for a maintenance worker to mess with.
Implementation 2:
Create a loop such that while dividend > divisor divisor = dividend - divisor. This is very clean, but it violates one of the big rules of PLC programming, which is to never use loops, since if someone inadvertently modifies an index counter you could get stuck in an infinite loop and machinery would go crazy or irrecoverably fault. Plus loops are hard for maintenance to troubleshoot. Plus, I don't even have looping instructions, I have to use labels and jumps. Eww.
So I'm wondering if anyone has any clever math hacks or smarter implementations of modulus than either of these. I have access to + - * /, exponents, sqrt, trig functions, log, abs value, and AND/OR/NOT/XOR.
How many bits are you dealing with? You could do something like:
if dividend > 32 * divisor dividend -= 32 * divisor
if dividend > 16 * divisor dividend -= 16 * divisor
if dividend > 8 * divisor dividend -= 8 * divisor
if dividend > 4 * divisor dividend -= 4 * divisor
if dividend > 2 * divisor dividend -= 2 * divisor
if dividend > 1 * divisor dividend -= 1 * divisor
quotient = dividend
Just unroll as many times as there are bits in dividend. Make sure to be careful about those multiplies overflowing. This is just like your #2 except it takes log(n) instead of n iterations, so it is feasible to unroll completely.
If you don't mind overly complicating things and wasting computer time you can calculate modulus with periodic trig functions:
atan(tan(( 12345.678 -5000)*pi/10000))*10000/pi+5000 = 2345.678
Seriously though, subtracting 10000 once or twice (your "implementation 2") is better. The usual algorithms for general floating point modulus require a number of bit-level manipulations that are probably unfeasible for you. See for example http://www.netlib.org/fdlibm/e_fmod.c (The algorithm is simple but the code is complex because of special cases and because it is written for IEEE 754 double precision numbers assuming there is no 64-bit integer type)
This all seems completely overcomplicated. You have an encoder index that rolls over at 10000 and objects rolling along the line whose positions you are tracking at any given point. If you need to forward project stop points or action points along the line, just add however many inches you need and immediately subtract 10000 if your target result is greater than 10000.
Alternatively, or in addition, you always get a new encoder value every PLC scan. In the case where the difference between the current value and last value is negative you can energize a working contact to flag the wrap event and make appropriate corrections for any calculations on that scan. (**or increment a secondary counter as below)
Without knowing more about the actual problem it is hard to suggest a more specific solution but there are certainly better solutions. I don't see a need for MOD here at all. Furthermore, the guys on the floor will thank you for not filling up the machine with obfuscated wizard stuff.
I quote :
Finally, it has to work for floating point decimals, for example
12345.678 MOD 10000 = 2345.678
There is a brilliant function that exists to do this - it's a subtraction. Why does it need to be more complicated than that? If your conveyor line is actually longer than 833 feet then roll a second counter that increments on a primary index roll-over until you've got enough distance to cover the ground you need.
For example, if you need 100000 inches of conveyor memory you can have a secondary counter that rolls over at 10. Primary encoder rollovers can be easily detected as above and you increment the secondary counter each time. Your working encoder position, then, is 10000 times the counter value plus the current encoder value. Work in the extended units only and make the secondary counter roll over at whatever value you require to not lose any parts. The problem, again, then reduces to a simple subtraction (as above).
I use this technique with a planetary geared rotational part holder, for example. I have an encoder that rolls over once per primary rotation while the planetary geared satellite parts (which themselves rotate around a stator gear) require 43 primary rotations to return to an identical starting orientation. With a simple counter that increments (or decrements, depending on direction) at the primary encoder rollover point it gives you a fully absolute measure of where the parts are at. In this case, the secondary counter rolls over at 43.
This would work identically for a linear conveyor with the only difference being that a linear conveyor can go on for an infinite distance. The problem then only needs to be limited by the longest linear path taken by the worst-case part on the line.
With the caveat that I've never used RSLogix, here is the general idea (I've used generic symbols here and my syntax is probably a bit wrong but you should get the idea)
With the above, you end up with a value ENC_EXT which has essentially transformed your encoder from a 10k inch one to a 100k inch one. I don't know if your conveyor can run in reverse, if it can you would need to handle the down count also. If the entire rest of your program only works with the ENC_EXT value then you don't even have to worry about the fact that your encoder only goes to 10k. It now goes to 100k (or whatever you want) and the wraparound can be handled with a subtraction instead of a modulus.
Afterword :
PLCs are first and foremost state machines. The best solutions for PLC programs are usually those that are in harmony with this idea. If your hardware is not sufficient to fully represent the state of the machine then the PLC program should do its best to fill in the gaps for that missing state information with the information it has. The above solution does this - it takes the insufficient 10000 inches of state information and extends it to suit the requirements of the process.
The benefit of this approach is that you now have preserved absolute state information, not just for the conveyor, but also for any parts on the line. You can track them forward and backward for troubleshooting and debugging and you have a much simpler and clearer coordinate system to work with for future extensions. With a modulus calculation you are throwing away state information and trying to solve individual problems in a functional way - this is often not the best way to work with PLCs. You kind of have to forget what you know from other programming languages and work in a different way. PLCs are a different beast and they work best when treated as such.
You can use a subroutine to do exactly what you are talking about. You can tuck the tricky code away so the maintenance techs will never encounter it. It's almost certainly the easiest for you and your maintenance crew to understand.
It's been a while since I used RSLogix500, so I might get a couple of terms wrong, but you'll get the point.
Define a Data File each for your floating points and integers, and give them symbols something along the lines of MOD_F and MOD_N. If you make these intimidating enough, maintenance techs leave them alone, and all you need them for is passing parameters and workspace during your math.
If you really worried about them messing up the data tables, there are ways to protect them, but I have forgotten what they are on a SLC/500.
Next, defined a subroutine, far away numerically from the ones in use now, if possible. Name it something like MODULUS. Again, maintenance guys almost always stay out of SBRs if they sound like programming names.
In the rungs immediately before your JSR instruction, load the variables you want to process into the MOD_N and MOD_F Data Files. Comment these rungs with instructions that they load data for MODULUS SBR. Make the comments clear to anyone with a programming background.
Call your JSR conditionally, only when you need to. Maintenance techs do not bother troubleshooting non-executing logic, so if your JSR is not active, they will rarely look at it.
Now you have your own little walled garden where you can write your loop without maintenance getting involved with it. Only use those Data Files, and don't assume the state of anything but those files is what you expect. In other words, you cannot trust indirect addressing. Indexed addressing is OK, as long as you define the index within your MODULUS JSR. Do not trust any incoming index. It's pretty easy to write a FOR loop with one word from your MOD_N file, a jump and a label. Your whole Implementation #2 should be less than ten rungs or so. I would consider using an expression instruction or something...the one that lets you just type in an expression. Might need a 504 or 505 for that instruction. Works well for combined float/integer math. Check the results though to make sure the rounding doesn't kill you.
After you are done, validate your code, perfectly if possible. If this code ever causes a math overflow and faults the processor, you will never hear the end of it. Run it on a simulator if you have one, with weird values (in case they somehow mess up the loading of the function inputs), and make sure the PLC does not fault.
If you do all that, no one will ever even realize you used regular programming techniques in the PLC, and you will be fine. AS LONG AS IT WORKS.
This is a loop based on the answer by #Keith Randall, but it also maintains the result of the division by substraction. I kept the printf's for clarity.
#include <stdio.h>
#include <limits.h>
#define NBIT (CHAR_BIT * sizeof (unsigned int))
unsigned modulo(unsigned dividend, unsigned divisor)
{
unsigned quotient, bit;
printf("%u / %u:", dividend, divisor);
for (bit = NBIT, quotient=0; bit-- && dividend >= divisor; ) {
if (dividend < (1ul << bit) * divisor) continue;
dividend -= (1ul << bit) * divisor;
quotient += (1ul << bit);
}
printf("%u, %u\n", quotient, dividend);
return dividend; // the remainder *is* the modulo
}
int main(void)
{
modulo( 13,5);
modulo( 33,11);
return 0;
}

Arbitrary precision arithmetic with Ruby

How the heck does Ruby do this? Does Jörg or anyone else know what's happening behind the scenes?
Unfortunately I don't know C very well so bignum.c is of little help to me. I was just kind of curious it someone could explain (in plain English) the theory behind whatever miracle algorithm its using.
irb(main):001:0> 999**999
368063488259223267894700840060521865838338232037353204655959621437025609300472231530103873614505175218691345257589896391130393189447969771645832382192366076536631132001776175977932178658703660778465765811830827876982014124022948671975678131724958064427949902810498973271030787716781467419524180040734398996952930832508934116945966120176735120823151959779536852290090377452502236990839453416790640456116471139751546750048602189291028640970574762600185950226138244530187489211615864021135312077912018844630780307462205252807737757672094320692373101032517459518497524015120165166724189816766397247824175394802028228160027100623998873667435799073054618906855460488351426611310634023489044291860510352301912426608488807462312126590206830413782664554260411266378866626653755763627796569082931785645600816236891168141774993267488171702172191072731069216881668294625679492696148976999868715671440874206427212056717373099639711168901197440416590226524192782842896415414611688187391232048327738965820265934093108172054875188246591760877131657895633586576611857277011782497943522945011248430439201297015119468730712364007639373910811953430309476832453230123996750235710787086641070310288725389595138936784715274150426495416196669832679980253436807864187160054589045664027158817958549374490512399055448819148487049363674611664609890030088549591992466360050042566270348330911795487647045949301286614658650071299695652245266080672989921799342509291635330827874264789587306974472327718704306352445925996155619153783913237212716010410294999877569745287353422903443387562746452522860420416689019732913798073773281533570910205207767157128174184873357050830752777900041943256738499067821488421053870869022738698816059810579221002560882999884763252161747566893835178558961142349304466506402373556318707175710866983035313122068321102457824112014969387225476259342872866363550383840720010832906695360553556647545295849966279980830561242960013654529514995113584909050813015198928283202189194615501403435553060147713139766323195743324848047347575473228198492343231496580885057330510949058490527738662697480293583612233134502078182014347192522391449087738579081585795613547198599661273567662441490401862839817822686573112998663038868314974259766039340894024308383451039874674061160538242392803580758232755749310843694194787991556647907091849600704712003371103926967137408125713631396699343733288014254084819379380555174777020843568689927348949484201042595271932630685747613835385434424807024615161848223715989797178155169951121052285149157137697718850449708843330475301440373094611119631361702936342263219382793996895988331701890693689862459020775599439506870005130750427949747071390095256759203426671803377068109744629909769176319526837824364926844730545524646494321826241925107158040561607706364484910978348669388142016838792902926158979355432483611517588605967745393958061959024834251565197963477521095821435651996730128376734574843289089682710350244222290017891280419782767803785277960834729869249991658417000499998999
Simple: it does it the same way you do, ever since first grade. Except it doesn't compute in base 10, it computes in base 4 billion (and change).
Think about it: with our number system, we can only represent numbers from 0 to 9. So, how can we compute 6+7 without overflowing? Easy: we do actually overflow! We cannot represent the result of 6+7 as a number between 0 and 9, but we can overflow to the next place and represent it as two numbers between 0 and 9: 3×100 + 1×101. If you want to add two numbers, you add them digit-wise from the right and overflow ("carry") to the left. If you want to multiply two numbers, you have to multiply every digit of one number individually with the other number, then add up the intermediate results.
BigNum arithmetic (this is what this kind of arithmetic where the numbers are bigger than the native machine numbers is usually called) works basically the same way. Except that the base is not 10, and its not 2, either – it's the size of a native machine integer. So, on a 32 bit machine, it would be base 232 or 4 294 967 296.
Specifically, in Ruby Integer is actually an abstract class that is never instianted. Instead, it has two subclasses, Fixnum and Bignum, and numbers automagically migrate between them, depending on their size. In MRI and YARV, Fixnum can hold a 31 or 63 bit signed integer (one bit is used for tagging) depending on the native word size of the machine. In JRuby, a Fixnum can hold a full 64 bit signed integer, even on an 32 bit machine.
The simplest operation is adding two numbers. And if you look at the implementation of + or rather bigadd_core in YARV's bignum.c, it's not too bad to follow. I can't read C either, but you can cleary see how it loops over the individual digits.
You could read the source for bignum.c...
At a very high level, without going into any implementation details, bignums are calculated "by hand" like you used to do in grade school. Now, there are certainly many optimizations that can be applied, but that's the gist of it.
I don't know of the implementation details so I'll cover how a basic Big Number implementation would work.
Basically instead of relying on CPU "integers" it will create it's own using multiple CPU integers. To store arbritrary precision, well lets say you have 2 bits. So the current integer is 11. You want to add one. In normal CPU integers, this would roll over to 00
But, for big number, instead of rolling over and keeping a "fixed" integer width, it would allocate another bit and simulate an addition so that the number becomes the correct 100.
Try looking up how binary math can be done on paper. It's very simple and is trivial to convert to an algorithm.
Beaconaut APICalc 2 just released on Jan.18, 2011, which is an arbitrary-precision integer calculator for bignum arithmetic, cryptography analysis and number theory research......
http://www.beaconaut.com/forums/default.aspx?g=posts&t=13
It uses the Bignum class
irb(main):001:0> (999**999).class
=> Bignum
Rdoc is available of course

Algorithm to find a common multiplier to convert decimal numbers to whole numbers

I have an array of numbers that potentially have up to 8 decimal places and I need to find the smallest common number I can multiply them by so that they are all whole numbers. I need this so all the original numbers can all be multiplied out to the same scale and be processed by a sealed system that will only deal with whole numbers, then I can retrieve the results and divide them by the common multiplier to get my relative results.
Currently we do a few checks on the numbers and multiply by 100 or 1,000,000, but the processing done by the *sealed system can get quite expensive when dealing with large numbers so multiplying everything by a million just for the sake of it isn’t really a great option. As an approximation lets say that the sealed algorithm gets 10 times more expensive every time you multiply by a factor of 10.
What is the most efficient algorithm, that will also give the best possible result, to accomplish what I need and is there a mathematical name and/or formula for what I’m need?
*The sealed system isn’t really sealed. I own/maintain the source code for it but its 100,000 odd lines of proprietary magic and it has been thoroughly bug and performance tested, altering it to deal with floats is not an option for many reasons. It is a system that creates a grid of X by Y cells, then rects that are X by Y are dropped into the grid, “proprietary magic” occurs and results are spat out – obviously this is an extremely simplified version of reality, but it’s a good enough approximation.
So far there are quiet a few good answers and I wondered how I should go about choosing the ‘correct’ one. To begin with I figured the only fair way was to create each solution and performance test it, but I later realised that pure speed wasn’t the only relevant factor – an more accurate solution is also very relevant. I wrote the performance tests anyway, but currently the I’m choosing the correct answer based on speed as well accuracy using a ‘gut feel’ formula.
My performance tests process 1000 different sets of 100 randomly generated numbers.
Each algorithm is tested using the same set of random numbers.
Algorithms are written in .Net 3.5 (although thus far would be 2.0 compatible)
I tried pretty hard to make the tests as fair as possible.
Greg – Multiply by large number
and then divide by GCD – 63
milliseconds
Andy – String Parsing
– 199 milliseconds
Eric – Decimal.GetBits – 160 milliseconds
Eric – Binary search – 32
milliseconds
Ima – sorry I couldn’t
figure out a how to implement your
solution easily in .Net (I didn’t
want to spend too long on it)
Bill – I figure your answer was pretty
close to Greg’s so didn’t implement
it. I’m sure it’d be a smidge faster
but potentially less accurate.
So Greg’s Multiply by large number and then divide by GCD” solution was the second fastest algorithm and it gave the most accurate results so for now I’m calling it correct.
I really wanted the Decimal.GetBits solution to be the fastest, but it was very slow, I’m unsure if this is due to the conversion of a Double to a Decimal or the Bit masking and shifting. There should be a
similar usable solution for a straight Double using the BitConverter.GetBytes and some knowledge contained here: http://blogs.msdn.com/bclteam/archive/2007/05/29/bcl-refresher-floating-point-types-the-good-the-bad-and-the-ugly-inbar-gazit-matthew-greig.aspx but my eyes just kept glazing over every time I read that article and I eventually ran out of time to try to implement a solution.
I’m always open to other solutions if anyone can think of something better.
I'd multiply by something sufficiently large (100,000,000 for 8 decimal places), then divide by the GCD of the resulting numbers. You'll end up with a pile of smallest integers that you can feed to the other algorithm. After getting the result, reverse the process to recover your original range.
Multiple all the numbers by 10
until you have integers.
Divide
by 2,3,5,7 while you still have all
integers.
I think that covers all cases.
2.1 * 10/7 -> 3
0.008 * 10^3/2^3 -> 1
That's assuming your multiplier can be a rational fraction.
If you want to find some integer N so that N*x is also an exact integer for a set of floats x in a given set are all integers, then you have a basically unsolvable problem. Suppose x = the smallest positive float your type can represent, say it's 10^-30. If you multiply all your numbers by 10^30, and then try to represent them in binary (otherwise, why are you even trying so hard to make them ints?), then you'll lose basically all the information of the other numbers due to overflow.
So here are two suggestions:
If you have control over all the related code, find another
approach. For example, if you have some function that takes only
int's, but you have floats, and you want to stuff your floats into
the function, just re-write or overload this function to accept
floats as well.
If you don't have control over the part of your system that requires
int's, then choose a precision to which you care about, accept that
you will simply have to lose some information sometimes (but it will
always be "small" in some sense), and then just multiply all your
float's by that constant, and round to the nearest integer.
By the way, if you're dealing with fractions, rather than float's, then it's a different game. If you have a bunch of fractions a/b, c/d, e/f; and you want a least common multiplier N such that N*(each fraction) = an integer, then N = abc / gcd(a,b,c); and gcd(a,b,c) = gcd(a, gcd(b, c)). You can use Euclid's algorithm to find the gcd of any two numbers.
Greg: Nice solution but won't calculating a GCD that's common in an array of 100+ numbers get a bit expensive? And how would you go about that? Its easy to do GCD for two numbers but for 100 it becomes more complex (I think).
Evil Andy: I'm programing in .Net and the solution you pose is pretty much a match for what we do now. I didn't want to include it in my original question cause I was hoping for some outside the box (or my box anyway) thinking and I didn't want to taint peoples answers with a potential solution. While I don't have any solid performance statistics (because I haven't had any other method to compare it against) I know the string parsing would be relatively expensive and I figured a purely mathematical solution could potentially be more efficient.
To be fair the current string parsing solution is in production and there have been no complaints about its performance yet (its even in production in a separate system in a VB6 format and no complaints there either). It's just that it doesn't feel right, I guess it offends my programing sensibilities - but it may well be the best solution.
That said I'm still open to any other solutions, purely mathematical or otherwise.
What language are you programming in? Something like
myNumber.ToString().Substring(myNumber.ToString().IndexOf(".")+1).Length
would give you the number of decimal places for a double in C#. You could run each number through that and find the largest number of decimal places(x), then multiply each number by 10 to the power of x.
Edit: Out of curiosity, what is this sealed system which you can pass only integers to?
In a loop get mantissa and exponent of each number as integers. You can use frexp for exponent, but I think bit mask will be required for mantissa. Find minimal exponent. Find most significant digits in mantissa (loop through bits looking for last "1") - or simply use predefined number of significant digits.
Your multiple is then something like 2^(numberOfDigits-minMantissa). "Something like" because I don't remember biases/offsets/ranges, but I think idea is clear enough.
So basically you want to determine the number of digits after the decimal point for each number.
This would be rather easier if you had the binary representation of the number. Are the numbers being converted from rationals or scientific notation earlier in your program? If so, you could skip the earlier conversion and have a much easier time. Otherwise you might want to pass each number to a function in an external DLL written in C, where you could work with the floating point representation directly. Or you could cast the numbers to decimal and do some work with Decimal.GetBits.
The fastest approach I can think of in-place and following your conditions would be to find the smallest necessary power-of-ten (or 2, or whatever) as suggested before. But instead of doing it in a loop, save some computation by doing binary search on the possible powers. Assuming a maximum of 8, something like:
int NumDecimals( double d )
{
// make d positive for clarity; it won't change the result
if( d<0 ) d=-d;
// now do binary search on the possible numbers of post-decimal digits to
// determine the actual number as quickly as possible:
if( NeedsMore( d, 10e4 ) )
{
// more than 4 decimals
if( NeedsMore( d, 10e6 ) )
{
// > 6 decimal places
if( NeedsMore( d, 10e7 ) ) return 10e8;
return 10e7;
}
else
{
// <= 6 decimal places
if( NeedsMore( d, 10e5 ) ) return 10e6;
return 10e5;
}
}
else
{
// <= 4 decimal places
// etc...
}
}
bool NeedsMore( double d, double e )
{
// check whether the representation of D has more decimal points than the
// power of 10 represented in e.
return (d*e - Math.Floor( d*e )) > 0;
}
PS: you wouldn't be passing security prices to an option pricing engine would you? It has exactly the flavor...

How to generate a verification code/number?

I'm working on an application where users have to make a call and type a verification number with the keypad of their phone.
I would like to be able to detect if the number they type is correct or not. The phone system does not have access to a list of valid numbers, but instead, it will validate the number against an algorithm (like a credit card number).
Here are some of the requirements :
It must be difficult to type a valid random code
It must be difficult to have a valid code if I make a typo (transposition of digits, wrong digit)
I must have a reasonable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Given these requirements, how would you generate such a number?
EDIT :
#Haaked: The code has to be numerical because the user types it with its phone.
#matt b: On the first step, the code is displayed on a Web page, the second step is to call and type in the code. I don't know the user's phone number.
Followup : I've found several algorithms to check the validity of numbers (See this interesting Google Code project : checkDigits).
After some research, I think I'll go with the ISO 7064 Mod 97,10 formula. It seems pretty solid as it is used to validate IBAN (International Bank Account Number).
The formula is very simple:
Take a number : 123456
Apply the following formula to obtain the 2 digits checksum : mod(98 - mod(number * 100, 97), 97) => 76
Concat number and checksum to obtain the code => 12345676
To validate a code, verify that mod(code, 97) == 1
Test :
mod(12345676, 97) = 1 => GOOD
mod(21345676, 97) = 50 => BAD !
mod(12345678, 97) = 10 => BAD !
Apparently, this algorithm catches most of the errors.
Another interesting option was the Verhoeff algorithm. It has only one verification digit and is more difficult to implement (compared to the simple formula above).
For 1M combinations you'll need 6 digits. To make sure that there aren't any accidentally valid codes, I suggest 9 digits with a 1/1000 chance that a random code works. I'd also suggest using another digit (10 total) to perform an integrity check. As far as distribution patterns, random will suffice and the check digit will ensure that a single error will not result in a correct code.
Edit: Apparently I didn't fully read your request. Using a credit card number, you could perform a hash on it (MD5 or SHA1 or something similar). You then truncate at an appropriate spot (for example 9 characters) and convert to base 10. Then you add the check digit(s) and this should more or less work for your purposes.
You want to segment your code. Part of it should be a 16-bit CRC of the rest of the code.
If all you want is a verification number then just use a sequence number (assuming you have a single point of generation). That way you know you are not getting duplicates.
Then you prefix the sequence with a CRC-16 of that sequence number AND some private key. You can use anything for the private key, as long as you keep it private. Make it something big, at least a GUID, but it could be the text to War and Peace from project Gutenberg. Just needs to be secret and constant. Having a private key prevents people from being able to forge a key, but using a 16 bit CR makes it easier to break.
To validate you just split the number into its two parts, and then take a CRC-16 of the sequence number and the private key.
If you want to obscure the sequential portion more, then split the CRC in two parts. Put 3 digits at the front and 2 at the back of the sequence (zero pad so the length of the CRC is consistent).
This method allows you to start with smaller keys too. The first 10 keys will be 6 digits.
Does it have to be only numbers? You could create a random number between 1 and 1M (I'd suggest even higher though) and then Base32 encode it. The next thing you need to do is Hash that value (using a secret salt value) and base32 encode the hash. Then append the two strings together, perhaps separated by the dash.
That way, you can verify the incoming code algorithmically. You just take the left side of the code, hash it using your secret salt, and compare that value to the right side of the code.
I must have a reasonnable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Well, if you want it to have at least one million combinations, then you need at least six digits. Is that short enough?
When you are creating the verification code, do you have access to the caller's phone number?
If so I would use the caller's phone number and run it through some sort of hashing function so that you can guarantee that the verification code you gave to the caller in step 1 is the same one that they are entering in step 2 (to make sure they aren't using a friend's validation code or they simply made a very lucky guess).
About the hashing, I'm not sure if it's possible to take a 10 digit number and come out with a hash result that would be < 10 digits (I guess you'd have to live with a certain amount of collision) but I think this would help ensure the user is who they say they are.
Of course this won't work if the phone number used in step 1 is different than the one they are calling from in step 2.
Assuming you already know how to detect which key the user hit, this should be doable reasonably easily. In the security world, there is the notion of a "one time" password. This is sometimes referred to as a "disposable password." Normally these are restricted to the (easily typable) ASCII values. So, [a-zA-z0-9] and a bunch of easily typable symbols. like comma, period, semi colon, and parenthesis. In your case, though, you'd probably want to limit the range to [0-9] and possibly include * and #.
I am unable to explain all the technical details of how these one-time codes are generated (or work) adequately. There is some intermediate math behind it, which I'd butcher without first reviewing it myself. Suffice it to say that you use an algorithm to generate a stream of one time passwords. No matter how mnay previous codes you know, the subsequent one should be impossibel to guess! In your case, you'll simply use each password on the list as the user's random code.
Rather than fail at explaining the details of the implementation myself, I'll direct you to a 9 page article where you can read up on it youself: https://www.grc.com/ppp.htm
It sounds like you have the unspoken requirement that it must be quickly determined, via algorithm, that the code is valid. This would rule out you simply handing out a list of one time pad numbers.
There are several ways people have done this in the past.
Make a public key and private key. Encode the numbers 0-999,999 using the private key, and hand out the results. You'll need to throw in some random numbers to make the result come out to the longer version, and you'll have to convert the result from base 64 to base 10. When you get a number entered, convert it back to base64, apply the private key, and see if the intereting numbers are under 1,000,000 (discard the random numbers).
Use a reversible hash function
Use the first million numbers from a PRN seeded at a specific value. The "checking" function can get the seed, and know that the next million values are good. It can either generate them each time and check one by one when a code is received, or on program startup store them all in a table, sorted, and then use binary search (maximum of compares) since one million integers is not a whole lot of space.
There are a bunch of other options, but these are common and easy to implement.
-Adam
You linked to the check digits project, and using the "encode" function seems like a good solution. It says:
encode may throw an exception if 'bad' data (e.g. non-numeric) is passed to it, while verify only returns true or false. The idea here is that encode normally gets it's data from 'trusted' internal sources (a database key for instance), so it should be pretty usual, in fact, exceptional that bad data is being passed in.
So it sounds like you could pass the encode function a database key (5 digits, for instance) and you could get a number out that would meet your requirements.

Resources