Related
This is a follow on from a previously posted question:
How to generate a random number in C?
I wish to be able to generate a random number from within a particular range, such as 1 to 6 to mimic the sides of a die.
How would I go about doing this?
All the answers so far are mathematically wrong. Returning rand() % N does not uniformly give a number in the range [0, N) unless N divides the length of the interval into which rand() returns (i.e. is a power of 2). Furthermore, one has no idea whether the moduli of rand() are independent: it's possible that they go 0, 1, 2, ..., which is uniform but not very random. The only assumption it seems reasonable to make is that rand() puts out a Poisson distribution: any two nonoverlapping subintervals of the same size are equally likely and independent. For a finite set of values, this implies a uniform distribution and also ensures that the values of rand() are nicely scattered.
This means that the only correct way of changing the range of rand() is to divide it into boxes; for example, if RAND_MAX == 11 and you want a range of 1..6, you should assign {0,1} to 1, {2,3} to 2, and so on. These are disjoint, equally-sized intervals and thus are uniformly and independently distributed.
The suggestion to use floating-point division is mathematically plausible but suffers from rounding issues in principle. Perhaps double is high-enough precision to make it work; perhaps not. I don't know and I don't want to have to figure it out; in any case, the answer is system-dependent.
The correct way is to use integer arithmetic. That is, you want something like the following:
#include <stdlib.h> // For random(), RAND_MAX
// Assumes 0 <= max <= RAND_MAX
// Returns in the closed interval [0, max]
long random_at_most(long max) {
unsigned long
// max <= RAND_MAX < ULONG_MAX, so this is okay.
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = random();
}
// This is carefully written not to overflow
while (num_rand - defect <= (unsigned long)x);
// Truncated division is intentional
return x/bin_size;
}
The loop is necessary to get a perfectly uniform distribution. For example, if you are given random numbers from 0 to 2 and you want only ones from 0 to 1, you just keep pulling until you don't get a 2; it's not hard to check that this gives 0 or 1 with equal probability. This method is also described in the link that nos gave in their answer, though coded differently. I'm using random() rather than rand() as it has a better distribution (as noted by the man page for rand()).
If you want to get random values outside the default range [0, RAND_MAX], then you have to do something tricky. Perhaps the most expedient is to define a function random_extended() that pulls n bits (using random_at_most()) and returns in [0, 2**n), and then apply random_at_most() with random_extended() in place of random() (and 2**n - 1 in place of RAND_MAX) to pull a random value less than 2**n, assuming you have a numerical type that can hold such a value. Finally, of course, you can get values in [min, max] using min + random_at_most(max - min), including negative values.
Following on from #Ryan Reich's answer, I thought I'd offer my cleaned up version. The first bounds check isn't required given the second bounds check, and I've made it iterative rather than recursive. It returns values in the range [min, max], where max >= min and 1+max-min < RAND_MAX.
unsigned int rand_interval(unsigned int min, unsigned int max)
{
int r;
const unsigned int range = 1 + max - min;
const unsigned int buckets = RAND_MAX / range;
const unsigned int limit = buckets * range;
/* Create equal size buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
do
{
r = rand();
} while (r >= limit);
return min + (r / buckets);
}
Here is a formula if you know the max and min values of a range, and you want to generate numbers inclusive in between the range:
r = (rand() % (max + 1 - min)) + min
unsigned int
randr(unsigned int min, unsigned int max)
{
double scaled = (double)rand()/RAND_MAX;
return (max - min +1)*scaled + min;
}
See here for other options.
Wouldn't you just do:
srand(time(NULL));
int r = ( rand() % 6 ) + 1;
% is the modulus operator. Essentially it will just divide by 6 and return the remainder... from 0 - 5
For those who understand the bias problem but can't stand the unpredictable run-time of rejection-based methods, this series produces a progressively less biased random integer in the [0, n-1] interval:
r = n / 2;
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
...
It does so by synthesising a high-precision fixed-point random number of i * log_2(RAND_MAX + 1) bits (where i is the number of iterations) and performing a long multiplication by n.
When the number of bits is sufficiently large compared to n, the bias becomes immeasurably small.
It does not matter if RAND_MAX + 1 is less than n (as in this question), or if it is not a power of two, but care must be taken to avoid integer overflow if RAND_MAX * n is large.
Here is a slight simpler algorithm than Ryan Reich's solution:
/// Begin and end are *inclusive*; => [begin, end]
uint32_t getRandInterval(uint32_t begin, uint32_t end) {
uint32_t range = (end - begin) + 1;
uint32_t limit = ((uint64_t)RAND_MAX + 1) - (((uint64_t)RAND_MAX + 1) % range);
/* Imagine range-sized buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
uint32_t randVal = rand();
while (randVal >= limit) randVal = rand();
/// Return the position you hit in the bucket + begin as random number
return (randVal % range) + begin;
}
Example (RAND_MAX := 16, begin := 2, end := 7)
=> range := 6 (1 + end - begin)
=> limit := 12 (RAND_MAX + 1) - ((RAND_MAX + 1) % range)
The limit is always a multiple of the range,
so we can split it into range-sized buckets:
Possible-rand-output: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Buckets: [0, 1, 2, 3, 4, 5][0, 1, 2, 3, 4, 5][X, X, X, X, X]
Buckets + begin: [2, 3, 4, 5, 6, 7][2, 3, 4, 5, 6, 7][X, X, X, X, X]
1st call to rand() => 13
→ 13 is not in the bucket-range anymore (>= limit), while-condition is true
→ retry...
2nd call to rand() => 7
→ 7 is in the bucket-range (< limit), while-condition is false
→ Get the corresponding bucket-value 1 (randVal % range) and add begin
=> 3
In order to avoid the modulo bias (suggested in other answers) you can always use:
arc4random_uniform(MAX-MIN)+MIN
Where "MAX" is the upper bound and "MIN" is lower bound. For example, for numbers between 10 and 20:
arc4random_uniform(20-10)+10
arc4random_uniform(10)+10
Simple solution and better than using "rand() % N".
While Ryan is correct, the solution can be much simpler based on what is known about the source of the randomness. To re-state the problem:
There is a source of randomness, outputting integer numbers in range [0, MAX) with uniform distribution.
The goal is to produce uniformly distributed random integer numbers in range [rmin, rmax] where 0 <= rmin < rmax < MAX.
In my experience, if the number of bins (or "boxes") is significantly smaller than the range of the original numbers, and the original source is cryptographically strong - there is no need to go through all that rigamarole, and simple modulo division would suffice (like output = rnd.next() % (rmax+1), if rmin == 0), and produce random numbers that are distributed uniformly "enough", and without any loss of speed. The key factor is the randomness source (i.e., kids, don't try this at home with rand()).
Here's an example/proof of how it works in practice. I wanted to generate random numbers from 1 to 22, having a cryptographically strong source that produced random bytes (based on Intel RDRAND). The results are:
Rnd distribution test (22 boxes, numbers of entries in each box):
1: 409443 4.55%
2: 408736 4.54%
3: 408557 4.54%
4: 409125 4.55%
5: 408812 4.54%
6: 409418 4.55%
7: 408365 4.54%
8: 407992 4.53%
9: 409262 4.55%
10: 408112 4.53%
11: 409995 4.56%
12: 409810 4.55%
13: 409638 4.55%
14: 408905 4.54%
15: 408484 4.54%
16: 408211 4.54%
17: 409773 4.55%
18: 409597 4.55%
19: 409727 4.55%
20: 409062 4.55%
21: 409634 4.55%
22: 409342 4.55%
total: 100.00%
This is as close to uniform as I need for my purpose (fair dice throw, generating cryptographically strong codebooks for WWII cipher machines such as http://users.telenet.be/d.rijmenants/en/kl-7sim.htm, etc). The output does not show any appreciable bias.
Here's the source of cryptographically strong (true) random number generator:
Intel Digital Random Number Generator
and a sample code that produces 64-bit (unsigned) random numbers.
int rdrand64_step(unsigned long long int *therand)
{
unsigned long long int foo;
int cf_error_status;
asm("rdrand %%rax; \
mov $1,%%edx; \
cmovae %%rax,%%rdx; \
mov %%edx,%1; \
mov %%rax, %0;":"=r"(foo),"=r"(cf_error_status)::"%rax","%rdx");
*therand = foo;
return cf_error_status;
}
I compiled it on Mac OS X with clang-6.0.1 (straight), and with gcc-4.8.3 using "-Wa,q" flag (because GAS does not support these new instructions).
As said before modulo isn't sufficient because it skews the distribution. Heres my code which masks off bits and uses them to ensure the distribution isn't skewed.
static uint32_t randomInRange(uint32_t a,uint32_t b) {
uint32_t v;
uint32_t range;
uint32_t upper;
uint32_t lower;
uint32_t mask;
if(a == b) {
return a;
}
if(a > b) {
upper = a;
lower = b;
} else {
upper = b;
lower = a;
}
range = upper - lower;
mask = 0;
//XXX calculate range with log and mask? nah, too lazy :).
while(1) {
if(mask >= range) {
break;
}
mask = (mask << 1) | 1;
}
while(1) {
v = rand() & mask;
if(v <= range) {
return lower + v;
}
}
}
The following simple code lets you look at the distribution:
int main() {
unsigned long long int i;
unsigned int n = 10;
unsigned int numbers[n];
for (i = 0; i < n; i++) {
numbers[i] = 0;
}
for (i = 0 ; i < 10000000 ; i++){
uint32_t rand = random_in_range(0,n - 1);
if(rand >= n){
printf("bug: rand out of range %u\n",(unsigned int)rand);
return 1;
}
numbers[rand] += 1;
}
for(i = 0; i < n; i++) {
printf("%u: %u\n",i,numbers[i]);
}
}
Will return a floating point number in the range [0,1]:
#define rand01() (((double)random())/((double)(RAND_MAX)))
I want to create a function which will give me fixed size 6 char alpha-numeric IDs with the requirement that the first and last character must be an alpha.
I want them to be generated sequentially. I think using base36 would be the way to go with an alphabet of [0-9A-Z] however I am not too sure how to ensure that they are always 6 char long with an alpha at the start and end.
E.g., if I create the IDs sequentially and start from 0, I would get 0 for output since 0 is the same in both bases.
Does anyone know of an efficient algorithm that could help here?
Thanks
You can use the standard algorithm for converting from an int to a base36 string, extracting one digit at a time by taking the modulo of the base and then dividing the remainder by the base, but add a special case for the first and last digit:
For e.g. in Java:
static String getId(int id)
{
String s = "";
for(int i = 0; i < 6; i++)
{
// compute the digit using modulo arithmetic using base 26
// for first and last character and base 36 for others
int digit;
if((i == 0) || (i == 5))
{
digit = (id % 26) + 10;
id /= 26;
}
else
{
digit = id % 36;
id /= 36;
}
// add the digit to the string:
if(digit < 10)
s = (char)('0' + digit) + s;
else
s = (char)('A' + (digit - 10)) + s;
}
return s;
}
There are 26*36*36*36*36*26 = 1135420416 possibilities, which means you only need a 32-bit integer to store them all.
It might help to start out with a real world example. Say I'm writing a web app that's backed by MongoDB, so my records have a long hex primary key, making my url to view a record look like /widget/55c460d8e2d6e59da89d08d0. That seems excessively long. Urls can use many more characters than that. While there are just under 8 x 10^28 (16^24) possible values in a 24 digit hex number, just limiting yourself to the characters matched by a [a-zA-Z0-9] regex class (a YouTube video id uses more), 62 characters, you can get past 8 x 10^28 in only 17 characters.
I want an algorithm that will convert any string that is limited to a specific alphabet of characters to any other string with another alphabet of characters, where the value of each character c could be thought of as alphabet.indexOf(c).
Something of the form:
convert(value, sourceAlphabet, destinationAlphabet)
Assumptions
all parameters are strings
every character in value exists in sourceAlphabet
every character in sourceAlphabet and destinationAlphabet is unique
Simplest example
var hex = "0123456789abcdef";
var base10 = "0123456789";
var result = convert("12245589", base10, hex); // result is "bada55";
But I also want it to work to convert War & Peace from the Russian alphabet plus some punctuation to the entire unicode charset and back again losslessly.
Is this possible?
The only way I was ever taught to do base conversions in Comp Sci 101 was to first convert to a base ten integer by summing digit * base^position and then doing the reverse to convert to the target base. Such a method is insufficient for the conversion of very long strings, because the integers get too big.
It certainly feels intuitively that a base conversion could be done in place, as you step through the string (probably backwards to maintain standard significant digit order), keeping track of a remainder somehow, but I'm not smart enough to work out how.
That's where you come in, StackOverflow. Are you smart enough?
Perhaps this is a solved problem, done on paper by some 18th century mathematician, implemented in LISP on punch cards in 1970 and the first homework assignment in Cryptography 101, but my searches have borne no fruit.
I'd prefer a solution in javascript with a functional style, but any language or style will do, as long as you're not cheating with some big integer library. Bonus points for efficiency, of course.
Please refrain from criticizing the original example. The general nerd cred of solving the problem is more important than any application of the solution.
Here is a solution in C that is very fast, using bit shift operations. It assumes that you know what the length of the decoded string should be. The strings are vectors of integers in the range 0..maximum for each alphabet. It is up to the user to convert to and from strings with restricted ranges of characters. As for the "in-place" in the question title, the source and destination vectors can overlap, but only if the source alphabet is not larger than the destination alphabet.
/*
recode version 1.0, 22 August 2015
Copyright (C) 2015 Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler#alumni.caltech.edu
*/
/* Recode a vector from one alphabet to another using intermediate
variable-length bit codes. */
/* The approach is to use a Huffman code over equiprobable alphabets in two
directions. First to encode the source alphabet to a string of bits, and
second to encode the string of bits to the destination alphabet. This will
be reasonably close to the efficiency of base-encoding with arbitrary
precision arithmetic. */
#include <stddef.h> // size_t
#include <limits.h> // UINT_MAX, ULLONG_MAX
#if UINT_MAX == ULLONG_MAX
# error recode() assumes that long long has more bits than int
#endif
/* Take a list of integers source[0..slen-1], all in the range 0..smax, and
code them into dest[0..*dlen-1], where each value is in the range 0..dmax.
*dlen returns the length of the result, which will not exceed the value of
*dlen when called. If the original *dlen is not large enough to hold the
full result, then recode() will return non-zero to indicate failure.
Otherwise recode() will return 0. recode() will also return non-zero if
either of the smax or dmax parameters are less than one. The non-zero
return codes are 1 if *dlen is not long enough, 2 for invalid parameters,
and 3 if any of the elements of source are greater than smax.
Using this same operation on the result with smax and dmax reversed reverses
the operation, restoring the original vector. However there may be more
symbols returned than the original, so the number of symbols expected needs
to be known for decoding. (An end symbol could be appended to the source
alphabet to include the length in the coding, but then encoding and decoding
would no longer be symmetric, and the coding efficiency would be reduced.
This is left as an exercise for the reader if that is desired.) */
int recode(unsigned *dest, size_t *dlen, unsigned dmax,
const unsigned *source, size_t slen, unsigned smax)
{
// compute sbits and scut, with which we will recode the source with
// sbits-1 bits for symbols < scut, otherwise with sbits bits (adding scut)
if (smax < 1)
return 2;
unsigned sbits = 0;
unsigned scut = 1; // 2**sbits
while (scut && scut <= smax) {
scut <<= 1;
sbits++;
}
scut -= smax + 1;
// same thing for dbits and dcut
if (dmax < 1)
return 2;
unsigned dbits = 0;
unsigned dcut = 1; // 2**dbits
while (dcut && dcut <= dmax) {
dcut <<= 1;
dbits++;
}
dcut -= dmax + 1;
// recode a base smax+1 vector to a base dmax+1 vector using an
// intermediate bit vector (a sliding window of that bit vector is kept in
// a bit buffer)
unsigned long long buf = 0; // bit buffer
unsigned have = 0; // number of bits in bit buffer
size_t i = 0, n = 0; // source and dest indices
unsigned sym; // symbol being encoded
for (;;) {
// encode enough of source into bits to encode that to dest
while (have < dbits && i < slen) {
sym = source[i++];
if (sym > smax) {
*dlen = n;
return 3;
}
if (sym < scut) {
buf = (buf << (sbits - 1)) + sym;
have += sbits - 1;
}
else {
buf = (buf << sbits) + sym + scut;
have += sbits;
}
}
// if not enough bits to assure one symbol, then break out to a special
// case for coding the final symbol
if (have < dbits)
break;
// encode one symbol to dest
if (n == *dlen)
return 1;
sym = buf >> (have - dbits + 1);
if (sym < dcut) {
dest[n++] = sym;
have -= dbits - 1;
}
else {
sym = buf >> (have - dbits);
dest[n++] = sym - dcut;
have -= dbits;
}
buf &= ((unsigned long long)1 << have) - 1;
}
// if any bits are left in the bit buffer, encode one last symbol to dest
if (have) {
if (n == *dlen)
return 1;
sym = buf;
sym <<= dbits - 1 - have;
if (sym >= dcut)
sym = (sym << 1) - dcut;
dest[n++] = sym;
}
// return recoded vector
*dlen = n;
return 0;
}
/* Test recode(). */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
// Return a random vector of len unsigned values in the range 0..max.
static void ranvec(unsigned *vec, size_t len, unsigned max) {
unsigned bits = 0;
unsigned long long mask = 1;
while (mask <= max) {
mask <<= 1;
bits++;
}
mask--;
unsigned long long ran = 0;
unsigned have = 0;
size_t n = 0;
while (n < len) {
while (have < bits) {
ran = (ran << 31) + random();
have += 31;
}
if ((ran & mask) <= max)
vec[n++] = ran & mask;
ran >>= bits;
have -= bits;
}
}
// Get a valid number from str and assign it to var
#define NUM(var, str) \
do { \
char *end; \
unsigned long val = strtoul(str, &end, 0); \
var = val; \
if (*end || var != val) { \
fprintf(stderr, \
"invalid or out of range numeric argument: %s\n", str); \
return 1; \
} \
} while (0)
/* "bet n m len count" generates count test vectors of length len, where each
entry is in the range 0..n. Each vector is recoded to another vector using
only symbols in the range 0..m. That vector is recoded back to a vector
using only symbols in 0..n, and that result is compared with the original
random vector. Report on the average ratio of input and output symbols, as
compared to the optimal ratio for arbitrary precision base encoding. */
int main(int argc, char **argv)
{
// get sizes of alphabets and length of test vector, compute maximum sizes
// of recoded vectors
unsigned smax, dmax, runs;
size_t slen, dsize, bsize;
if (argc != 5) { fputs("need four arguments\n", stderr); return 1; }
NUM(smax, argv[1]);
NUM(dmax, argv[2]);
NUM(slen, argv[3]);
NUM(runs, argv[4]);
dsize = ceil(slen * ceil(log2(smax + 1.)) / floor(log2(dmax + 1.)));
bsize = ceil(dsize * ceil(log2(dmax + 1.)) / floor(log2(smax + 1.)));
// generate random test vectors, encode, decode, and compare
srandomdev();
unsigned source[slen], dest[dsize], back[bsize];
unsigned mis = 0, i;
unsigned long long dtot = 0;
int ret;
for (i = 0; i < runs; i++) {
ranvec(source, slen, smax);
size_t dlen = dsize;
ret = recode(dest, &dlen, dmax, source, slen, smax);
if (ret) {
fprintf(stderr, "encode error %d\n", ret);
break;
}
dtot += dlen;
size_t blen = bsize;
ret = recode(back, &blen, smax, dest, dlen, dmax);
if (ret) {
fprintf(stderr, "decode error %d\n", ret);
break;
}
if (blen < slen || memcmp(source, back, slen)) // blen > slen is ok
mis++;
}
if (mis)
fprintf(stderr, "%u/%u mismatches!\n", mis, i);
if (ret == 0)
printf("mean dest/source symbols = %.4f (optimal = %.4f)\n",
dtot / (i * (double)slen), log(smax + 1.) / log(dmax + 1.));
return 0;
}
As has been pointed out in other StackOverflow answers, try not to think of summing digit * base^position as converting it to base ten; rather, think of it as directing the computer to generate a representation of the quantity represented by the number in its own terms (for most computers probably closer to our concept of base 2). Once the computer has its own representation of the quantity, we can direct it to output the number in any way we like.
By rejecting "big integer" implementations and asking for letter-by-letter conversion you are at the same time arguing that the numerical/alphabetical representation of quantity is not actually what it is, namely that each position represents a quantity of digit * base^position. If the nine-millionth character of War and Peace does represent what you are asking to convert it from, then the computer at some point will need to generate a representation for Д * 33^9000000.
I don't think any solution can work generally because if ne != m for some integer e and some MAX_INT because there's no way to calculate the value of the target base in a certain place p if np > MAX_INT.
You can get away with this for the case where ne == m for some e because the problem is recursively doable (the first e digits of n can be summed and converted into the first digit of M, and then chopped off and repeated.
If you don't have this useful property, then eventually you're going to have to try to take some part of the original base and try to perform modulus in np and np is going to be greater than MAX_INT, which means it's impossible.
What is the best method to find the number of digits of a positive integer?
I have found this 3 basic methods:
conversion to string
String s = new Integer(t).toString();
int len = s.length();
for loop
for(long long int temp = number; temp >= 1;)
{
temp/=10;
decimalPlaces++;
}
logaritmic calculation
digits = floor( log10( number ) ) + 1;
where you can calculate log10(x) = ln(x) / ln(10) in most languages.
First I thought the string method is the dirtiest one but the more I think about it the more I think it's the fastest way. Or is it?
There's always this method:
n = 1;
if ( i >= 100000000 ) { n += 8; i /= 100000000; }
if ( i >= 10000 ) { n += 4; i /= 10000; }
if ( i >= 100 ) { n += 2; i /= 100; }
if ( i >= 10 ) { n += 1; }
Well the correct answer would be to measure it - but you should be able to make a guess about the number of CPU steps involved in converting strings and going through them looking for an end marker
Then think how many FPU operations/s your processor can do and how easy it is to calculate a single log.
edit: wasting some more time on a monday morning :-)
String s = new Integer(t).toString();
int len = s.length();
One of the problems with high level languages is guessing how much work the system is doing behind the scenes of an apparently simple statement. Mandatory Joel link
This statement involves allocating memory for a string, and possibly a couple of temporary copies of a string. It must parse the integer and copy the digits of it into a string, possibly having to reallocate and move the existing memory if the number is large. It might have to check a bunch of locale settings to decide if your country uses "," or ".", it might have to do a bunch of unicode conversions.
Then finding the length has to scan the entire string, again considering unicode and any local specific settings such as - are you in a right->left language?.
Alternatively:
digits = floor( log10( number ) ) + 1;
Just because this would be harder for you to do on paper doesn't mean it's hard for a computer! In fact a good rule in high performance computing seems to have been - if something is hard for a human (fluid dynamics, 3d rendering) it's easy for a computer, and if it's easy for a human (face recognition, detecting a voice in a noisy room) it's hard for a computer!
You can generally assume that the builtin maths functions log/sin/cos etc - have been an important part of computer design for 50years. So even if they don't map directly into a hardware function in the FPU you can bet that the alternative implementation is pretty efficient.
I don't know, and the answer may well be different depending on how your individual language is implemented.
So, stress test it! Implement all three solutions. Run them on 1 through 1,000,000 (or some other huge set of numbers that's representative of the numbers the solution will be running against) and time how long each of them takes.
Pit your solutions against one another and let them fight it out. Like intellectual gladiators. Three algorithms enter! One algorithm leaves!
Test conditions
Decimal numeral system
Positive integers
Up to 10 digits
Language: ActionScript 3
Results
digits: [1,10],
no. of runs: 1,000,000
random sample: 8777509,40442298,477894,329950,513,91751410,313,3159,131309,2
result: 7,8,6,6,3,8,3,4,6,1
CONVERSION TO STRING: 724ms
LOGARITMIC CALCULATION: 349ms
DIV 10 ITERATION: 229ms
MANUAL CONDITIONING: 136ms
Note: Author refrains from making any conclusions for numbers with more than 10 digits.
Script
package {
import flash.display.MovieClip;
import flash.utils.getTimer;
/**
* #author Daniel
*/
public class Digits extends MovieClip {
private const NUMBERS : uint = 1000000;
private const DIGITS : uint = 10;
private var numbers : Array;
private var digits : Array;
public function Digits() {
// ************* NUMBERS *************
numbers = [];
for (var i : int = 0; i < NUMBERS; i++) {
var number : Number = Math.floor(Math.pow(10, Math.random()*DIGITS));
numbers.push(number);
}
trace('Max digits: ' + DIGITS + ', count of numbers: ' + NUMBERS);
trace('sample: ' + numbers.slice(0, 10));
// ************* CONVERSION TO STRING *************
digits = [];
var time : Number = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(String(numbers[i]).length);
}
trace('\nCONVERSION TO STRING - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* LOGARITMIC CALCULATION *************
digits = [];
time = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(Math.floor( Math.log( numbers[i] ) / Math.log(10) ) + 1);
}
trace('\nLOGARITMIC CALCULATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* DIV 10 ITERATION *************
digits = [];
time = getTimer();
var digit : uint = 0;
for (var i : int = 0; i < numbers.length; i++) {
digit = 0;
for(var temp : Number = numbers[i]; temp >= 1;)
{
temp/=10;
digit++;
}
digits.push(digit);
}
trace('\nDIV 10 ITERATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* MANUAL CONDITIONING *************
digits = [];
time = getTimer();
var digit : uint;
for (var i : int = 0; i < numbers.length; i++) {
var number : Number = numbers[i];
if (number < 10) digit = 1;
else if (number < 100) digit = 2;
else if (number < 1000) digit = 3;
else if (number < 10000) digit = 4;
else if (number < 100000) digit = 5;
else if (number < 1000000) digit = 6;
else if (number < 10000000) digit = 7;
else if (number < 100000000) digit = 8;
else if (number < 1000000000) digit = 9;
else if (number < 10000000000) digit = 10;
digits.push(digit);
}
trace('\nMANUAL CONDITIONING: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
}
}
}
This algorithm might be good also, assuming that:
Number is integer and binary encoded (<< operation is cheap)
We don't known number boundaries
var num = 123456789L;
var len = 0;
var tmp = 1L;
while(tmp < num)
{
len++;
tmp = (tmp << 3) + (tmp << 1);
}
This algorithm, should have speed comparable to for-loop (2) provided, but a bit faster due to (2 bit-shifts, add and subtract, instead of division).
As for Log10 algorithm, it will give you only approximate answer (that is close to real, but still), since analytic formula for computing Log function have infinite loop and can't be calculated precisely Wiki.
Use the simplest solution in whatever programming language you're using. I can't think of a case where counting digits in an integer would be the bottleneck in any (useful) program.
C, C++:
char buffer[32];
int length = sprintf(buffer, "%ld", (long)123456789);
Haskell:
len = (length . show) 123456789
JavaScript:
length = String(123456789).length;
PHP:
$length = strlen(123456789);
Visual Basic (untested):
length = Len(str(123456789)) - 1
conversion to string: This will have to iterate through each digit, find the character that maps to the current digit, add a character to a collection of characters. Then get the length of the resulting String object. Will run in O(n) for n=#digits.
for-loop: will perform 2 mathematical operation: dividing the number by 10 and incrementing a counter. Will run in O(n) for n=#digits.
logarithmic: Will call log10 and floor, and add 1. Looks like O(1) but I'm not really sure how fast the log10 or floor functions are. My knowledge of this sort of things has atrophied with lack of use so there could be hidden complexity in these functions.
So I guess it comes down to: is looking up digit mappings faster than multiple mathematical operations or whatever is happening in log10? The answer will probably vary. There could be platforms where the character mapping is faster, and others where doing the calculations is faster. Also to keep in mind is that the first method will creats a new String object that only exists for the purpose of getting the length. This will probably use more memory than the other two methods, but it may or may not matter.
You can obviously eliminate the method 1 from the competition, because the atoi/toString algorithm it uses would be similar to method 2.
Method 3's speed depends on whether the code is being compiled for a system whose instruction set includes log base 10.
For very large integers, the log method is much faster. For instance, with a 2491327 digit number (the 11920928th Fibonacci number, if you care), Python takes several minutes to execute the divide-by-10 algorithm, and milliseconds to execute 1+floor(log(n,10)).
import math
def numdigits(n):
return ( int(math.floor(math.log10(n))) + 1 )
Regarding the three methods you propose for "determining the number of digits necessary to represent a given number in a given base", I don't like any of them, actually; I prefer the method I give below instead.
Re your method #1 (strings): Anything involving converting back-and-forth between strings and numbers is usually very slow.
Re your method #2 (temp/=10): This is fatally flawed because it assumes that x/10 always means "x divided by 10". But in many programming languages (eg: C, C++), if "x" is an integer type, then "x/10" means "integer division", which isn't the same thing as floating-point division, and it introduces round-off errors at every iteration, and they accumulate in a recursive formula such as your solution #2 uses.
Re your method #3 (logs): it's buggy for large numbers (at least in C, and probably other languages as well), because floating-point data types tend not to be as precise as 64-bit integers.
Hence I dislike all 3 of those methods: #1 works but is slow, #2 is broken, and #3 is buggy for large numbers. Instead, I prefer this, which works for numbers from 0 up to about 18.44 quintillion:
unsigned NumberOfDigits (uint64_t Number, unsigned Base)
{
unsigned Digits = 1;
uint64_t Power = 1;
while ( Number / Power >= Base )
{
++Digits;
Power *= Base;
}
return Digits;
}
Keep it simple:
long long int a = 223452355415634664;
int x;
for (x = 1; a >= 10; x++)
{
a = a / 10;
}
printf("%d", x);
You can use a recursive solution instead of a loop, but somehow similar:
#tailrec
def digits (i: Long, carry: Int=1) : Int = if (i < 10) carry else digits (i/10, carry+1)
digits (8345012978643L)
With longs, the picture might change - measure small and long numbers independently against different algorithms, and pick the appropriate one, depending on your typical input. :)
Of course nothing beats a switch:
switch (x) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: case 8: case 9: return 1;
case 10: case 11: // ...
case 99: return 2;
case 100: // you get the point :)
default: return 10; // switch only over int
}
except a plain-o-array:
int [] size = {1,1,1,1,1,1,1,1,1,2,2,2,2,2,... };
int x = 234561798;
return size [x];
Some people will tell you to optimize the code-size, but yaknow, premature optimization ...
log(x,n)-mod(log(x,n),1)+1
Where x is a the base and n is the number.
Here is the measurement in Swift 4.
Algorithms code:
extension Int {
var numberOfDigits0: Int {
var currentNumber = self
var n = 1
if (currentNumber >= 100000000) {
n += 8
currentNumber /= 100000000
}
if (currentNumber >= 10000) {
n += 4
currentNumber /= 10000
}
if (currentNumber >= 100) {
n += 2
currentNumber /= 100
}
if (currentNumber >= 10) {
n += 1
}
return n
}
var numberOfDigits1: Int {
return String(self).count
}
var numberOfDigits2: Int {
var n = 1
var currentNumber = self
while currentNumber > 9 {
n += 1
currentNumber /= 10
}
return n
}
}
Measurement code:
var timeInterval0 = Date()
for i in 0...10000 {
i.numberOfDigits0
}
print("timeInterval0: \(Date().timeIntervalSince(timeInterval0))")
var timeInterval1 = Date()
for i in 0...10000 {
i.numberOfDigits1
}
print("timeInterval1: \(Date().timeIntervalSince(timeInterval1))")
var timeInterval2 = Date()
for i in 0...10000 {
i.numberOfDigits2
}
print("timeInterval2: \(Date().timeIntervalSince(timeInterval2))")
Output
timeInterval0: 1.92149806022644
timeInterval1: 0.557608008384705
timeInterval2: 2.83262193202972
On this measurement basis String conversion is the best option for the Swift language.
I was curious after seeing #daniel.sedlacek results so I did some testing using Swift for numbers having more than 10 digits. I ran the following script in the playground.
let base = [Double(100090000000), Double(100050000), Double(100050000), Double(100000200)]
var rar = [Double]()
for i in 1...10 {
for d in base {
let v = d*Double(arc4random_uniform(UInt32(1000000000)))
rar.append(v*Double(arc4random_uniform(UInt32(1000000000))))
rar.append(Double(1)*pow(1,Double(i)))
}
}
print(rar)
var timeInterval = NSDate().timeIntervalSince1970
for d in rar {
floor(log10(d))
}
var newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
timeInterval = NSDate().timeIntervalSince1970
for d in rar {
var c = d
while c > 10 {
c = c/10
}
}
newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
Results of 80 elements
0.105069875717163 for floor(log10(x))
0.867973804473877 for div 10 iterations
Adding one more approach to many of the already mentioned approaches.
The idea is to use binarySearch on an array containing the range of integers based on the digits of the int data type.
The signature of Java Arrays class binarySearch is :
binarySearch(dataType[] array, dataType key) which returns the index of the search key, if it is contained in the array; otherwise, (-(insertion point) – 1).
The insertion point is defined as the point at which the key would be inserted into the array.
Below is the implementation:
static int [] digits = {9,99,999,9999,99999,999999,9999999,99999999,999999999,Integer.MAX_VALUE};
static int digitsCounter(int N)
{
int digitCount = Arrays.binarySearch(digits , N<0 ? -N:N);
return 1 + (digitCount < 0 ? ~digitCount : digitCount);
}
Please note that the above approach only works for : Integer.MIN_VALUE <= N <= Integer.MAX_VALUE, but can be easily extended for Long data type by adding more values to the digits array.
For example,
I) for N = 555, digitCount = Arrays.binarySearch(digits , 555) returns -3 (-(2)-1) as it's not present in the array but is supposed to be inserted at point 2 between 9 & 99 like [9, 55, 99].
As the index we got is negative we need to take the bitwise compliment of the result.
At last, we need to add 1 to the result to get the actual number of digits in the number N.
In Swift 5.x, you get the number of digit in integer as below :
Convert to string and then count number of character in string
let nums = [1, 7892, 78, 92, 90]
for i in nums {
let ch = String(describing: i)
print(ch.count)
}
Calculating the number of digits in integer using loop
var digitCount = 0
for i in nums {
var tmp = i
while tmp >= 1 {
tmp /= 10
digitCount += 1
}
print(digitCount)
}
let numDigits num =
let num = abs(num)
let rec numDigitsInner num =
match num with
| num when num < 10 -> 1
| _ -> 1 + numDigitsInner (num / 10)
numDigitsInner num
F# Version, without casting to a string.
Is there an algorithm for accurately multiplying two arbitrarily long integers together? The language I am working with is limited to 64-bit unsigned integer length (maximum integer size of 18446744073709551615). Realistically, I would like to be able to do this by breaking up each number, processing them somehow using the unsigned 64-bit integers, and then being able to put them back together in to a string (which would solve the issue of multiplied result storage).
Any ideas?
Most languages have functions or libraries that do this, usually called a Bignum library (GMP is a good one.)
If you want to do it yourself, I would do it the same way that people do long multiplication on paper. To do this you could either work with strings containing the number, or do it in binary using bitwise operations.
Example:
45
x67
---
315
+270
----
585
Or in binary:
101
x101
----
101
000
+101
------
11001
Edit: After doing it in binary I realized that it would be much simpler (and faster of course) to code using bitwise operations instead of strings containing the base-10 numbers. I've edited my binary multiplying example to show a pattern: for each 1-bit in the bottom number, add the top number, bit-shifted left the position of the 1-bit times to a variable. At the end, that variable will contain the product.
To store the product, you'll have to have two 64-bit numbers and imagine one of them being the first 64 bits and the other one the second 64 bits of the product. You'll have to write code that carries the addition from bit 63 of the second number to bit 0 of the first number.
If you can't use an existing bignum library like GMP, check out Wikipedia's article on binary multiplication with computers. There are a number of good, efficient algorithms for this.
The simplest way would be to use the schoolbook mechanism, splitting your arbitrarily sized numbers into chunks of 32-bit each.
Given A B C D * E F G H (each chunk 32-bit, for a total 128 bit)
You need an output array 9 dwords wide.
Set Out[0..8] to 0
You'd start by doing: H * D + out[8] => 64 bit result.
Store the low 32-bits in out[8] and take the high 32-bits as carry
Next: (H * C) + out[7] + carry
Again, store low 32-bit in out[7], use the high 32-bits as carry
after doing H*A + out[4] + carry, you need to continue looping until you have no carry.
Then repeat with G, F, E.
For G, you'd start at out[7] instead of out[8], and so forth.
Finally, walk through and convert the large integer into digits (which will require a "divide large number by a single word" routine)
Yes, you do it using a datatype that is effectively a string of digits (just like a normal 'string' is a string of characters). How you do this is highly language-dependent. For instance, Java uses BigDecimal. What language are you using?
This is often given as a homework assignment. The algorithm you learned in grade school will work. Use a library (several are mentioned in other posts) if you need this for a real application.
Here is my code piece in C. Good old multiply method
char *multiply(char s1[], char s2[]) {
int l1 = strlen(s1);
int l2 = strlen(s2);
int i, j, k = 0, c = 0;
char *r = (char *) malloc (l1+l2+1); // add one byte for the zero terminating string
int temp;
strrev(s1);
strrev(s2);
for (i = 0;i <l1+l2; i++) {
r[i] = 0 + '0';
}
for (i = 0; i <l1; i ++) {
c = 0; k = i;
for (j = 0; j < l2; j++) {
temp = get_int(s1[i]) * get_int(s2[j]);
temp = temp + c + get_int(r[k]);
c = temp /10;
r[k] = temp%10 + '0';
k++;
}
if (c!=0) {
r[k] = c + '0';
k++;
}
}
r[k] = '\0';
strrev(r);
return r;
}
//Here is a JavaScript version of an Karatsuba Algorithm running with less time than the usual multiplication method
function range(start, stop, step) {
if (typeof stop == 'undefined') {
// one param defined
stop = start;
start = 0;
}
if (typeof step == 'undefined') {
step = 1;
}
if ((step > 0 && start >= stop) || (step < 0 && start <= stop)) {
return [];
}
var result = [];
for (var i = start; step > 0 ? i < stop : i > stop; i += step) {
result.push(i);
}
return result;
};
function zeroPad(numberString, zeros, left = true) {
//Return the string with zeros added to the left or right.
for (var i in range(zeros)) {
if (left)
numberString = '0' + numberString
else
numberString = numberString + '0'
}
return numberString
}
function largeMultiplication(x, y) {
x = x.toString();
y = y.toString();
if (x.length == 1 && y.length == 1)
return parseInt(x) * parseInt(y)
if (x.length < y.length)
x = zeroPad(x, y.length - x.length);
else
y = zeroPad(y, x.length - y.length);
n = x.length
j = Math.floor(n/2);
//for odd digit integers
if ( n % 2 != 0)
j += 1
var BZeroPadding = n - j
var AZeroPadding = BZeroPadding * 2
a = parseInt(x.substring(0,j));
b = parseInt(x.substring(j));
c = parseInt(y.substring(0,j));
d = parseInt(y.substring(j));
//recursively calculate
ac = largeMultiplication(a, c)
bd = largeMultiplication(b, d)
k = largeMultiplication(a + b, c + d)
A = parseInt(zeroPad(ac.toString(), AZeroPadding, false))
B = parseInt(zeroPad((k - ac - bd).toString(), BZeroPadding, false))
return A + B + bd
}
//testing the function here
example = largeMultiplication(12, 34)
console.log(example)