Fastest way to modify one digit of an integer - algorithm

Suppose I have an int x = 54897, old digit index (0 based), and the new value for that digit. What's the fastest way to get the new value?
Example
x = 54897
index = 3
value = 2
y = f(x, index, value) // => 54827
Edit: by fastest, I definitely mean faster performance. No string processing.

In simplest case (considering the digits are numbered from LSB to MSB, the first one being 0) AND knowing the old digit, we could do as simple as that:
num += (new_digit - old_digit) * 10**pos;
For the real problem we would need:
1) the MSB-first version of the pos, that could cost you a log() or at most log10(MAX_INT) divisions by ten (could be improved using binary search).
2) the digit from that pos that would need at most 2 divisions (or zero, using results from step 1).
You could also use the special fpu instruction from x86 that is able to save a float in BCD (I have no idea how slow it is).
UPDATE: the first step could be done even faster, without any divisions, with a binary search like this:
int my_log10(unsigned short n){
// short: 0.. 64k -> 1.. 5 digits
if (n < 1000){ // 1..3
if (n < 10) return 1;
if (n < 100) return 2;
return 3;
} else { // 4..5
if (n < 10000) return 4;
return 5;
}
}

If your index started at the least significant digit, you could do something like
p = pow(10,index);
x = (x / (p*10) * (p*10) + value * p + x % p).
But since your index is backwards, a string is probably the way to go. It would also be more readable and maintainable.

Calculate the "mask" M: 10 raised to the power of index, where index is a zero-based index from the right. If you need to index from the left, recalculate index accordingly.
Calculate the "prefix" PRE = x / (M * 10) * (M * 10)
Calculate the "suffix" SUF = x % M
Calculate the new "middle part" MID = value * M
Generate the new number new_x = PRE + MID + POST.
P.S. ruslik's answer does it more elegantly :)

You need to start by figuring out how many digits are in your input. I can think of two ways of doing that, one with a loop and one with logarithms. Here's the loop version. This will fail for negative and zero inputs and when the index is out of bounds, probably other conditions too, but it's a starting point.
def f(x, index, value):
place = 1
residual = x
while residual > 0:
if index < 0:
place *= 10
index -= 1
residual /= 10
digit = (x / place) % 10
return x - (place * digit) + (place * value)
P.S. This is working Python code. The principle of something simple like this is easy to work out, but the details are so tricky that you really need to iterate it a bit. In this case I started with the principle that I wanted to subtract out the old digit and add the new one; from there it was a matter of getting the correct multiplier.

You gotta get specific with your compute platform if you're talking about performance.
I would approach this by converting the number into pairs of decimal digits, 4 bit each.
Then I would find and process the pair that needs modification as a byte.
Then I would put the number back together.
There are assemblers that do this very well.

Related

Generating random number in the range 0-N [duplicate]

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.
So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).
Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!
When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11
When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11
When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11
This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.
So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
int x;
do {
x = rand();
} while (x >= n);
but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.
A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
For small values of n, this will rarely require more than one call to rand().
Works cited and further reading:
CPlusPlus Reference
Eternally Confuzzled
Keep selecting a random is a good way to remove the bias.
Update
We could make the code fast if we search for an x in range divisible by n.
// Assumptions
// rand() in [0, RAND_MAX]
// n in (0, RAND_MAX]
int x;
// Keep searching for an x in a range divisible by n
do {
x = rand();
} while (x >= RAND_MAX - (RAND_MAX % n))
x %= n;
The above loop should be very fast, say 1 iteration on average.
#user1413793 is correct about the problem. I'm not going to discuss that further, except to make one point: yes, for small values of n and large values of RAND_MAX, the modulo bias can be very small. But using a bias-inducing pattern means that you must consider the bias every time you calculate a random number and choose different patterns for different cases. And if you make the wrong choice, the bugs it introduces are subtle and almost impossible to unit test. Compared to just using the proper tool (such as arc4random_uniform), that's extra work, not less work. Doing more work and getting a worse solution is terrible engineering, especially when doing it right every time is easy on most platforms.
Unfortunately, the implementations of the solution are all incorrect or less efficient than they should be. (Each solution has various comments explaining the problems, but none of the solutions have been fixed to address them.) This is likely to confuse the casual answer-seeker, so I'm providing a known-good implementation here.
Again, the best solution is just to use arc4random_uniform on platforms that provide it, or a similar ranged solution for your platform (such as Random.nextInt on Java). It will do the right thing at no code cost to you. This is almost always the correct call to make.
If you don't have arc4random_uniform, then you can use the power of opensource to see exactly how it is implemented on top of a wider-range RNG (ar4random in this case, but a similar approach could also work on top of other RNGs).
Here is the OpenBSD implementation:
/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by generating new random numbers until the one
* returned is outside the range [0, 2**32 % upper_bound). This
* guarantees the selected random number will be inside
* [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
* after reduction modulo upper_bound.
*/
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
It is worth noting the latest commit comment on this code for those who need to implement similar things:
Change arc4random_uniform() to calculate 2**32 % upper_bound as
-upper_bound % upper_bound. Simplifies the code and makes it the
same on both ILP32 and LP64 architectures, and also slightly faster on
LP64 architectures by using a 32-bit remainder instead of a 64-bit
remainder.
Pointed out by Jorden Verwer on tech#
ok deraadt; no objections from djm or otto
The Java implementation is also easily findable (see previous link):
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
Definition
Modulo Bias is the inherent bias in using modulo arithmetic to reduce an output set to a subset of the input set. In general, a bias exists whenever the mapping between the input and output set is not equally distributed, as in the case of using modulo arithmetic when the size of the output set is not a divisor of the size of the input set.
This bias is particularly hard to avoid in computing, where numbers are represented as strings of bits: 0s and 1s. Finding truly random sources of randomness is also extremely difficult, but is beyond the scope of this discussion. For the remainder of this answer, assume that there exists an unlimited source of truly random bits.
Problem Example
Let's consider simulating a die roll (0 to 5) using these random bits. There are 6 possibilities, so we need enough bits to represent the number 6, which is 3 bits. Unfortunately, 3 random bits yields 8 possible outcomes:
000 = 0, 001 = 1, 010 = 2, 011 = 3
100 = 4, 101 = 5, 110 = 6, 111 = 7
We can reduce the size of the outcome set to exactly 6 by taking the value modulo 6, however this presents the modulo bias problem: 110 yields a 0, and 111 yields a 1. This die is loaded.
Potential Solutions
Approach 0:
Rather than rely on random bits, in theory one could hire a small army to roll dice all day and record the results in a database, and then use each result only once. This is about as practical as it sounds, and more than likely would not yield truly random results anyway (pun intended).
Approach 1:
Instead of using the modulus, a naive but mathematically correct solution is to discard results that yield 110 and 111 and simply try again with 3 new bits. Unfortunately, this means that there is a 25% chance on each roll that a re-roll will be required, including each of the re-rolls themselves. This is clearly impractical for all but the most trivial of uses.
Approach 2:
Use more bits: instead of 3 bits, use 4. This yield 16 possible outcomes. Of course, re-rolling anytime the result is greater than 5 makes things worse (10/16 = 62.5%) so that alone won't help.
Notice that 2 * 6 = 12 < 16, so we can safely take any outcome less than 12 and reduce that modulo 6 to evenly distribute the outcomes. The other 4 outcomes must be discarded, and then re-rolled as in the previous approach.
Sounds good at first, but let's check the math:
4 discarded results / 16 possibilities = 25%
In this case, 1 extra bit didn't help at all!
That result is unfortunate, but let's try again with 5 bits:
32 % 6 = 2 discarded results; and
2 discarded results / 32 possibilities = 6.25%
A definite improvement, but not good enough in many practical cases. The good news is, adding more bits will never increase the chances of needing to discard and re-roll. This holds not just for dice, but in all cases.
As demonstrated however, adding an 1 extra bit may not change anything. In fact if we increase our roll to 6 bits, the probability remains 6.25%.
This begs 2 additional questions:
If we add enough bits, is there a guarantee that the probability of a discard will diminish?
How many bits are enough in the general case?
General Solution
Thankfully the answer to the first question is yes. The problem with 6 is that 2^x mod 6 flips between 2 and 4 which coincidentally are a multiple of 2 from each other, so that for an even x > 1,
[2^x mod 6] / 2^x == [2^(x+1) mod 6] / 2^(x+1)
Thus 6 is an exception rather than the rule. It is possible to find larger moduli that yield consecutive powers of 2 in the same way, but eventually this must wrap around, and the probability of a discard will be reduced.
Without offering further proof, in general using double the number
of bits required will provide a smaller, usually insignificant,
chance of a discard.
Proof of Concept
Here is an example program that uses OpenSSL's libcrypo to supply random bytes. When compiling, be sure to link to the library with -lcrypto which most everyone should have available.
#include <iostream>
#include <assert.h>
#include <limits>
#include <openssl/rand.h>
volatile uint32_t dummy;
uint64_t discardCount;
uint32_t uniformRandomUint32(uint32_t upperBound)
{
assert(RAND_status() == 1);
uint64_t discard = (std::numeric_limits<uint64_t>::max() - upperBound) % upperBound;
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
while(randomPool > (std::numeric_limits<uint64_t>::max() - discard)) {
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
++discardCount;
}
return randomPool % upperBound;
}
int main() {
discardCount = 0;
const uint32_t MODULUS = (1ul << 31)-1;
const uint32_t ROLLS = 10000000;
for(uint32_t i = 0; i < ROLLS; ++i) {
dummy = uniformRandomUint32(MODULUS);
}
std::cout << "Discard count = " << discardCount << std::endl;
}
I encourage playing with the MODULUS and ROLLS values to see how many re-rolls actually happen under most conditions. A sceptical person may also wish to save the computed values to file and verify the distribution appears normal.
Mark's Solution (The accepted solution) is Nearly Perfect.
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
edited Mar 25 '16 at 23:16
Mark Amery 39k21170211
However, it has a caveat which discards 1 valid set of outcomes in any scenario where RAND_MAX (RM) is 1 less than a multiple of N (Where N = the Number of possible valid outcomes).
ie, When the 'count of values discarded' (D) is equal to N, then they are actually a valid set (V), not an invalid set (I).
What causes this is at some point Mark loses sight of the difference between N and Rand_Max.
N is a set who's valid members are comprised only of Positive Integers, as it contains a count of responses that would be valid. (eg: Set N = {1, 2, 3, ... n } )
Rand_max However is a set which ( as defined for our purposes ) includes any number of non-negative integers.
In it's most generic form, what is defined here as Rand Max is the Set of all valid outcomes, which could theoretically include negative numbers or non-numeric values.
Therefore Rand_Max is better defined as the set of "Possible Responses".
However N operates against the count of the values within the set of valid responses, so even as defined in our specific case, Rand_Max will be a value one less than the total number it contains.
Using Mark's Solution, Values are Discarded when: X => RM - RM % N
EG:
Ran Max Value (RM) = 255
Valid Outcome (N) = 4
When X => 252, Discarded values for X are: 252, 253, 254, 255
So, if Random Value Selected (X) = {252, 253, 254, 255}
Number of discarded Values (I) = RM % N + 1 == N
IE:
I = RM % N + 1
I = 255 % 4 + 1
I = 3 + 1
I = 4
X => ( RM - RM % N )
255 => (255 - 255 % 4)
255 => (255 - 3)
255 => (252)
Discard Returns $True
As you can see in the example above, when the value of X (the random number we get from the initial function) is 252, 253, 254, or 255 we would discard it even though these four values comprise a valid set of returned values.
IE: When the count of the values Discarded (I) = N (The number of valid outcomes) then a Valid set of return values will be discarded by the original function.
If we describe the difference between the values N and RM as D, ie:
D = (RM - N)
Then as the value of D becomes smaller, the Percentage of unneeded re-rolls due to this method increases at each natural multiplicative. (When RAND_MAX is NOT equal to a Prime Number this is of valid concern)
EG:
RM=255 , N=2 Then: D = 253, Lost percentage = 0.78125%
RM=255 , N=4 Then: D = 251, Lost percentage = 1.5625%
RM=255 , N=8 Then: D = 247, Lost percentage = 3.125%
RM=255 , N=16 Then: D = 239, Lost percentage = 6.25%
RM=255 , N=32 Then: D = 223, Lost percentage = 12.5%
RM=255 , N=64 Then: D = 191, Lost percentage = 25%
RM=255 , N= 128 Then D = 127, Lost percentage = 50%
Since the percentage of Rerolls needed increases the closer N comes to RM, this can be of valid concern at many different values depending on the constraints of the system running he code and the values being looked for.
To negate this we can make a simple amendment As shown here:
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
This provides a more general version of the formula which accounts for the additional peculiarities of using modulus to define your max values.
Examples of using a small value for RAND_MAX which is a multiplicative of N.
Mark'original Version:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X >= (RAND_MAX - ( RAND_MAX % n ) )
When X >= 2 the value will be discarded, even though the set is valid.
Generalized Version 1:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X > (RAND_MAX - ( ( RAND_MAX % n ) + 1 ) % n )
When X > 3 the value would be discarded, but this is not a vlue in the set RAND_MAX so there will be no discard.
Additionally, in the case where N should be the number of values in RAND_MAX; in this case, you could set N = RAND_MAX +1, unless RAND_MAX = INT_MAX.
Loop-wise you could just use N = 1, and any value of X will be accepted, however, and put an IF statement in for your final multiplier. But perhaps you have code that may have a valid reason to return a 1 when the function is called with n = 1...
So it may be better to use 0, which would normally provide a Div 0 Error, when you wish to have n = RAND_MAX+1
Generalized Version 2:
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
} else {
x = rand();
}
Both of these solutions resolve the issue with needlessly discarded valid results which will occur when RM+1 is a product of n.
The second version also covers the edge case scenario when you need n to equal the total possible set of values contained in RAND_MAX.
The modified approach in both is the same and allows for a more general solution to the need of providing valid random numbers and minimizing discarded values.
To reiterate:
The Basic General Solution which extends mark's example:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
The Extended General Solution which Allows one additional scenario of RAND_MAX+1 = n:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
} else {
x = rand();
}
In some languages ( particularly interpreted languages ) doing the calculations of the compare-operation outside of the while condition may lead to faster results as this is a one-time calculation no matter how many re-tries are required. YMMV!
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x; // Resulting random number
int y; // One-time calculation of the compare value for x
y = RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n)
if n != 0 {
do {
x = rand();
} while (x > y);
x %= n;
} else {
x = rand();
}
There are two usual complaints with the use of modulo.
one is valid for all generators. It is easier to see in a limit case. If your generator has a RAND_MAX which is 2 (that isn't compliant with the C standard) and you want only 0 or 1 as value, using modulo will generate 0 twice as often (when the generator generates 0 and 2) as it will generate 1 (when the generator generates 1). Note that this is true as soon as you don't drop values, whatever the mapping you are using from the generator values to the wanted one, one will occurs twice as often as the other.
some kind of generator have their less significant bits less random than the other, at least for some of their parameters, but sadly those parameter have other interesting characteristic (such has being able to have RAND_MAX one less than a power of 2). The problem is well known and for a long time library implementation probably avoid the problem (for instance the sample rand() implementation in the C standard use this kind of generator, but drop the 16 less significant bits), but some like to complain about that and you may have bad luck
Using something like
int alea(int n){
assert (0 < n && n <= RAND_MAX);
int partSize =
n == RAND_MAX ? 1 : 1 + (RAND_MAX-n)/(n+1);
int maxUsefull = partSize * n + (partSize-1);
int draw;
do {
draw = rand();
} while (draw > maxUsefull);
return draw/partSize;
}
to generate a random number between 0 and n will avoid both problems (and it avoids overflow with RAND_MAX == INT_MAX)
BTW, C++11 introduced standard ways to the the reduction and other generator than rand().
With a RAND_MAX value of 3 (in reality it should be much higher than that but the bias would still exist) it makes sense from these calculations that there is a bias:
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
random_between(1, 3) % 2 = more likely a 1
In this case, the % 2 is what you shouldn't do when you want a random number between 0 and 1. You could get a random number between 0 and 2 by doing % 3 though, because in this case: RAND_MAX is a multiple of 3.
Another method
There is much simpler but to add to other answers, here is my solution to get a random number between 0 and n - 1, so n different possibilities, without bias.
the number of bits (not bytes) needed to encode the number of possibilities is the number of bits of random data you'll need
encode the number from random bits
if this number is >= n, restart (no modulo).
Really random data is not easy to obtain, so why use more bits than needed.
Below is an example in Smalltalk, using a cache of bits from a pseudo-random number generator. I'm no security expert so use at your own risk.
next: n
| bitSize r from to |
n < 0 ifTrue: [^0 - (self next: 0 - n)].
n = 0 ifTrue: [^nil].
n = 1 ifTrue: [^0].
cache isNil ifTrue: [cache := OrderedCollection new].
cache size < (self randmax highBit) ifTrue: [
Security.DSSRandom default next asByteArray do: [ :byte |
(1 to: 8) do: [ :i | cache add: (byte bitAt: i)]
]
].
r := 0.
bitSize := n highBit.
to := cache size.
from := to - bitSize + 1.
(from to: to) do: [ :i |
r := r bitAt: i - from + 1 put: (cache at: i)
].
cache removeFrom: from to: to.
r >= n ifTrue: [^self next: n].
^r
Modulo reduction is a commonly seen way to make a random integer generator avoid the worst case of running forever.
When the range of possible integers is unknown, however, there is no way in general to "fix" this worst case of running forever without introducing bias. It's not just modulo reduction (rand() % n, discussed in the accepted answer) that will introduce bias this way, but also the "multiply-and-shift" reduction of Daniel Lemire, or if you stop rejecting an outcome after a set number of iterations. (To be clear, this doesn't mean there is no way to fix the bias issues present in pseudorandom generators. For example, even though modulo and other reductions are biased in general, they will have no issues with bias if the range of possible integers is a power of 2 and if the random generator produces unbiased random bits or blocks of them.)
The following answer of mine discusses the relationship between running time and bias in random generators, assuming we have a "true" random generator that can produce unbiased and independent random bits. The answer doesn't even involve the rand() function in C because it has many issues. Perhaps the most serious here is the fact that the C standard does not explicitly specify a particular distribution for the numbers returned by rand(), not even a uniform distribution.
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
As the accepted answer indicates, "modulo bias" has its roots in the low value of RAND_MAX. He uses an extremely small value of RAND_MAX (10) to show that if RAND_MAX were 10, then you tried to generate a number between 0 and 2 using %, the following outcomes would result:
rand() % 3 // if RAND_MAX were only 10, gives
output of rand() | rand()%3
0 | 0
1 | 1
2 | 2
3 | 0
4 | 1
5 | 2
6 | 0
7 | 1
8 | 2
9 | 0
So there are 4 outputs of 0's (4/10 chance) and only 3 outputs of 1 and 2 (3/10 chances each).
So it's biased. The lower numbers have a better chance of coming out.
But that only shows up so obviously when RAND_MAX is small. Or more specifically, when the number your are modding by is large compared to RAND_MAX.
A much better solution than looping (which is insanely inefficient and shouldn't even be suggested) is to use a PRNG with a much larger output range. The Mersenne Twister algorithm has a maximum output of 4,294,967,295. As such doing MersenneTwister::genrand_int32() % 10 for all intents and purposes, will be equally distributed and the modulo bias effect will all but disappear.
I just wrote a code for Von Neumann's Unbiased Coin Flip Method, that should theoretically eliminate any bias in the random number generation process. More info can be found at (http://en.wikipedia.org/wiki/Fair_coin)
int unbiased_random_bit() {
int x1, x2, prev;
prev = 2;
x1 = rand() % 2;
x2 = rand() % 2;
for (;; x1 = rand() % 2, x2 = rand() % 2)
{
if (x1 ^ x2) // 01 -> 1, or 10 -> 0.
{
return x2;
}
else if (x1 & x2)
{
if (!prev) // 0011
return 1;
else
prev = 1; // 1111 -> continue, bias unresolved
}
else
{
if (prev == 1)// 1100
return 0;
else // 0000 -> continue, bias unresolved
prev = 0;
}
}
}

Remove the inferior digits of a number

Given a number n of x digits. How to remove y digits in a way the remaining digits results in the greater possible number?
Examples:
1)x=7 y=3
n=7816295
-8-6-95
=8695
2)x=4 y=2
n=4213
4--3
=43
3)x=3 y=1
n=888
=88
Just to state: x > y > 0.
For each digit to remove: iterate through the digits left to right; if you find a digit that's less than the one to its right, remove it and stop, otherwise remove the last digit.
If the number of digits x is greater than the actual length of the number, it means there are leading zeros. Since those will be the first to go, you can simply reduce the count y by a corresponding amount.
Here's a working version in Python:
def remove_digits(n, x, y):
s = str(n)
if len(s) > x:
raise ValueError
elif len(s) < x:
y -= x - len(s)
if y <= 0:
return n
for r in range(y):
for i in range(len(s)):
if s[i] < s[i+1:i+2]:
break
s = s[:i] + s[i+1:]
return int(s)
>>> remove_digits(7816295, 7, 3)
8695
>>> remove_digits(4213, 4, 2)
43
>>> remove_digits(888, 3, 1)
88
I hesitated to submit this, because it seems too simple. But I wasn't able to think of a case where it wouldn't work.
if x = y we have to remove all the digits.
Otherwise, you need to find maximum digit in first y + 1 digits. Then remove all the y0 elements before this maximum digit. Then you need to add that maximum to the answer and then repeat that task again, but you need now to remove y - y0 elements now.
Straight forward implementation will work in O(x^2) time in the worst case.
But finding maximum in the given range can be done effectively using Segment Tree data structure. Time complexity will be O(x * log(x)) in the worst case.
P. S. I just realized, that it possible to solve in O(x) also, using the fact, that exists only 10 digits (but the algorithm maybe a little bit complicated). We need to find the minimum in the given range [L, R], but the ranges in this task will "change" from left to the right (L and R always increase). And we just need to store 10 pointers to the digits (1 per digit) to the first position in the number such that position >= L. Then to find the minimum, we need to check only 10 pointers. To update the pointers, we will try to move them right.
So the time complexity will be O(10 * x) = O(x)
Here's an O(x) solution. It builds an index that maps (i, d) to j, the smallest number > i such that the j'th digit of n is d. With this index, one can easily find the largest possible next digit in the solution in O(1) time.
def index(digits):
next = [len(digits)+1] * 10
for i in xrange(len(digits), 0, -1):
next[ord(digits[i-1])-ord('0')] = i-1
yield next[::-1]
def minseq(n, y):
n = str(n)
idx = list(index(n))[::-1]
i, r = 0, []
for ry in xrange(len(n)-y):
i = next(j for j in idx[i] if j <= y+ry) + 1
r.append(n[i - 1])
return ''.join(r)
print minseq(7816295, 3)
print minseq(4213, 2)
Pseudocode:
Number.toDigits().filter (sortedSet (Number.toDigits()). take (y))
Imho you don't need to know x.
For efficiency, Number.toDigits () could be precalculated
digits = Number.toDigits()
digits.filter (sortedSet (digits).take (y))
Depending on language and context, you either output the digits and are done or have to convert the result into a number again.
Working Scala-Code for example:
def toDigits (l: Long) : List [Long] = if (l < 10) l :: Nil else (toDigits (l /10)) :+ (l % 10)
val num = 734529L
val dig = toDigits (num)
dig.filter (_ > ((dig.sorted).take(2).last))
A sorted set is a set which is sorted, which means, every element is only contained once and then the resulting collection is sorted by some criteria, for example numerical ascending. => 234579.
We take two of them (23) and from that subset the last (3) and filter the number by the criteria, that the digits have to be greater than that value (3).
Your question does not explicitly say, that each digit is only contained once in the original number, but since you didn't give a criterion, which one to remove in doubt, I took it as an implicit assumption.
Other languages may of course have other expressions (x.sorted, x.toSortedSet, new SortedSet (num), ...) or lack certain classes, functions, which you would have to build on your own.
You might need to write your own filter method, which takes a pedicate P, and a collection C, and returns a new collection of all elements which satisfy P, P being a Method which takes one T and returns a Boolean. Very useful stuff.

Finding the index of a given permutation

I'm reading the numbers 0, 1, ..., (N - 1) one by one in some order. My goal is to find the lexicography index of this given permutation, using only O(1) space.
This question was asked before, but all the algorithms I could find used O(N) space. I'm starting to think that it's not possible. But it would really help me a lot with reducing the number of allocations.
Considering the following data:
chars = [a, b, c, d]
perm = [c, d, a, b]
ids = get_indexes(perm, chars) = [2, 3, 0, 1]
A possible solution for permutation with repetitions goes as follows:
len = length(perm) (len = 4)
num_chars = length(chars) (len = 4)
base = num_chars ^ len (base = 4 ^ 4 = 256)
base = base / len (base = 256 / 4 = 64)
id = base * ids[0] (id = 64 * 2 = 128)
base = base / len (base = 64 / 4 = 16)
id = id + (base * ids[1]) (id = 128 + (16 * 3) = 176)
base = base / len (base = 16 / 4 = 4)
id = id + (base * ids[2]) (id = 176 + (4 * 0) = 176)
base = base / len (base = 4 / 4 = 1)
id = id + (base * ids[3]) (id = 176 + (1 * 1) = 177)
Reverse process:
id = 177
(id / (4 ^ 3)) % 4 = (177 / 64) % 4 = 2 % 4 = 2 -> chars[2] -> c
(id / (4 ^ 2)) % 4 = (177 / 16) % 4 = 11 % 4 = 3 -> chars[3] -> d
(id / (4 ^ 1)) % 4 = (177 / 4) % 4 = 44 % 4 = 0 -> chars[0] -> a
(id / (4 ^ 0)) % 4 = (177 / 1) % 4 = 177 % 4 = 1 -> chars[1] -> b
The number of possible permutations is given by num_chars ^ num_perm_digits, having num_chars as the number of possible characters, and num_perm_digits as the number of digits in a permutation.
This requires O(1) in space, considering the initial list as a constant cost; and it requires O(N) in time, considering N as the number of digits your permutation will have.
Based on the steps above, you can do:
function identify_permutation(perm, chars) {
for (i = 0; i < length(perm); i++) {
ids[i] = get_index(perm[i], chars);
}
len = length(perm);
num_chars = length(chars);
index = 0;
base = num_chars ^ len - 1;
base = base / len;
for (i = 0; i < length(perm); i++) {
index += base * ids[i];
base = base / len;
}
}
It's a pseudocode, but it's also quite easy to convert to any language (:
If you are looking for a way to obtain the lexicographic index or rank of a unique combination instead of a permutation, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The following tested code will iterate through each unique combinations:
public void Test10Choose5()
{
String S;
int Loop;
int N = 10; // Total number of elements in the set.
int K = 5; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
You should be able to port this class over fairly easily to the language of your choice. You probably will not have to port over the generic part of the class to accomplish your goals. Depending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.
There is a java solution to this problem on geekviewpoint. It has a good explanation for why it's true and the code is easy to follow. http://www.geekviewpoint.com/java/numbers/permutation_index. It also has a unit test that runs the code with different inputs.
There are N! permutations. To represent index you need at least N bits.
Here is a way to do it if you want to assume that arithmetic operations are constant time:
def permutationIndex(numbers):
n=len(numbers)
result=0
j=0
while j<n:
# Determine factor, which is the number of possible permutations of
# the remaining digits.
i=1
factor=1
while i<n-j:
factor*=i
i+=1
i=0
# Determine index, which is how many previous digits there were at
# the current position.
index=numbers[j]
while i<j:
# Only the digits that weren't used so far are valid choices, so
# the index gets reduced if the number at the current position
# is greater than one of the previous digits.
if numbers[i]<numbers[j]:
index-=1
i+=1
# Update the result.
result+=index*factor
j+=1
return result
I've purposefully written out certain calculations that could be done more simply using some Python built-in operations, but I wanted to make it more obvious that no extra non-constant amount of space was being used.
As maxim1000 noted, the number of bits required to represent the result will grow quickly as n increases, so eventually big integers will be required, which no longer have constant-time arithmetic, but I think this code addresses the spirit of your question.
Nothing really new in the idea but a fully matricial method with no explicit loop or recursion (using Numpy but easy to adapt):
import numpy as np
import math
vfact = np.vectorize(math.factorial, otypes='O')
def perm_index(p):
return np.dot( vfact(range(len(p)-1, -1, -1)),
p-np.sum(np.triu(p>np.vstack(p)), axis=0) )
I just wrote a code using Visual Basic and my program can directly calculate every index or every corresponding permutation to a given index up to 17 elements (this limit is due to the approximation of the scientific notation of numbers over 17! of my compiler).
If you are interested I can I can send the program or publish it somewhere for download.
It works fine and It can be useful for testing and paragon the output of your codes.
I used the method of James D. McCaffrey called factoradic and you can read about it here and something also here (in the discussion at the end of the page).

Puzzled over palindromic product problem

I've been learning Ruby, so I thought I'd try my hand at some of the project Euler puzzles. Embarrassingly, I only made it to problem 4...
Problem 4 goes as follows:
A palindromic number reads the same
both ways. The largest palindrome made
from the product of two 2-digit
numbers is 9009 = 91 × 99.
Find the largest palindrome made from
the product of two 3-digit numbers.
So I figured I would loop down from 999 to 100 in a nested for loop and do a test for the palindrome and then break out of the loops when I found the first one (which should be the largest one):
final=nil
range = 100...1000
for a in range.to_a.reverse do
for b in range.to_a.reverse do
c=a*b
final=c if c.to_s == c.to_s.reverse
break if !final.nil?
end
break if !final.nil?
end
puts final
This does output a palindrome 580085, but apparently this isn't the highest product of two three-digit numbers within the range. Strangely, the same code succeeds to return 9009, like in the example, if I change the range to 10...100.
Can someone tell me where I am going
wrong?
Also, is there a nicer way to
break out of the internal loop?
Thanks
You are testing 999* (999...100), then 998 * (999...100)
Hence you will be testing 999 * 500 before you test 997 * 996.
So, how you we find that right number?
First note the multiplication is reflective, a * b == b * a, so b need not go from 999...0 every time, just a ...0.
When you find a palindrone, add the two factors together and save the sum (save the two factors also)
Inside the loop, if (a+b) is ever less than the saved sum, abandon the inner loop and move to the next a. When a falls below sum/2, no future value you could find would be higher than the one you've already found, so you're done.
The problem is that you might find a palindrome for an a of 999 and a b of 200, but you break too soon, so you never see that there is one for 998*997 (just example numbers).
You need to either look for all palindromes or once you find the first one, set that b as your minimum bound and continue looking through the a loop.
Regarding the second question, my advice is to approach the problem in more functional, than procedural manner. So, rather than looping, you may try to "describe" your problem functionally, and let Ruby does the work:
From all the pairs of 3-digit numbers,
select only those whose product is a palindrome,
and find the one with the largest product
Although this approach may not yield the most efficient of the solutions, it may teach you couple of Ruby idioms.
Consider the digits of P – let them be x, y and z. P must be at least 6 digits long since the palindrome 111111 = 143×777 – the product of two 3-digit integers. Since P is palindromic:
P=100000x + 10000y + 1000z + 100z + 10y + x
P=100001x + 10010y + 1100z
P=11(9091x + 910y + 100z)
Since 11 is prime, at least one of the integers a or b must have a factor of 11. So if a is not divisible by 11 then we know b must be. Using this information we can determine what values of b we check depending on a.
C# Implementation :
using System;
namespace HighestPalindrome
{
class Program
{
static void Main(string[] args)
{
int i, j;
int m = 1;
bool flag = false;
while (true)
{
if (flag) j = m + 1;
else j = m;
for (i = m; i > 0; i--)
{
Console.WriteLine("{0} * {1} = {2}", 1000 - i, 1000 - j, (1000 - i) * (1000 - j));
j++;
//--- Palindrome Check ------------------------------
int number, temp, remainder, sum = 0;
number = temp = (1000 - i) * (1000 - j);
while (number > 0)
{
remainder = number % 10;
number /= 10;
sum = sum * 10 + remainder;
}
if (sum == temp)
{
Console.WriteLine("Highest Palindrome Number is - {0} * {1} = {2}", 1000 - i, 1000 - j, temp);
Console.ReadKey();
return;
}
//---------------------------------------------------
}
if (flag)
m++;
flag = !flag;
}
}
}
}
The mistake is you assume that if you find palindrom with greatest a value it will give the greatest product it isn't true. Solution is to keep max_product value and update it against solution you find.
I can answer your first question: You need to find the highest product, not the product containing the highest factor. In other words a * b could be greater than c * d even if c > a > b.
You're breaking on the first palindrome you come to, not necessarily the biggest.
Say you have A,B,C,D,E. You test E * A before you test D * C.
The main thing is to go through all the possible values. Don't try to break when you find the first answer just start with a best answer of zero then try all combinations and keep updating best. The secondary thing is to try to reduce the set of "all combinations".
One thing you can do is limit your inner loop to values less than or equal to a (since ab == ba). This puts the larger value of your equation always in a and substantially reduces the number of values you have to test.
for a in range.to_a.reverse do
for b in (100..a).to_a.reverse do
The next thing you can do is break out of the inner loop whenever the product is less than the current best value.
c = a*b
next if c < best
Next, if you're going to go through them all anyway there's no benefit to going through them in reverse. By starting at the top of the range it takes a while before you find a palindromic number and as a result it takes a while to reduce your search set. If you start at the bottom you begin to increase the lower bound quickly.
for a in range.to_a do
for b in (100..a).to_a do
My tests show that either way you try some 405K pairs however. So how about thinking of the problem a different way. What is the largest possible product of two 3 digit numbers? 999 * 999 = 998001 and the smallest is 100*100 = 10000. How about we take the idea you had of breaking on the first answer but apply it to a different range, that being 998001 to 10000 (or 999*999 to 100*100).
for c in (10000...998001).to_a.reverse do
We get to a palindrome after only 202 tests... the problem is it isn't a product of two 3-digit numbers. So now we have to check whether the palindrome we've found is a product of 2 3-digit numbers. As soon as we find a value in the range that is a palindrome and a product of two 3-digit numbers we're done. My tests show we find the highest palindrome that meets the requirement after less than 93K tests. But since we have the overhead of checking that all palindromes to that point were products of two 3-digit numbers it may not be more efficient than the previous solution.
So lets go back to the original improvement.
for a in range.to_a.reverse do
for b in (100..a).to_a.reverse do
We're looping rows then columns and trying to be efficient by detecting a point where we can go to the next row because any additional trys on the current row could not possibly be better than our current best. What if, instead of going down the rows, we go across the diagonals?
Since the products get smaller diagonal by diagonal you can stop as soon as you find a palindome number. This is a really efficient solution but with a more complex implementation. It turns out this method finds the highest palindrome after slightly more than 2200 trys.
ar=[]
limit = 100..999
for a in limit.to_a.reverse do
for b in (100..a).to_a.reverse do
c=a*b
if c.to_s == c.to_s.reverse
palndrm=c
ar << palndrm
end
end
end
print ar
print"\n"
puts ar.max
puts ar.min
an implementation:
max = 100.upto(999).inject([-1,0,0]) do |m, a|
a.upto(999) do |b|
prod = a * b
m = [prod, a, b] if prod.to_s == prod.to_s.reverse and prod > m[0]
end
m
end
puts "%d = %d * %d" % max
prints 906609 = 913 * 993
Here's what I came up with in Ruby:
def largest_palindrome_product(digits)
largest, upper, lower = 0, 10**digits - 1, 10**(digits - 1)
for i in upper.downto(lower) do
for j in i.downto(lower) do
product = i * j
largest = product if product > largest && palindrome?(product)
end
end
largest
end
And here's the function to check if the number is a palindrome:
def palindrome?(input)
chars = input.to_s.chars
for i in 0..(chars.size - 1) do
return false if chars[i] != chars[chars.size - i - 1]
end
true
end
I guess there's probably a more efficient solution out there, though.
For this problem, as we are looking for the highest palindrom, i assumed it would start with a 9. Thus ending with a 9 (palindrom).
if you pay attention, to get a number finishing by 9, you can only get it with numbers finishing by 9 and 1, 3 and 3, 7 and 7.
Then it is useless to check the other values (for instance 999*998 as it will not end with a 9).
Starting from 999 and 991, you can then substract 10 to 991, trying 999 and 981 etc...
You do the same with 993 and 993 ... 993 * 983
same with 997 * 997 then 997 * 987 etc
You don't need to go further than 900 or 10^4 - 10^3 as you can be sure the highest will be before.
int PB4_firstTry(int size)
{
int nb1 = (int)pow(10.0,size+1.0) - 1, nb2 = (int)pow(10.0,size+1.0) - 1;
int pal91 = getFirstPalindrome(size,9,1);
int pal33 = getFirstPalindrome(size,3,3);
int pal77 = getFirstPalindrome(size,7,7);
int bigger1 = (pal91 > pal33) ? pal91 : pal33;
return (bigger1 > pal77) ? bigger1 : pal77;
}
int getFirstPalindrome(int size,int ending1,int ending2)
{
int st1 = (int)pow(10.0,size+1.0) - 10 + ending1;
int comp = st1 - pow(10.0,size);
int st2 = (int)pow(10.0,size+1.0) - 10 + ending2;
int answer = -1;
while (st1 > comp)
{
for (int i = st2; i > comp && st1*i > answer; i-=10)
{
if (PB4_isPalindrome(st1*i))
answer = st1*i;
}
st1 -= 10;
}
return answer;
}
bool PB4_isPalindrome(int number)
{
std::string str = intToString(number);
for (int i = 0; i < (int)(str.length() / 2); i++)
{
if (str[i] != str[str.length() - 1 - i])
return false;
}
return true;
}
std::string intToString(int number)
{
std::ostringstream convert;
convert << number;
return convert.str();
}
Of course, this works for 4 size digits factors etc.

Better random "feeling" integer generator for short sequences

I'm trying to figure out a way to create random numbers that "feel" random over short sequences. This is for a quiz game, where there are four possible choices, and the software needs to pick one of the four spots in which to put the correct answer before filling in the other three with distractors.
Obviously, arc4random % 4 will create more than sufficiently random results over a long sequence, but in a short sequence its entirely possible (and a frequent occurrence!) to have five or six of the same number come back in a row. This is what I'm aiming to avoid.
I also don't want to simply say "never pick the same square twice," because that results in only three possible answers for every question but the first. Currently I'm doing something like this:
bool acceptable = NO;
do {
currentAnswer = arc4random() % 4;
if (currentAnswer == lastAnswer) {
if (arc4random() % 4 == 0) {
acceptable = YES;
}
} else {
acceptable = YES;
}
} while (!acceptable);
Is there a better solution to this that I'm overlooking?
If your question was how to compute currentAnswer using your example's probabilities non-iteratively, Guffa has your answer.
If the question is how to avoid random-clustering without violating equiprobability and you know the upper bound of the length of the list, then consider the following algorithm which is kind of like un-sorting:
from random import randrange
# randrange(a, b) yields a <= N < b
def decluster():
for i in range(seq_len):
j = (i + 1) % seq_len
if seq[i] == seq[j]:
i_swap = randrange(i, seq_len) # is best lower bound 0, i, j?
if seq[j] != seq[i_swap]:
print 'swap', j, i_swap, (seq[j], seq[i_swap])
seq[j], seq[i_swap] = seq[i_swap], seq[j]
seq_len = 20
seq = [randrange(1, 5) for _ in range(seq_len)]; print seq
decluster(); print seq
decluster(); print seq
where any relation to actual working Python code is purely coincidental. I'm pretty sure the prior-probabilities are maintained, and it does seem break clusters (and occasionally adds some). But I'm pretty sleepy so this is for amusement purposes only.
You populate an array of outcomes, then shuffle it, then assign them in that order.
So for just 8 questions:
answer_slots = [0,0,1,1,2,2,3,3]
shuffle(answer_slots)
print answer_slots
[1,3,2,1,0,2,3,0]
To reduce the probability for a repeated number by 25%, you can pick a random number between 0 and 3.75, and then rotate it so that the 0.75 ends up at the previous answer.
To avoid using floating point values, you can multiply the factors by four:
Pseudo code (where / is an integer division):
currentAnswer = ((random(0..14) + lastAnswer * 4) % 16) / 4
Set up a weighted array. Lets say the last value was a 2. Make an array like this:
array = [0,0,0,0,1,1,1,1,2,3,3,3,3];
Then pick a number in the array.
newValue = array[arc4random() % 13];
Now switch to using math instead of an array.
newValue = ( ( ( arc4random() % 13 ) / 4 ) + 1 + oldValue ) % 4;
For P possibilities and a weight 0<W<=1 use:
newValue = ( ( ( arc4random() % (P/W-P(1-W)) ) * W ) + 1 + oldValue ) % P;
For P=4 and W=1/4, (P/W-P(1-W)) = 13. This says the last value will be 1/4 as likely as other values.
If you completely eliminate the most recent answer it will be just as noticeable as the most recent answer showing up too often. I do not know what weight will feel right to you, but 1/4 is a good starting point.

Resources