Related
Let us call a number "steady" if sum of digits on odd positions is equal to sum of digits on even positions. For example 132 or 4059. Given a number N, program should output smallest/first "steady" number greater than N. For example if N = 4, answer = 11, if N = 123123, answer = 123134.
But the constraint is that N can be very large. Number of digits in N can be 100. And time limit is 1 second.
My approach was to take in N as a string store each digit in array of int type and add 1 using long arithmetic, than test if the number is steady or not, if Yes output it, if No add 1 again and test if it is steady. Do this until you get the answer.
It works on many tests, but when the difference between oddSum and EvenSum is very large like in 9090909090 program exceeds time limit. I could not come up with other algorithm. Intuitively I think there might be some pattern in swapping several last digits with each other and if necessary add or subtract something to them, but I don't know. I prefer a good HINT instead of answer, because I want to do it myself.
Use the algorithm that you would use. It goes like this:
Input: 9090909090
Input: 9090909090 Odd:0 Even:45
Input: 909090909? Odd:0 Even:45
Clearly no digit will work, we can make the odd at most 9
Input: 90909090?? Odd:0 Even:36
Clearly no digit will work, we removed a 9 and there is no larger digit (we have to make the number larger)
Input: 9090909??? Odd:0 Even:36
Clearly no digit will work. Even is bigger than odd, we can only raise odd to 18
Input: 909090???? Odd:0 Even:27
Clearly no digit will work, we removed a 9
Input: 90909????? Odd:0 Even:27
Perhaps a 9 will work.
Input: 909099???? Odd:9 Even:27
Zero is the smallest number that might work
Input: 9090990??? Odd:9 Even:27
We need 18 more and only have two digits, so 9 is the smallest number that can work
Input: 90909909?? Odd:18 Even:27
Zero is the smallest number that can work.
Input: 909099090? Odd:18 Even:27
9 is the only number that can work
Input: 9090990909 Odd:27 Even:27
Success
Do you see the method? Remove digits while a solution is impossible then add them back until you have the solution. At first, remove digits until a solution is possible. Only a number than the one you removed can be used. Then add numbers back using the smallest one possible at each stage until you have the solution.
You can try Digit DP technique .
Your parameter can be recur(pos,oddsum,evensum,str)
your state transitions will be like this :
bool ans=0
for(int i=0;i<10;i++)
{
ans|=recur(pos+1,oddsum+(pos%2?i:0),evensum+(pos%2?i:0),str+(i+'0')
if(ans) return 1;
}
Base case :
if(pos>=n) return oddsum==evensum;
Memorization: You only need to save pos,oddsum,evensum in your DP array. So your DP array will be DP[100][100*10][100*10]. This is 10^8 and will cause MLE, you have to prune some memory.
As oddsum+evensum<9*100 , we can have only one parameter SUM and add / subtract when odd/even . So our new recursion will look like this : recur(pos,sum,str)
state transitions will be like this :
bool ans=0
for(int i=0;i<10;i++)
{
ans|=recur(pos+1,SUM+(pos%2?i:-i),str+(i+'0')
if(ans) return 1;
}
Base case :
if(pos>=n) return SUM==0;
Memorization: now our Dp array will be 2d having [pos][sum] . we can say DP[100][10*100]
Find the parity with the smaller sum. Starting from the smallest digit of that parity, increase digits of that parity to the min of 9 and the remaining increase needed.
This gets you a larger steady number, but it may be too big.
E.g., 107 gets us 187, but 110 would do.
Next, repeatedly decrement the value of the nonzero digit in the largest position of each parity in our steady number where doing so doesn't reduce us below our target.
187,176,165,154,143,132,121,110
This last step as written is linear in the number of decrements. That's fast enough since there are at most 9*digits of them, but it can be optimized.
Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?
In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.
Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.
Proof by counterexample: 3^4^5^6=4.
I want to run tests with randomized inputs and need to generate 'sensible' random
numbers, that is, numbers that match good enough to pass the tested function's
preconditions, but hopefully wreak havoc deeper inside its code.
math.random() (I'm using Lua) produces uniformly distributed random
numbers. Scaling these up will give far more big numbers than small numbers,
and there will be very few integers.
I would like to skew the random numbers (or generate new ones using the old
function as a randomness source) in a way that strongly favors 'simple' numbers,
but will still cover the whole range, i.e., extending up to positive/negative infinity
(or ±1e309 for double). This means:
numbers up to, say, ten should be most common,
integers should be more common than fractions,
numbers ending in 0.5 should be the most common fractions,
followed by 0.25 and 0.75; then 0.125,
and so on.
A different description: Fix a base probability x such that probabilities
will sum to one and define the probability of a number n as xk
where k is the generation in which n is constructed as a surreal
number1. That assigns x to 0, x2 to -1 and +1,
x3 to -2, -1/2, +1/2 and +2, and so on. This
gives a nice description of something close to what I want (it skews a bit too
much), but is near-unusable for computing random numbers. The resulting
distribution is nowhere continuous (it's fractal!), I'm not sure how to
determine the base probability x (I think for infinite precision it would be
zero), and computing numbers based on this by iteration is awfully
slow (spending near-infinite time to construct large numbers).
Does anyone know of a simple approximation that, given a uniformly distributed
randomness source, produces random numbers very roughly distributed as
described above?
I would like to run thousands of randomized tests, quantity/speed is more
important than quality. Still, better numbers mean less inputs get rejected.
Lua has a JIT, so performance is usually not much of an issue. However, jumps based
on randomness will break every prediction, and many calls to math.random()
will be slow, too. This means a closed formula will be better than an
iterative or recursive one.
1 Wikipedia has an article on surreal numbers, with
a nice picture. A surreal number is a pair of two surreal
numbers, i.e. x := {n|m}, and its value is the number in the middle of the
pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side
of the pair is empty, that's interpreted as increment (or decrement, if right
is empty) by one. If both sides are empty, that's zero. Initially, there are
no numbers, so the only number one can build is 0 := { | }. In generation
two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get
{1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some
more complex representations of known numbers, e.g. {-1|1} ? 0). Note that
e.g. 1/3 is never generated by finite numbers because it is an infinite
fraction – the same goes for floats, 1/3 is never represented exactly.
How's this for an algorithm?
Generate a random float in (0, 1) with a library function
Generate a random integral roundoff point according to a desired probability density function (e.g. 0 with probability 0.5, 1 with probability 0.25, 2 with probability 0.125, ...).
'Round' the float by that roundoff point (e.g. floor((float_val << roundoff)+0.5))
Generate a random integral exponent according to another PDF (e.g. 0, 1, 2, 3 with probability 0.1 each, and decreasing thereafter)
Multiply the rounded float by 2exponent.
For a surreal-like decimal expansion, you need a random binary number.
Even bits tell you whether to stop or continue, odd bits tell you whether to go right or left on the tree:
> 0... => 0.0 [50%] Stop
> 100... => -0.5 [<12.5%] Go, Left, Stop
> 110... => 0.5 [<12.5%] Go, Right, Stop
> 11100... => 0.25 [<3.125%] Go, Right, Go, Left, Stop
> 11110... => 0.75 [<3.125%] Go, Right, Go, Right, Stop
> 1110100... => 0.125
> 1110110... => 0.375
> 1111100... => 0.625
> 1111110... => 0.875
One way to quickly generate a random binary number is by looking at the decimal digits in math.random() and replace 0-4 with '1' and 5-9 with '1':
0.8430419054348022
becomes
1000001010001011
which becomes -0.5
0.5513009827118367
becomes
1100001101001011
which becomes 0.25
etc
Haven't done much lua programming, but in Javascript you can do:
Math.random().toString().substring(2).split("").map(
function(digit) { return digit >= "5" ? 1 : 0 }
);
or true binary expansion:
Math.random().toString(2).substring(2)
Not sure which is more genuinely "random" -- you'll need to test it.
You could generate surreal numbers in this way, but most of the results will be decimals in the form a/2^b, with relatively few integers. On Day 3, only 2 integers are produced (-3 and 3) vs. 6 decimals, on Day 4 it is 2 vs. 14, and on Day n it is 2 vs (2^n-2).
If you add two uniform random numbers from math.random(), you get a new distribution which has a "triangle" like distribution (linearly decreasing from the center). Adding 3 or more will get a more 'bell curve' like distribution centered around 0:
math.random() + math.random() + math.random() - 1.5
Dividing by a random number will get a truly wild number:
A/(math.random()+1e-300)
This will return an results between A and (theoretically) A*1e+300,
though my tests show that 50% of the time the results are between A and 2*A
and about 75% of the time between A and 4*A.
Putting them together, we get:
round(6*(math.random()+math.random()+math.random() - 1.5)/(math.random()+1e-300))
This has over 70% of the number returned between -9 and 9 with a few big numbers popping up rarely.
Note that the average and sum of this distribution will tend to diverge towards a large negative or positive number, because the more times you run it, the more likely it is for a small number in the denominator to cause the number to "blow up" to a large number such as 147,967 or -194,137.
See gist for sample code.
Josh
You can immediately calculate the nth born surreal number.
Example, the 1000th Surreal number is:
convert to binary:
1000 dec = 1111101000 bin
1's become pluses and 0's minuses:
1111101000
+++++-+---
The first '1' bit is 0 value, the next set of similar numbers is +1 (for 1's) or -1 (for 0's), then the value is 1/2, 1/4, 1/8, etc for each subsequent bit.
1 1 1 1 1 0 1 0 0 0
+ + + + + - + - - -
0 1 1 1 1 h h h h h
+0+1+1+1+1-1/2+1/4-1/8-1/16-1/32
= 3+17/32
= 113/32
= 3.53125
The binary length in bits of this representation is equal to the day on which that number was born.
Left and right numbers of a surreal number are the binary representation with its tail stripped back to the last 0 or 1 respectively.
Surreal numbers have an even distribution between -1 and 1 where half of the numbers created to a particular day will exist. 1/4 of the numbers exists evenly distributed between -2 to -1 and 1 to 2 and so on. The max range will be negative to positive integers matching the number of days you provide. The numbers go to infinity slowly because each day only adds one to the negative and positive ranges and days contain twice as many numbers as the last.
Edit:
A good name for this bit representation is "sinary"
Negative numbers are transpositions. ex:
100010101001101s -> negative number (always start 10...)
111101010110010s -> positive number (always start 01...)
and we notice that all bits flip accept the first one which is a transposition.
Nan is => 0s (since all other numbers start with 1), which makes it ideal for representation in bit registers in a computer since leading zeros are required (we don't make ternary computer anymore... too bad)
All Conway surreal algebra can be done on these number without needing to convert to binary or decimal.
The sinary format can be seem as a one plus a simple one's counter with a 2's complement decimal representation attached.
Here is an incomplete report on finary (similar to sinary): https://github.com/peawormsworth/tools/blob/master/finary/Fine%20binary.ipynb
I am trying to decipher some assembly code that involves multiple left rotations on an 8-bit binary number.
For reference, the code is:
lab: rol dl,1
rol dl,1
dec ecx
jnz lab
The dec and jnz isn't an issue, but is there to show that the 2 rols are executed several times.
What I am trying to do is figure out a mathematical equivalent of this code, such as a formula. I'm certainly not looking for a complete formula to tell me the whole code, but I would like to know if there is a formula that gives the equivalent (in denary) of a single left rotation.
I've tried figuring this out with a couple of different numbers, but cannot see a link between the two results. For example: if the start number is 115 it comes out as 220, but if the start number is 99 it comes out as 216.
Given your sample results, I assume we are treating the 8-bit quantity as unsigned.
The 7 low-order bits are shifted left, multiplying that part of the number by 2; and the high-order bit is swapped around to the beginning.
Thus, (x % 128) * 2 + (x / 128), using the usual integer div/mod operators.
Shifting a byte containing number X by one bit (position) left is equal to multiplying the number X by 2:
x << 1 <==> x = x * 2
You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator which produces 1 with a probability 0.5 and 0 with a probability 0.5.
Note: this problem is an exercise problem from Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein.(clrs)
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
In code this is done as easy as:
int UnbiasedRandom()
{
int x, y;
do
{
x = BiasedRandom();
y = BiasedRandom();
} while (x == y);
return x;
}
The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:
Toss the coin twice.
If the results match, start over, forgetting both results.
If the results differ, use the first result, forgetting the second.
The reason this algorithm works is because the probability of getting HT is p(1-p), which is the same as getting TH (1-p)p. Thus two events are equally likely.
I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is z = 2*p*(1-p), therefore the expected running time is 1/z.
The previous example looks encouraging (after all, if you have a biased coin with a bias of p=0.99, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.
Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.
You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.
And in the end - python code:
import random
def biased(p):
# create a biased coin
return 1 if random.random() < p else 0
def unbiased_from_biased(p):
n1, n2 = biased(p), biased(p)
while n1 == n2:
n1, n2 = biased(p), biased(p)
return n1
p = random.random()
print p
tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2
It is pretty self-explanatory, and here is an example result:
0.0973181652114
505 495
As you see, nonetheless we had a bias of 0.097, we got approximately the same number of 1 and 0
The trick attributed to von Neumann of getting two bits at a time, having 01 correspond to 0 and 10 to 1, and repeating for 00 or 11 has already come up. The expected value of bits you need to extract to get a single bit using this method is 1/p(1-p), which can get quite large if p is especially small or large, so it is worthwhile to ask whether the method can be improved, especially since it is evident that it throws away a lot of information (all 00 and 11 cases).
Googling for "von neumann trick biased" produced this paper that develops a better solution for the problem. The idea is that you still take bits two at a time, but if the first two attempts produce only 00s and 11s, you treat a pair of 0s as a single 0 and a pair of 1s as a single 1, and apply von Neumann's trick to these pairs. And if that doesn't work either, keep combining similarly at this level of pairs, and so on.
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
You need to draw pairs of values from the RNG until you get a sequence of different values, i.e. zero followed by one or one followed by zero. You then take the first value (or last, doesn't matter) of that sequence. (i.e. Repeat as long as the pair drawn is either two zeros or two ones)
The math behind this is simple: a 0 then 1 sequence has the very same probability as a 1 then zero sequence. By always taking the first (or the last) element of this sequence as the output of your new RNG, we get an even chance to get a zero or a one.
Besides the von Neumann procedure given in other answers, there is a whole family of techniques, called randomness extraction (also known as debiasing, deskewing, or whitening), that serve to produce unbiased random bits from random numbers of unknown bias. They include Peres's (1992) iterated von Neumann procedure, as well as an "extractor tree" by Zhou and Bruck (2012). Both methods (and several others) are asymptotically optimal, that is, their efficiency (in terms of output bits per input) approaches the optimal limit as the number of inputs gets large (Pae 2018).
For example, the Peres extractor takes a list of bits (zeros and ones with the same bias) as input and is described as follows:
Create two empty lists named U and V. Then, while two or more bits remain in the input:
If the next two bits are 0/0, append 0 to U and 0 to V.
Otherwise, if those bits are 0/1, append 1 to U, then write a 0.
Otherwise, if those bits are 1/0, append 1 to U, then write a 1.
Otherwise, if those bits are 1/1, append 0 to U and 1 to V.
Run this algorithm recursively, reading from the bits placed in U.
Run this algorithm recursively, reading from the bits placed in V.
This is not to mention procedures that produce unbiased random bits from biased dice or other biased random numbers (not just biased bits); see, e.g., Camion (1974).
I discuss more on randomness extractors in a note on randomness extraction.
REFERENCES:
Peres, Y., "Iterating von Neumann's procedure for extracting random bits", Annals of Statistics 1992,20,1, p. 590-597.
Zhou, H. And Bruck, J., "Streaming algorithms for optimal generation of random bits", arXiv:1209.0730 [cs.IT], 2012.
S. Pae, "Binarization Trees and Random Number Generation", arXiv:1602.06058v2 [cs.DS].
Camion, Paul, "Unbiased die rolling with a biased die", North Carolina State University. Dept. Of Statistics, 1974.
Here's one way, probably not the most efficient. Chew through a bunch of random numbers until you get a sequence of the form [0..., 1, 0..., 1] (where 0... is one or more 0s). Count the number of 0s. If the first sequence is longer, generate a 0, if the second sequence is longer, generate a 1. (If they're the same, try again.)
This is like what HotBits does to generate random numbers from radioactive particle decay:
Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again
HotBits: How It Works
I'm just explaining the already proposed solutions with some running proof. This solution will be unbiased, no matter how many times we change the probability. In a head n tail toss, the exclusivity of consecutive head n tail or tail n head is always unbiased.
import random
def biased_toss(probability):
if random.random() > probability:
return 1
else:
return 0
def unbiased_toss(probability):
x = biased_toss(probability)
y = biased_toss(probability)
while x == y:
x = biased_toss(probability)
y = biased_toss(probability)
else:
return x
# results with contain counts of heads '0' and tails '1'
results = {'0':0, '1':0}
for i in range(1000):
# on every call we are changing the probability
p = random.random()
results[str(unbiased_toss(p))] += 1
# it still return unbiased result
print(results)