The usual method to generate a uniform random number 0..n using coin flips is to build a rng for the smallest power of two greater than n in the obvious way, then whenever this algorithm generates a number larger than n-1, throw that number away and try again.
Unfortunately this has worst case runtime of infinity.
Is there any way to solve this problem while guaranteeing termination?
Quote from this answer https://stackoverflow.com/a/137809/261217:
There is no (exactly correct) solution which will run in a constant
amount of time, since 1/7 is an infinite decimal in base 5.
Now ask Adam Rosenfield why it is true :)
Related
You are given an array of positive integers of size N. You can choose any positive number x such that x<=max(Array) and subtract it from all elements of the array greater than and equal to x.
This operation has a cost A[i]-x for A[i]>=x. The total cost for a particular step is the
sum(A[i]-x). A step is only valid if the sum(A[i]-x) is less than or equal to a given number K.
For all the valid steps find the minimum number of steps to make all elements of the array zero.
0<=i<10^5
0<=x<=10^5
0<k<10^5
Can anybody help me with any approach? DP will not work due to high constraints.
Just some general exploratory thoughts.
First, there should be a constraint on N. If N is 3, this is much easier than if it is 100. The naive brute force approach is going to be O(k^N)
Next, you are right that DP will not work with these constraints.
For a greedy approach, I would want to minimize the number of distinct non-zero values, and not maximize how much I took. Our worst case approach is take out the largest each time, for N steps. If you can get 2 pairs of entries to both match, then that shortened our approach.
The obvious thing to try if you can is an A* search. However that requires a LOWER bound (not upper). The best naive lower bound that I can see is ceil(log_2(count_distinct_values)). Unless you're incredibly lucky and the problem can be solved that quickly, this is unlikely to narrow your search enough to be helpful.
I'm curious what trick makes this problem actually doable.
I do have an idea. But it is going to take some thought to make it work. Naively we want to take each choice for x and explore the paths that way. And this is a problem because there are 10^5 choices for x. After 2 choices we have a problem, and after 3 we are definitely not going to be able to do it.
BUT instead consider the possible orders of the array elements (with ties both possible and encouraged) and the resulting inequalities on the range of choices that could have been made. And now instead of having to store a 10^5 choices of x we only need store the distinct orderings we get, and what inequalities there are on the range of choices that get us there. As long as N < 10, the number of weak orderings is something that we can deal with if we're clever.
It would take a bunch of work to flesh out this idea though.
I may be totally wrong, and if so, please tell me and I'm going to delete my thoughts: maybe there is an opportunity if we translate the problem into another form?
You are given an array A of positive integers of size N.
Calculate the histogram H of this array.
The highest populated slot of this histogram has index m ( == max(A)).
Find the shortest sequence of selections of x for:
Select an index x <= m which satisfies sum(H[i]*(i-x)) <= K for i = x+1 .. m (search for suitable x starts from m down)
Add H[x .. m] to H[0 .. m-x]
Set the new m as the highest populated index in H[0 .. x-1] (we ignore everything from H[x] up)
Repeat until m == 0
If there is only a "good" but not optimal solution sought for, I could imagine that some kind of spectral analysis of H could hint towards favorable x selections so that maxima in the histogram pile upon other maxima in the reduction step.
As part of my effort to explore algorithms through project Euler, I'm trying to write a method that will accept an integer 'n', number of factors 'k' and factorize it. If its not possible, it will throw an error.
For instance, if I enter factorize(13257440,3), the function will return a list of all possible unique sets with 3 elements where the product of the 3 elements is equal to 13257440.
My first though is to generate a multi-set of prime factors of n (with 'm' representing the size of the set), then partition the set into k partitions. Once partition sizes are determined, I would treat it as a combinations problem.
I'm having trouble however formulating algorithms for the two parts above, and have no idea where to start. Am I over complicating a simple problem with a simple solution? If not, what are some recommended approaches? Thanks!
primes decomposition
find all primes that can divide n without remainder. Use sieve of Eratosthenes to speed up the process considerably.
You can use/modify mine (warning this link is project Euler spoiler)
get primes up to n in C++
now you need to modify the code so the prime list will change to multiplicants list. For example if n=12 this will found { 2,3 } and you need { 2,2,3 } so if divider prime found check it again and again until it is not divisible anymore each time lessen the n.
Add a flag to each found prime (is used?) to speed up the next step...
The combination part
I assume the multiplicants can be the same so add k times 1 to the primes list at start and create function that create all possibilities of numbers up to some x from found unused primes. Add the counter for unused primes m so at start the m is set to prime list size and the flags are all set to unused.
Now you need to find all possibilities of using 1..m-k+1 numbers from the list. Each iteration set the picked number as used and decrease m so it is something like:
for (o=1;o<=m-k+1;o++)
here find all combination of o unused numbers so handle it as o digit number generation with base o without digit repetitions it is o! permutations.
You can use this (warning this link is Euler spoiler):
permutations in C++
do not forget to set flag for each used number and unset it after iteration is done. Rewrite this function so it is iterative with calls findfirst(), findnext() similar to mine permutation class.
Now you can nest all this k times (with use of nested fors from the permutation link or via recursion each time lessen the k and n)
i want to know if there is a easy way to find the prime number next to X.
For example, if X=2, the next prime will be 3. The algorithm that i have would be ok, if i wanted to know little numbers but i want to calculate like X=3 million.
I found a algorithm to calculate primes, but it takes a lot of time to calculate them, since it calculates all primes from 0 to X... For example for 1 million, it takes almost 2 minutes.
Question is... How can i find the next prime number? Is there an efficient algorithm? Best solution i found is to check if X+1 is prime and increase until one is found...
What you need is to test for primality each number beginning at X. You can find such tests implemented in the GMP library or you can look at the snippet for Miller-Rabin algorithm in Rosetta code.
One possible solution is instead of increasing the number by one, you can increment the number by two if the number is odd else increment by one and then in all future iterations increment by two.
Like in the below code snippet
if (num is odd)
check_prime_of(num+2);
else /*num is even and not equal to 2*/
check_prime_of(num+1);
I can double the speed of "if x+1 is prime" ..... For all x > 2, then x+1 will never be prime, so test x+2 instead :)
Other than that, no. There is no efficient algorithm to find the next prime after X. The "long time to calculate" is what makes public key cryptography (the basis of much of security on the Internet) possible. Public key is based on the difficulty of finding the two prime factors of a given large number; if finding the next prime after X was easy, then you could simply start at the square root of the large number and start counting up.
I need a (fairly) fast way to get the following for my code.
Background: I have to work with powers of numbers and their product, so I decided to use logs.
Now I need a way to convert the log back to an integer.
I can't just take 2^log_val (I'm working with log base 2) because the answer will be too large. In fact i need to give the answer mod M for given M.
I tried doing this. I wrote log_val as p+q, where q is a float, q < 1 and p is an integer.
Now i can calculate 2^p very fast using log n exponentiation along with the modulo, but i can't do anything with the 2^q. What I thought of doing is finding the first integral power of 2, say x, such that 2^(x+q) is very close to an integer, and then calculate 2^p-x.
This is too long for me because in the worst case I'll take O(p) steps.
Is there a better way?
While working with large numbers as logs is usually a good approach, it won't work here. The issue is that working in log space throws away the least significant digits, thus you have lost information, and won't be able to go back. Working in mod space will also throw away information (otherwise your number gets to big, as you say), but it throws away the most significant ones instead.
For your particular problem POWERMUL, what I would do is to calculate the prime factorizations of the numbers from 1 to N. You have to be careful how you do it, since your N is fairly large.
Now, if your number is k with the prime factorization {2: 3, 5: 2} you get the factorization of k^m by {2: m*3, 5:m*2}. Division similarly turns into subtraction.
Once you have the prime factorization representation of f(N)/(f(r)*f(N-r)) you can recreate the integer with a combination of modular multiplication and exponentiation. The later is a cool technique to look up. (In fact languages like python has it built in with pow(3, 16, 7)=4.
Have fun :)
If you need an answer mod N, you can often do each step of your whole calculation mod N. That way, you never exceed your system's integer size restrictions.
If I have an unsorted large set of n integers (say 2^20 of them) and would like to generate subsets with k elements each (where k is small, say 5) in increasing order of their sums, what is the most efficient way to do so?
Why I need to generate these subsets in this fashion is that I would like to find the k-element subset with the smallest sum satisfying a certain condition, and I thus would apply the condition on each of the k-element subsets generated.
Also, what would be the complexity of the algorithm?
There is a similar question here: Algorithm to get every possible subset of a list, in order of their product, without building and sorting the entire list (i.e Generators) about generating subsets in order of their product, but it wouldn't fit my needs due to the extremely large size of the set n
I intend to implement the algorithm in Mathematica, but could do it in C++ or Python too.
If your desired property of the small subsets (call it P) is fairly common, a probabilistic approach may work well:
Sort the n integers (for millions of integers i.e. 10s to 100s of MB of ram, this should not be a problem), and sum the k-1 smallest. Call this total offset.
Generate a random k-subset (say, by sampling k random numbers, mod n) and check it for P-ness.
On a match, note the sum-total of the subset. Subtract offset from this to find an upper bound on the largest element of any k-subset of equivalent sum-total.
Restrict your set of n integers to those less than or equal to this bound.
Repeat (goto 2) until no matches are found within some fixed number of iterations.
Note the initial sort is O(n log n). The binary search implicit in step 4 is O(log n).
Obviously, if P is so rare that random pot-shots are unlikely to get a match, this does you no good.
Even if only 1 in 1000 of the k-sized sets meets your condition, That's still far too many combinations to test. I believe runtime scales with nCk (n choose k), where n is the size of your unsorted list. The answer by Andrew Mao has a link to this value. 10^28/1000 is still 10^25. Even at 1000 tests per second, that's still 10^22 seconds. =10^14 years.
If you are allowed to, I think you need to eliminate duplicate numbers from your large set. Each duplicate you remove will drastically reduce the number of evaluations you need to perform. Sort the list, then kill the dupes.
Also, are you looking for the single best answer here? Who will verify the answer, and how long would that take? I suggest implementing a Genetic Algorithm and running a bunch of instances overnight (for as long as you have the time). This will yield a very good answer, in much less time than the duration of the universe.
Do you mean 20 integers, or 2^20? If it's really 2^20, then you may need to go through a significant amount of (2^20 choose 5) subsets before you find one that satisfies your condition. On a modern 100k MIPS CPU, assuming just 1 instruction can compute a set and evaluate that condition, going through that entire set would still take 3 quadrillion years. So if you even need to go through a fraction of that, it's not going to finish in your lifetime.
Even if the number of integers is smaller, this seems to be a rather brute force way to solve this problem. I conjecture that you may be able to express your condition as a constraint in a mixed integer program, in which case solving the following could be a much faster way to obtain the solution than brute force enumeration. Assuming your integers are w_i, i from 1 to N:
min sum(i) w_i*x_i
x_i binary
sum over x_i = k
subject to (some constraints on w_i*x_i)
If it turns out that the linear programming relaxation of your MIP is tight, then you would be in luck and have a very efficient way to solve the problem, even for 2^20 integers (Example: max-flow/min-cut problem.) Also, you can use the approach of column generation to find a solution since you may have a very large number of values that cannot be solved for at the same time.
If you post a bit more about the constraint you are interested in, I or someone else may be able to propose a more concrete solution for you that doesn't involve brute force enumeration.
Here's an approximate way to do what you're saying.
First, sort the list. Then, consider some length-5 index vector v, corresponding to the positions in the sorted list, where the maximum index is some number m, and some other index vector v', with some max index m' > m. The smallest sum for all such vectors v' is always greater than the smallest sum for all vectors v.
So, here's how you can loop through the elements with approximately increasing sum:
sort arr
for i = 1 to N
for v = 5-element subsets of (1, ..., i)
set = arr{v}
if condition(set) is satisfied
break_loop = true
compute sum(set), keep set if it is the best so far
break if break_loop
Basically, this means that you no longer need to check for 5-element combinations of (1, ..., n+1) if you find a satisfying assignment in (1, ..., n), since any satisfying assignment with max index n+1 will have a greater sum, and you can stop after that set. However, there is no easy way to loop through the 5-combinations of (1, ..., n) while guaranteeing that the sum is always increasing, but at least you can stop checking after you find a satisfying set at some n.
This looks to be a perfect candidate for map-reduce (http://en.wikipedia.org/wiki/MapReduce). If you know of any way of partitioning them smartly so that passing candidates are equally present in each node then you can probably get a great throughput.
Complete sort may not really be needed as the map stage can take care of it. Each node can then verify the condition against the k-tuples and output results into a file that can be aggregated / reduced later.
If you know of the probability of occurrence and don't need all of the results try looking at probabilistic algorithms to converge to an answer.