Where in this method are numbers checked for primality? - ruby

Here is a method I found to find the highest prime factor of a number.
Yet there is dark mystery within - including something I once read was forbidden - changing the condition of a loop within the loop.
def factorize(orig) # 600851475143
factors = Hash.new(0)
n = orig
i = 2
sqi = 4
while sqi <= n do
if n % i == 0
n /= i
factors[i] = true
puts "Found factor #{i}"
end
i += 1
sqi = i**2
puts "sqi is #{sqi}"
end
if (n != 1) && (n != orig)
factors[n] = true
end
p factors
end
puts factorize(600851475143).keys.max
So I see (sort of) how the factors are found.
But where in these lines is the factor checked to make sure it it is prime?
What mathematical insight am I missing?
Thanks

Your method is slightly wrong (just slightly). It should look like this:
def factorize(orig)
factors = Hash.new(0)
n, i, sqi = orig, 2, 4
while sqi <= n do
if n % i == 0
n /= i
factors[i] = true
puts "Found factor #{i}"
else
sqi += 2 * i + 1
i += 1
end
puts "sqi is #{sqi}"
end
if (n != 1) && (n != orig)
factors[n] = true
end
p factors
end
The difference here is that now, I only increase i (and sqi) when i is not a factor of n. This is because, like the example of 16 that was highlighted earlier, a number can have multiple instances of any one prime factor, so we should keep checking a number until it is no longer a factor.
Now this method does guarantee primality, because it always finds the smallest factor of the number (conversely, it only increases the factor it's checking, if it's no longer a factor, which is saying the same thing). And of course the smallest factor of a number must be prime:
Proof By Contradiction The smallest factor of a number is prime.
Suppose the smallest factor, f of a number N is not prime.
Then f has itself, has factors x and y where 1 < x, y < f holds true.
As a result, x and y must also be factors of N and, x, y are both less than f!
This is a contradiction, because we said f was the smallest factor of N.
So our original assumption about f is false, and f must be prime.
I got to this result by inspecting the invariant of the loop, which I will add to this answer in due course.
EDIT: Notes on Invariants
An invariant of a loop, is a predicate condition that remains true, before, during and after the running of the loop, and we can use it to prove that a loop is providing us with the answer we want.
In the case of our loop, there is a simple invariant we must keep track of, which is sqi = i**2 which simply states that sqi must always hold the value of the square of i. This invariant exists to save us recalculating the square every time to compare it with n. (Which by the way, is why I've changed it to incrementing by 2 * i + 1 in my method, otherwise you might as well put i*i in the condition of the loop).
The other part of the invariant is that the factors hash (which mathematically I will treat as a set of numbers) is the set of factors of the number k such that n * k = orig.
The final, and most important part of the invariant is that i <= f where f is the smallest factor of n. (This means that n % i = 0 only when i = f, which means that the loop always finds the smallest factor of n, which is a prime factor of n).
Writing the invariant is only half the battle, we also need to prove that our method always follows it:
The first part of the invariant is simple, because we see whenever we update i we update sqi correctly, and they begin as 2 and 4 respectively.
The second part, similarly is pretty simple, because we only add i as a factor when
n % i == 0 is true, and at the same time, we divide n by i, so as to ensure that the factor added to k is removed from n.
Now let's look at the part of the invariant that's crucial to ensuring the list only contains prime factors. Well, to begin with i = 2 which is the smallest factor of any
number (not including 1 due to its awkwardness when it comes to primality). Then we need to be certain that we increment i as late as possible. I.e. when we are sure that it can no longer be a factor.
Our code only increments i when it is not a factor of n. If the invariant held before, this means that i <= f and i is not a factor, therefore i < f. So the correct behaviour is to increment to get i closer to f.
This logic is enough to suggest that when i is not a factor, we should increment it, but not enough to suggest we shouldn't always increment i, for which we need this next piece of logic: If i is a factor of n, it means i = f, however, it doesn't tell us anything about whether the next smallest factor is strictly greater than f (as we've seen with 16, the next smallest factor could be equal to the previous). So this means we shouldn't increment i if it is a factor, because doing so may make us miss the next smallest factor.
I hope this bit convinces you about the correctness of the program. It is also possible to write factorize with a nested while loop, which I feel might be a little bit simpler to reason about, but they both work basically identically.

if n % i == 0 checks if n is divisible by i. If it is then it sets factors[i] = true, if a number has no factors (apart from itself and one); then it is prime.

The factor is not checked to make sure it's prime. That's why it breaks (giving non-prime factor) for orig==32:
Found factor 2
sqi is 9
sqi is 16
Found factor 4
sqi is 25
{2=>true, 4=>true}
4
It could be fixed (while retaining the same logic, i.e. without major rewrites) by replacing if n % i == 0 with while n % i == 0 do (that is, divide n by i while it's possible): then by the time we reach a composite i, all its prime factors would be already "factored out" during prior iterations.

Related

How do I find the space and time complexities for this code

fun root(n) =
if n>0 then
let
val x = root(n div 4);
in
if (2*x+1)*(2*x+1) > n then 2*x
else 2*x+1
end
else 0;
fun isPrime(n,c) =
if c<=root(n) then
if n mod c = 0 then false
else isPrime(n,c+1)
else true;
The time complexity for the root(n) function here is O(log(n)): the number is getting divided by 4 at every step and the code in the function itself is O(1). The time complexity for the isPrime function is o(sqrt(n)) as it runs iteratively from 1 to sqrt(n). The issue I face now is what would be the order of both functions together? Would it just be O(sqrt(n)) or would it be O(sqrt(n)*log(n)) or something else altogether?
I'm new to big O notation in general, I have gone through multiple websites and youtube videos trying to understand the concept but I can't seem to calculate it with any confidence... If you guys could point me towards a few resources to help me practice calculating, it would be a great help.
root(n) is O(log₄(n)), yes.
isPrime(n,c) is O((√n - c) · log₄(n)):
You recompute root(n) in every step even though it never changes, causing the "... · log₄(n)".
You iterate c from some value up to root(n); while it is upwards bounded by root(n), it is not downards bounded: c could start at 0, or at an arbitrarily large negative number, or at a positive number less than or equal to √n, or at a number greater than √n. If you assume that c starts at 0, then isPrime(n,c) is O(√n · log₄(n)).
You probably want to prove this using either induction or by reference to the Master Theorem. You may want to simplify isPrime so that it does not take c as an argument in its outer signature, and so that it does not recompute root(n) unnecessarily on every iteration.
For example:
fun isPrime n =
let
val sq = root n
fun check c = c > sq orelse (n mod c <> 0 andalso check (c + 1))
in
check 2
end
This isPrime(n) is O(√n + log₄(n)), or just O(√n) if we omit lower-order terms.
First it computes root n once at O(log₄(n)).
Then it loops from 0 up to root n once at O(√n).
Note that neither of us have proven anything formally at this point.
(Edit: Changed check (n, 0) to check (n, 2), since duh.)
(Edit: Removed n as argument from check since it never varies.)
(Edit: As you point out, Aryan, looping from 2 to root n is indeed O(√n) even though computing root n takes only O(log₄(n))!)

Is this shortcut for modulo by a constant (C) valid? IF (A mod 2^n) > C: { -C}

Looking to do modulo operator, A mod K where...
K is a uint32_t constant, is not a power of two, and I will be using it over and over again.
A is a uint32_t variable, possibly as much as ~2^13 times larger than K.
The ISA does not have single cycle modulo or division instructions. (8-bit micro)
The naive approach seems to coincide with the naive approach to division; repeat subtraction until underflow, then keep the remainder. This would obviously have fairly bad worst case performance, but would work for any A and K.
A known fast approach which works well for a K that is some power of two, is to logical AND with that power of two -1.
From Wikipedia...
A % 2^n == A & (2^n - 1)
My knee jerk reaction is to use these two things together, and I'm wondering if that is valid?
Specifically, I figure I can use the power of two mod trick to narrow the worst case for the above subtraction method. In other words, quickly mod to the nearest power of two above my constant, then subtract my constant if necessary. Here's the code that is in the actual question, fully expanded.
A = A AND (2^n - 1) # MOD A to the next higher power of two
if A > K: # See if we are still larger than our constant
A -= K # If so, subtract. We now must be lower.
##################
# A = A MOD K ???
##################
On inspection, this should always work, and should always be fast, since the next power of two greater than K should always be such that 2K will be larger. That is, K < 2^n < 2K meaning I should only ever need one extra test, and then possibly one subtraction.
...but this seems too simple. If it worked, I'd expect to have seen it before. But I can't find an example. I can't find a counter example either though. I have checked the usual places. What am I missing?
You can't combine both the approaches. First understand why does the below equation holds true.
A % p == A & (p - 1), where p = 2^n
p will have exactly 1 set bit in it's binary representation, say it's position is x.
So all the numbers which have atleast one set bit in a position greater than x, are all divisible by p, that is why performing AND with p-1 would give all set bits less than 2^x, which is same as performing mod
But that isn't the case when p is not a power of 2.
If that didn't made sense, then take for example:
A = 18 = 10010,
K = 6 = 110,
A % K = 0
According to your approach, you will perform AND operation with A and 7 (= 2^3-1), resulting in 2, which is not the value of MOD.

Given a number N and an array A. Check if N can be expressed as a product of one or more array elements

Given a number N (where N <= 10^18) and an array A(consisting of at most 20 elements). I have to tell if it is possible to form N by multiplying some elements of the array. Note that, I can use any element multiple times.
Example: N = 8 and A = {2, 3}. Here, 8 = 2 * 2 * 2. So the answer is YES. But if N = 15, then I can't form 15 as a product of one or more elements using them any number of times. So in this case the answer is NO.
How can I approach this problem?
Simple pseudocode:
A_divisors = set()
for x in A:
if num % x == 0:
A_divisors.add(x)
candidates = A_divisors.clone()
seen = set()
while(candidates.size()):
size = divisors.size()
new_candidates = set()
for y in candidates:
for x in A_divisors:
if num % (x * y) == 0 and (x * y) not in seen:
new_candidates.add(x * y)
seen.add(x * y)
if x * y == num:
return true
candidates = new_candidates
return false
Complexity: O(|A| * k * log k), with k being amount of divisors. The log k would be the cost of adding and checking if element is present in the set. With a hash based approach it would be O(1) and can be removed. I am also assuming %, * operations to be O(1).
Since you show no code or algorithm, I'll just give one idea. If you want more help, please show more of your own work on the problem.
Note that N can be at most 60 bits long. This is small enough that N could be decomposed into its prime factors pretty quickly. So first work up a good factoring algorithm for numbers of that size.
Your algorithm would factor N and each of the elements in your array A. If there is any prime factor of N that does not divide into any element of A then your answer is NO. This is the case in your example of N = 15.
Now you work with the prime factors and their exponents in N and in the elements of A. Now you want to find a subset (or, more properly, a sub-multiset) of A where the exponents for each prime add up to that in N. This greatly reduces the sizes of your numbers thus makes the problem easier.
That last part is not trivial. Work more on this problem and show us some of your work, then we can continue helping you.
You can follow below approach:
Form 2 queues: Q2 and Q3.
Add 2 in Q2 and 3 in Q3.
Get the minimum of the head of both queues, lets say h. Remove h from the corresponding queue. Check if it is equal to the number N. If yes, return true. If it is greater than N, return false.
If it is less than N, then add 2*h in Q2 and 3*h in Q3. Repeat steps 3 to 4.
Please note that when the minimum h comes from Q3, you need not to add 2*h into Q2. That is because you already have added that element in Q3 before. (I will leave it for you to deduce}. Keep on doing this procedure until your h is greater than N.
If you have more such numbers, you can form there queues as well. I think this is an optimal solution in case you have more numbers to process.
Can you guess the time and space complexity of this?

Sample number with equal probability which is not part of a set

I have a number n and a set of numbers S ∈ [1..n]* with size s (which is substantially smaller than n). I want to sample a number k ∈ [1..n] with equal probability, but the number is not allowed to be in the set S.
I am trying to solve the problem in at worst O(log n + s). I am not sure whether it's possible.
A naive approach is creating an array of numbers from 1 to n excluding all numbers in S and then pick one array element. This will run in O(n) and is not an option.
Another approach may be just generating random numbers ∈[1..n] and rejecting them if they are contained in S. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if s is substantially smaller than n.
Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.
Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.
E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.
If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.
If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).
Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.
This is O(s) time, O(s) space.
This is an O(1) solution with O(s) initial setup that works by mapping each non-allowed number > s to an allowed number <= s.
Let S be the set of non-allowed values, S(i), where i = [1 .. s] and s = |S|.
Here's a two part algorithm. The first part constructs a hash table based only on S in O(s) time, the second part finds the random value k ∈ {1..n}, k ∉ S in O(1) time, assuming we can generate a uniform random number in a contiguous range in constant time. The hash table can be reused for new random values and also for new n (assuming S ⊂ { 1 .. n } still holds of course).
To construct the hash, H. First set j = 1. Then iterate over S(i), the elements of S. They do not need to be sorted. If S(i) > s, add the key-value pair (S(i), j) to the hash table, unless j ∈ S, in which case increment j until it is not. Finally, increment j.
To find a random value k, first generate a uniform random value in the range s + 1 to n, inclusive. If k is a key in H, then k = H(k). I.e., we do at most one hash lookup to insure k is not in S.
Python code to generate the hash:
def substitute(S):
H = dict()
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
H[s] = j
j += 1
return H
For the actual implementation to be O(s), one might need to convert S into something like a frozenset to insure the test for membership is O(1) and also move the len(S) loop invariant out of the loop. Assuming the j in S test and the insertion into the hash (H[s] = j) are constant time, this should have complexity O(s).
The generation of a random value is simply:
def myrand(n, s, H):
k = random.randint(s + 1, n)
return (H[k] if k in H else k)
If one is only interested in a single random value per S, then the algorithm can be optimized to improve the common case, while the worst case remains the same. This still requires S be in a hash table that allows for a constant time "element of" test.
def rand_not_in(n, S):
k = random.randint(len(S) + 1, n);
if k not in S: return k
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
if s == k: return j
j += 1
Optimizations are: Only generate the mapping if the random value is in S. Don't save the mapping to a hash table. Short-circuit the mapping generation when the random value is found.
Actually, the rejection method seems like the practical approach.
Generate a number in 1...n and check whether it is forbidden; regenerate until the generated number is not forbidden.
The probability of a single rejection is p = s/n.
Thus the expected number of random number generations is 1 + p + p^2 + p^3 + ... which is 1/(1-p), which in turn is equal to n/(n-s).
Now, if s is much less than n, or even more up to s = n/2, this expected number is at most 2.
It would take s almost equal to n to make it infeasible in practice.
Multiply the expected time by log s if you use a tree-set to check whether the number is in the set, or by just 1 (expected value again) if it is a hash-set. So the average time is O(1) or O(log s) depending on the set implementation. There is also O(s) memory for storing the set, but unless the set is given in some special way, implicitly and concisely, I don't see how it can be avoided.
(Edit: As per comments, you do this only once for a given set.
If, additionally, we are out of luck, and the set is given as a plain array or list, not some fancier data structure, we get O(s) expected time with this approach, which still fits into the O(log n + s) requirement.)
If attacks against the unbounded algorithm are a concern (and only if they truly are), the method can include a fall-back algorithm for the cases when a certain fixed number of iterations didn't provide the answer.
Similarly to how IntroSort is QuickSort but falls back to HeapSort if the recursion depth gets too high (which is almost certainly a result of an attack resulting in quadratic QuickSort behavior).
Find all numbers that are in a forbidden set and less or equal then n-s. Call it array A.
Find all numbers that are not in a forbidden set and greater then n-s. Call it array B. It may be done in O(s) if set is sorted.
Note that lengths of A and B are equal, and create mapping map[A[i]] = B[i]
Generate number t up to n-s. If there is map[t] return it, otherwise return t
It will work in O(s) insertions to a map + 1 lookup which is either O(s) in average or O(s log s)

Randomized Algorithm

I'm having trouble with a randomized problem. :)
A, a randomized algorithm, determines whether an input x is a prime number.
This algorithm works the following way:
1- If x is prime, then A outputs YES
2- If x is not prime, then A outputs NO with the probability 3/4.
At least how many times should we run A, if we want to have the algorithm A output NO with the probability at least 1- (1/k)?
Note: One NO answer implies that a given input x is not prime.
Any idea?
If a number x is not a prime, the probability to yield 'yes' in n repeats of the algorithm is (1/4)^n = 4^(-n) = 2^(-2n).
So, if you want to achieve 1-(1/k), you are actually looking for False Positive with probability of at most 1/k, and from the above we want:
2^(-2n) <= 1/k //log_2 on both sides:
-2n <= log(1/k) = log(1)-log(k) = 0 - log(k)
2n >= log(k)
n >= log(k)/2
So you want to chose the smallest integer n possible such that n >= log(k)/2, to guarantee True Negative with probability of 1-1/k.
(Note: All log() are with base 2).
Example:
If you want to be correct 99% of the time, you actually are looking for 1-1/k=0.99, so 1/k=1/100 and k=100.
Now, according to the above formula, note that log_2(100) ~= 6.64, and thus the smallest n such that n >= log_2(100)/2 is n==4.
Meaning, you need to repeat the algorithm 4 times to achieve 99%.
Let's check this is correct:
First check that the probability is indeed greater than 99%: 1-(1/4)^4 = 1-(1/256) ~= 0.996 >= 0.99, so the probability is fine.
Check that for a smaller integer (n==3), we would have got worse than 99% correct answer: 1-(1/4)^3 = 1-1/64 ~= 0.984 < 0.99 - so we would have failed for n==3.

Resources