Constructing dodge ball team - algorithm

A coach is trying to construct a dodge ball team. Each player is assigned a student ID, and if one player's ID divides by other player's ID, they fight. So the couch wants to make a team so that no one fights in the team. Given the number N (ID is assigned 1 to N), find out the minimum number K
where the couch is unable to make a team in which no one fights.
input (N): 3
output (K): 2
For example, N = 3,
K = 3, {1,2,3} --> Player 1 and 2 has a fight.
K = 2, {2,3} --> No one fights.
input (N): 4
output (K): 2
N = 4,
K = 4, {1,2,3,4} -> more than a pair of players (1,2), (1,3), etc, fights.
K = 3, {1,2,4}, {2,3,4}, {1,3,4} --> players fights in all teams.
K = 2, {2,3} --> No one fights.
So basically, given N, find out the minimum K that a couch can't make any combination of K players so that no one fights. (This is also the maximum number K'+1 where a couch can find at least one team of K' where no one fights.)
A greedy solution I and my friend came up with is try finding the maximum set from the given N. The optimal set must contain the big numbers because since if we start putting small numbers, 2, 3, ..., all the multipliers of these numbers can't be included. So we can start putting N to N/2 in a set as long as the new number is not a divisor of some number already included in the set. We are not entirely sure if this solution would be correct, so we would love to discuss the correctness of our solution and hear other people's ideas.
I was asked to solve this coding problem during an online coding test but couldn't figure it out how to solve.

The way I answered this was return the number of primes in n + 1.
The k is the minimum number that makes it impossible to have pairs that don't fight, as in, at least one pair of a number that divides the other evenly, yes? Based on that, the "safest" bet is prime numbers (since they can't divide each other). Once you add non-prime numbers, you'll be certain to have a "fighting" pair.
Case 1: all prime numbers in n and the number 1 (trivial).
Case 2: all prime numbers in n and any even number in n (the even number can be divided by 2, which eliminates this option).
Case 3: all prime numbers and an odd number (any composite -non prime - odd number can be divided by an odd prime number).
I'm not 100% sure on case 3 regarding mathematical proof, but it seems to be the case.
Disclaimer: I haven't yet received feedback from the interview, this could be totally wrong.

Related

Find only two numbers in array that evenly divide each other

Find the only two numbers in an array where one evenly divides the other - that is, where the result of the division operation is a whole number
Input Arrays Output
5 9 2 8 8/2 = 4
9 4 7 3 9/3 = 3
3 8 6 5 6/3 = 2
The brute force approach of having nested loops has time complexity of O(n^2). Is there any better way with less time complexity?
This question is part of advent of code.
Given an array of numbers A, you can identify the denominator by multiplying all the numbers together to give E, then testing each ith element by dividing E by Ai2. If this is a whole number, you have found the denominator, as no other factors can be introduced by multiplication.
Once you have the denominator, it's a simple task to do a second, independent loop searching for the paired numerator.
This eliminates the n2 comparisons.
Why does this work? First, we have an n-2 collection of non-divisors: abcde..
To complete the array, we also have numerator x and denominator y.
However, we know that x and only x has a factor of y, so it can be expressed as yz (z being a whole remainder from the division of x by y)
When we multiply out all the numbers, we end up with xyabcde.., but as x = yz, we can also say y2zabcde..
When we loop through dividing by the squared i'th element from the array, for most of the elements we create a fraction, e.g. for a:
y2zabcde.. / a2 = y2zbcde.. / a
However, for y and y only:
y2zabcde.. / y^2 = zabcde..
Why doesn't this work? The same is true of the other numbers. There's no guarantee that a and b can't produce another common factor when multiplied. Take the example of [9, 8, 6, 4], 9 and 8 multiplied equals 72, but as they both include prime factors 2 and 3, 72 has a factor of 6, also in the array. When we multiply it all out to 1728, those combine with the original 6 so that it can divide soundly by 36.
How might this be fixed? More accurately, if y is a factor of x, then y's prime factors will uniquely be a subset of x's prime factors, so maybe things can be refined along those lines. Obtaining a prime factorization should not scale according to the size of the array, but comparing subsets would, so it's not clear to me if this is at all useful.
I think that O(n^2) is the best time complexity you can get without any assumptions on the data.
If you can't tell anything about the numbers, knowing that x and y do not divide each other tells you nothing about x and z or y and z for any x, y, z. Therefore, in the worst case you must check all pairs of numbers - equal to n Choose 2 = n*(n-1)/2 = O(n^2).
Clearly, we can get O(n * sqrt(m)), where m is the absolute value range, by listing the pairs of divisors of each element against a hash of unique values in the array. This can be more efficient than O(n^2) depending on the input.
5 9 2 8
list divisor pairs (at most sqrt m iterations per element m)
5 (1,5)
9 (1,9), (3,3)
2 (1,2)
8 (1,8), (2,4) BINGO!
If you prime factorise all the numbers in the array progressively into a tree, when we discover a completely factored number leaf while factoring another number, we know we've found the divisor.
However, given we don't know which number is the divisor, we do need to test all primes up to divisor's largest factor. The largest factor for any m-digit number is, at most, sqrt(m), while the average number of primes below any m-digit number is m / ln(m). This means we will make at most n (sqrt(m) / ln(sqrt(m)) operations with very basic factorization and no optimization.
To be a little more specific, the algorithm should keep track of four things: a common tree of explored prime factors, the original number from the array, its current partial factorization, and its position in the tree.
For each prime number, we should test all numbers in the array (repeatedly to account for repeated factors). If the number divides evenly, we a) update the partial factorization, b) add/navigate to the corresponding child to the tree, c) if the partial factorization is 1, we have found the last factor and can indicate a leaf by adding the terminating '1' child, and d) if not, we can check for other numbers having left a child '1' to indicate they are completely factored.
When we find a child '1', we can identify the other number by multiplying out the partial factorization (e.g. all the parents up the tree) and exit.
For further optimization, we can cache the factorization (both partial and full) of numbers. We can also stop checking further factors of numbers that have a unique factor, narrowing the field of candidates over time.

count numbers between L and R which have atleast one prime factor between 1 to 50

Given L and R very large numbers (10^18) , how do i find count of numbers between L and R such that numbers have atleast one prime factors from 1 to N.
Note : N can be at MAX 50
I will just sketch a method, not working it out in detail.
If R-L is very small it is probably best to try it out one by one.
Otherwise use the inclusion exclusion principle: For explanation reasons I just consider the primes 2,3, and 5. Determine how many numbers can be divided by 2, 3, 5 (i.e. one of the primes), 6, 10, 15 (i.e. two of the primes), and 30 (i.e. all three of the primes). For a divisor k this is approximately (R-L)/k, taking the border conditions into account, we can get the exact count. Call the respective count c(k).
Now the total count of numbers divisible by at least one prime is:
c(2)+c(3)+c(5)-c(6)-c(10)-c(15)+c(30)

Select K unique random numbers from range with sum equal to S

i have a range
R = {0, ..., N}
and i like to get K elements which have a sum equal to S, but the elements should be selected randomly.
So an easy brute force method would be to determine all element combinations containing K numbers resulting in S and picking one of the combinations by random.
I am trying to think about a recursive solution where a random number is selected and then the problem reduces to find (K-1) random numbers with sum equal to (S - K0) but this need not yield in a solution.
Is there a better approach?
A sample would be:
R = {0,1,2,3,4,5}, S = 5, K = 2
Solutions: randomly pick one of {{1,4};{2,3};{0.5}}
In general, if K is big (then N also), and S not too little, it is unpredictable, because, there are two many combinations.
Brute force: try every combinations. You are sure to find a solution, if there exists one, but if there are more than, say, 1 Md, or somewhat, it it almost impossible to list them all.
Your algorithm:
To choose at random, your algorithm is ok: take one number at random, then another, ...
But you make an assumption: there exists a solution with the numbers you pick: you dont know.
So what ? if statistically there exist many solutions, you could find it like that, perhaps, or perhaps not.
Some trails:
1 Use S/K
If every numbers < S/K, it is impossible.
if every numbers > S/K, it is impossible.
So lets assume that there are numbers < S/K, and other > S/K
2 keep only numbers < S, very interesting if S is little.
3 idea: If S is big, and numbers little, you have chance that there exist many combinations.
idea of algorithm
1 take one number N1 at random
2 if N1 < S/K, take another one N2 > S/K
3 calculate N1+N2: if < 2.S/K take another one N3> S/K, if not
4 iterate at each step: if sum < n S/K take another one > S/K, if not
5 you can have better precision, by replacing S/K by (S-sum N1,N2,...)/(K-n)
If at one step you dont can not find any number, backtrack
hope it helps
I would start with Dirichlet distribution (https://en.wikipedia.org/wiki/Dirichlet_distribution). Using it, you could sample uniformly in (0..1) distributed random numbers Xi, such that SumiXi = 1.
For S <= N, it is easy to see that sampling beyond S is useless and should be rejected outright.
So, combining with acceptance/rejection, something along the lines
Divide interval [0...1] into S (or S+1 if 0 is allowed) equal bins.
Sample K numbers from Dirichlet distribution.
Map sampled numbers to bin index, so you have now sampled integers which are
all below or equal S and have sum equal to S.
If all integers are distinct, accept the sampling, otherwise reject the sampling and go to step 2

What is the probability that all priorities are unique for Permute-By-Sorting algorithm?

I hope someone can help me answer the following question. Thanks!
Here is a pseudo code of Permute-By-Sorting algorithm:
Permute-By-Sorting (A)
n = A.length
let P[1..n] be a new array
for i = 1 to n
P[i] = Random (1,n^3)
sort A, using P as sort keys
In the above algorithm, the array P represents the priorities of the elements in array A. Line 4 chooses a random number between 1 and n^3.
The question is what is the probability that all priorities in P are unique? and how do I get the probability?
To reconcile the answers already given: for choice i = 0, ..., n - 1, given that no duplicates have been chosen yet, there are n^3 - i non-duplicate choices of n^3 total for the ith value. Thus the probability is the product for i = 0, ..., n - 1 of (1 - i/n^3).
sdcwc is using a union bound to lowerbound this probability by 1 - O(1/n). This estimate turns out to be basically right. The proof sketch is that (1 - i/n^3) is exp(-i/n^3 + O(i^2/n^6)), so the product is exp(-O(n^2)/n^3 + O(n^-3)), which is greater than or equal to 1 - O(n^2)/n^3 + O(n^-3) = 1 - O(1/n). I'm sure the fine folks on math.SE would be happy to do this derivation "properly" for you.
Others have given you the probability calculation, but I think you may be asking the wrong question.
I assume the reason you're asking about the probability of the priorities being unique, and the reason for choosing n^3 in the first place, is because you're hoping they will be unique, and choosing a large range relative to n seems to be a reasonable way of achieving uniqueness.
It is much easier to ensure that the values are unique. Simply populate the array of priorities with the numbers 1 .. n and then shuffle them with the Fisher-Yates algorithm (aka algorithm P from The Art of Computer Programming, volume 2, Seminumerical Algorithms, by Donald Knuth).
The sort would then be carried out with known unique priority values.
(There are also other ways of going about getting a random permutation. It is possible to generate the nth lexicographic permutation of a sequence using factoradic numbers (or, the factorial number system), and so generate the permutation for a randomly chosen value in [1 .. n!].)
You are choosing n numbers from 1...n^3 and asking what is the probability that they are all unique.
There are (n^3) P n = (n^3)!/(n^3-n)! ways to choose the n numbers uniquely, and (n^3)^n ways to choose the n-numbers total.
So the probability of the numbers being unique is just the first equation divided by the second, which gives
n3!
--------------
(n3-n)! n3n
Let Aij be the event: i-th and j-th elements collide. Obviously P(Aij)=1/n3.
There is at most n2 pairs, therefore probability of at least one collision is at most 1/n.
If you are interested in exact thing, see BlueRaja's answer, but in randomized algorithms it is usually enough to give this type of bound.
So the sort part is irrelevant
Assuming the "Random" is real random, the probability is just
n^3!
----------------
(n^3-n)!n^(3n)

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.
You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;
If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.
The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)
There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).
With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.
There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.
pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Resources