3 approximation algorithm(randomised or deterministic) - algorithm

Let 3 colorful sets problem. Given M sets of size three over {1...n} elements. We are given in other words sets S1, S2, ... , Sm where, for every i, Si = {x, y, z} for some x, y, z ∈ {1, ... , n}. What I want to find is to pick a set of elements E ⊆ {1, ... , n} so that to maximize the number of sets that contain exactly one element in E, namely, to maximize the |{i |Si ∩ E| = 1}| . A solution could use a 3 approximation polynomial time algo.
I am thinking a randomized algo that guarantee the approximation ration or a deterministic one. I have some ideas but I am not sure how to actually implement it. Any help would be appreciated.

Related

How To Find K-th Smallest Element in Multiset-sum?

Need some help designing an algorithm to solve this problem.
Let a and b be integers with a ≤ b, and let [a,b] denote the set {a, a + 1, a + 2, ..., b}. Suppose we are given n such sets, [a1,b1],...[an,bn], their multiset-sum is
S = {a1, a1 + 1,..., b1, a2,a2 + 1,...,b2,...,an,an + 1, ..., bn}
For example, the multiset-sum of [5,25], [3,10], and [8,12], is
{3,4,5,5,6,6,7,7,8,8,8,9,9,9,10,10,10,...,25}
Given the sets[a1, b1],...,[an, bn] such that 0 ≤ ai, bi ≤ N and an integer k > 0, design an efficient algorithm that outputs the k smallest element in S, the multiset-sum of the sets. Determine the running time of the algorithm in terms of n and N.
I've already designed two helper algorithms called FindElementsBefore(x, [a1,b1]...[an,bn]) and FindElementsAfter(x, [a1,b1]...[an,bn]). These both accept an element x and each of the sets and return the number of elements in S less than x and greater than x respectively.
I've been told by my professor that using these two helper methods, I should be able to solve the above problem, but I am absolutely stumped. How do I solve this?
Use a binary search.
You already know the largest and smallest values in your multiset-sum. Thus, you have an upper and lower bound for the k-th smallest element. Now you can simply recurse on the upper and lower bounds, depending on the value of FindElementsBefore(mid, ...) <= k.

Regarding subsequences - CLRS

I am reading chapter 15 from CLRS and came across this definition of a subsequence:
A subsequence of a given sequence is just the given sequence with zero
or more elements left out.
Later it is said that:
Each subsequence of X corresponds to a subset of the indices {1, 2,
3...m} of X. Because X has 2^m subsequences...
I don't see how X can have 2^m subsequences. From what I understand, if
X = {A, B}, then the subsequences of X can be {A}, {B} and {A, B} so we have 3 subsequences and not 2^2. Could someone please show me what I am missing here?
There is one empty subset.
For any set S the power set of S is the number of subsets possible which is 2^|S| where |s| is the cardinality of the set, i.e number of elements in the set.
In your case the sequence is nothing but a set and the number of subsequence possible is equivalent to the power set of the sequence.
In your example X = {A, B} the possible sub sequences are {empty, A, B ,AB}

maximum ratio of a min subset and a max subset of size k in a collection of n value pairs

So, say you have a collection of value pairs on the form {x, y}, say {1, 2}, {1, 3} & {2, 5}.
Then you have to find a subset of k pairs (in this case, say k = 2), such that the ratio of the sum of all x in the subset divided by all the y in the subset is as high as possible.
Could you point me in the direction for relevant theory or algorithms?
It's kind of like maximum subset sum, but since the pairs are "bound" to each other it introduces a restriction that changes it from problems known to me.
Initially I thought that a simple greedy approach could work here, but commentators pointed out some counter examples.
Instead I think a bisection approach should work.
Suppose we want to know whether it is possible to achieve a ratio of g.
We need to add a selection of k vectors to end up above a line of gradient g.
If we project each vector perpendicular to this line to get values p1,p2,p3, then the final vector will be above the line if and only if the sum of the p values is positive.
Now, with the projected values it does seem right that the optimal solution is to choose the largest k.
We can then use bisection to find the highest ratio that is achievable.
Mathematical justification
Suppose we want to have the ratio above g, i.e.
(x1+x2+x3)/(y1+y2+y3) >= g
=> (x1+x2+x3) >= g(y1+y2+y3)
=> (x1-g.y1) + (x2-g.y2) + (x3-g.y3) >= 0
=> p1 + p2 + p3 >= 0
where pi is defined to be xi-g.yi.

Select k numbers maximizing sum of pairwise xor

Given a range [l, r] (where l < r), and a number k (where k <= r - l), I want to select a set S of k distinct numbers in [l, r] which maximizes the sum of pairwise xors. For example, if [l, r] = [2, 10] and k = 3 and we choose S = {4, 5, 6}, the sum of xors is d(4, 5) + d(4, 6) + d(5, 6) = 1 + 1 + 2 = 4.
Here's my thinking so far: in [l, r], for each bit index i less than or equal to the index of the highest set bit in r, the number of elements in S ^ S with the ith bit set is equal to j * (k-j), where j is the count of the elements in S with the ith bit set. To optimize this we want to select S such that, for each bit i, S contains k/2 elements with the ith bit set. This is easy for k = 2, but I'm stuck on generalizing this for k > 2.
At a first glance it seems that there is no algebraic solution for this problem. I mean, this seems like a NP-hard problem (a optimizational problem) that is not solvable in polynomial time.
As almost always possible one can brute force through the feasible space.
Intuitively, I can suggest to look into Locality Sensitive Hashing. In LSH one normally tries to find similarities between two sets. But in you case, you can abuse this algorithm in the following sense.
The domain is subdivided into few buckets.
You sample randomly points in the space [l,r].
High probable points (large Hamming distance) are placed in the buckets.
In the end you brute force in the most probable bucket.
In the end one can expect that points with large Hamming distances should be in the same neighborhood (that's why the name Locality Sensitive Hashing). However, it is just an idea.

Find sum in array equal to zero

Given an array of integers, find a set of at least one integer which sums to 0.
For example, given [-1, 8, 6, 7, 2, 1, -2, -5], the algorithm may output [-1, 6, 2, -2, -5] because this is a subset of the input array, which sums to 0.
The solution must run in polynomial time.
You'll have a hard time doing this in polynomial time, as the problem is known as the Subset sum problem, and is known to be NP-complete.
If you do find a polynomial solution, though, you'll have solved the "P = NP?" problem, which will make you quite rich.
The closest you get to a known polynomial solution is an approximation, such as the one listed on Wikipedia, which will try to get you an answer with a sum close to, but not necessarily equal to, 0.
This is a Subset sum problem, It's NP-Compelete but there is pseudo polynomial time algorithm for it. see wiki.
The problem can be solved in polynomial if the sum of items in set is polynomially related to number of items, from wiki:
The problem can be solved as follows
using dynamic programming. Suppose the
sequence is
x1, ..., xn
and we wish to determine if there is a
nonempty subset which sums to 0. Let N
be the sum of the negative values and
P the sum of the positive values.
Define the boolean-valued function
Q(i,s) to be the value (true or false)
of
"there is a nonempty subset of x1, ..., xi which sums to s".
Thus, the solution to the problem is
the value of Q(n,0).
Clearly, Q(i,s) = false if s < N or s
P so these values do not need to be stored or computed. Create an array to
hold the values Q(i,s) for 1 ≤ i ≤ n
and N ≤ s ≤ P.
The array can now be filled in using a
simple recursion. Initially, for N ≤ s
≤ P, set
Q(1,s) := (x1 = s).
Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi = s) or Q(i − 1,s − xi) for N ≤ s ≤ P.
For each assignment, the values of Q
on the right side are already known,
either because they were stored in the
table for the previous value of i or
because Q(i − 1,s − xi) = false if s −
xi < N or s − xi > P. Therefore, the
total number of arithmetic operations
is O(n(P − N)). For example, if all
the values are O(nk) for some k, then
the time required is O(nk+2).
This algorithm is easily modified to
return the subset with sum 0 if there
is one.
This solution does not count as
polynomial time in complexity theory
because P − N is not polynomial in the
size of the problem, which is the
number of bits used to represent it.
This algorithm is polynomial in the
values of N and P, which are
exponential in their numbers of bits.
A more general problem asks for a
subset summing to a specified value
(not necessarily 0). It can be solved
by a simple modification of the
algorithm above. For the case that
each xi is positive and bounded by the
same constant, Pisinger found a linear
time algorithm.[2]
It is well known Subset sum problem which NP-complete problem.
If you are interested in algorithms then most probably you are math enthusiast that I advise you look at
Subset Sum problem in mathworld
and here you can find the algorithm for it
Polynomial time approximation algorithm
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi+y,
for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in
increasing order do //trim the list by
eliminating numbers
close one to another
if y<(1-c/N)z, set y=z and add z to S
if S contains a number between (1-c)s and s, output yes, otherwise no

Resources