Maximally set-covering set of k elements - algorithm

Given a universe of elements U={e_1....e_n}, I have a collection of subsets of these elements C={s_1...s_m}. Now given a positive integer k, I want to find a solution of k elements which cover a maximal number of subsets.
A concrete example: I have a collection of songs. each song is composed of notes. if i only know how to play k distinct notes - which k notes would allow me to play the maximal number of songs, and what is this maximal number?
How is this problem called?

Brute force approach:
First find all distinct permutations of size k from n.
Then for every permutation find ,number of subsets it cover.
And remember, if you are taking one element that cover subset 's_1' for example then you have to take all elements from that subset, else greedy approach will cover only some part of subset not whole.
And then pick that permutation which gives maximum answer.
But brute force approach only works when k is less than 10.
As the order goes exponentially and there is no better solution than this, thus this question goes to np_hard. It can be shown that that your problem reduces to vertex cover problem.
Consider subsets as trees and elements as nodes.
Now your problem is to select k elements such that it covers maximum number of trees fully.

Related

Ordered Knapsack Problem Correctness/Proof

A thief is given the choice of n objects to steal, but only has one knapsack with a capacity of
taking M weight. Each object i has weight w_i, and profit p_i. Suppose he also knows the following:
the order of these items when sorted by increasing weight is the same as their order when sorted
by decreasing value. Give a greedy algorithm to find an optimal solution to this variant of the
knapsack problem. Prove the correctness and running time.
So the greedy algorithm I came up with was to sort the items based off of increasing weight which is also decreasing value. This means that the price per weight is in decreasing order. So the thief can take the highest valued item until the weight >= M. The running time would be O(n log n) since sorting takes O(n log n) and iterating through the list takes O(n). The part I am stuck on is the proof for correctness. Here is my proof so far:
Suppose there is an instance such that the solution stated above (referred to as GA) is not optimal. Let the optimal solution be referred to as OS, and the items taken by OS be sorted in increasing value. Since OS is more optimal than GA, then the profit earned from GA is less than or equal to the profit earned from OS. Since GA takes the item with the highest profit/weight ratio, then the first element, i, must be greater than or equal to the first element of OS. Because OS is more optimal, then there must exist a i that is greater than or equal to an item j in the set of GA. But because GA and OS are done on the same set, and GA is always taking the item with the highest profit/weight, there cannot be a i in OS that is greater than a j in GA.
Can anyone help with the proof? Thanks
Your approach to the solution is valid and the reasoning on the running time is correct. In the sequel, suppose that the input is "de-trivialized" in the sense that every occurring obejct actually fits into the knapsack and that it is impossible to select the entire input.
The sorting of the items that is generated by the sorting is both
decreasing in value
increasing weight
which makes it a special case of the general knapsack problem. The argumentation for the proof of correctnes is as follows. Let i' denote the breaking index which is the index of the first item in the sorted sequence which is rejected by the greedy algorithm. For clarity, call the corresponding object the breaking object. Note that
w_j > w_i' for each j > i'
holds, which means that that the greedy algorithm also rejects every object succeeding the breaking object (as it does not fit into the knapsack, just like the breaking object).
In total, the greedy algorithm selects a prefix of the sorted sequence; we aim at showing that any optimal solution (which we consider fixed in the sequel) is the same prefix.
Note that the optimal solution, as it it optimal, does not leave space for an additional object.
Aiming at a contradiction, let k be the minimal index which occurs in the greedy solution but not in the optimal solution. As it is impossible to select object k additionally into the optimal solution, there must (via minimality of k) be some item in the optimal solution with an index
k' > k
which permits an exchange of items in the optimal solution. As
w_k < w_k' and p_k > p_k'
hold, object k' can be replaced by object k in the optimal solution, which yields a solution with profit larger than the one of the optimal solution, which is a contradiction to its optimality.
Hence, there is no item in the greedy solution which is missing in the optimal solution, which means that the greedy solution is a subset of the optimal solution. On the other hand, the greedy solution is maximal with respect to inclusion, which means that the optimal solution cannot contain an item which is missing in the greedy solution.
Note that the greedy algorithm als is useful for the general knapsack problem; taking the better one of the greedy solution and an item with maximum profit yields an approximation algorithm with ratio 2.

Finding number of length 3 increasing (or decreasing) subsequences?

Given an array of positive integers, how can I find the number of increasing (or decreasing) subsequences of length 3? E.g. [1,6,3,7,5,2,9,4,8] has 24 of these, such as [3,4,8] and [6,7,9].
I've found solutions for length k, but I believe those solutions can be made more efficient since we're only looking at k = 3.
For example, a naive O(n^3) solution can be made faster by looping over elements and counting how many elements to their left are less, and how many to their right are higher, then multiplying these two counts, and adding it to a sum. This is O(n^2), which obviously doesn't translate easily into k > 3.
The solution can be by looping over elements, on every element you can count how many elements to their left and less be using segment tree algorithm which work in O(log(n)), and by this way you can count how many elements to their right and higher, then multiplying these two counts, and adding it to the sum. This is O(n*log(n)).
You can learn more about segment tree algorithm over here:
Segment Tree Tutorial
For each curr element, count how many elements on the left and right have less and greater values.
This curr element can form less[left] * greater[right] + greater[left] * less[right] triplet.
Complexity Considerations
The straightforward approach to count elements on left and right yields a quadratic solution. You might be tempted to use a set or something to count solders in O(log n) time.
You can find a solder rating in a set in O(log n), however, counting elements before and after will still be linear. Unless you implement BST where each node tracks count of left children.
Check the solution here:
https://leetcode.com/problems/count-number-of-teams/discuss/554795/C%2B%2BJava-O(n-*-n)

Fewest subsets with sum less than N

I have a specific sub-problem for which I am having trouble coming up with an optimal solution. This problem is similar to the subset sum group of problems as well as space filling problems, but I have not seen this specific problem posed anywhere. I don't necessarily need the optimal solution (as I am relatively certain it is NP-hard), but an effective and fast approximation would certainly suffice.
Problem: Given a list of positive valued integers find the fewest number of disjoint subsets containing the entire list of integers where each subset sums to less than N. Obviously no integer in the original list can be greater than N.
In my application I have many lists and I can concatenate them into columns of a matrix as long as they fit in the matrix together. For downstream purposes I would like to have as little "wasted" space in the resulting ragged matrix, hence the space filling similarity.
Thus far I am employing a greedy-like approach, processing from the largest integers down and finding the largest integer that fits into the current subset under the limit N. Once the smallest integer no longer fits into the current subset I proceed to the next subset similarly until all numbers are exhausted. This almost certainly does not find the optimal solution, but was the best I could come up with quickly.
BONUS: My application actually requires batches, where there is a limit on the number of subsets in each batch (M). Thus the larger problem is to find the fewest batches where each batch contains M subsets and each subset sums to less than N.
Straight from Wikipedia (with some bold amendments):
In the bin packing problem, objects [Integers] of different volumes [values] must be
packed into a finite number of bins [sets] or containers each of volume V [summation of the subset < V] in
a way that minimizes the number of bins [sets] used. In computational
complexity theory, it is a combinatorial NP-hard problem.
https://en.wikipedia.org/wiki/Bin_packing_problem
As far as I can tell, this is exactly what you are looking for.

Algorithm to calculate permutations

I'm aware of Heap's algorithm to calculate permutations of a given sequence, but what if I wanted to calculate the permutations of a k-elements subset for a given sequence N?
The solution I'm thinking of this time is a backtracking one, but it would need to generate a new sequence of sub-elements each time deleting one and recursively calling the permutation function. This sounds expensive and I would like to know if there's a better solution
Use an algorithm to generate combinations of size K from the set of N.
(Pick any from the SO question: Algorithm to return all combinations of k elements from n).
Using the result, apply Heap's Algorithm to create all permutations of this k-element subset (or another Algorithm to generate all possible permutations of a list).
Generate the next subset of size K and repeat (steps 1 and 2) until all subsets of size K have been enumerated.

How to enumerate all k-combinations of a set by sum?

Suppose I have a finite set of numeric values of size n.
Question: Is there an efficient algorithm for enumerating the k-combinations of that set so that combination I precedes combination J iff the sum of the elements in I is less than or equal to the sum of the elements in J?
Clearly it's possible to simply enumerate the combinations and sort them according to their sums. If the set is large, however, brute enumeration of all combinations, let alone sorting, will be infeasible. If I'm only interested in obtaining the first m << choose(n,k) combinations ranked by sum, is it possible to obtain them before the heat death of the universe?
There is no polynomial algorithm for enumerating the set this way (unless P=NP).
If there was such an algorithm (let it be A), then we could solve the subset sum problem polynomially:
run A
Do a binary search to find the subset that sums closest to the desired number.
Note that step 1 runs polynomially (assumption) and step 2 runs in O(log(2^n)) = O(n).
Conclusion: Since the Subset Sum problem is NP-Complete, solving this problem efficiently will prove P=NP - thus there is no known polynomial solution to the problem.
Edit: Even though the problem is NP-Hard, getting the "smallest" m subsets can be done on O(n+2^m) by selecting the smallest m elements, generating all the subsets from these m elements - and choosing the minimal m of those. So for fairly small values of m - it might be feasible to calculate it.

Resources