Output-sensitive fast enumeration of all maximal subsets of non-overlapping intervals

Output-sensitive fast enumeration of all maximal subsets of non-overlapping intervals - algorithm

So given a set of intervals, finding a subset of non-overlapping intervals that has the maximal number of intervals can be done in linear time after sorting the intervals by their right end-points. However, what if we want to output ALL solution subsets with a maximal number of non-overlapping intervals? The running time should be output-sensitive, because for n intervals the number of optimal solutions could be exponential, e.g. as high as O(sqrt(n)^sqrt(n)). So if there are S optimal solutions, can they be enumerated in time linearly proportional to the size of S (perhaps with polynomial dependence on n as well)?

Run the standard dynamic programming algorithm for largest independent set in an interval graph. This tells you what the maximum number is. It is straightforward to modify this algorithm to track the number of ways to get said maximum number.
For every interval I, compile a list of all of the later intervals that do not overlap I from which you can form an independent set of maximum size.
Now run a straightforward recursive enumeration of all independent sets using the information compiled in the last paragraph.
If the size of the maximum independent set is h, this will take O(hS + n^2) time; the n^2 is for the DP and the hS is for the recursion and output.

Related

Finding the weighted median in an unsorted array in linear time

This is from the practice problem in one of coursera's Algorithms courses; I've been stuck for a couple of weeks.
The problem is this:
Given an array of n distinct unsorted elements x1, x2, ..., xn ε X with positive weights w1, w2, ..., wn ε W, a weighted median is an element xk for which the total weight of all elements with values less than xk is at most (total weight)/2 and also the total weight of elements with values larger than xk is at most (total weight)/2. Observe that there are at most two weighted. Show how to compute all weighted medians in O(n) worst time
The course mostly covered divide and conquer algorithms, so I think the key to get started on this would be to identify which of the algorithms covered can be used for this problem.
One of the algorithms covered was the RSelect algorithm in the form RSelect(array X, length n, order statistic i) which for a weighted median could be written as RSelect(array X, weights W, length n, order statistic i). My issue with this approach is that it assumes I know the median value ahead of time, which seems unlikely. There's also the issue that the pivot is chosen uniformly at random, which I don't imagine is likely to work with weights without computing every weight for every entry.
Next is the DSelect algorithms, where using a median of medians approach a pivot may be computed without randomization so we can compute a proper median. This seems like the approach that could work, where I have trouble is that it also assumes that I know ahead of time the value I'm looking for.
DSelect(array A, length n, order statistic i) for an unweighted array
DSelect(array A, weights W, length n, order statistic i) for a weighted array
Am I overthinking this? Should I use DSelect assuming that I know the value of (total weight) / 2 ahead of time? I guess even if I compute it it would add only linear time to the running time. But then it would be no different from precomputing a weighted array (combine A, W into Q where qi = xi*wi) and transforming this back to an unweighted array problem where I can use RSelect (plus some accounting for cases where there are two medians)
I've found https://archive.org/details/lineartimealgori00blei/page/n3 and https://blog.nelsonliu.me/2016/07/05/gsoc-week-6-efficient-calculation-of-weighted-medians/ which describe this problem, but their approach doesn't seem to be something covered in the course (and I'm not familiar with heaps/heapsort)

This problem can be solved with a simple variant of quickselect:
Calculate the sum of all weights and divide by 2 to get the target sum
Choose a pivot and partition the array into larger and smaller elements
Sum the weights in the smaller partition, and subtract from the total to get the sum in the other partition
go back to 2 to process the appropriate partition with the appropriate target sum
Just like normal quickselect, this becomes linear in the worst case if you use the (normal, unweighted) median-of-medians approach to choose a pivot.

This average performance can be achieved with Quickselect.
The randomly chosen pivot can be chosen - with weighting - with the Reservoir Sampling Algorithm. You are correct that it is O(n) to find the first pivot, but the size of the lists that you're working with will follow a geometric series, so the total cost of finding pivots will still work out to be only O(n).

Given Set of rationals and some value. Minimize elements needed for exceeding the value. (Linear time)

Given a set of n elements x > 0 and a weight W. Find the minimum of elements needed that sum up to W or greater.
I am struggling to find a linear time (O(n)) algorithm.
[Edit: Linear sorting algorithm cannot be used her since all values are rationals (also distinct rationals)]

Firstly you need to sort these elements. And then you can easily find the minimum number of elements in linear time. You need to go from the largest element to the smallest and sum them up as long as this sum <= W.
The usual sorting algorithm takes O(n*log(n)). So you need an algorithm for sorting numbers in linear time. If you have a set of integers, you can use one of these algorithms http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap09.htm

Algorithm to output a subset of set A such that it maximizes the overall pairwise sum

Suppose I have a set A={a_1, a_2, ..., a_n}. I also have a function f:AxA->R that assigns a pair from A a certain real value. I want to extract a subset S_k of size k from A such that it maximizes the overall pairwise sum of all elements in S_k
Is there any known algorithm that would do this in reasonable time? polynomial/quasi-polynomial time perhaps?
Edit: Worked Example
Suppose A={a_1,a_2,a_3,a_4} with k=3 and f is defined as:
f(a_1,a_2)=0,f(a_1,a_3)=0,f(a_1,a_4)=0,f(a_2,a_3)=1,f(a_2,a_4)=5,f(a_3,a_4)=10.
Then S_k={a_2,a_3,a_4} since it maximizes the sum f(a_2,a_3)+f(a_2,a_4)+f(a_3,a_4). (i.e. the pairwise sum of all elements in S_k)

Unlikely -- this problem generalizes the problem of finding a k-clique (set the weights to the adjacency matrix of the graph), for which the best known algorithms are exponential (see also the strong exponential time hypothesis).

Fewest subsets with sum less than N

I have a specific sub-problem for which I am having trouble coming up with an optimal solution. This problem is similar to the subset sum group of problems as well as space filling problems, but I have not seen this specific problem posed anywhere. I don't necessarily need the optimal solution (as I am relatively certain it is NP-hard), but an effective and fast approximation would certainly suffice.
Problem: Given a list of positive valued integers find the fewest number of disjoint subsets containing the entire list of integers where each subset sums to less than N. Obviously no integer in the original list can be greater than N.
In my application I have many lists and I can concatenate them into columns of a matrix as long as they fit in the matrix together. For downstream purposes I would like to have as little "wasted" space in the resulting ragged matrix, hence the space filling similarity.
Thus far I am employing a greedy-like approach, processing from the largest integers down and finding the largest integer that fits into the current subset under the limit N. Once the smallest integer no longer fits into the current subset I proceed to the next subset similarly until all numbers are exhausted. This almost certainly does not find the optimal solution, but was the best I could come up with quickly.
BONUS: My application actually requires batches, where there is a limit on the number of subsets in each batch (M). Thus the larger problem is to find the fewest batches where each batch contains M subsets and each subset sums to less than N.

Straight from Wikipedia (with some bold amendments):
In the bin packing problem, objects [Integers] of different volumes [values] must be
packed into a finite number of bins [sets] or containers each of volume V [summation of the subset < V] in
a way that minimizes the number of bins [sets] used. In computational
complexity theory, it is a combinatorial NP-hard problem.
https://en.wikipedia.org/wiki/Bin_packing_problem
As far as I can tell, this is exactly what you are looking for.

What is complexity measured against? (bits, number of elements, ...)

I've read that the naive approach to testing primality has exponential complexity because you judge the algorithm by the size of its input. Mysteriously, people insist that when discussing primality of an integer, the appropriate measure of the size of the input is the number of bits (not n, the integer itself).
However, when discussing an algorithm like Floyd's, the complexity is often stated in terms of the number of nodes without regard to the number of bits required to store those nodes.
I'm not trying to make an argument here. I honestly don't understand the reasoning. Please explain. Thanks.

Traditionally speaking, the complexity is measured against the size of input.
In case of numbers, the size of input is log of this number (because it is a binary representation of it), in case of graphs, all edges and vertices must be represented somehow in the input, so the size of the input is linear in |V| and |E|.
For example, naive primality test that runs in linear time of the number itself, is called pseudo-polynomial. It is polynomial in the number, but it is NOT polynomial in the size of the input, which is log(n), and it is in fact exponential in the size of the input.
As a side note, it does not matter if you use the size of the input in bits, bytes, or any other CONSTANT factor for this matter, because it will be discarded anyway later on when computing the asymptotical notation as constants.

The main difference is that when discussing algorithms we keep in the back of our mind a hardware that is able to perform operations on the data used in O(1) time. When being strict or when considering data which is not able to fit into the processors register then taking the number of bits in account becomes important.

Although the size of input is measured in the number of bits, in many cases we can use a shortcut that lets us divide out a constant number of bits. This constant factor is embedded in the representation that we choose for our data structure.
When discussing graph algorithms, we assume that each vertex and each edge has a fixed cost of representation in terms of the number of bits, which does not depend of the number of vertices and edges. This assumption requires that weights associated with vertices and edges have fixed size in terms of the number of bits (i.e. all integers, all floats, etc.)
With this assumption in place, adjacency list representation has fixed size per edge or vertex, because we need one pointer per edge and one pointer per vertex, in addition to the weights, which we presume to be of constant size as well.
Same goes for adjacency matrix representation, because we need W(E2 + V) bits for the matrix, where W is the number of bits required to store the weight.
In rare situations when weights themselves are dependent on the number of vertices or edges the assumption of fixed weight no longer holds, so we must go back to counting the number of bits.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio