I was reading this http://www.cas.mcmaster.ca/~terlaky/4-6TD3/slides/DP/DP.pdf and would like to know if there exists a solution with better time complexity to the partition problem.
From the link:
"Suppose a given arrangement S of non-negative numbers
{s1,...,sn} and an integer k. How to cut S into k or fewer ranges,
so as to minimize the maximum sum over all the ranges?"
e.g.
S = 1,2,3,4,5,6,7,8,9
k=3
By cutting S into these 3 ranges, the sum of the maximum range (8,9) is 17, which is the minimum possible.
1,2,3,4,5|6,7|8,9
The algorithm suggested in the link runs in O(kn^2) and uses O(kn) space. Are there more efficient algorithms?
Ok so apparently this was closed for being "off-topic"!?
But it's back up now so anyway, I found the solution to be binary searching the answer. Sorry I forgot one of the constraints was that the sum of all the integers would not exceed 2^64. So let C = cumulative sum of all integers. Then we can binary search for the answer using a
bool isPossible(int x)
function which returns true if it is possible to divide S into k partitions with maximum partition sum less than X. isPossible(int x) can be done in O(n) (by adding everything from left to right and if it exceeds x make a new partition). So the total running time is O(n*log(s)).
Related
This is from the practice problem in one of coursera's Algorithms courses; I've been stuck for a couple of weeks.
The problem is this:
Given an array of n distinct unsorted elements x1, x2, ..., xn ε X with positive weights w1, w2, ..., wn ε W, a weighted median is an element xk for which the total weight of all elements with values less than xk is at most (total weight)/2 and also the total weight of elements with values larger than xk is at most (total weight)/2. Observe that there are at most two weighted. Show how to compute all weighted medians in O(n) worst time
The course mostly covered divide and conquer algorithms, so I think the key to get started on this would be to identify which of the algorithms covered can be used for this problem.
One of the algorithms covered was the RSelect algorithm in the form RSelect(array X, length n, order statistic i) which for a weighted median could be written as RSelect(array X, weights W, length n, order statistic i). My issue with this approach is that it assumes I know the median value ahead of time, which seems unlikely. There's also the issue that the pivot is chosen uniformly at random, which I don't imagine is likely to work with weights without computing every weight for every entry.
Next is the DSelect algorithms, where using a median of medians approach a pivot may be computed without randomization so we can compute a proper median. This seems like the approach that could work, where I have trouble is that it also assumes that I know ahead of time the value I'm looking for.
DSelect(array A, length n, order statistic i) for an unweighted array
DSelect(array A, weights W, length n, order statistic i) for a weighted array
Am I overthinking this? Should I use DSelect assuming that I know the value of (total weight) / 2 ahead of time? I guess even if I compute it it would add only linear time to the running time. But then it would be no different from precomputing a weighted array (combine A, W into Q where qi = xi*wi) and transforming this back to an unweighted array problem where I can use RSelect (plus some accounting for cases where there are two medians)
I've found https://archive.org/details/lineartimealgori00blei/page/n3 and https://blog.nelsonliu.me/2016/07/05/gsoc-week-6-efficient-calculation-of-weighted-medians/ which describe this problem, but their approach doesn't seem to be something covered in the course (and I'm not familiar with heaps/heapsort)
This problem can be solved with a simple variant of quickselect:
Calculate the sum of all weights and divide by 2 to get the target sum
Choose a pivot and partition the array into larger and smaller elements
Sum the weights in the smaller partition, and subtract from the total to get the sum in the other partition
go back to 2 to process the appropriate partition with the appropriate target sum
Just like normal quickselect, this becomes linear in the worst case if you use the (normal, unweighted) median-of-medians approach to choose a pivot.
This average performance can be achieved with Quickselect.
The randomly chosen pivot can be chosen - with weighting - with the Reservoir Sampling Algorithm. You are correct that it is O(n) to find the first pivot, but the size of the lists that you're working with will follow a geometric series, so the total cost of finding pivots will still work out to be only O(n).
Given an array of positive integers, how can I find the number of increasing (or decreasing) subsequences of length 3? E.g. [1,6,3,7,5,2,9,4,8] has 24 of these, such as [3,4,8] and [6,7,9].
I've found solutions for length k, but I believe those solutions can be made more efficient since we're only looking at k = 3.
For example, a naive O(n^3) solution can be made faster by looping over elements and counting how many elements to their left are less, and how many to their right are higher, then multiplying these two counts, and adding it to a sum. This is O(n^2), which obviously doesn't translate easily into k > 3.
The solution can be by looping over elements, on every element you can count how many elements to their left and less be using segment tree algorithm which work in O(log(n)), and by this way you can count how many elements to their right and higher, then multiplying these two counts, and adding it to the sum. This is O(n*log(n)).
You can learn more about segment tree algorithm over here:
Segment Tree Tutorial
For each curr element, count how many elements on the left and right have less and greater values.
This curr element can form less[left] * greater[right] + greater[left] * less[right] triplet.
Complexity Considerations
The straightforward approach to count elements on left and right yields a quadratic solution. You might be tempted to use a set or something to count solders in O(log n) time.
You can find a solder rating in a set in O(log n), however, counting elements before and after will still be linear. Unless you implement BST where each node tracks count of left children.
Check the solution here:
https://leetcode.com/problems/count-number-of-teams/discuss/554795/C%2B%2BJava-O(n-*-n)
I have this question that asks to rewrite the subset sum problem in terms of only N.
If unaware the problem is that given weights, each with cost 1 how would you find the optimal solution given a max weight to achieve.
So the O(NW) is the space and time costs, where space will be for the 2d matrix and in the use of dynamic programming. This problem is a special case of the knapsac problem.
I'm not sure how to approach this as I tried to think about it and only thing I thought of was find the sum of all weights and just have a general worst case scenario. Thanks
If the weight is not bounded, and so the complexity must depend solely on N, there is at least an O (2N) approach, which is trying all possible subsets of N elements and computing their sums.
If you are willing to use exponential space rather than polynomial space, you can solve the problem in O(n 2^(n/2)) time and O(2^(n/2)) space if you split your set of n weights into two sets A and B of roughly equal size and compute the sum of weights for all the subsets of the two sets, and then hash all sums of subsets in A and hash W - x for all sums x of subsets of B, and if you get a collision between a subset of A and a subset of B in the hash table then you have found a subset that sums to W.
I am solving problems from Column2 of Programming Pearls. I came across this problem:
"Given a set of n real numbers, a real number t, and an integer k, how quickly can you determine whether there exists a k-element subset of the set that sums to at most t?"
My solution is to sort the set of real numbers and then look at the sum for the first k elements. If this sum is less than or equal to t, then we know there exists at least one
set that satisfies the condition.
Is the solution correct?
Is there a better or different solution?
Note: Just to make it clear, do not assume the input to be already sorted.
Because you need only first k elements sorted as per your problem , I suggest following:-
Select the kth element in array using randomised select O(N)
Take sum of first k elements in array and check if its less than t
Time complexity O(N + k) = O(N) as k is O(N)
Randomized Selection
Note:- when k is very small as compared to N then max heap can be very efficient as the storage does not cost that much and it can solve problem in worst case O(Nlogk).
Is there a data structure representing a large set S of (64-bit) integers, that starts out empty and supports the following two operations:
insert(s) inserts the number s into S;
minmod(m) returns the number s in S such that s mod m is minimal.
An example:
insert(11)
insert(15)
minmod(7) -> the answer is 15 (which mod 7 = 1)
insert(14)
minmod(7) -> the answer is 14 (which mod 7 = 0)
minmod(10) -> the answer is 11 (which mod 10 = 1)
I am interested in minimizing the maximal total time spent on a sequence of n such operations. It is obviously possible to just maintain a list of elements for S and iterate through them for every minmod operation; then insert is O(1) and minmod is O(|S|), which would take O(n^2) time for n operations (e.g., n/2 insert operations followed by n/2 minmod operations would take roughly n^2/4 operations).
So: is it possible to do better than O(n^2) for a sequence of n operations? Maybe O(n sqrt(n)) or O(n log(n))? If this is possible, then I would also be interested to know if there are data structures that additionally admit removing single elements from S, or removing all numbers within an interval.
Another idea based on balanced binary search tree, as in Keith's answer.
Suppose all inserted elements so far are stored in balanced BST, and we need to compute minmod(m). Consider our set S as a union of subsets of numbers, lying in intervals [0,m-1], [m, 2m-1], [2m, 3m-1] .. etc. The answer will obviously be among the minimal numbers we have in each of that intervals. So, we can consequently lookup the tree to find the minimal numbers of that intervals. It's easy to do, for example if we need to find the minimal number in [a,b], we'll move left if current value is greater than a, and right otherwise, keeping track of the minimal value in [a,b] we've met so far.
Now if we suppose that m is uniformly distributed in [1, 2^64], let's calculate the mathematical expectation of number of queries we'll need.
For all m in [2^63, 2^64-1] we'll need 2 queries. The probability of this is 1/2.
For all m in [2^62, 2^63-1] we'll need 4 queries. The probability of this is 1/4.
...
The mathematical expectation will be sum[ 1/(2^k) * 2^k ], for k in [1,64], which is 64 queries.
So, to sum up, the average minmod(m) query complexity will be O(64*logn). In general, if we m has unknown upper bound, this will be O(logmlogn). The BST update is, as known, O(logn), so the overall complexity in case of n queries will be O(nlogm*logn).
Partial answer too big for a comment.
Suppose you implement S as a balanced binary search tree.
When you seek S.minmod(m), naively you walk the tree and the cost is O(n^2).
However, at a given time during the walk, you have the best (lowest) result so far. You can use this to avoid checking whole sub-trees when:
bestSoFar < leftChild mod m
and
rightChild - leftChild < m - leftChild mod m
This will only help much if a common spacing b/w the numbers in the set is smaller than common values of m.
Update the next morning...
Grigor has better and more fully articulated my idea and shown how it works well for "large" m. He also shows how a "random" m is typically "large", so works well.
Grigor's algorithm is so efficient for large m that one needs to think about the risk for much smaller m.
So it is clear that you need to think about the distribution of m and optimise for different cases if need be.
For example, it might be worth simply keeping track of the minimal modulus for very small m.
But suppose m ~ 2^32? Then the search algorithm (certainly as given but also otherwise) needs to check 2^32 intervals, which may amount to searching the whole set anyway.