How to find the second smallest element in n + logn -2 comparisons? [duplicate] - algorithm

Given n numbers, how do I find the largest and second largest number using at most n+log(n) comparisons?
Note that it's not O(n+log(n)), but really n+log(n) comparisons.

pajton gave a comment.
Let me elaborate.
As pajton said, this can be done by tournament selection.
Think of this as a knock out singles tennis tournament, where player abilities have a strict order and the outcome of a match is decided solely by that order.
In the first round half the people are eliminated. In the next round another half etc (with some byes possible).
Ultimately the winner is decided in the last and final round.
This can be viewed as a tree.
Each node of the tree will be the winner of the match between the children of that node.
The leaves are the players. The winner of the first round are the parents of the leaves etc.
This is a complete binary tree on n nodes.
Now follow the path of the winner. There are log n matches the winner has played. Now consider the losers of those log n matches. The second best must be the best among those.
The winner is decided in n-1 matches (you knock out one per match) and the winner among the logn is decided in logn -1 matches.
Thus you can decide the largest and second largest in n+logn - 2 compares.
In fact, it can proven that this is optimal. In any comparison scheme in the worst case, the winner would have to play logn matches.
To prove that assume a point system where after a match the winner gets the points of the loser. Initially all start out with 1 point each.
At the end the final winner has n points.
Now given any algorithm, it could be arranged so that player with more points is always the winner. Since the points of any person at most double in any match in that scenario, you require at least log n matches played by the winner in the worst case.

Is there a problem with this? It's at most 3n comparisons (not counting the i < n comparison). If you count that, it's 4n (or 5n in the second example).
double first = -1e300, second = -1e300;
for (i = 0; i < n; i++){
if (a[i] > first){
second = first;
first = a[i];
}
else if (a[i] > second && a[i] < first){
second = a[i];
}
}
another way to code it:
for (i = 0; i < n; i++) if (a[i] > first) first = a[i];
for (i = 0; i < n; i++) if (a[i] < first && a[i] > second) second = a[i];

Related

How was the assumption made on half will have i < j?

I have been reading "Cracking the Coding Interview 6th Edition".. On Chapter 0 - Big O, I have problem understanding an assumption made to a problem on Example 3.
void printUnorderedPairs(int[] array){
for(int i = 0; i < array.length; i++){
for(int j = i + 1; j < array.length; j++){
...
}
}
}
Under What It Means section, it assumed that:
There are N^2 total pairs. Roughly half of those will have i < j and the remaining half will have i > j. This code goes through roughly n^2/2 pairs so it does O(N^2) work.
My question is, how was the assumption made on Roughly half of those will have i < j and the remaining half will have i > j done? Can someone explain it to me please?
Thanks!
There are several ways you can try to think about this assumption, I quite like the "geometric" suggestion from #IanMercer in the comments. Here is another:
What is an unordered pair
An unordered pair is a pair of integers (i,j) where i and j is in the domain (1, N). (They can take any value from 1 to N).
How many pairs are there?
i can be of any value from 1 to N, and j can be of any value from 1 to N. Any combination of i forms a valid pair. So there are are N*N pairs.
Among all the pairs, how many pairs are there that i < j
Note that for any pair (a,b) where a is smaller than b, there exists a counterpart (b,a) (same values but flipped). So there is an equal amount of pairs where i<j as there are pairs 'i>j'.
So what is this confusing roughly part? It is because of all those N*N pairs there are some where neither i<j nor j>i, and those are precisely the N pairs where i==j.
The N*N pairs are thus divided into three parts (those where i < j), (those where j> i) and (those where i==j). Since first two are much larger O(N**2)/2 vs. the last group which has only N elements, we can state that roughly half have the property that i<j.

write a number as sum of a consecutive primes

How to check if n can be partitioned to sum of a sequence of consecutive prime numbers.
For example, 12 is equal to 5+7 which 5 and 7 are consecutive primes, but 20 is equal to 3+17 which 3 and 17 are not consecutive.
Note that, repetition is not allowed.
My idea is to find and list all primes below n, then use 2 loops to sum all primes. The first 2 numbers, second 2 numbers, third 2 numbers etc. and then first 3 numbers, second 3 numbers and so far. But it takes lot of time and memory.
Realize that a consecutive list of primes is defined only by two pieces of information, the starting and the ending prime number. You just have to find these two numbers.
I assume that you have all the primes at your disposal, sorted in the array called primes. Keep three variables in memory: sum which initially is 2 (the smallest prime), first_index and last_index which are initially 0 (index of the smallest prime in array primes).
Now you have to "tweak" these two indices, and "travel" the array along the way in the loop:
If sum == n then finish. You have found your sequence of primes.
If sum < n then enlarge the list by adding next available prime. Increment last_index by one, and then increment sum by the value of new prime, which is primes[last_index]. Repeat the loop. But if primes[last_index] is larger than n then there is no solution, and you must finish.
If sum > n then reduce the list by removing the smallest prime from the list. Decrement sum by that value, which is primes[first_index], and then increment first_index by one. Repeat the loop.
Dialecticus's algorithm is the classic O(m)-time, O(1)-space way to solve this type of problem (here I'll use m to represent the number of prime numbers less than n). It doesn't depend on any mysterious properties of prime numbers. (Interestingly, for the particular case of prime numbers, AlexAlvarez's algorithm is also linear time!) Dialecticus gives a clear and correct description, but seems at a loss to explain why it is correct, so I'll try to do this here. I really think it's valuable to take the time to understand this particular algorithm's proof of correctness: although I had to read a number of explanations before it finally "sank in", it was a real "Aha!" moment when it did! :) (Also, problems that can be efficiently solved in the same manner crop up quite a lot.)
The candidate solutions this algorithm tries can be represented as number ranges (i, j), where i and j are just the indexes of the first and last prime number in a list of prime numbers. The algorithm gets its efficiency by ruling out (that is, not considering) sets of number ranges in two different ways. To prove that it always gives the right answer, we need to show that it never rules out the only range with the right sum. To that end, it suffices to prove that it never rules out the first (leftmost) range with the right sum, which is what we'll do here.
The first rule it applies is that whenever we find a range (i, j) with sum(i, j) > n, we rule out all ranges (i, k) having k > j. It's easy to see why this is justified: the sum can only get bigger as we add more terms, and we have determined that it's already too big.
The second, trickier rule, crucial to the linear time complexity, is that whenever we advance the starting point of a range (i, j) from i to i+1, instead of "starting again" from (i+1, i+1), we start from (i+1, j) -- that is, we avoid considering (i+1, k) for all i+1 <= k < j. Why is it OK to do this? (To put the question the other way: Couldn't it be that doing this causes us to skip over some range with the right sum?)
[EDIT: The original version of the next paragraph glossed over a subtlety: we might have advanced the range end point to j on any previous step.]
To see that it never skips a valid range, we need to think about the range (i, j-1). For the algorithm to advance the starting point of the current range, so that it changes from (i, j) to (i+1, j), it must have been that sum(i, j) > n; and as we will see, to get to a program state in which the range (i, j) is being considered in the first place, it must have been that sum(i, j-1) < n. That second claim is subtle, because there are two different ways to arrive in such a program state: either we just incremented the end point, meaning that the previous range was (i, j-1) and this range was found to be too small (in which case our desired property sum(i, j-1) < n obviously holds); or we just incremented the start point after considering (i-1, j) and finding it to be too large (in which case it's not obvious that the property still holds).
What we do know, however, is that regardless of whether the end point was increased from j-1 to j on the previous step, it was definitely increased at some time before the current step -- so let's call the range that triggered this end point increase (k, j-1). Clearly sum(k, j-1) < n, since this was (by definition) the range that caused us to increase the end point from j-1 to j; and just as clearly k <= i, since we only process ranges in increasing order of their start points. Since i >= k, sum(i, j-1) is just the same as sum(k, j-1) but with zero or more terms removed from the left end, and all of these terms are positive, so it must be that sum(i, j-1) <= sum(k, j-1) < n.
So we have established that whenever we increase i to i+1, we know that sum(i, j-1) < n. To finish the analysis of this rule, what we (again) need to make use of is that dropping terms from either end of this sum can't make it any bigger. Removing the first term leaves us with sum(i+1, j-1) <= sum(i, j-1) < n. Starting from that sum and successively removing terms from the other end leaves us with sum(i+1, j-2), sum(i+1, j-3), ..., sum(i+1, i+1), all of which we know must be less than n -- that is, none of the ranges corresponding to these sums can be valid solutions. Therefore we can safely avoid considering them in the first place, and that's exactly what the algorithm does.
One final potential stumbling block is that it might seem that, since we are advancing two loop indexes, the time complexity should be O(m^2). But notice that every time through the loop body, we advance one of the indexes (i or j) by one, and we never move either of them backwards, so if we are still running after 2m loop iterations we must have i + j = 2m. Since neither index can ever exceed m, the only way for this to hold is if i = j = m, which means that we have reached the end: i.e. we are guaranteed to terminate after at most 2m iterations.
The fact that primes have to be consecutive allows to solve quite efficiently this problem in terms of n. Let me suppose that we have previously computed all the primes less or equal than n. Therefore, we can easily compute sum(i) as the sum of the first i primes.
Having this function precomputed, we can loop over the primes less or equal than n and see whether there exists a length such that starting with that prime we can sum up to n. But notice that for a fixed starting prime, the sequence of sums is monotone, so we can binary search over the length.
Thus, let k be the number of primes less or equal than n. Precomputing the sums has cost O(k) and the loop has cost O(klogk), dominating the cost. Using the Prime number theorem, we know that k = O(n/logn), and then the whole algorithm has cost O(n/logn log(n/logn)) = O(n).
Let me put a code in C++ to make it clearer, hope there are not bugs:
#include <iostream>
#include <vector>
using namespace std;
typedef long long ll;
int main() {
//Get the limit for the numbers
int MAX_N;
cin >> MAX_N;
//Compute the primes less or equal than MAX_N
vector<bool> is_prime(MAX_N + 1, true);
for (int i = 2; i*i <= MAX_N; ++i) {
if (is_prime[i]) {
for (int j = i*i; j <= MAX_N; j += i) is_prime[j] = false;
}
}
vector<int> prime;
for (int i = 2; i <= MAX_N; ++i) if (is_prime[i]) prime.push_back(i);
//Compute the prefixed sums
vector<ll> sum(prime.size() + 1, 0);
for (int i = 0; i < prime.size(); ++i) sum[i + 1] = sum[i] + prime[i];
//Get the number of queries
int n_queries;
cin >> n_queries;
for (int z = 1; z <= n_queries; ++z) {
int n;
cin >> n;
//Solve the query
bool found = false;
for (int i = 0; i < prime.size() and prime[i] <= n and not found; ++i) {
//Do binary search over the lenght of the sum:
//For all x < ini, [i, x] sums <= n
int ini = i, fin = int(prime.size()) - 1;
while (ini <= fin) {
int mid = (ini + fin)/2;
int value = sum[mid + 1] - sum[i];
if (value <= n) ini = mid + 1;
else fin = mid - 1;
}
//Check the candidate of the binary search
int candidate = ini - 1;
if (candidate >= i and sum[candidate + 1] - sum[i] == n) {
found = true;
cout << n << " =";
for (int j = i; j <= candidate; ++j) {
cout << " ";
if (j > i) cout << "+ ";
cout << prime[j];
}
cout << endl;
}
}
if (not found) cout << "No solution" << endl;
}
}
Sample input:
1000
5
12
20
28
17
29
Sample output:
12 = 5 + 7
No solution
28 = 2 + 3 + 5 + 7 + 11
17 = 2 + 3 + 5 + 7
29 = 29
I'd start by noting that for a pair of consecutive primes to sum to the number, one of the primes must be less than N/2, and the other prime must be greater than N/2. For them to be consecutive primes, they must be the primes closest to N/2, one smaller and the other larger.
If you're starting with a table of prime numbers, you basically do a binary search for N/2. Look at the primes immediately larger and smaller than that. Add those numbers together and see if they sum to your target number. If they don't, then it can't be the sum of two consecutive primes.
If you don't start with a table of primes, it works out pretty much the same way--you still start from N/2 and find the next larger prime (we'll call that prime1). Then you subtract N-prime1 to get a candidate for prime2. Check if that's prime, and if it is, search the range prime2...N/2 for other primes to see if there was a prime in between. If there's a prime in between your number is a sum of non-consecutive primes. If there's no other prime in that range, then it is a sum of consecutive primes.
The same basic idea applies for sequences of 3 or more primes, except that (of course) your search starts from N/3 (or whatever number of primes you want to sum to get to the number).
So, for three consecutive primes to sum to N, 2 of the three must be the first prime smaller than N/3 and the first prime larger than N/3. So, we start by finding those, then compute N-(prime1+prime2). That gives use our third candidate. We know these three numbers sum to N. We still need to prove that this third number is a prime. If it is prime, we need to verify that it's consecutive to the other two.
To give a concrete example, for 10 we'd start from 3.333. The next smaller prime is 3 and the next larger is 5. Those add to 8. 10-8 = 2. 2 is prime and consecutive to 3, so we've found the three consecutive primes that add to 10.
There are some other refinements you can make as well. The most obvious would be based on the fact that all primes (other than 2) are odd numbers. Therefore (assuming we can ignore 2), an even number can only be the sum of an even number of primes, and an odd number can only be a sum of an odd number of primes. So, given 123456789, we know immediately that it can't possibly be the sum of 2 (or 4, 6, 8, 10, ...) consecutive primes, so the only candidates to consider are 3, 5, 7, 9, ... primes. Of course, the opposite works as well: given, say, 12345678, the simple fact that it's even lets us immediately rule out the possibility that it could be the sum of 3, 5, 7 or 9 consecutive primes; we only need to consider sequences of 2, 4, 6, 8, ... primes. We violate this basic rule only when we get to a large enough number of primes that we could include 2 as part of the sequence.
I haven't worked through the math to figure out exactly how many that would be be for a given number, but I'm pretty sure it should be fairly easy and it's something we want to know anyway (because it's the upper limit on the number of consecutive primes to look for for a given number). If we use M for the number of primes, the limit should be approximately M <= sqrt(N), but that's definitely only an approximation.
I know that this question is a little old, but I cannot refrain from replying to the analysis made in the previous answers. Indeed, it has been emphasized that all the three proposed algorithms have a run-time that is essentially linear in n. But in fact, it is not difficult to produce an algorithm that runs at a strictly smaller power of n.
To see how, let us choose a parameter K between 1 and n and suppose that the primes we need are already tabulated (if they must be computed from scratch, see below). Then, here is what we are going to do, to search a representation of n as a sum of k consecutive primes:
First we search for k<K using the idea present in the answer of Jerry Coffin; that is, we search k primes located around n/k.
Then to explore the sums of k>=K primes we use the algorithm explained in the answer of Dialecticus; that is, we begin with a sum whose first element is 2, then we advance the first element one step at a time.
The first part, that concerns short sums of big primes, requires O(log n) operations to binary search one prime close to n/k and then O(k) operations to search for the other k primes (there are a few simple possible implementations). In total this makes a running time
R_1=O(K^2)+O(Klog n).
The second part, that is about long sums of small primes, requires us to consider sums of consecutive primes p_1<\dots<p_k where the first element is at most n/K.
Thus, it requires to visit at most n/K+K primes (one can actually save a log factor by a weak version of the prime number theorem). Since in the algorithm every prime is visited at most O(1) times, the running time is
R_2=O(n/K) + O(K).
Now, if log n < K < \sqrt n we have that the first part runs with O(K^2) operations and the second part runs in O(n/K). We optimize with the choice K=n^{1/3}, so that the overall running time is
R_1+R_2=O(n^{2/3}).
If the primes are not tabulated
If we also have to find the primes, here is how we do it.
First we use Erathostenes, that in C_2=O(T log log T) operations finds all the primes up to T, where T=O(n/K) is the upper bound for the small primes visited in the second part of the algorithm.
In order to perform the first part of the algorithm we need, for every k<K, to find O(k) primes located around n/k. The Riemann hypothesis implies that there are at least k primes in the interval [x,x+y] if y>c log x (k+\sqrt x) for some constant c>0. Therefore a priori we need to find the primes contained in an interval I_k centered at n/k with width |I_k|= O(k log n)+O(\sqrt {n/k} log n).
Using the sieve Eratosthenes to sieve the interval I_k requires O(|I_k|log log n) + O(\sqrt n) operations. If k<K<\sqrt n we get a time complexity C_1=O(\sqrt n log n log log n) for every k<K.
Summing up, the time complexity C_1+C_2+R_1+R_2 is maximized when
K = n^{1/4} / (log n \sqrt{log log n}).
With this choice have the sublinear time complexity
R_1+R_2+C_1+C_2 = O(n^{3/4}\sqrt{log log n}.
If we do not assume the Riemann Hypothesis we will have to search on larger intervals, but we still get at the end a sublinear time complexity. If instead we assume stronger conjectures on prime gaps, we may only need to search on intervals I_k with width |I_k|=k (log n)^A for some A>0. Then, instead of Erathostenes, we can use other deterministic primality tests. For example, suppose that you can test a single number for primality in O((log n)^B) operations, for some B>0.
Then you can search the interval I_k in O(k(log n)^{A+B}) operations. In this case the optimal K is still K\approx n^{1/3}, up to logarithmic factors, and so the total complexity is O(n^{2/3}(log n)^D for some D>0.

Find triplets in better than linear time such that A[n-1] >= A[n] <= A[n+1]

A sequence of numbers was given in an interview such that A[0] >= A[1] and A[N-1] >= A[N-2]. I was asked to find at-least one triplet such that A[n-1] >= A[n] <= A[n+1].
I tried to solve in iterations. Interviewer expected better than linear time solution. How should I approach this question?
Example: 9 8 5 4 3 2 6 7
Answer: 3 2 6
We can solve this in O(logn) time using divide & conquer aka. binary search. Better than linear time. So we need to find a triplet such that A[n-1] >= A[n] <= A[n+1].
First find the mid of the given array. If mid is smaller than its left and greater than its right. then return, thats your answer. Incidentally this would be a basecase in your recursion. Also if len(arr) < 3 then too return. another basecase.
Now comes the recursion scenarios. When to recurse, we would need to inspect further right. For that, If mid is greater than the element on its left then consider start to left of the array as a subproblem and recurse with this new array. i.e. in tangible terms at this point we would have ...26... with index n being 6. So we move left to see if the element to the left of 2 completes the triplet.
Otherwise if mid is greater than element on its right subarray then consider mid+1 to right of the array as a subproblem and recurse.
More Theory: The above should be sufficient to understand the problem but read on. The problem essentially boils down to finding local minima in a given set of elements. A number in the array is called local minima if it is smaller than both its left and right numbers which precisely boils down to A[n-1] >= A[n] <= A[n+1].
A given array such that its first 2 elements are decreasing and last 2 elements are increasing HAS to have a local minima. Why is that? Lets prove this by negation. If first two numbers are decreasing, and there is no local minima, that means 3rd number is less than 2nd number. otherwise 2nd number would have been local minima. Following the same logic 4th number will have to be less than 3rd number and so on and so forth. So the numbers in the array will have to be in decreasing order. Which violates the constraint of last two numbers being in increasing order. This proves by negation that there need to be a local minima.
The above theory suggests a O(n) linear approach but we definitely can do better. But the theory definitely gives us a different perspective about the problem.
Code: Here's python code (fyi - was typed in stackoverflow text editor freehand, it might misbheave).
def local_minima(arr, start, end):
mid = (start+end)/2
if mid-2 < 0 and mid+1 >= len(arr):
return -1;
if arr[mid-2] > arr[mid-1] and arr[mid-1] < arr[mid]: #found it!
return mid-1;
if arr[mid-1] > arr[mid-2]:
return local_minima(arr, start, mid);
else:
return local_minima(arr, mid, end);
Note that I just return the index of the n. To print out the triple just do -1 and +1 to the returned index. source
It sounds like what you're asking is this:
You have a sequence of numbers. It starts decreasing and continues to decrease until element n, then it starts increasing until the end of the sequence. Find n.
This is a (non-optimal) solution in linear time:
for (i = 1; i < length(A) - 1; i++)
{
if ((A[i-1] >= A[i]) && (A[i] <= A[i+1]))
return i;
}
To do better than linear time, you need to use the information that you get from the fact that the series decreases then increases.
Consider the difference between A[i] and A[i+1]. If A[i] > A[i+1], then n > i, since the values are still decreasing. If A[i] <= A[i+1], then n <= i, since the values are now increasing. In this case you need to check the difference between A[i-1] and A[i].
This is a solution in log time:
int boundUpper = length(A) - 1;
int boundLower = 1;
int i = (boundUpper + boundLower) / 2; //initial estimate
while (true)
{
if (A[i] > A[i+1])
boundLower = i + 1;
else if (A[i-1] >= A[i])
return i;
else
boundUpper = i;
i = (boundLower + boundUpper) / 2;
}
I'll leave it to you to add in the necessary error check in the case that A does not have an element satisfying the criteria.
Linear you could just do by iterating through the set, comparing them all.
You could also check the slope of the first two, then do a kind of binary chop/in order traversal comparing pairs until you find one of the opposite slope. That would amortize to a better than n time, I think, though it's not guaranteed.
edit: just realised what your ordering meant. The binary chop method is guaranteed to do this in <n time, as there is guaranteed to be a point of change (assuming that your N-1, N-2 are the last two elements of the list).
This means you just need to find it/one of them, in which case binary chop will do it in order log(n)

Reduce a sequence in most optimal way

We are given a sequence a of n numbers. The reduction of sequence a is defined as replacing the elements a[i] and a[i+1] with max(a[i],a[i+1]).
Each reduction operation has a cost defined as max(a[i],a[i+1]). After n-1 reductions a sequence of length 1 is obtained.
Now our goal is to print the cost of the optimal reduction of the given sequence a such that the resulting sequence of length 1 has the minimum cost.
e.g.:
1
2
3
Output :
5
An O(N^2) solution is trivial. Any ideas?
EDIT1:
People are asking about my idea, so my idea was to traverse through the sequence pairwise and for each pair check cost and in the end reduce the pair with least cost.
1 2 3
2 3 <=== Cost is 2
So reduce above sequence to
2 3
now again traverse through sequence, we get cost as 3
2 3
3 <=== Cost is 3
So total cost is 2+3=5
Above algorithm is of O(N^2). That is why I was asking for some more optimized idea.
O(n) solution:
High-level:
The basic idea is to repeatedly merge any element e smaller than both its neighbours ns and nl with its smallest neighbour ns. This produces the minimal cost because both the cost and result of merging is max(a[i],a[i+1]), which means no merge can make an element smaller than it currently is, thus the cheapest possible merge for e is with ns, and that merge can't increase the cost of any other possible merges.
This can be done with a one pass algorithm by keeping a stack of elements from our array in decreasing order. We compare the current element to both its neighbours (one being the top of the stack) and perform appropriate merges until we're done.
Pseudo-code:
stack = empty
for pos = 0 to length
// stack.top > arr[pos] is implicitly true because of the previous iteration of the loop
if stack.top > arr[pos] > arr[pos+1]
stack.push(arr[pos])
else if stack.top > arr[pos+1] > arr[pos]
merge(arr[pos], arr[pos+1])
else while arr[pos+1] > stack.top > arr[pos]
merge(arr[pos], stack.pop)
Java code:
Stack<Integer> stack = new Stack<Integer>();
int cost = 0;
int arr[] = {10,1,2,3,4,5};
for (int pos = 0; pos < arr.length; pos++)
if (pos < arr.length-1 && (stack.empty() || stack.peek() >= arr[pos+1]))
if (arr[pos] > arr[pos+1])
stack.push(arr[pos]);
else
cost += arr[pos+1]; // merge pos and pos+1
else
{
int last = Integer.MAX_VALUE; // required otherwise a merge may be missed
while (!stack.empty() && (pos == arr.length-1 || stack.peek() < arr[pos+1]))
{
last = stack.peek();
cost += stack.pop(); // merge stack.pop() and pos or the last popped item
}
if (last != Integer.MAX_VALUE)
{
int costTemp = Integer.MAX_VALUE;
if (!stack.empty())
costTemp = stack.peek();
if (pos != arr.length-1)
costTemp = Math.min(arr[pos+1], costTemp);
cost += costTemp;
}
}
System.out.println(cost);
I am confused if you mean by "cost" of reduction "computational cost" i.e. an operation taking time max(a[i],a[i+1]) or simply something you want to calculate. If it is the latter, then the following algorithm is better than O(n^2):
sort the list, or more precise, define b[i] s.t. a[b[i]] is the sorted list: O(n) if you can use RADIX sort, O(n log n) otherwise.
starting from the second-lowest item i in the sorted list: if left/right is lower than i, then perform reduction: O(1) for each item, update list from 2, O(n) in total.
I have no idea if that is the optimal solution, but it's O(n) for integers and O(n log n), otherwise.
edit: Realized that removing a precomputing step made it much simpler
If you don't consider it cheating to sort the list, then do it in n log n time and then merge the first two entries recursively. The total cost in this case will be the sum of the entries minus the smallest entry. This is optimal since
the cost will be the sum of n-1 entries (with repeats allowed)
the ith smallest entry can appear at most i-1 times in the cost function
The same fundamental idea works even if the list isn't sorted. An optimal solution is to merge the smallest element with its smallest neighbor. To see that this is optimal, note that
the cost will be the sum of n-1 entries (with repeats allowed)
entry a_i can appear at most j-1 times in the cost function, where j is the length of the longest consecutive subsequence containing a_i such that a_i is the maximum element of the subsequence
In the worst case, the sequence is decreasing and the time is O(n^2).
Greedy approach indeed works.
You can always reduce the smallest number with its smaller neighbor.
Proof: we have to reduce smallest number at some point. Any reduction of a neighbor will make the value of neighbor at least the same(possibly) bigger, so operation that reduces minimal element a[i] will always have cost c>=min(a[i-1], a[i+1])
Now we need to
quickly find/remove smallest number
find its neigbors
I'd go with 2 RMQs on that. Doing operation 2 as a binary search. Which gives us O(N * log^2(N))
EDIT: first RMQ - values. When you remove an element put some big value there
second RMQ - "presence". 0 or 1 (value is there/isn't there). To find a [for example] left neighbor of a[i], you need to find the greatest l, that sum[l,i-1] = 1.

Variant of rod cutting

You are given a wooden stick of length X with m markings on it at arbitrary places (integral), and the markings suggest where the cuts are to be made accordingly. For chopping a L-length stick
into two pieces, the carpenter charges L dollars (does not matter whether the two pieces are of equal length or not, i.e, the chopping cost is independent of where the chopping point is).
Design a dynamic programming algorithm that calculates the minimum overall cost.
Couldn't figure out the recurrence.
Was asked this in a recent programming interview.
With m marks, you have m+2 interesting points, 0 = left end-point, marks 1, ..., m, right end-point = (m+1).
Store the distance from interesting point 0 to interesting point i in an array to calculate costs.
Edit: (Doh, gratuitously introduced an unnecessary loop, noticed after seeing Per's answer again)
For each 0 <= l < r <= m+1, let cost[l][r] be the lowest cost for completely chopping the piece between points l and r. The solution is cost[0][m+1].
// Setup
int pos[m+2];
pos[0] = 0; pos[m+1] = X;
for(i = 1; i <= m; ++i){
pos[i] = position of i-th mark;
}
int cost[m+2][m+2] = {0};
for(l = 0; l < m; ++l){
// for pieces with only one mark, there's no choice, cost is length
cost[l][l+2] = pos[l+2]-pos[l];
}
// Now the dp
for(d = 3; d <= m+1; ++d){ // for increasing numbers of marks between left and right
for(l = 0; l <= m+1-d; ++l){ // for all pieces needing d-1 cuts
// what would it cost if we first chop at the left most mark?
best_found = cost[l+1][l+d];
for(i = l+2; i < l+d; ++i){ // for all choices of first cut, ssee if it's cheaper
if (cost[l][i] + cost[i][l+d] < best_found){
best_found = cost[l][i] + cost[i][l+d];
}
}
// add cost of first chop
cost[l][i][l+d] = (pos[l+d] - pos[l]) + best_found;
}
}
return cost[0][m+1];
Complexity: If you're naively checking all possible ways to chop, that'd make m! ways. Very bad.
Taking into account that after any cut it doesn't matter whether you first completely chop the left part, then the right or interleave the chopping of the two parts, the complexity is (for m >= 2) reduced to 2*3^(m-2). Still very bad.
For our dp:
innermost loop, looping i; d-1 (l < i < l+d)
loop over l: m+2-d (0 <= l <= m+1-d), makes (m+2-d)*(d-1)
outermost loop, 3 <= d <= m+1, makes roughly m^3/6 steps.
Okay, O(m^3) isn't everybody's dream, but it's the best I could quickly come up with (after some inspiration from Per's post made me notice a prior inefficiency).
Compute, for every pair of (marking|endpoint)s, the cheapest way to cut that segment of the rod. For each segment, minimize over the choice of first cuts in that segment.
I am guessing that you want the carpenter to cut once, then keep cutting until the sticks are all cut, and you are asking for the order in which to make the cuts?
in that case, one way to do this is to do a depth-first recursive search through the tree of possible combinations, where you count the cost as you go, record the first sequence and its cost, and from that point forward, avoid descending into trees where the cost is higher and always record any cheaper sequences that you find.

Resources