Difficulty in thinking a divide and conquer approach - algorithm

I am self-learning algorithms. As we know Divide and Conquer is one of the algorithm design paradigms. I have studied mergeSort, QuickSort, Karatsuba Multiplication, counting inversions of an array as examples of this particular design pattern. Although it sounds very simple, divides the problems into subproblems, solves each subproblem recursively, and merges the result of each of them, I found it very difficult to develop an idea of how to apply that logic to a new problem. To my understanding, all those above-mentioned canonical examples come up with a very clever trick to solve the problem. For example, I am trying to solve the following problem:
Given a sequence of n numbers such that the difference between two consecutive numbers is constant, find the missing term in logarithmic time.
Example: [5, 7, 9, 11, 15]
Answer: 13
First, I came up with the idea that it can be solved using the divide and conquer approach as the naive approach will take O(n) time. From my understanding of divide and conquer, this is how I approached:
The original problem can be divided into two independent subproblems. I can search for the missing term in the two subproblems recursively. So, I first divide the problem.
leftArray = [5,7,9]
rightArray = [11, 15]
Now it says, I need to solve the subproblems recursively until it becomes trivial to solve. In this case, the subproblem becomes of size 1. If there is only one element, there are 0 missing elements. Now to combine the result. But I am not sure how to do it or how it will solve my original problem.
Definitely, I am missing something crucial here. My question is how to approach when solving this type of divide and conquer problem. Should I come up with a trick like a mergeSort or QuickSort? The more I see the solution to this kind of problem, it feels I am memorizing the approach to solve, not understanding and each problem solves it differently. Any help or suggestion regarding the mindset when solving divide and conquer would be greatly appreciated. I have been trying for a long time to develop my algorithmic skill but I improved very little. Thanks in advance.

You have the right approach. The only missing part is an O(1) way to decide which side you are discarding.
First, note that the numbers in your problem must be ordered, otherwise you can't do better than O(n). There also needs to be at least three numbers, otherwise you wouldn't figure out the "step".
With this understanding in place, you can determine the "step" in O(1) time by examining the initial three terms, and see what's the difference between the consecutive ones. Two outcomes are possible:
Both differences are the same, and
One difference is twice as big as the other.
Case 2 hands you a solution by luck, so we will consider only the first case from now on. With the step in hand, you can determine if the range has a gap in it by subtracting the endpoints, and comparing the result to the number of gaps times the step. If you arrive at the same result, the range does not have a missing term, and can be discarded. When both halves can be discarded, the gap is between them.

As #Sergey Kalinichenko points out, this assumes the incoming set is ordered
However, if you're certain the input is ordered (which is likely in this case) observe the nth position's value to be start + jumpsize * index; this allows you to bisect to find where it shifts
Example: [5, 7, 9, 11, 15]
Answer: 13
start = 5
jumpsize = 2
check midpoint: 5 * 2 * 2 -> 9
this is valid, so the shift must be after the midpoint
recurse
You can find the jumpsize by checking the first 3 values
a, b, c = (language-dependent retrieval)
gap1 = b - a
gap2 = c - b
if gap1 != gap2:
if (value at 4th index) - c == gap1:
missing value is b + gap1 # 2nd gap doesn't match
else:
missing value is a + gap2 # 1st gap doesn't match
bisect remaining values

Related

Find minimum steps to convert all elements to zero

You are given an array of positive integers of size N. You can choose any positive number x such that x<=max(Array) and subtract it from all elements of the array greater than and equal to x.
This operation has a cost A[i]-x for A[i]>=x. The total cost for a particular step is the
sum(A[i]-x). A step is only valid if the sum(A[i]-x) is less than or equal to a given number K.
For all the valid steps find the minimum number of steps to make all elements of the array zero.
0<=i<10^5
0<=x<=10^5
0<k<10^5
Can anybody help me with any approach? DP will not work due to high constraints.
Just some general exploratory thoughts.
First, there should be a constraint on N. If N is 3, this is much easier than if it is 100. The naive brute force approach is going to be O(k^N)
Next, you are right that DP will not work with these constraints.
For a greedy approach, I would want to minimize the number of distinct non-zero values, and not maximize how much I took. Our worst case approach is take out the largest each time, for N steps. If you can get 2 pairs of entries to both match, then that shortened our approach.
The obvious thing to try if you can is an A* search. However that requires a LOWER bound (not upper). The best naive lower bound that I can see is ceil(log_2(count_distinct_values)). Unless you're incredibly lucky and the problem can be solved that quickly, this is unlikely to narrow your search enough to be helpful.
I'm curious what trick makes this problem actually doable.
I do have an idea. But it is going to take some thought to make it work. Naively we want to take each choice for x and explore the paths that way. And this is a problem because there are 10^5 choices for x. After 2 choices we have a problem, and after 3 we are definitely not going to be able to do it.
BUT instead consider the possible orders of the array elements (with ties both possible and encouraged) and the resulting inequalities on the range of choices that could have been made. And now instead of having to store a 10^5 choices of x we only need store the distinct orderings we get, and what inequalities there are on the range of choices that get us there. As long as N < 10, the number of weak orderings is something that we can deal with if we're clever.
It would take a bunch of work to flesh out this idea though.
I may be totally wrong, and if so, please tell me and I'm going to delete my thoughts: maybe there is an opportunity if we translate the problem into another form?
You are given an array A of positive integers of size N.
Calculate the histogram H of this array.
The highest populated slot of this histogram has index m ( == max(A)).
Find the shortest sequence of selections of x for:
Select an index x <= m which satisfies sum(H[i]*(i-x)) <= K for i = x+1 .. m (search for suitable x starts from m down)
Add H[x .. m] to H[0 .. m-x]
Set the new m as the highest populated index in H[0 .. x-1] (we ignore everything from H[x] up)
Repeat until m == 0
If there is only a "good" but not optimal solution sought for, I could imagine that some kind of spectral analysis of H could hint towards favorable x selections so that maxima in the histogram pile upon other maxima in the reduction step.

Big O runtime analysis for 3-way recursion with memoization

I'm doing some practice interview questions and came across this one:
Given a list of integers which represent hedge heights, determine the minimum number of moves to make the hedges pretty - that is, compute the minimum number of changes needed to make the array alternate between increasing and decreasing. For example, [1,6,6,4,4] should return 2 as you need to change the second 6 to something >6 and the last 4 to something <4. Assume the min height is 1 and the max height is 9. You can change to any number that is between 1 and 9, and that counts as 1 move regardless of the diff to the current number.
My solution is here: https://repl.it/#plusfuture/GrowlingOtherEquipment
I'm trying to figure out the big O runtime for this solution, which is memoized recursion. I think it's O(n^3) because for each index, I need to check against 3 possible states for the rest of the array, changeUp, noChange, and changeDown. My friend maintains that it's O(n) since I'm memoizing most of the solutions and exiting branches where the array is not "pretty" immediately.
Can someone help me understand how to analyze the runtime for this solution? Thanks.

Sorting algorithm based on subset inversion

I'm looking for a sorting algorithm based on subset inversion. It's like pancake sort, only instead of taking all the pancakes on top of the spatula, you can just invert any subset you want. Length of the subset doesn't matter.
Like this:
http://www.yourgenome.org/sites/default/files/illustrations/diagram/dna_mutations_inversion_yourgenome.png
So we can't simply swap numbers without inverting everything in between.
We're doing this to determine how one subspecies of fruitfly can mutate into the other. Both have the same genes but in a different order. The second subspecies' genome is 'sorted', i.e. the gene numbers are 1-25. The first subspecies genome is unsorted. Hence, we're looking for a sorting algorithm.
This is the "genome" we're looking at (though we should be able to have this work on all lists of numbers):
[23, 1, 2, 11, 24, 22, 19, 6, 10, 7, 25, 20, 5, 8, 18, 12, 13, 14, 15, 16, 17, 21, 3, 4, 9];
We're looking at two separate problems:
1) To sort a list of 25 numbers with the least amount of inversions
2) To sort a list of 25 numbers with the least amount of numbers moved
We also want to establish both upper and lower bounds for both.
We've already found a way to sort like this by just going from left to right, searching for the next lowest value and inverting everything in between, but we're absolutely certain we should be able to do this faster. However, we still haven't found any other methods so I'm asking for your help!
UPDATE: the method we currently use is based on the above method
but instead works both ways. It looks at the next elements needed
for both ends (e.g. 1 and 25 at the beginning) and then calculates
which inversion would be cheapest. All values at the ends can be
ignored for the rest of the algorithm because they get put into the
correct place immediately. Our first method took 18/19 steps and 148
genes, and this one does it in 17 steps and 101 genes. For both
optimalisation tactics (the two mentioned above), this is a better
method. It is however not cheaper in terms of code and processing.
Right now, we're working in Python because we have most experience with that, but I'd be happy with any pseudocode ideas on how we can more efficiently tackle this. If you think another language might be better suited, please let me know. Pseudocode, ideas, thoughts and actual code are all welcome!
Thanks in advance!
Regarding the first question: Do you know (and care about) which of the two strands the genes are on?
If so, you're in luck: This is called the inversion distance between signed permutations problem, and there is a linear-time algorithm for it: http://www.ncbi.nlm.nih.gov/pubmed/11694179. I haven't looked at the details.
If not, then unfortunately (as described on p. 2 of that paper) the problem is NP-hard, so it's very unlikely that any algorithm exists that is efficient (polynomial-time) in the worst case.
Regarding the second question: Assuming you mean that you want to find the minimum number of swaps needed to sort a list of numbers, you should be able to find solutions to this by searching here on SO and elsewhere. I think this is a clear and concise explanation. You can also use the optimal solution to this problem to get an upper bound for your first question: Any swap of positions i and j can be simulated using the two interval reversals (i, j) and (i+1, j-1). (This upper bound might be very bad, though, and in particular could be worse than your existing greedy algorithm.)
I think what you're looking for for the second question is the minimum number of swaps of adjacent elements to sort a sequence, which is equal to the number of inversions in the sequence (where a[i] > a[j] and i < j).
The first question seems quite a bit more complicated to me. One potential heuristic might be to think of the subset inversion as similar to the adjacent swap of more than one element. For example, if you've managed to get a sequence to this position,
5,6,1,2,3,4,7,8
we can "adjacent swap" indexes [0,1] with [2,3] (so inverting [0,1,2,3]),
2,1,6,5,3,4,7,8
and then [2,3] with [4,5] (inverting [2,3,4,5]),
2,1,4,3,5,6,7,8
and arrive at a sequence that now has significantly less element inversions, meaning less single adjacent swaps are needed to now complete the sort.
So maybe attempting to quantify inversions (in the sense of a[i] > a[j] and i < j) of sections rather than single elements could help move in the direction of estimating or building a method for the first question.

Dynamic algorithm to multiply elements in a sequence two at a time and find the total

I am trying to find a dynamic approach to multiply each element in a linear sequence to the following element, and do the same with the pair of elements, etc. and find the sum of all of the products. Note that any two elements cannot be multiplied. It must be the first with the second, the third with the fourth, and so on. All I know about the linear sequence is that there are an even amount of elements.
I assume I have to store the numbers being multiplied, and their product each time, then check some other "multipliable" pair of elements to see if the product has already been calculated (perhaps they possess opposite signs compared to the current pair).
However, by my understanding of a linear sequence, the values must be increasing or decreasing by the same amount each time. But since there are an even amount of numbers, I don't believe it is possible to have two "multipliable" pairs be the same (with potentially opposite signs), due to the issue shown in the following example:
Sequence: { -2, -1, 0, 1, 2, 3 }
Pairs: -2*-1, 0*1, 2*3
Clearly, since there are an even amount of pairs, the only case in which the same multiplication may occur more than once is if the elements are increasing/decreasing by 0 each time.
I fail to see how this is a dynamic programming question, and if anyone could clarify, it would be greatly appreciated!
A quick google for define linear sequence gave
A number pattern which increases (or decreases) by the same amount each time is called a linear sequence. The amount it increases or decreases by is known as the common difference.
In your case the common difference is 1. And you are not considering any other case.
The same multiplication may occur in the following sequence
Sequence = {-3, -1, 1, 3}
Pairs = -3 * -1 , 1 * 3
with a common difference of 2.
However this is not necessarily to be solved by dynamic programming. You can just iterate over the numbers and store the multiplication of two numbers in a set(as a set contains unique numbers) and then find the sum.
Probably not what you are looking for, but I've found a closed solution for the problem.
Suppose we observe the first two numbers. Note the first number by a, the difference between the numbers d. We then count for a total of 2n numbers in the whole sequence. Then the sum you defined is:
sum = na^2 + n(2n-1)ad + (4n^2 - 3n - 1)nd^2/3
That aside, I also failed to see how this is a dynamic problem, or at least this seems to be a problem where dynamic programming approach really doesn't do much. It is not likely that the sequence will go from negative to positive at all, and even then the chance that you will see repeated entries decreases the bigger your difference between two numbers is. Furthermore, multiplication is so fast the overhead from fetching them from a data structure might be more expensive. (mul instruction is probably faster than lw).

Tricky programming problem that I'm having trouble getting my head around

First off, let me say that this is not homework (I am an A-Level student, this is nothing close to what we problem solve (this is way harder)), but more of a problem I'm trying to suss out to improve my programming logic.
I thought of a scenario where there is an array of random integers, let's for example say 10 integers. The user will input a number he wants to count to, and the algorithm will try and work out what numbers are needed to make that sum. For example if I wanted to make the sum 44 from this array of integers:
myIntegers = array(1, 5, 9, 3, 7, 12, 36, 22, 19, 63);
The output would be:
36 + 3 + 5 = 44
Or something along those lines. I hope I make myself clear. As an added bonus I would like to make the algorithm pick as few numbers as possible to make the required sum, or give out an error if the sum cannot be made with the numbers supplied.
I thought about using recursion and iterating through the array, adding numbers over and over until the sum is met or gone past. But what I can't get my head around is what to do if the algorithm goes past the sum and needs to be selective about what numbers to pick from the array.
I'm not looking for complete code, or a complete algorithm, I just want your opinions on how I should proceed with this and perhaps share a few tips or something. I'll probably start work on this tonight. :P
As I said, not homework. Just me wanting to do something a bit more advanced.
Thanks for any help you're able to offer. :)
You are looking at the Knapsack Problem
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.
Edit: Your special case is the Subset Sum Problem
Will subset sum do? ;]
This is the classic Knapsack problem that you would see in college level algorithms course (or at least I saw it then). Best to work this out on paper and the solution in code should be relatively easy to work out.
EDIT: One thing to consider is dynamic programming.
Your Problem is related to the subset sum problem.
You have to try all possible combinations in the worst case.
No shortcuts here I'm afraid. In addition to what other people have said, about what specific problem this is etc., here's some practical advice to offer you a starting point:
I would sort the array and given the input sum m, would find the first number in the array less than m, call it n (this is your first possible number for the sum), and start from the highest possible complement (m-n), working your way down.
If you don't find a precise match, pick the highest available, call it o, (that now is your 2nd number) and look for the 3rd one starting from (m-n-o) and work your way down again.
If you don't find a precise match, start with the next number n (index of original n at index-1) and do the same. You can keep doing this until you find a precise match for two numbers. If no match for the sum is found for two numbers, start the process again, but expand it to include a 3rd number. And so on.
That could be done recursively. At least this approach ensures that when you find a match, it will be the one with the least possible numbers in the set forming the total input sum.
Potentially though, worst case, you end up going through the whole lot.
Edit: As Venr correctly points out, my first approach was incorrect. Edited approach to reflect this.
There is a very efficient randomized algorithm for this problem. I know you already accepted an answer, but I'm happy to share anyway, I just hope people will still check this question :).
Let Used = list of numbers that you sum.
Let Unused = list of numbers that you DON'T sum.
Let tmpsum = 0.
Let S = desired sum you want to reach.
for ( each number x you read )
toss a coin:
if it's heads and tmpsum < S
add x to Used
else
add x to Unused
while ( tmpsum != S )
if tmpsum < S
MOVE one random number from Unused to Used
else
MOVE one random number from Used to Unused
print the Used list, containing the numbers you need to add to get S
This will be much faster than the dynamic programming solution, especially for random inputs. The only problems are that you cannot reliably detect when there is no solution (you could let the algorithm run for a few seconds and if it doesn't finish, assume there is no solution) and that you cannot be sure you will get the solution with minimum number of elements chosen. Again, you could add some logic to make the algorithm keep going and trying to find a solution with less elements until certain stop conditions are met, but this will make it slower. However, if you are only interested in a solution that works and you have a LOT of numbers and the desired sum can be VERY big, this is probably better than the DP algorithm.
Another advantage of this approach is that it will also work for negative and rational numbers with no modifications, which is not true for the DP solution, because the DP solution involves using partial sums as array indexes, and indexes can only be natural numbers. You can of course use hashtables for example, but that will make the DP solution even slower.
I don't know exactly what's this task is called, but it seems that it's kind of http://en.wikipedia.org/wiki/Knapsack_problem.
Heh, I'll play the "incomplete specification" card (nobody said that numbers couldn't appear more than once!) and reduce this to the "making change" problem. Sort your numbers in decreasing order, find the first one less than your desired sum, then subtract that from your sum (division and remainders could speed this up). Repeat until sum = 0 or no number less than the sum is found.
For completeness, you would need to keep track of the number of addends in each sum, and of course generate the additional sequences by keeping track of the first number you use, skipping that, and repeating the process with the additional numbers. This would solve the (7 + 2 + 1) over (6 + 4) problem.
Repeating the answer of others: it is a Subset Sum problem.
It could be efficiently solved by Dynamic Programing technique.
The following has not been mentioned yet: the problem is Pseudo-P (or NP-Complete in weak sense).
Existence of an algorithm (based on dynamic programming) polynomial in S (where S is the sum) and n (the number of elements) proves this claim.
Regards.
Ok, I wrote a C++ program to solve the above problem. The algorithm is simple :-)
First of all arrange whatever array you have in descending order(I have hard-coded the array in descending form but you may apply any of the sorting algorithms ).
Next I took three stacks n, pos and sum. The first one stores the number for which a possible sum combination is to be found, the second holds the index of the array from where to start the search, the third stores the elements whose addition will give you the number you enter.
The function looks for the largest number in the array which is smaller than or equal to the number entered. If it is equal, it simply pushes the number onto the sum stack. If not, then it pushes the encountered array element to the sum stack(temporarily), and finds the difference between the number to search for and number encountered, and then it performs recursion.
Let me show an example:-
to find 44 in {63,36,22,19,12,9,7,5,3,1}
first 36 will be pushed in sum(largest number less than 44)
44-36=8 will be pushed in n(next number to search for)
7 will be pushed in sum
8-7=1 will be pushed in n
1 will be pushed in sum
thus 44=36+7+1 :-)
#include <iostream>
#include<conio.h>
using namespace std;
int found=0;
void func(int n[],int pos[],int sum[],int arr[],int &topN,int &topP,int &topS)
{
int i=pos[topP],temp;
while(i<=9)
{
if(arr[i]<=n[topN])
{
pos[topP]=i;
topS++;
sum[topS]=arr[i];
temp=n[topN]-arr[i];
if(temp==0)
{
found=1;
break;
}
topN++;
n[topN]=temp;
temp=pos[topP]+1;
topP++;
pos[topP]=temp;
break;
}
i++;
}
if(i==10)
{
topP=topP-1;
topN=topN-1;
pos[topP]+=1;
topS=topS-1;
if(topP!=-1)
func(n,pos,sum,arr,topN,topP,topS);
}
else if(found!=1)
func(n,pos,sum,arr,topN,topP,topS);
}
main()
{
int x,n[100],pos[100],sum[100],arr[10]={63,36,22,19,12,9,7,5,3,1},topN=-1,topP=-1,topS=-1;
cout<<"Enter a number: ";
cin>>x;
topN=topN+1;
n[topN]=x;
topP=topP+1;
pos[topP]=0;
func(n,pos,sum,arr,topN,topP,topS);
if(found==0)
cout<<"Not found any combination";
else{
cout<<"\n"<<sum[0];
for(int i=1;i<=topS;i++)
cout<<" + "<<sum[i];
}
getch();
}
You can copy the code and paste it in your IDE, works fine :-)

Resources