Related
Most of the mergesort implementations I see are similar to this. intro to algorithms book along with online implentations I search for. My recursion chops don't go much further than messing with Fibonacci generation (which was simple enough) so maybe it's the multiple recursions blowing my mind, but I can't even step through the code and understand whats going on even before I even hit the merge function.
How is it stepping through this? Is there some strategy or reading I should undergo to better understand the process here?
void mergesort(int *a, int*b, int low, int high)
{
int pivot;
if(low<high)
{
pivot=(low+high)/2;
mergesort(a,b,low,pivot);
mergesort(a,b,pivot+1,high);
merge(a,b,low,pivot,high);
}
}
and the merge(although frankly I'm mentally stuck before I even get to this part)
void merge(int *a, int *b, int low, int pivot, int high)
{
int h,i,j,k;
h=low;
i=low;
j=pivot+1;
while((h<=pivot)&&(j<=high))
{
if(a[h]<=a[j])
{
b[i]=a[h];
h++;
}
else
{
b[i]=a[j];
j++;
}
i++;
}
if(h>pivot)
{
for(k=j; k<=high; k++)
{
b[i]=a[k];
i++;
}
}
else
{
for(k=h; k<=pivot; k++)
{
b[i]=a[k];
i++;
}
}
for(k=low; k<=high; k++) a[k]=b[k];
}
MERGE SORT:
1) Split the array in half
2) Sort the left half
3) Sort the right half
4) Merge the two halves together
I think the "sort" function name in MergeSort is a bit of a misnomer, it should really be called "divide".
Here is a visualization of the algorithm in process.
Each time the function recurses, it's working on a smaller and smaller subdivision of the input array, starting with the left half of it. Each time the function returns from recursion, it will continue on and either start working on the right half, or recurse up again and work on a larger half.
Like this
[************************]mergesort
[************]mergesort(lo,mid)
[******]mergesort(lo,mid)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort*(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[******]mergesort(mid+1,hi)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[************]merge
[************]mergesort(mid+1,hi)
[******]mergesort(lo,mid)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[******]mergesort(mid+1,hi)
[***]mergesort(lo,mid)
[**]mergesort*(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[************]merge
[************************]merge
An obvious thing to do would be to try this merge sort on a small array, say size 8 (power of 2 is convenient here), on paper. Pretend you are a computer executing the code, and see if it starts to become a bit clearer.
Your question is a bit ambiguous because you don't explain what you find confusing, but it sounds like you are trying to unroll the recursive calls in your head. Which may or may not be a good thing, but I think it can easily lead to having too much in your head at once. Instead of trying to trace the code from start to end, see if you can understand the concept abstractly. Merge sort:
Splits the array in half
Sorts the left half
Sorts the right half
Merges the two halves together
(1) should be fairly obvious and intuitive to you. For step (2) the key insight is this, the left half of an array... is an array. Assuming your merge sort works, it should be able to sort the left half of the array. Right? Step (4) is actually a pretty intuitive part of the algorithm. An example should make it trivial:
at the start
left: [1, 3, 5], right: [2, 4, 6, 7], out: []
after step 1
left: [3, 5], right: [2, 4, 6, 7], out: [1]
after step 2
left: [3, 5], right: [4, 6, 7], out: [1, 2]
after step 3
left: [5], right: [4, 6, 7], out: [1, 2, 3]
after step 4
left: [5], right: [6, 7], out: [1, 2, 3, 4]
after step 5
left: [], right: [6, 7], out: [1, 2, 3, 4, 5]
after step 6
left: [], right: [7], out: [1, 2, 3, 4, 5, 6]
at the end
left: [], right: [], out: [1, 2, 3, 4, 5, 6, 7]
So assuming that you understand (1) and (4), another way to think of merge sort would be this. Imagine someone else wrote mergesort() and you're confident that it works. Then you could use that implementation of mergesort() to write:
sort(myArray)
{
leftHalf = myArray.subArray(0, myArray.Length/2);
rightHalf = myArray.subArray(myArray.Length/2 + 1, myArray.Length - 1);
sortedLeftHalf = mergesort(leftHalf);
sortedRightHalf = mergesort(rightHalf);
sortedArray = merge(sortedLeftHalf, sortedRightHalf);
}
Note that sort doesn't use recursion. It just says "sort both halves and then merge them". If you understood the merge example above then hopefully you see intuitively that this sort function seems to do what it says... sort.
Now, if you look at it more carefully... sort() looks pretty much exactly like mergesort()! That's because it is mergesort() (except it doesn't have base cases because it's not recursive!).
But that's how I like thinking of recursive functions--assume that the function works when you call it. Treat it as a black box that does what you need it to. When you make that assumption, figuring out how to fill in that black box is often easy. For a given input, can you break it down into smaller inputs to feed to your black box? After you solve that, the only thing that's left is handling the base cases at the start of your function (which are the cases where you don't need to make any recursive calls. For example, mergesort([]) just returns an empty array; it doesn't make a recursive call to mergesort()).
Finally, this is a bit abstract, but a good way to understand recursion is actually to write mathematical proofs using induction. The same strategy used to write an proof by induction is used to write a recursive function:
Math proof:
Show the claim is true for the base cases
Assume it is true for inputs smaller than some n
Use that assumption to show that it is still true for an input of size n
Recursive function:
Handle the base cases
Assume that your recursive function works on inputs smaller than some n
Use that assumption to handle an input of size n
Concerning the recursion part of the merge sort, I've found this page to be very very helpful. You can follow the code as it's being executed. It shows you what gets executed first, and what follows next.
Tom
the mergesort() simply divides the array in two halves until the if condition fails that is low < high. As you are calling mergesort() twice : one with low to pivot and second with pivot+1 to high, this will divide the sub arrays even more further.
Lets take an example :
a[] = {9,7,2,5,6,3,4}
pivot = 0+6/2 (which will be 3)
=> first mergesort will recurse with array {9,7,2} : Left Array
=> second will pass the array {5,6,3,4} : Right Array
It will repeat until you have 1 element in each left as well as right array.
In the end you'll have something similar to this :
L : {9} {7} {2} R : {5} {6} {3} {4} (each L and R will have further sub L and R)
=> which on call to merge will become
L(L{7,9} R{2}) : R(L{5,6} R{3,4})
As you can see that each sub array are getting sorted in the merge function.
=> on next call to merge the next L and R sub arrays will get in order
L{2,7,9} : R{3,4,5,6}
Now both L and R sub array are sorted within
On last call to merge they'll be merged in order
Final Array would be sorted => {2,3,4,5,6,7,9}
See the merging steps in answer given by #roliu
My apologies if this has been answered this way. I acknowledge that this is just a sketch, rather than a deep explanation.
While it is not obvious to see how the actual code maps to the recursion, I was able to understand the recursion in a general sense this way.
Take a the example unsorted set {2,9,7,5} as input. The merge_sort algorithm is denoted by "ms" for brevity below. Then we can sketch the operation as:
step 1: ms( ms( ms(2),ms(9) ), ms( ms(7),ms(5) ) )
step 2: ms( ms({2},{9}), ms({7},{5}) )
step 3: ms( {2,9}, {5,7} )
step 4: {2,5,7,9}
It is important to note that merge_sort of a singlet (like {2}) is simply the singlet (ms(2) = {2}), so that at the deepest level of recursion we get our first answer. The remaining answers then tumble like dominoes as the interior recursions finish and are merged together.
Part of the genius of the algorithm is the way it builds the recursive formula of step 1 automatically through its construction. What helped me was the exercise of thinking how to turn step 1 above from a static formula to a general recursion.
Trying to work out each and every step of a recursion is often not an ideal approach, but for beginners, it definitely helps to understand the basic idea behind recursion, and also to get better at writing recursive functions.
Here's a C solution to Merge Sort :-
#include <stdio.h>
#include <stdlib.h>
void merge_sort(int *, unsigned);
void merge(int *, int *, int *, unsigned, unsigned);
int main(void)
{
unsigned size;
printf("Enter the no. of integers to be sorted: ");
scanf("%u", &size);
int * arr = (int *) malloc(size * sizeof(int));
if (arr == NULL)
exit(EXIT_FAILURE);
printf("Enter %u integers: ", size);
for (unsigned i = 0; i < size; i++)
scanf("%d", &arr[i]);
merge_sort(arr, size);
printf("\nSorted array: ");
for (unsigned i = 0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
free(arr);
return EXIT_SUCCESS;
}
void merge_sort(int * arr, unsigned size)
{
if (size > 1)
{
unsigned left_size = size / 2;
int * left = (int *) malloc(left_size * sizeof(int));
if (left == NULL)
exit(EXIT_FAILURE);
for (unsigned i = 0; i < left_size; i++)
left[i] = arr[i];
unsigned right_size = size - left_size;
int * right = (int *) malloc(right_size * sizeof(int));
if (right == NULL)
exit(EXIT_FAILURE);
for (unsigned i = 0; i < right_size; i++)
right[i] = arr[i + left_size];
merge_sort(left, left_size);
merge_sort(right, right_size);
merge(arr, left, right, left_size, right_size);
free(left);
free(right);
}
}
/*
This merge() function takes a target array (arr) and two sorted arrays (left and right),
all three of them allocated beforehand in some other function(s).
It then merges the two sorted arrays (left and right) into a single sorted array (arr).
It should be ensured that the size of arr is equal to the size of left plus the size of right.
*/
void merge(int * arr, int * left, int * right, unsigned left_size, unsigned right_size)
{
unsigned i = 0, j = 0, k = 0;
while ((i < left_size) && (j < right_size))
{
if (left[i] <= right[j])
arr[k++] = left[i++];
else
arr[k++] = right[j++];
}
while (i < left_size)
arr[k++] = left[i++];
while (j < right_size)
arr[k++] = right[j++];
}
Here's the step-by-step explanation of the recursion :-
Let arr be [1,4,0,3,7,9,8], having the address 0x0000.
In main(), merge_sort(arr, 7) is called, which is the same as merge_sort(0x0000, 7).
After all of the recursions are completed, arr (0x0000) becomes [0,1,3,4,7,8,9].
| | |
| | |
| | |
| | |
| | |
arr - 0x0000 - [1,4,0,3,7,9,8] | | |
size - 7 | | |
| | |
left = malloc() - 0x1000a (say) - [1,4,0] | | |
left_size - 3 | | |
| | |
right = malloc() - 0x1000b (say) - [3,7,9,8] | | |
right_size - 4 | | |
| | |
merge_sort(left, left_size) -------------------> | arr - 0x1000a - [1,4,0] | |
| size - 3 | |
| | |
| left = malloc() - 0x2000a (say) - [1] | |
| left_size = 1 | |
| | |
| right = malloc() - 0x2000b (say) - [4,0] | |
| right_size = 2 | |
| | |
| merge_sort(left, left_size) -------------------> | arr - 0x2000a - [1] |
| | size - 1 |
| left - 0x2000a - [1] <-------------------------- | (0x2000a has only 1 element) |
| | |
| | |
| merge_sort(right, right_size) -----------------> | arr - 0x2000b - [4,0] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000a (say) - [4] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000b (say) - [0] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000a - [4]
| | | size - 1
| | left - 0x3000a - [4] <-------------------------- | (0x3000a has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000b - [0]
| | | size - 1
| | right - 0x3000b - [0] <------------------------- | (0x3000b has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000b, 0x3000a, 0x3000b, 1, 1) |
| right - 0x2000b - [0,4] <----------------------- | (0x2000b is now sorted) |
| | |
| | free(left) (0x3000a is now freed) |
| | free(right) (0x3000b is now freed) |
| | |
| | |
| merge(arr, left, right, left_size, right_size) | |
| i.e. merge(0x1000a, 0x2000a, 0x2000b, 1, 2) | |
left - 0x1000a - [0,1,4] <---------------------- | (0x1000a is now sorted) | |
| | |
| free(left) (0x2000a is now freed) | |
| free(right) (0x2000b is now freed) | |
| | |
| | |
merge_sort(right, right_size) -----------------> | arr - 0x1000b - [3,7,9,8] | |
| size - 4 | |
| | |
| left = malloc() - 0x2000c (say) - [3,7] | |
| left_size = 2 | |
| | |
| right = malloc() - 0x2000d (say) - [9,8] | |
| right_size = 2 | |
| | |
| merge_sort(left, left_size) -------------------> | arr - 0x2000c - [3,7] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000c (say) - [3] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000d (say) - [7] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000c - [3]
| left - [3,7] was already sorted, but | | size - 1
| that doesn't matter to this program. | left - 0x3000c - [3] <-------------------------- | (0x3000c has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000d - [7]
| | | size - 1
| | right - 0x3000d - [7] <------------------------- | (0x3000d has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000c, 0x3000c, 0x3000d, 1, 1) |
| left - 0x2000c - [3,7] <------------------------ | (0x2000c is now sorted) |
| | |
| | free(left) (0x3000c is now freed) |
| | free(right) (0x3000d is now freed) |
| | |
| | |
| merge_sort(right, right_size) -----------------> | arr - 0x2000d - [9,8] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000e (say) - [9] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000f (say) - [8] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000e - [9]
| | | size - 1
| | left - 0x3000e - [9] <-------------------------- | (0x3000e has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000f - [8]
| | | size - 1
| | right - 0x3000f - [8] <------------------------- | (0x3000f has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000d, 0x3000e, 0x3000f, 1, 1) |
| right - 0x2000d - [8,9] <----------------------- | (0x2000d is now sorted) |
| | |
| | free(left) (0x3000e is now freed) |
| | free(right) (0x3000f is now freed) |
| | |
| | |
| merge(arr, left, right, left_size, right_size) | |
| i.e. merge(0x1000b, 0x2000c, 0x2000d, 2, 2) | |
right - 0x1000b - [3,7,8,9] <------------------- | (0x1000b is now sorted) | |
| | |
| free(left) (0x2000c is now freed) | |
| free(right) (0x2000d is now freed) | |
| | |
| | |
merge(arr, left, right, left_size, right_size) | | |
i.e. merge(0x0000, 0x1000a, 0x1000b, 3, 4) | | |
(0x0000 is now sorted) | | |
| | |
free(left) (0x1000a is now freed) | | |
free(right) (0x1000b is now freed) | | |
| | |
| | |
| | |
I know this is an old question but wanted to throw my thoughts of what helped me understand merge sort.
There are two big parts to merge sort
Splitting of the array into smaller chunks (dividing)
Merging the array together (conquering)
The role of the recurison is simply the dividing portion.
I think what confuses most people is that they think there is a lot of logic in the splitting and determining what to split, but most of the actual logic of sorting happens on the merge. The recursion is simply there to divide and do the first half and then the second half is really just looping, copying things over.
I see some answers that mention pivots but I would recommend not associating the word "pivot" with merge sort because that's an easy way to confuse merge sort with quicksort (which is heavily reliant on choosing a "pivot"). They are both "divide and conquer" algorithms. For merge sort the division always happens in the middle whereas for quicksort you can be clever with the division when choosing an optimal pivot.
process to divide the problem into subproblems
Given example will help you understand recursion. int A[]={number of element to be shorted.}, int p=0; (lover index). int r= A.length - 1;(Higher index).
class DivideConqure1 {
void devide(int A[], int p, int r) {
if (p < r) {
int q = (p + r) / 2; // divide problem into sub problems.
devide(A, p, q); //divide left problem into sub problems
devide(A, q + 1, r); //divide right problem into sub problems
merger(A, p, q, r); //merger the sub problem
}
}
void merger(int A[], int p, int q, int r) {
int L[] = new int[q - p + 1];
int R[] = new int[r - q + 0];
int a1 = 0;
int b1 = 0;
for (int i = p; i <= q; i++) { //store left sub problem in Left temp
L[a1] = A[i];
a1++;
}
for (int i = q + 1; i <= r; i++) { //store left sub problem in right temp
R[b1] = A[i];
b1++;
}
int a = 0;
int b = 0;
int c = 0;
for (int i = p; i < r; i++) {
if (a < L.length && b < R.length) {
c = i + 1;
if (L[a] <= R[b]) { //compare left element<= right element
A[i] = L[a];
a++;
} else {
A[i] = R[b];
b++;
}
}
}
if (a < L.length)
for (int i = a; i < L.length; i++) {
A[c] = L[i]; //store remaining element in Left temp into main problem
c++;
}
if (b < R.length)
for (int i = b; i < R.length; i++) {
A[c] = R[i]; //store remaining element in right temp into main problem
c++;
}
}
When you call the recursive method it does not execute the real function at the same time it's stack into stack memory. And when condition not satisfied then it's going to next line.
Consider that this is your array:
int a[] = {10,12,9,13,8,7,11,5};
So your method merge sort will work like below:
mergeSort(arr a, arr empty, 0 , 7);
mergeSort(arr a, arr empty, 0, 3);
mergeSort(arr a, arr empty,2,3);
mergeSort(arr a, arr empty, 0, 1);
after this `(low + high) / 2 == 0` so it will come out of first calling and going to next:
mergeSort(arr a, arr empty, 0+1,1);
for this also `(low + high) / 2 == 0` so it will come out of 2nd calling also and call:
merger(arr a, arr empty,0,0,1);
merger(arr a, arr empty,0,3,1);
.
.
So on
So all sorting values store in empty arr.
It might help to understand the how recursive function works
After spending about 6-8 hours trying to digest the Manacher's algorithm, I am ready to throw in the towel. But before I do, here is one last shot in the dark: can anyone explain it? I don't care about the code. I want somebody to explain the ALGORITHM.
Here seems to be a place that others seemed to enjoy in explaining the algorithm:
http://www.leetcode.com/2011/11/longest-palindromic-substring-part-ii.html
I understand why you would want to transform the string, say, 'abba' to #a#b#b#a#
After than I'm lost. For example, the author of the previously mentioned website says the key part of the algorithm is:
if P[ i' ] ≤ R – i,
then P[ i ] ← P[ i' ]
else P[ i ] ≥ P[ i' ]. (Which we have to expand past
the right edge (R) to find P[ i ])
This seems wrong, because he/she says at one point that P[i] equals 5 when P[i'] = 7 and P[i] is not less or equal to R - i.
If you are not familiar with the algorithm, here are some more links: http://tristan-interview.blogspot.com/2011/11/longest-palindrome-substring-manachers.html (I've tried this one, but the terminology is awful and confusing. First, some things are not defined. Also, too many variables. You need a checklist to recall what variable is referring to what.)
Another is: http://www.akalin.cx/longest-palindrome-linear-time (good luck)
The basic gist of the algorithm is to find the longest palindrome in linear time. It can be done in O(n^2) with a minimum to medium amount of effort. This algorithm is supposed to be quite "clever" to get it down to O(n).
I agree that the logic isn't quite right in the explanation of the link. I give some details below.
Manacher's algorithm fills in a table P[i] which contains how far the palindrome centered at i extends. If P[5]=3, then three characters on either side of position five are part of the palindrome. The algorithm takes advantage of the fact that if you've found a long palindrome, you can fill in values of P on the right side of the palindrome quickly by looking at the values of P on the left side, since they should mostly be the same.
I'll start by explaining the case you were talking about, and then I'll expand this answer as needed.
R indicates the index of the right side of the palindrome centered at C. Here is the state at the place you indicated:
C=11
R=20
i=15
i'=7
P[i']=7
R-i=5
and the logic is like this:
if P[i']<=R-i: // not true
else: // P[i] is at least 5, but may be greater
The pseudo-code in the link indicates that P[i] should be greater than or equal to P[i'] if the test fails, but I believe it should be greater than or equal to R-i, and the explanation backs that up.
Since P[i'] is greater than R-i, the palindrome centered at i' extends past the palindrome centered at C. We know the palindrome centered at i will be at least R-i characters wide, because we still have symmetry up to that point, but we have to search explicitly beyond that.
If P[i'] had been no greater than R-i, then the largest palindrome centered at i' is within the largest palindrome centered at C, so we would have known that P[i] couldn't be any larger than P[i']. If it was, we would have a contradiction. It would mean that we would be able to extend the palindrome centered at i beyond P[i'], but if we could, then we would also be able to extend the palindrome centered at i' due to the symmetry, but it was already supposed to be as large as possible.
This case is illustrated previously:
C=11
R=20
i=13
i'=9
P[i']=1
R-i=7
In this case, P[i']<=R-i. Since we are still 7 characters away from the edge of the palindrome centered at C, we know that at least 7 characters around i are the same as the 7 characters around i'. Since there was only a one character palindrome around i', there is a one character palindrome around i as well.
j_random_hacker noticed that the logic should be more like this:
if P[i']<R-i then
P[i]=P[i']
else if P[i']>R-i then
P[i]=R-i
else P[i]=R-i + expansion
If P[i'] < R-i, then we know that P[i]==P[i'], since we're still inside the palindrome centered at C.
If P[i'] > R-i, then we know that P[i]==R-i, because otherwise the palindrome centered at C would have extended past R.
So the expansion is really only necessary in the special case where P[i']==R-i, so we don't know if the palindrome at P[i] may be longer.
This is handled in the actual code by setting P[i]=min(P[i'],R-i) and then always expanding. This way of doing it doesn't increase the time complexity, because if no expansion is necessary, the time taken to do the expansion is constant.
I have found one of the best explanation so far at the following link:
http://tarokuriyama.com/projects/palindrome2.php
It also has a visualization for the same string example (babcbabcbaccba) used at the first link mentioned in the question.
Apart from this link, i also found the code at
http://algs4.cs.princeton.edu/53substring/Manacher.java.html
I hope it will be helpful to others trying hard to understand the crux of this algorithm.
The Algorithm on this site seems understandable to the certain point
http://www.akalin.cx/longest-palindrome-linear-time
To understand this particular approach the best is to try to solving the problem on paper and catching the tricks you can implement to avoid checking for the palindrome for each possible center.
First answer yourself - when you find a palindrome of a given length, let's say 5 - can't you as a next step just jump to the end of this palindrome (skipping 4 letters and 4 mid-letters)?
If you try to create a palindrome with length 8 and place another palindrome with length > 8, which center is in the right side of the first palindrome you will notice something funny. Try it out:
Palindrome with length 8 - WOWILIKEEKIL - Like + ekiL = 8
Now in most cases you would be able to write down the place between two E's as a center and number 8 as the length and jump after the last L to look for the center of the bigger palindrome.
This approach is not correct, which the center of bigger palindrome can be inside ekiL and you would miss it if you would jump after the last L.
After you find LIKE+EKIL you place 8 in the array that these algos use and this looks like:
[0,1,0,3,0,1,0,1,0,3,0,1,0,1,0,1,8]
for
[#,W,#,O,#,W,#,I,#,L,#,I,#,K,#,E,#]
The trick is that you already know that most probably next 7 (8-1) numbers after 8 will be the same as on the left side, so the next step is to automatically copy 7 numbers from left of 8 to right of 8 keeping in mind they are not yet final.
The array would look like this
[0,1,0,3,0,1,0,1,0,3,0,1,0,1,0,1,8,1,0,1,0,1,0,3] (we are at 8)
for
[#,W,#,O,#,W,#,I,#,L,#,I,#,K,#,E,#,E,#,K,#,I,#,L]
Let's make an example, that such jump would destroy our current solution and see what we can notice.
WOWILIKEEKIL - lets try to make bigger palindrome with the center somewhere within EKIL.
But its not possible - we need to change word EKIL to something that contain palindrome.
What? OOOOOh - thats the trick.
The only possibility to have a bigger palindrome with the center in the right side of our current palindrome is that it is already in the right (and left) side of palindrome.
Let's try to build one based on WOWILIKEEKIL
We would need to change EKIL to for example EKIK with I as a center of the bigger palindrome - remember to change LIKE to KIKE as well.
First letters of our tricky palindrome will be:
WOWIKIKEEKIK
as said before - let the last I be the center of the bigger pallindrome than KIKEEKIK:
WOWIKIKEEKIKEEKIKIW
let's make the array up to our old pallindrom and find out how to laverage the additional info.
for
[_ W _ O _ W _ I _ K _ I _ K _ E _ E _ K _ I _ K _ E _ E _ K _ I _ K _ I _ W ]
it will be
[0,1,0,3,0,1,0,1,0,3,0,3,0,1,0,1,8
we know that the next I - a 3rd will be the longest pallindrome, but let's forget about it for a bit. lets copy the numbers in the array from the left of 8 to the right (8 numbers)
[0,1,0,3,0,1,0,1,0,3,0,3,0,1,0,1,8,1,0,1,0,3,0,3]
In our loop we are at between E's with number 8. What is special about I (future middle of biggest pallindrome) that we cannot jump right to K (the last letter of currently biggest pallindrome)?
The special thing is that it exceeds the current size of the array ... how?
If you move 3 spaces to the right of 3 - you are out of array. It means that it can be the middle of the biggest pallindrome and the furthest you can jump is this letter I.
Sorry for the length of this answer - I wanted to explain the algorythm and can assure you - #OmnipotentEntity was right - I understand it even better after explaining to you :)
Full Article: http://www.zrzahid.com/longest-palindromic-substring-in-linear-time-manachers-algorithm/
First of all lets observe closely to a palindrome in order to find some interesting properties. For example, S1 = "abaaba" and S2="abcba", both are palindrome but what is the non-trivial (i.e. not length or characters) difference between them? S1 is a palindrome centered around the invisible space between i=2 and i=3 (non-existent space!). On the other hand S2 is centered around character at i=2 (ie. c). In order to graciously handle the center of a palindrome irrespective of the odd/even length, lets transform the palindrome by inserting special character $ in between characters. Then S1="abba" and S2="abcba" will be transformed into T1="$a$b$a$a$b$a$" centered at i=6 and T2="$a$b$c$b$a$" centered at i=5. Now, we can see that centers are existent and lengths are consistent 2*n+1, where n=length of original string. For example,
i' c i
-----------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
-----------------------------------------------------
T1=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
-----------------------------------------------------
Next, observe that from the symmetric property of a (transformed) palindrome T around the center c, T[c-k] = T[c+k] for 0<= k<= c. That is positions c-k and c+k are mirror to each other. Let's put it another way, for an index i on the right of center c, the mirror index i' is on the left of c such that c-i'=i-c => i'=2*c-i and vice versa. That is,
For each position i on the right of center c of a palindromic substring, the mirror position of i is, i'=2*c-i, and vice versa.
Let us define an array P[0..2*n] such that P[i] equals to the length of the palindrome centered at i. Note that, length is actually measured by number of characters in the original string (by ignoring special chars $). Also let min and max be respectively the leftmost and rightmost boundary of a palindromic substring centered at c. So, min=c-P[c] and max=c+P[c]. For example, for palindrome S="abaaba", the transformed palindrome T, mirror center c=6, length array P[0..12], min=c-P[c]=6-6=0, max=c+P[c]=6+6=12 and two sample mirrored indices i and i' are shown in the following figure.
min i' c i max
-----------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
-----------------------------------------------------
T=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
-----------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 5 | 6 | 1 | 0 | 3 | 0 | 1 | 0 |
-----------------------------------------------------
With such a length array P, we can find the length of longest palindromic substring by looking into the max element of P. That is,
P[i] is the length of a palindromic substring with center at i in the transformed string T, ie. center at i/2 in the original string S; Hence the longest palindromic substring would be the substring of length P[imax] starting from index (imax-P[imax])/2 such that imax is the index of maximum element in P.
Let us draw a similar figure in the following for our non-palindromic example string S="babaabca".
min c max
|----------------|-----------------|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
---------------------------------------------------------------------
Question is how to compute P efficiently. The symmetric property suggests the following cases that we could potentially use to compute P[i] by using previously computed P[i'] at the mirror index i' on the left, hence skipping a lot of computations. Let's suppose that we have a reference palindrome (first palindrome) to start with.
A third palindrome whose center is within the right side of a first palindrome will have exactly the same length as that of a second palindrome anchored at the mirror center on the left side, if the second palindrome is within the bounds of the first palindrome by at least one character.
For example in the following figure with the first palindrome centered at c=8 and bounded by min=4 and max=12, length of the third palindrome centered at i=9 (with mirror index i'= 2*c-i = 7) is, P[i] = P[i'] = 1. This is because the second palindrome centered at i' is within the bounds of first palindrome. Similarly, P[10] = P[6] = 0.
|----3rd----|
|----2nd----|
|-----------1st Palindrome---------|
min i' c i max
|------------|---|---|-------------|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | ? | ? | ? | ? | ? | ? | ? | ? |
---------------------------------------------------------------------
Now, question is how to check this case? Note that, due to symmetric property length of segment [min..i'] is equals to the length of segment [i..max]. Also, note that 2nd palindrome is completely within 1st palindrome iff left edge of the 2nd palindrome is inside the left boundary, min of the 1st palindrome. That is,
i'-P[i'] >= min
=>P[i']-i' < -min (negate)
=>P[i'] < i'-min
=>P[i'] < max-i [(max-i)=(i'-min) due to symmetric property].
Combining all the facts in case 1,
P[i] = P[i'], iff (max-i) > P[i']
If the second palindrome meets or extends beyond the left bound of the first palindrome, then the third palindrome is guaranteed to have at least the length from its own center to the right outermost character of the first palindrome. This length is the same from the center of the second palindrome to the left outermost character of the first palindrome.
For example in the following figure, second palindrome centered at i=5 extends beyond the left bound of the first palindrome. So, in this case we can't say P[i]=P[i']. But length of the third palindrome centered at i=11, P[i] is at least the length from its center i=11 to the right bound max=12 of first palindrome centered at c. That is, P[i]>=1. This means third palindrome could be extended past max if and only if next immediate character past max matches exactly with the mirrored character, and we continue this check beyond. For example, in this case P[13]!=P[9] and it can't be extended. So, P[i] = 1.
|-------2nd palindrome------| |----3rd----|---?
|-----------1st Palindrome---------|
min i' c i max
|----|-----------|-----------|-----|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | ? | ? | ? | ? | ? | ? |
---------------------------------------------------------------------
So, how to check this case? This is simply the failed check for case 1. That is, second palindrome will extend past left edge of first palindrome iff,
i'-P[i'] < min
=>P[i']-i' >= -min [negate]
=>P[i'] >= i'-min
=>P[i'] >= max-i [(max-i)=(i'-min) due to symmetric property].
That is, P[i] is at least (max-i) iff (max-i) P[i]>=(max-i), iff (max-i)
Now, if the third palindrome does extend beyond max then we need to update the center and the boundary of the new palindrome.
If the palindrome centered at i does expand past max then we have new (extended) palindrome, hence a new center at c=i. Update max to the rightmost boundary of the new palindrome.
Combining all the facts in case 1 and case 2, we can come up with a very beautiful little formulae:
Case 1: P[i] = P[i'], iff (max-i) > P[i']
Case 2: P[i]>=(max-i), iff (max-i) = min(P[i'], max-i).
That is, P[i]=min(P[i'], max-i) when the third palindrome is not extendable past max. Otherwise, we have new third palindrome at center at c=i and new max=i+P[i].
Neither the first nor second palindrome provides information to help determine the palindromic length of a fourth palindrome whose center is outside the right side of the first palindrome.
That is, we can't determine preemptively P[i] if i>max. That is,
P[i] = 0, iff max-i < 0
Combining all the cases, we conclude the formulae:
P[i] = max>i ? min(P[i'], max-i) : 0. In case we can expand beyond max then we expand by matching characters beyond max with the mirrored character with respect to new center at c=i. Finally when we have a mismatch we update new max=i+P[i].
Reference: algorithm description in wiki page
This material is of great help for me to understand it:
http://solutionleetcode.blogspot.com/2013/07/leetcode-longest-palindromic-substring.html
Define T as the length of the longest palindromic substrings centered at each of the characters.
The key thing is, when smaller palindromes are completely embedded within the longer palindrome, T[i] should also be symmetric within the longer palindrome.
Otherwise, we will have to compute T[i] from scratch, rather than induce from the symmetric left part.
class Palindrome
{
private int center;
private int radius;
public Palindrome(int center, int radius)
{
if (radius < 0 || radius > center)
throw new Exception("Invalid palindrome.");
this.center = center;
this.radius = radius;
}
public int GetMirror(int index)
{
int i = 2 * center - index;
if (i < 0)
return 0;
return i;
}
public int GetCenter()
{
return center;
}
public int GetLength()
{
return 2 * radius;
}
public int GetRight()
{
return center + radius;
}
public int GetLeft()
{
return center - radius;
}
public void Expand()
{
++radius;
}
public bool LargerThan(Palindrome other)
{
if (other == null)
return false;
return (radius > other.radius);
}
}
private static string GetFormatted(string original)
{
if (original == null)
return null;
else if (original.Length == 0)
return "";
StringBuilder builder = new StringBuilder("#");
foreach (char c in original)
{
builder.Append(c);
builder.Append('#');
}
return builder.ToString();
}
private static string GetUnFormatted(string formatted)
{
if (formatted == null)
return null;
else if (formatted.Length == 0)
return "";
StringBuilder builder = new StringBuilder();
foreach (char c in formatted)
{
if (c != '#')
builder.Append(c);
}
return builder.ToString();
}
public static string FindLargestPalindrome(string str)
{
string formatted = GetFormatted(str);
if (formatted == null || formatted.Length == 0)
return formatted;
int[] radius = new int[formatted.Length];
try
{
Palindrome current = new Palindrome(0, 0);
for (int i = 0; i < formatted.Length; ++i)
{
radius[i] = (current.GetRight() > i) ?
Math.Min(current.GetRight() - i, radius[current.GetMirror(i)]) : 0;
current = new Palindrome(i, radius[i]);
while (current.GetLeft() - 1 >= 0 && current.GetRight() + 1 < formatted.Length &&
formatted[current.GetLeft() - 1] == formatted[current.GetRight() + 1])
{
current.Expand();
++radius[i];
}
}
Palindrome largest = new Palindrome(0, 0);
for (int i = 0; i < radius.Length; ++i)
{
current = new Palindrome(i, radius[i]);
if (current.LargerThan(largest))
largest = current;
}
return GetUnFormatted(formatted.Substring(largest.GetLeft(), largest.GetLength()));
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
return null;
}
Fast Javascript Solution to finding the longest palindrome in a string:
const lpal = str => {
let lpal = ""; // to store longest palindrome encountered
let pal = ""; // to store new palindromes found
let left; // to iterate through left side indices of the character considered to be center of palindrome
let right; // to iterate through left side indices of the character considered to be center of palindrome
let j; // to iterate through all characters and considering each to be center of palindrome
for (let i=0; i<str.length; i++) { // run through all characters considering them center of palindrome
pal = str[i]; // initializing current palindrome
j = i; // setting j as index at the center of palindorme
left = j-1; // taking left index of j
right = j+1; // taking right index of j
while (left >= 0 && right < str.length) { // while left and right indices exist
if(str[left] === str[right]) { //
pal = str[left] + pal + str[right];
} else {
break;
}
left--;
right++;
}
if(pal.length > lpal.length) {
lpal = pal;
}
pal = str[i];
j = i;
left = j-1;
right = j+1;
if(str[j] === str[right]) {
pal = pal + str[right];
right++;
while (left >= 0 && right < str.length) {
if(str[left] === str[right]) {
pal = str[left] + pal + str[right];
} else {
break;
}
left--;
right++;
}
if(pal.length > lpal.length) {
lpal = pal;
}
}
}
return lpal;
}
Example Input
console.log(lpal("gerngehgbrgregbeuhgurhuygbhsbjsrhfesasdfffdsajkjsrngkjbsrjgrsbjvhbvhbvhsbrfhrsbfsuhbvsuhbvhvbksbrkvkjb"));
Output
asdfffdsa
I went through the same frustration/struggle and I found the solution on this page, https://www.hackerearth.com/practice/algorithms/string-algorithm/manachars-algorithm/tutorial/, to be easiest to understand.
I tried to implement this solution in my own style, and I think I can understand the algorithm now. I also tried to stuff as many explanations in the code as possible to explain the algo. Hope this help!
#Manacher's Algorithm
def longestPalindrome(s):
s = s.lower()
#Insert special characters, #, between characters
#Insert another special in the front, $, and at the end, #, of string to avoid bound checking.
s1 = '$#'
for c in s:
s1 += c + '#'
s1 = s1+'#'
#print(s, " -modified into- ", s1)
#Palin[i] = length of longest palindrome start at center i
Palin = [0]*len(s1)
#THE HARD PART: THE MEAT of the ALGO
#c and r help keeping track of the expanded regions.
c = r = 0
for i in range(1,len(s1)-1): #NOTE: this algo always expands around center i
#if we already expanded past i, we can retrieve partial info
#about this location i, by looking at the mirror from left side of center.
if r > i: #---nice, we look into memory of the past---
#look up mirror from left of center c
mirror = c - (i-c)
#mirror's largest palindrome = Palin[mirror]
#case1: if mirror's largest palindrome expands past r, choose r-i
#case2: if mirror's largest palindrome is contains within r, choose Palin[mirror]
Palin[i] = min(r-i, Palin[mirror])
#keep expanding around center i
#NOTE: instead of starting to expand from i-1 and i+1, which is repeated work
#we start expanding from Palin[i],
##which is, if applicable, updated in previous step
while s1[i+1+Palin[i]] == s1[i-1-Palin[i]]:
Palin[i] += 1
#if expanded past r, update r and c
if i+Palin[i] > r:
c = i
r = i + Palin[i]
#the easy part: find the max length, remove special characters, and return
max_center = max_length = 0
for i in range(len(s1)):
if Palin[i] > max_length:
max_length = Palin[i]
max_center = i
output = s1[max_center-max_length : max_center+max_length]
output = ''.join(output.split('#'))
return output # for the (the result substring)
using namespace std;
class Palindrome{
public:
Palindrome(string st){
s = st;
longest = 0;
maxDist = 0;
//ascii: 126(~) - 32 (space) = 94
// for 'a' to 'z': vector<vector<int>> v(26,vector<int>(0));
vector<vector<int>> v(94,vector<int>(0)); //all ascii
mDist.clear();
vPos = v;
bDebug = true;
};
string s;
string sPrev; //previous char
int longest; //longest palindrome size
string sLongest; //longest palindrome found so far
int maxDist; //max distance been checked
bool bDebug;
void findLongestPal();
int checkIfAnchor(int iChar, int &i);
void checkDist(int iChar, int i);
//store char positions in s pos[0] : 'a'... pos[25] : 'z'
// 0123456
// i.e. "axzebca" vPos[0][0]=0 (1st. position of 'a'), vPos[0][1]=6 (2nd pos. of 'a'),
// vPos[25][0]=2 (1st. pos. of 'z').
vector<vector<int>> vPos;
//<anchor,distance to check>
// i.e. abccba anchor = 3: position of 2nd 'c', dist = 3
// looking if next char has a dist. of 3 from previous one
// i.e. abcxcba anchor = 4: position of 2nd 'c', dist = 4
map<int,int> mDist;
};
//check if current char can be an anchor, if so return next distance to check (3 or 4)
// i.e. "abcdc" 2nd 'c' is anchor for sub-palindrome "cdc" distance = 4 if next char is 'b'
// "abcdd: 2nd 'd' is anchor for sub-palindrome "dd" distance = 3 if next char is 'c'
int Palindrome::checkIfAnchor(int iChar, int &i){
if (bDebug)
cout<<"checkIfAnchor. i:"<<i<<" iChar:"<<iChar<<endl;
int n = s.size();
int iDist = 3;
int iSize = vPos[iChar].size();
//if empty or distance to closest same char > 2
if ( iSize == 0 || vPos[iChar][iSize - 1] < (i - 2)){
if (bDebug)
cout<<" .This cannot be an anchor! i:"<<i<<" : iChar:"<<iChar<<endl;
//store char position
vPos[iChar].push_back(i);
return -1;
}
//store char position of anchor for case "cdc"
vPos[iChar].push_back(i);
if (vPos[iChar][iSize - 1] == (i - 2))
iDist = 4;
//for case "dd" check if there are more repeated chars
else {
int iRepeated = 0;
while ((i+1) < n && s[i+1] == s[i]){
i++;
iRepeated++;
iDist++;
//store char position
vPos[iChar].push_back(i);
}
}
if (bDebug)
cout<<" .iDist:"<<iDist<<" i:"<<i<<endl;
return iDist;
};
//check distance from previous same char, and update sLongest
void Palindrome::checkDist(int iChar, int i){
if (bDebug)
cout<<"CheckDist. i:"<<i<<" iChar:"<<iChar<<endl;
int iDist;
int iSize = vPos[iChar].size();
bool b1stOrLastCharInString;
bool bDiffDist;
//checkAnchor will add this char position
if ( iSize == 0){
if (bDebug)
cout<<" .1st time we see this char. Assign it INT_MAX Dist"<<endl;
iDist = INT_MAX;
}
else {
iDist = i - vPos[iChar][iSize - 1];
}
//check for distances being check, update them if found or calculate lengths if not.
if (mDist.size() == 0) {
if (bDebug)
cout<<" .no distances to check are registered, yet"<<endl;
return;
}
int i2ndMaxDist = 0;
for(auto it = mDist.begin(); it != mDist.end();){
if (bDebug)
cout<<" .mDist. anchor:"<<it->first<<" . dist:"<<it->second<<endl;
b1stOrLastCharInString = false;
bDiffDist = it->second == iDist; //check here, because it can be updated in 1st. if
if (bDiffDist){
if (bDebug)
cout<<" .Distance checked! :"<<iDist<<endl;
//check if it's the first char in the string
if (vPos[iChar][iSize - 1] == 0 || i == (s.size() - 1))
b1stOrLastCharInString = true;
//we will continue checking for more...
else {
it->second += 2; //update next distance to check
if (it->second > maxDist) {
if (bDebug)
cout<<" .previous MaxDist:"<<maxDist<<endl;
maxDist = it->second;
if (bDebug)
cout<<" .new MaxDist:"<<maxDist<<endl;
}
else if (it->second > i2ndMaxDist) {//check this...hmtest
i2ndMaxDist = it->second;
if (bDebug)
cout<<" .second MaxDist:"<<i2ndMaxDist<<endl;
}
it++;
}
}
if (!bDiffDist || b1stOrLastCharInString) {
if (bDebug && it->second != iDist)
cout<<" .Dist diff. Anchor:"<<it->first<<" dist:"<<it->second<<" iDist:"<<iDist<<endl;
else if (bDebug)
cout<<" .Palindrome found at the beggining or end of the string"<<endl;
//if we find a closest same char.
if (!b1stOrLastCharInString && it->second > iDist){
if (iSize > 1) {
if (bDebug)
cout<<" . < Dist . looking further..."<<endl;
iSize--;
iDist = i - vPos[iChar][iSize - 1];
continue;
}
}
if (iDist == maxDist) {
maxDist = 0;
if (bDebug)
cout<<" .Diff. clearing Max Dist"<<endl;
}
else if (iDist == i2ndMaxDist) {
i2ndMaxDist = 0;
if (bDebug)
cout<<" .clearing 2nd Max Dist"<<endl;
}
int iStart;
int iCurrLength;
//first char in string
if ( b1stOrLastCharInString && vPos[iChar].size() > 0 && vPos[iChar][iSize - 1] == 0){
iStart = 0;
iCurrLength = i+1;
}
//last char in string
else if (b1stOrLastCharInString && i == (s.size() - 1)){
iStart = i - it->second;
iCurrLength = it->second + 1;
}
else {
iStart = i - it->second + 1;
iCurrLength = it->second - 1; //"xabay" anchor:2nd. 'a'. Dist from 'y' to 'x':4. length 'aba':3
}
if (iCurrLength > longest){
if (bDebug)
cout<<" .previous Longest!:"<<sLongest<<" length:"<<longest<<endl;
longest = iCurrLength;
sLongest = s.substr(iStart, iCurrLength);
if (bDebug)
cout<<" .new Longest!:"<<sLongest<<" length:"<<longest<<endl;
}
if (bDebug)
cout<<" .deleting iterator for anchor:"<<it->first<<" dist:"<<it->second<<endl;
mDist.erase(it++);
}
}
//check if we need to get new max distance
if (maxDist == 0 && mDist.size() > 0){
if (bDebug)
cout<<" .new maxDist needed";
if (i2ndMaxDist > 0) {
maxDist = i2ndMaxDist;
if (bDebug)
cout<<" .assigned 2nd. max Dist to max Dist"<<endl;
}
else {
for(auto it = mDist.begin(); it != mDist.end(); it++){
if (it->second > maxDist)
maxDist = it->second;
}
if (bDebug)
cout<<" .new max dist assigned:"<<maxDist<<endl;
}
}
};
void Palindrome::findLongestPal(){
int n = s.length();
if (bDebug){
cout<<"01234567891123456789212345"<<endl<<"abcdefghijklmnopqrstuvwxyz"<<endl<<endl;
for (int i = 0; i < n;i++){
if (i%10 == 0)
cout<<i/10;
else
cout<<i;
}
cout<<endl<<s<<endl;
}
if (n == 0)
return;
//process 1st char
int j = 0;
//for 'a' to 'z' : while (j < n && (s[j] < 'a' && s[j] > 'z'))
while (j < n && (s[j] < ' ' && s[j] > '~'))
j++;
if (j > 0){
s.substr(j);
n = s.length();
}
// for 'a' to 'z' change size of vector from 94 to 26 : int iChar = s[0] - 'a';
int iChar = s[0] - ' ';
//store char position
vPos[iChar].push_back(0);
for (int i = 1; i < n; i++){
if (bDebug)
cout<<"findLongestPal. i:"<<i<<" "<<s.substr(0,i+1)<<endl;
//if max. possible palindrome would be smaller or equal
// than largest palindrome found then exit
// (n - i) = string length to check
// maxDist: max distance to check from i
int iPossibleLongestSize = maxDist + (2 * (n - i));
if ( iPossibleLongestSize <= longest){
if (bDebug)
cout<<" .PosSize:"<<iPossibleLongestSize<<" longest found:"<<iPossibleLongestSize<<endl;
return;
}
//for 'a' to 'z' : int iChar = s[i] - 'a';
int iChar = s[i] - ' ';
//for 'a' to 'z': if (iChar < 0 || iChar > 25){
if (iChar < 0 || iChar > 94){
if (bDebug)
cout<<" . i:"<<i<<" iChar:"<<s[i]<<" skipped!"<<endl;
continue;
}
//check distance to previous char, if exist
checkDist(iChar, i);
//check if this can be an anchor
int iDist = checkIfAnchor(iChar,i);
if (iDist == -1)
continue;
//append distance to check for next char.
if (bDebug)
cout<<" . Adding anchor for i:"<<i<<" dist:"<<iDist<<endl;
mDist.insert(make_pair(i,iDist));
//check if this is the only palindrome, at the end
//i.e. "......baa" or "....baca"
if (i == (s.length() - 1) && s.length() > (iDist - 2)){
//if this is the longest palindrome!
if (longest < (iDist - 1)){
sLongest = s.substr((i - iDist + 2),(iDist - 1));
}
}
}
};
int main(){
string s;
cin >> s;
Palindrome p(s);
p.findLongestPal();
cout<<p.sLongest;
return 0;
}
I'm looking for some algorithm that for a given record with n properties with n possible values each (int, string etc) searches a number of existing records and gives back the one that matches the most properties.
Example:
A = 1
B = 1
C = 1
D = f
A | B | C | D
----+-----+-----+----
1 | 1 | 9 | f <
2 | 3 | 1 | g
3 | 4 | 2 | h
2 | 5 | 8 | j
3 | 6 | 5 | h
The first row would be the one I'm looking for, as it has the most matching values. I think it doesn't need to calculate any closeness to the values, because then row 2 might be more matching.
Loop through each row, add one to the row score of a field matches (field one has a score of 2) and when that's done, you have a resultset of scores which you can sort.
The basic algorithm could look like (in java pseudo code):
int bestMatchIdx = -1;
int currMatches = 0;
int bestMatches = 0;
for ( int row = 0 ; row < numRows ; row++ ) {
currMatches = 0;
for ( int col = 0 ; col < numCols ; col++ ) {
if ( search[col].equals( rows[ row ][ cols] ))
currMatches++;
}
if ( currMatches > bestMatches ) {
bestMatchIdx = row;
bestMatches = currMatches;
}
}
This assumes that you have an equals function to compare, and the data stored in a 2D array. 'search' is the reference row to compare all other rows against it.
This is a hard algorithms problem that :
Divide the list in 2 parts (sum) that their sum closest to (most) each other
list length is 1 <= n <= 100 and their(numbers) weights 1<=w<=250 given in the question.
For example : 23 65 134 32 95 123 34
1.sum = 256
2.sum = 250
1.list = 1 2 3 7
2.list = 4 5 6
I have an algorithm but it didn't work for all inputs.
init. lists list1 = [], list2 = []
Sort elements (given list) [23 32 34 65 95 123 134]
pop last one (max one)
insert to the list which differs less
Implementation :
list1 = [], list2 = []
select 134 insert list1. list1 = [134]
select 123 insert list2. because if you insert to the list1 the difference getting bigger 3. select 95 and insert list2 . because sum(list2) + 95 - sum(list1) is less.
and so on...
You can reformulate this as the knapsack problem.
You have a list of items with total weight M that should be fitted into a bin that can hold maximum weight M/2. The items packed in the bin should weigh as much as possible, but not more than the bin holds.
For the case where all weights are non-negative, this problem is only weakly NP-complete and has polynomial time solutions.
A description of dynamic programming solutions for this problem can be found on Wikipedia.
The problem is NPC, but there is a pseudo polynomial algorithm for it, this is a 2-Partition problem, you can follow the way of pseudo polynomial time algorithm for sub set sum problem to solve this. If the input size is related polynomially to input values, then this can be done in polynomial time.
In your case (weights < 250) it's polynomial (because weight <= 250 n => sums <= 250 n^2).
Let Sum = sum of weights, we have to create two dimensional array A, then construct A, Column by Column
A[i,j] = true if (j == weight[i] or j - weight[i] = weight[k] (k is in list)).
The creation of array with this algorithm takes O(n^2 * sum/2).
At last we should find most valuable column which has true value.
Here is an example:
items:{0,1,2,3}
weights:{4,7,2,8} => sum = 21 sum/2 = 10
items/weights 0| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
---------------------------------------------------------
|0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0
|1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0
|2 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1
|3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1
So because a[10, 2] == true the partition is 10, 11
This is an algorithm I found here and edited a little bit to solve your problem:
bool partition( vector< int > C ) {
// compute the total sum
int n = C.size();
int N = 0;
for( int i = 0; i < n; i++ ) N += C[i];
// initialize the table
T[0] = true;
for( int i = 1; i <= N; i++ ) T[i] = false;
// process the numbers one by one
for( int i = 0; i < n; i++ )
for( int j = N - C[i]; j >= 0; j--)
if( T[j] ) T[j + C[i]] = true;
for(int i = N/2;i>=0;i--)
if (T[i])
return i;
return 0;
}
I just returned first T[i] which is true instead of returning T[N/2] (in max to min order).
Finding the path which gives this value is not hard.
This problem is at least as hard as the NP-complete problem subset sum. Your algorithm is a greedy algorithm. This type of algorithm is fast, and can generate an approximate solution quickly but it cannot find the exact solution to an NP-complete problem.
A brute force approach is probably the simplest way to solve your problem, although it is will be to slow if there are too many elements.
Try every possible way of partitioning the elements into two sets and calculate the absolute difference in the sums.
Choose the partition for which the absolute difference is minimal.
Generating all the partitions can be done by considering the binary representation of each integer from 0 to 2^n, where each binary digit determines whether the correspending element is in the left or right partition.
Trying to resolve the same problem I have faced into the following idea which seems too much a solution, but it works in a linear time. Could one provide an example which would show that it does not work or explain why it is not a solution?
arr = [20,10,15,6,1,17,3,9,10,2,19] # a list of numbers
g1 = []
g2 = []
for el in reversed(sorted(arr)):
if sum(g1) > sum(g2):
g2.append(el)
else:
g1.append(el)
print(f"{sum(g1)}: {g1}")
print(f"{sum(g2)}: {g2}")
Typescript code:
import * as _ from 'lodash'
function partitionArray(numbers: number[]): {
arr1: number[]
arr2: number[]
difference: number
} {
let sortedArr: number[] = _.chain(numbers).without(0).sortBy((x) => x).value().reverse()
let arr1: number[] = []
let arr2: number[] = []
let median = _.sum(sortedArr) / 2
let sum = 0
_.each(sortedArr, (n) => {
let ns = sum + n
if (ns > median) {
arr1.push(n)
} else {
sum += n
arr2.push(n)
}
})
return {
arr1: arr1,
arr2: arr2,
difference: Math.abs(_.sum(arr1) - _.sum(arr2))
}
}
I have two sets of ranges. Each range is a pair of integers (start and end) representing some sub-range of a single larger range. The two sets of ranges are in a structure similar to this (of course the ...s would be replaced with actual numbers).
$a_ranges =
{
a_1 =>
{
start => ...,
end => ...,
},
a_2 =>
{
start => ...,
end => ...,
},
a_3 =>
{
start => ...,
end => ...,
},
# and so on
};
$b_ranges =
{
b_1 =>
{
start => ...,
end => ...,
},
b_2 =>
{
start => ...,
end => ...,
},
b_3 =>
{
start => ...,
end => ...,
},
# and so on
};
I need to determine which ranges from set A overlap with which ranges from set B. Given two ranges, it's easy to determine whether they overlap. I've simply been using a double loop to do this--loop through all elements in set A in the outer loop, loop through all elements of set B in the inner loop, and keep track of which ones overlap.
I'm having two problems with this approach. First is that the overlap space is extremely sparse--even if there are thousands of ranges in each set, I expect each range from set A to overlap with maybe 1 or 2 ranges from set B. My approach enumerates every single possibility, which is overkill. This leads to my second problem--the fact that it scales very poorly. The code finishes very quickly (sub-minute) when there are hundreds of ranges in each set, but takes a very long time (+/- 30 minutes) when there are thousands of ranges in each set.
Is there a better way I can index these ranges so that I'm not doing so many unnecessary checks for overlap?
Update: The output I'm looking for is two hashes (one for each set of ranges) where the keys are range IDs and the values are the IDs of the ranges from the other set that overlap with the given range in this set.
This sounds like the perfect use case for an interval tree, which is a data structure specifically designed to support this operation. If you have two sets of intervals of sizes m and n, then you can build one of them into an interval tree in time O(m lg m) and then do n intersection queries in time O(n lg m + k), where k is the total number of intersections you find. This gives a net runtime of O((m + n) lg m + k). Remember that in the worst case k = O(nm), so this isn't any better than what you have, but for cases where the number of intersections is sparse this can be substantially better than the O(mn) runtime you have right now.
I don't have much experience working with interval trees (and zero experience in Perl, sorry!), but from the description it seems like they shouldn't be that hard to build. I'd be pretty surprised if one didn't exist already.
Hope this helps!
In case you are not exclusively tied to perl; The IRanges package in R deals with interval arithmetic. It has very powerful primitives, it would probably be easy to code a solution with them.
A second remark is that the problem could possibly become very easy if the intervals have additional structure; for example, if within each set of ranges there is no overlap (in that case a linear approach sifting through the two ordered sets simultaneously is possible). Even in the absence of such structure, the least you can do is to sort one set of ranges by start point, and the other set by end point, then break out of the inner loop once a match is no longer possible. Of course, existing and general algorithms and data structures such as the interval tree mentioned earlier are the most powerful.
There are Several existing CPAN modules that solve this issue, I have developed 2 of them: Data::Range::Compare and Data::Range::Compare::Stream
Data::Range::Compare only works with arrays in memory, but supports generic range types.
Data::Range::Compare::Stream Works with streams of data via iterators, but it requires understanding OO Perl to extend to generic data types. Data::Range::Compare::Stream is recommended if you are processing very very large sets of data.
Here is an Excerpt form the Examples folder of Data::Range::Compare::Stream.
Given these 3 sets of data:
Numeric Range set: A contained in file: source_a.src
+----------+
| 1 - 11 |
| 13 - 44 |
| 17 - 23 |
| 55 - 66 |
+----------+
Numeric Range set: B contained in file: source_b.src
+----------+
| 0 - 1 |
| 2 - 29 |
| 88 - 133 |
+----------+
Numeric Range set: C contained in file: source_c.src
+-----------+
| 17 - 29 |
| 220 - 240 |
| 241 - 250 |
+-----------+
The expected output would be:
+--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+
| 0 - 0 | No Data | 0 - 1 | No Data |
| 1 - 1 | 1 - 11 | 0 - 1 | No Data |
| 2 - 11 | 1 - 11 | 2 - 29 | No Data |
| 12 - 12 | No Data | 2 - 29 | No Data |
| 13 - 16 | 13 - 44 | 2 - 29 | No Data |
| 17 - 29 | 13 - 44 | 2 - 29 | 17 - 29 |
| 30 - 44 | 13 - 44 | No Data | No Data |
| 55 - 66 | 55 - 66 | No Data | No Data |
| 88 - 133 | No Data | 88 - 133 | No Data |
| 220 - 240 | No Data | No Data | 220 - 240 |
| 241 - 250 | No Data | No Data | 241 - 250 |
+--------------------------------------------------------------------+
The Source code can be found here.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use lib qw(./ ../lib);
# custom package from FILE_EXAMPLE.pl
use Data::Range::Compare::Stream::Iterator::File;
use Data::Range::Compare::Stream;
use Data::Range::Compare::Stream::Iterator::Consolidate;
use Data::Range::Compare::Stream::Iterator::Compare::Asc;
my $source_a=Data::Range::Compare::Stream::Iterator::File->new(filename=>'source_a.src');
my $source_b=Data::Range::Compare::Stream::Iterator::File->new(filename=>'source_b.src');
my $source_c=Data::Range::Compare::Stream::Iterator::File->new(filename=>'source_c.src');
my $consolidator_a=new Data::Range::Compare::Stream::Iterator::Consolidate($source_a);
my $consolidator_b=new Data::Range::Compare::Stream::Iterator::Consolidate($source_b);
my $consolidator_c=new Data::Range::Compare::Stream::Iterator::Consolidate($source_c);
my $compare=new Data::Range::Compare::Stream::Iterator::Compare::Asc();
my $src_id_a=$compare->add_consolidator($consolidator_a);
my $src_id_b=$compare->add_consolidator($consolidator_b);
my $src_id_c=$compare->add_consolidator($consolidator_c);
print " +--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+\n";
my $format=' | %-12s | %-13s | %-13s | %-13s |'."\n";
while($compare->has_next) {
my $result=$compare->get_next;
my $string=$result->to_string;
my #data=($result->get_common);
next if $result->is_empty;
for(0 .. 2) {
my $column=$result->get_column_by_id($_);
unless(defined($column)) {
$column="No Data";
} else {
$column=$column->get_common->to_string;
}
push #data,$column;
}
printf $format,#data;
}
print " +--------------------------------------------------------------------+\n";
Try Tree::RB but to find mutually exclusive ranges, no overlaps
The performance is not bad, if I had about 10000 segments and had to find the segment for each discrete number. My input had 300 million records. I neaded to put them into separate buckets. Like partitioning the data. Tree::RB worked out great.
$var = [
[0,90],
[91,2930],
[2950,8293]
.
.
.
]
my lookup value were 10, 99, 991 ...
and basically I needed the position of the range for the given number
With this below comparison function, mine uses something like this:
my $cmp = sub
{
my ($a1, $b1) = #_;
if(ref($b1) && ref($a1))
{
return ($$a1[1]) <=> ($$b1[0]);
}
my $ret = 0;
if(ref($a1) eq 'ARRAY')
{
#
if($$a1[0] <= $b1 && $b1 >= $$a1[1])
{
$ret = 0;
}
if($$a1[0] < $b1)
{
$ret = -1;
}
if($$a1[1] > $b1)
{
$ret = 1;
}
}
else
{
if($$b1[0] <= $a1 && $a1 >= $$b1[1])
{
$ret = 0;
}
if($$b1[0] > $a1)
{
$ret = -1;
}
if($$b1[1] < $a1)
{
$ret = 1;
}
}
return $ret;
}
I should check time in order to know if its the fastest way, but according to the structure of your data you should try this:
use strict;
my $fromA = 12;
my $toA = 15;
my $fromB = 7;
my $toB = 35;
my #common_range = get_common_range($fromA, $toA, $fromB, $toB);
my $common_range = $common_range[0]."-".$common_range[-1];
sub get_common_range {
my #A = $_[0]..$_[1];
my %B = map {$_ => 1} $_[2]..$_[3];
my #common = ();
foreach my $i (#A) {
if (defined $B{$i}) {
push (#common, $i);
}
}
return sort {$a <=> $b} #common;
}