Understanding of shell sort - algorithm

I have a couple of questions I couldn't find online regarding Shell sort with Shell's gap.
public static void shell(int[] a) {
int increment = a.length / 2;
while (increment > 0) {
for (int i = increment; i < a.length; i++) {
int j = i;
int temp = a[i];
while (j >= increment && a[j - increment] > temp) {
a[j] = a[j - increment];
j = j - increment;
}
a[j] = temp;
}
if (increment == 2) {
increment = 1;
} else {
increment *= (5.0 / 11);
}
}
}
This is the code I found online, but I don't really understand the last else statement. What does 5.0/11 represent?
Also I need to analyse the complexity of the algorithm, though I'm receiving pretty perplexing results:
It seems that it is O(n) either best and worst cases. Are these results legit?

5.0/11 is actually used to get the half of the increment. Anything <0.5 and >0.45 will work to obtain the half of the value in round figures.

Related

First missing Integer approach's time complexity

I want to understand the time complexity of my below algorithm, which is an acceptable answer for the famous first missing integer problem:
public int firstMissingPositive(int[] A) {
int l = A.length;
int i = 0;
while (i < l) {
int j = A[i];
while (j > 0 && j <= l) {
int k = A[j - 1];
A[j - 1] = Integer.MAX_VALUE;
j = k;
}
i++;
}
for (i = 0; i < l; i++) {
if (A[i] != Integer.MAX_VALUE)
break;
}
return i + 1;
}
Observations and findings:
Looking at the loop structure I thought that the complexity should be more than n as I may visit every element more than twice in some cases. But to my surprise, the solution got accepted. I am not able to understand the complexity.
You are probably looking at the nested loops and thinking O(N2), but it's not that simple.
Every iteration of the inner loop changes an item in A to Integer.MAX_VALUE, and there are only N items, so there cannot be more than N iterations of the inner loop in total.
The total time is therefore O(N).

select k values at time and flip them, find minimum cost to make all array values same equal to 1

Given a array containing only 0 and 1 and a integer value k.
You should choose k digits at time and flip all of them. Find minimum cost for making all values same. If it is not possible then give -1.
This is a simple greedy problem. I am assuming you can't flip less than k digits any time.
Find minimum cost for making all values same.
To solve this, first we will try to make all values 1 and then we'll try to make all values 0. Between these, which will take minimum steps will be our answer.
Here is my pseudo-code. The pseudo-code is self-explanatory and that's why I am not adding explanation. I am giving code for making all values 1, hope you can do for both.
int cnt = 0;
for(int i = 0; i < arr.length() - k + 1; i++) {
if(arr[i] == '1') {
continue;
}
for(int j = 0; j < k; j++) {
arr[i + j] = (arr[i + j] == '0') ? '1' : '0';
}
cnt++;
}
bool flag = true;
for(int i = 0; i < arr.length(); i++) {
if(arr[i] == '0') {
flag = false;
break;
}
}
if(flag) {
print(cnt);
} else {
print("-1");
}

Distinct Subsequences DP explanation

From LeetCode
Given a string S and a string T, count the number of distinct
subsequences of T in S.
A subsequence of a string is a new string which is formed from the
original string by deleting some (can be none) of the characters
without disturbing the relative positions of the remaining characters.
(ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Here is an example: S = "rabbbit", T = "rabbit"
Return 3.
I see a very good DP solution, however, I have hard time to understand it, anybody can explain how this dp works?
int numDistinct(string S, string T) {
vector<int> f(T.size()+1);
//set the last size to 1.
f[T.size()]=1;
for(int i=S.size()-1; i>=0; --i){
for(int j=0; j<T.size(); ++j){
f[j]+=(S[i]==T[j])*f[j+1];
printf("%d\t", f[j] );
}
cout<<"\n";
}
return f[0];
}
First, try to solve the problem yourself to come up with a naive implementation:
Let's say that S.length = m and T.length = n. Let's write S{i} for the substring of S starting at i. For example, if S = "abcde", S{0} = "abcde", S{4} = "e", and S{5} = "". We use a similar definition for T.
Let N[i][j] be the distinct subsequences for S{i} and T{j}. We are interested in N[0][0] (because those are both full strings).
There are two easy cases: N[i][n] for any i and N[m][j] for j<n. How many subsequences are there for "" in some string S? Exactly 1. How many for some T in ""? Only 0.
Now, given some arbitrary i and j, we need to find a recursive formula. There are two cases.
If S[i] != T[j], we know that N[i][j] = N[i+1][j] (I hope you can verify this for yourself, I aim to explain the cryptic algorithm above in detail, not this naive version).
If S[i] = T[j], we have a choice. We can either 'match' these characters and go on with the next characters of both S and T, or we can ignore the match (as in the case that S[i] != T[j]). Since we have both choices, we need to add the counts there: N[i][j] = N[i+1][j] + N[i+1][j+1].
In order to find N[0][0] using dynamic programming, we need to fill the N table. We first need to set the boundary of the table:
N[m][j] = 0, for 0 <= j < n
N[i][n] = 1, for 0 <= i <= m
Because of the dependencies in the recursive relation, we can fill the rest of the table looping i backwards and j forwards:
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}
We can now use the most important trick of the algorithm: we can use a 1-dimensional array f, with the invariant in the outer loop: f = N[i+1]; This is possible because of the way the table is filled. If we apply this to my algorithm, this gives:
f[j] = 0, for 0 <= j < n
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
f[j] = f[j] + f[j+1];
} else {
f[j] = f[j];
}
}
}
We're almost at the algorithm you gave. First of all, we don't need to initialize f[j] = 0. Second, we don't need assignments of the type f[j] = f[j].
Since this is C++ code, we can rewrite the snippet
if (S[i] == T[j]) {
f[j] += f[j+1];
}
to
f[j] += (S[i] == T[j]) * f[j+1];
and that's all. This yields the algorithm:
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
f[j] += (S[i] == T[j]) * f[j+1];
}
}
I think the answer is wonderful, but something may be not correct.
I think we should iterate backwards over i and j. Then we change to array N to array f, we looping j forwards for not overlapping the result last got.
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}

Interview - Find magnitude pole in an array

Magnitude Pole: An element in an array whose left hand side elements are lesser than or equal to it and whose right hand side element are greater than or equal to it.
example input
3,1,4,5,9,7,6,11
desired output
4,5,11
I was asked this question in an interview and I have to return the index of the element and only return the first element that met the condition.
My logic
Take two MultiSet (So that we can consider duplicate as well), one for right hand side of the element and one for left hand side of the
element(the pole).
Start with 0th element and put rest all elements in the "right set".
Base condition if this 0th element is lesser or equal to all element on "right set" then return its index.
Else put this into "left set" and start with element at index 1.
Traverse the Array and each time pick the maximum value from "left set" and minimum value from "right set" and compare.
At any instant of time for any element all the value to its left are in the "left set" and value to its right are in the "right set"
Code
int magnitudePole (const vector<int> &A) {
multiset<int> left, right;
int left_max, right_min;
int size = A.size();
for (int i = 1; i < size; ++i)
right.insert(A[i]);
right_min = *(right.begin());
if(A[0] <= right_min)
return 0;
left.insert(A[0]);
for (int i = 1; i < size; ++i) {
right.erase(right.find(A[i]));
left_max = *(--left.end());
if (right.size() > 0)
right_min = *(right.begin());
if (A[i] > left_max && A[i] <= right_min)
return i;
else
left.insert(A[i]);
}
return -1;
}
My questions
I was told that my logic is incorrect, I am not able to understand why this logic is incorrect (though I have checked for some cases and
it is returning right index)
For my own curiosity how to do this without using any set/multiset in O(n) time.
For an O(n) algorithm:
Count the largest element from n[0] to n[k] for all k in [0, length(n)), save the answer in an array maxOnTheLeft. This costs O(n);
Count the smallest element from n[k] to n[length(n)-1] for all k in [0, length(n)), save the answer in an array minOnTheRight. This costs O(n);
Loop through the whole thing and find any n[k] with maxOnTheLeft <= n[k] <= minOnTheRight. This costs O(n).
And you code is (at least) wrong here:
if (A[i] > left_max && A[i] <= right_min) // <-- should be >= and <=
Create two bool[N] called NorthPole and SouthPole (just to be humorous.
step forward through A[]tracking maximum element found so far, and set SouthPole[i] true if A[i] > Max(A[0..i-1])
step backward through A[] and set NorthPole[i] true if A[i] < Min(A[i+1..N-1)
step forward through NorthPole and SouthPole to find first element with both set true.
O(N) in each step above, as visiting each node once, so O(N) overall.
Java implementation:
Collection<Integer> magnitudes(int[] A) {
int length = A.length;
// what's the maximum number from the beginning of the array till the current position
int[] maxes = new int[A.length];
// what's the minimum number from the current position till the end of the array
int[] mins = new int[A.length];
// build mins
int min = mins[length - 1] = A[length - 1];
for (int i = length - 2; i >= 0; i--) {
if (A[i] < min) {
min = A[i];
}
mins[i] = min;
}
// build maxes
int max = maxes[0] = A[0];
for (int i = 1; i < length; i++) {
if (A[i] > max) {
max = A[i];
}
maxes[i] = max;
}
Collection<Integer> result = new ArrayList<>();
// use them to find the magnitudes if any exists
for (int i = 0; i < length; i++) {
if (A[i] >= maxes[i] && A[i] <= mins[i]) {
// return here if first one only is needed
result.add(A[i]);
}
}
return result;
}
Your logic seems perfectly correct (didn't check the implementation, though) and can be implemented to give an O(n) time algorithm! Nice job thinking in terms of sets.
Your right set can be implemented as a stack which supports a min, and the left set can be implemented as a stack which supports a max and this gives an O(n) time algorithm.
Having a stack which supports max/min is a well known interview question and can be done so each operation (push/pop/min/max is O(1)).
To use this for your logic, the pseudo code will look something like this
foreach elem in a[n-1 to 0]
right_set.push(elem)
while (right_set.has_elements()) {
candidate = right_set.pop();
if (left_set.has_elements() && left_set.max() <= candidate <= right_set.min()) {
break;
} else if (!left.has_elements() && candidate <= right_set.min() {
break;
}
left_set.push(candidate);
}
return candidate
I saw this problem on Codility, solved it with Perl:
sub solution {
my (#A) = #_;
my ($max, $min) = ($A[0], $A[-1]);
my %candidates;
for my $i (0..$#A) {
if ($A[$i] >= $max) {
$max = $A[$i];
$candidates{$i}++;
}
}
for my $i (reverse 0..$#A) {
if ($A[$i] <= $min) {
$min = $A[$i];
return $i if $candidates{$i};
}
}
return -1;
}
How about the following code? I think its efficiency is not good in the worst case, but it's expected efficiency would be good.
int getFirstPole(int* a, int n)
{
int leftPole = a[0];
for(int i = 1; i < n; i++)
{
if(a[j] >= leftPole)
{
int j = i;
for(; j < n; j++)
{
if(a[j] < a[i])
{
i = j+1; //jump the elements between i and j
break;
}
else if (a[j] > a[i])
leftPole = a[j];
}
if(j == n) // if no one is less than a[i] then return i
return i;
}
}
return 0;
}
Create array of ints called mags, and int variable called maxMag.
For each element in source array check if element is greater or equal to maxMag.
If is: add element to mags array and set maxMag = element.
If isn't: loop through mags array and remove all elements lesser.
Result: array of magnitude poles
Interesting question, I am having my own solution in C# which I have given below, read the comments to understand my approach.
public int MagnitudePoleFinder(int[] A)
{
//Create a variable to store Maximum Valued Item i.e. maxOfUp
int maxOfUp = A[0];
//if list has only one value return this value
if (A.Length <= 1) return A[0];
//create a collection for all candidates for magnitude pole that will be found in the iteration
var magnitudeCandidates = new List<KeyValuePair<int, int>>();
//add the first element as first candidate
var a = A[0];
magnitudeCandidates.Add(new KeyValuePair<int, int>(0, a));
//lets iterate
for (int i = 1; i < A.Length; i++)
{
a = A[i];
//if this item is maximum or equal to all above items ( maxofUp will hold max value of all the above items)
if (a >= maxOfUp)
{
//add it to candidate list
magnitudeCandidates.Add(new KeyValuePair<int, int>(i, a));
maxOfUp = a;
}
else
{
//remote all the candidates having greater values to this item
magnitudeCandidates = magnitudeCandidates.Except(magnitudeCandidates.Where(c => c.Value > a)).ToList();
}
}
//if no candidate return -1
if (magnitudeCandidates.Count == 0) return -1;
else
//return value of first candidate
return magnitudeCandidates.First().Key;
}

InsertionSort vs. InsertionSort vs. BinaryInsertionSort

I have a couple of questions concerning different implementations of insertion sort.
Implementation 1:
public static void insertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
int key = a[i];
int j = i - 1;
while (j >= 0 && a[j] > key) {
a[j + 1] = a[j];
--j;
}
a[j + 1] = key;
}
}
Implementation 2:
public static void insertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
for (int j = i; j > 0 && a[j - 1] > a[j]; --j) {
swap(a, j, j - 1);
}
}
}
private static void swap(int[] a, int i, int j) {
int tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
Here's my first question: One should think that the first version should be a little faster that the second version (because of lesser assignments) but it isn't (or at least the difference it's negligible). But why?
Second, I was wondering that Java's Arrays.sort() method also uses the second approach (maybe because of code reuse because the swap method is used in different places, maybe because it's easier to understand).
Implementation 3 (binaryInsertionSort):
public static void binaryInsertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
int pos = Arrays.binarySearch(a, 0, i, a[i]);
int insertionPoint = (pos >= 0) ? pos : -pos - 1;
if (insertionPoint < i) {
int key = a[i];
// for (int j = i; i > insertionPoint; --i) {
// a[j] = a[j - 1];
// }
System.arraycopy(a, insertionPoint, a, insertionPoint + 1, i - insertionPoint);
a[insertionPoint] = key;
}
}
}
Is the binary insertion sort of any practical use, or is it more of a theoretical thing? On small arrays, the other approaches are much faster, and on bigger arrays mergesort/quicksort has a much better performance.
delete false claim
The number of comparisons in the first two is 1/2*n(n-1), excluding those for the outer loops.
None of these programs make much sense for real work as they stand, because they don't make use of the information at their disposal. For instance, it is easy to add a check to the inner loop to see if any swaps have been made: if not then the array is sorted, and you can finish, perhaps saving most of the work. In practice, these kinds of consideration can dominate the average case.
Postscript
Missed the question about Java: I understand that Java's sort is a pretty complex algorithm, which uses a lot of special cases, such as specialised sorting cases for small arrays, and using quicksort to do its heavy lifting.

Resources