InsertionSort / Improvements - sorting

I have kind of a theoretical question... I am trying to understand different implementations of InsertionSort algorithms and I came across that one specific implementation where the author (before starting the actual sorting) first brought the smallest item to the very first position of the array. Does someone out there know, why this could be an improvement to InsertionSort?
Greetings - Christian
for (i = r; i > l; i--) compExch(a, i-1, i);

Related

Finding all prime numbers from 1 to N using GCD (An alternate approach to sieve-of-eratosthenes)

To find all prime numbers from 1 to N.
I know we usually approach this problem using Sieve of Eratosthenes, I had an alternate approach in mind using gcd that I wanted your views on.
My approach->
Keep a maintaining a variable if all prime numbers are processed till any iteration. If gcd of this var, number i ==1. That means the nos. are co-prime so i must be prime.
For ex: gcd(210,11) == 1, so 11 is prime.
{210=2*3*5*7}
Pseudocode:
Init num_list={contains numbers 2 to N} [since 0 and 1 arent prime nos.]
curr_gcd = 2, gcd_val=1
For i=3;i<=N;i++
gcd_val=__gcd(curr_gcd,i)
if gcd_val == 1 //(prime)
curr_gcd = curr_gcd * i
else //(composite so remove from list)
numList.remove(i)
Alternatively, we can also have a list and push the prime numbers into that list.
SC = O(N)
TC = O(N log(N)) [TC to calculate gcd using euclid's method => O(log(max(a,b)))]
Does this seem right or I am calculating the TC incorrectly here. Please post your views on this.
TIA!
Looks like the time complexity of my approach is closer to O(log^2(n)) as pointed out by many in the comments.
Also, the curr_gcd var would become quite large as N is increased and would definitely overflow int and long size limits.
Thanks to everyone who responded!
Maybe your method is theoretically right,but evidently, it's not excellent.
It's efficiency is worse than SoE, the range of data that it needs is too large. So maybe it seems elegant to look but hard to use.
In my views, "To find all prime numbers from 1 to N" is already a well-known problem and that means it's solution is well considered.
At first, maybe we use brute-force to deal with it like this.
int primes[N],cnt;//store all prime numbers
bool st[N];//st[i]:whether i is rejected
void get_primes(int n){
for(int i=2;i<=n;i++){
if(st[i]) continue;
primes[cnt++]=i;
for(int j=i+i;j<=n;j+=i){
st[j]=true;
}
}
}
it's a O(n^2) time algorithm.Too slow to endure.
Go ahead. We have SoE, which use O(nlognlogn) time.
But we have a better algorithm called "liner sieve", which only use O(n) time, just as it's name. I implement it with C language like this.
int primes[N],cnt;
bool st[N];
void get_primes(int n){
for(int i=2;i<=n;i++){
if(!st[i]) primes[cnt++]=i;
for(int j=0;primes[j]*i<=n;j++){
st[primes[j]*i]=true;
if(i%primes[j]==0) break;
}
}
}
this O(n) algorithm is used by me to slove this kind of algorithm problems that appear in major IT companies and many kinds of OJ.

Fast Algorithm to Solve Unique Paths With Backtracking

A robot located at the top left corner of a XxX grid is trying to reach the bottom right corner. The robot can move either up, down, left, or right, but cannot visit the same spot twice. How many possible unique paths are there to the bottom right corner?
What is a fast algorithmic solution to this? I've spent a huge amount of time trying to figure out a fast algorithm to this. But still stuck.
This is basically the unique paths Leetcode problem, except with backtracking.
Unique paths, without backtracking, can be solved with dynamic programming such as:
class Solution {
public:
int uniquePaths(int m, int n) {
vector<int> cur(n, 1);
for (int i = 1; i < m; i++) {
for (int j = 1; j < n; j++) {
cur[j] += cur[j - 1];
}
}
return cur[n - 1];
}
};
What would be a fast algorithmic solution, using dynamic programming, to unique paths, except with backtracking? Something that could quickly find the result 1,568,758,030,464,750,013,214,100 for a 10X10 grid.
Reddit, Wikipedia, and Youtube have resources illustrating the complexity of this problem. But they don't have any answers.
The problem cannot be solved using dynamic programming because the recurrence relation does not break the problem into sub-problems. Dynamic programming assumes that the state to be computed is dependent on only the sub-states in the recurrence. It is not true in this case because there can be cycles, ie. going up and down.
The general case of this problem, to count the number of simple paths in a directed cyclic graph, is considered to be #P-Complete.
This can also been as enumerating self avoiding walks in 2-dimensions. As per wikipedia,
Finding the number of such paths is conjectured to be an NP-hard problem[citation needed].
However, if we consider moves in only the positive direction, ie. right and down, it has a closed form solution, of m+nCm. Basically, the total number of moves is always fixed to be m + n where m,n are cartesian distances to the end point of the diagonal and we simply have to choose the m right(s) or n down(s). The dynamic programming solution is essentially the same.

How to calculate complexity of non-standard day to day algorithms

Hello StackOverflow community!
I had this question in my mind from so many days and finally have decided to get it sorted out. So, given a algorithm or say a function which implements some non-standard algorithm in your daily coding activity, how do you go about analyzing the rum time complexity?
Ok let me be more specific. Suppose you are solving this problem,
Given a NxN matrix consisting of positive integers, find the longest increasing sequence in it. You may only traverse in up, down, left or right directions but not diagonally.
Eg: If the matrix is
[ [9,9,4],
[6,6,8],
[2,1,1] ].
the algorithm must return 4
(The sequence being 1->2->6->9)
So yeah, looks like I have to use DFS. I get this part. I have done my Algorithms course back in Uni and can work my way around such questions. So, I come up with this solution say,
class Solution
{
public int longestIncreasingPathStarting(int[][] matrix, int i, int j)
{
int localMax = 1;
int[][] offsets = {{0,1}, {0,-1}, {1,0}, {-1,0}};
for (int[] offset: offsets)
{
int x = i + offset[0];
int y = j + offset[1];
if (x < 0 || x >= matrix.length || y < 0 || y >= matrix[i].length || matrix[x][y] <= matrix[i][j])
continue;
localMax = Math.max(localMax, 1+longestIncreasingPathStarting(matrix, x, y));
}
return localMax;
}
public int longestIncreasingPath(int[][] matrix)
{
if (matrix.length == 0)
return 0;
int maxLen = 0;
for (int i = 0; i < matrix.length; ++i)
{
for (int j = 0; j < matrix[i].length; ++j)
{
maxLen = Math.max(maxLen, longestIncreasingPathStarting(matrix, i, j));
}
}
return maxLen;
}
}
Inefficient, I know, but I wrote it this way on purpose! Anyways my question is, how do you go about analyzing the run time of longestIncreasingPath(matrix) function?
I can understand the analysis they teach us in a Algos course, you know the standard MergeSort, QuickSort analysis etc. but unfortunately and I hate to say this, that did not prepare me to apply it in my day-day coding job. I want to do it now, and hence would like to start it by analyzing such functions.
Can someone help me out here and describe the steps one would take to analyze the runtime of the above function? That would greatly help me. Thanks in advance, Cheers!
For day to day work eye-balling things usually works well.
In this case you will try to go in every direction recursively. So a really bad example comes to mind like: [[1,2,3], [2,3,4], [3,4,5]] so that you have two options from most cells. I happen to know that this will be O((2*n) ! / (n!*n!)) steps, but another good guess would be O(2^N). Now that you have an example where you know or can compute more easily the complexity, the overall complexity has to be at least that.
Usually, it doesn't really matter which one it is exactly since for both O(N!) and O(2^N) the run-time grows very fast and should only work fast for up to around 10-20 maybe a bit more if you are willing to wait. You would not run this algorithm for N ~= 1000, you would need something polynomial. So an rough estimate that you have a exponential solution would be enough to make a decision.
So in general to get an idea of the complexity, try to relate your solution to other algorithms where you know the complexity already or figure out a worst case scenario for the algorithm where it's easier to judge the complexity. Even if you are slightly off it might still help you make a decision.
If you need to compare algorithms of more similar complexity (ie. O(NlogN) vs O(N^2) for N~=100) you should implement both and benchmark since the constant factor might be the leading contributor to the run-time.

Regarding sorted data in Fast 3 Way Partition / in place quicksort via Sedgewick

I am interested in the 3 way partition in quickSort at http://algs4.cs.princeton.edu/23quicksort/Quick3way.java.html
because it uses that partition to overcome the Dutch National Flag problem (equal data) in an in-place quicksort.
Since the author is Sedgewick I would assume that there is no error in that code, yet the pivot selected is prone to worst case n^2 time complexity for sorted data.
According to wikipedia:
In the very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by Sedgewick).[17]
The code for the quick sort:
// quicksort the subarray a[lo .. hi] using 3-way partitioning
private static void sort(Comparable[] a, int lo, int hi) {
if (hi <= lo) return;
int lt = lo, gt = hi;
Comparable v = a[lo];
int i = lo;
while (i <= gt) {
int cmp = a[i].compareTo(v);
if (cmp < 0) exch(a, lt++, i++);
else if (cmp > 0) exch(a, i, gt--);
else i++;
}
// a[lo..lt-1] < v = a[lt..gt] < a[gt+1..hi].
sort(a, lo, lt-1);
sort(a, gt+1, hi);
assert isSorted(a, lo, hi);
}
Am I correct to use the mid or ninther for the pivot or have I missed something? I realize it is instructional but why not at least use the mid?
EDIT
Is shuffling considered a rigorous way to prevent worst case over simply choosing a better pivot? Why not just change the pivot...Shuffling a large array with significant randomness would take some overhead would it not? Since a shuffle algorithm takes extra time, why not choose the pivot? Shuffling data with all equivalent data is a complete waste for instance. Would it not be better to run isSorted on the array as an heuristic with a needed edit for equiv data? –
not one to argue with Hoare, but would it not be better to check ifSorted with a modification for equiv data that would short circuit rather than run the data through the sort unnecessarily? It would take the same time as a shuffle.
The sort method you quoted is a private helper method. The real public method sort is like this:
public static void sort(Comparable[] a) {
StdRandom.shuffle(a);
sort(a, 0, a.length - 1);
assert isSorted(a);
}
By calling StdRandom.shuffle, the array is randomly shuffled before doing quicksort. This is the way to protect against the worst case.
It's not only used for this 3-way partition quicksort, it's also used in the normal quicksort.
Quoting from the Algorithms book by Sedgewick, §2.3 QUICKSORT
Q. Randomly shuffling the array seems to take a significant fraction of the total time for the sort. Is doing so really worthwhile?
A. Yes. It protects against the worst case and makes the running time predictable. Hoare proposed this approach when he presented the algorithm in 1960—it is a prototypical (and among the first) randomized algorithm.

How is this solution an example of dynamic programming?

A lecturer gave this question in class:
[question]
A sequence of n integers is stored in
an array A[1..n]. An integer a in A is
called the majority if it appears more
than n/2 times in A.
An O(n) algorithm can be devised to
find the majority based on the
following observation: if two
different elements in the original
sequence are removed, then the
majority in the original sequence
remains the majority in the new
sequence. Using this observation, or
otherwise, write programming code to
find the majority, if one exists, in
O(n) time.
for which this solution was accepted
[solution]
int findCandidate(int[] a)
{
int maj_index = 0;
int count = 1;
for (int i=1;i<a.length;i++)
{
if (a[maj_index] == a[i])
count++;
else
count--;
if (count == 0)
{
maj_index =i;
count++;
}
}
return a[maj_index];
}
int findMajority(int[] a)
{
int c = findCandidate(a);
int count = 0;
for (int i=0;i<a.length;i++)
if (a[i] == c) count++;
if (count > n/2) return c;
return -1;//just a marker - no majority found
}
I can't see how the solution provided is a dynamic solution. And I can't see how based on the wording, he pulled that code out.
The origin of the term dynamic programming is trying to describe a really awesome way of optimizing certain kinds of solutions (dynamic was used since it sounded punchier). In other words, when you see "dynamic programming", you need to translate it into "awesome optimization".
'Dynamic programming' has nothing to do with dynamic allocation of memory or whatever, it's just an old term. In fact, it has little to do with modern meaing of "programming" also.
It is a method of solving of specific class of problems - when an optimal solution of subproblem is guaranteed to be part of optimal solution of bigger problem. For instance, if you want to pay $567 with a smallest amount of bills, the solution will contain at least one of solutions for $1..$566 and one more bill.
The code is just an application of the algorithm.
This is dynamic programming because the findCandidate function is breaking down the provided array into smaller, more manageable parts. In this case, he starts with the first array as a candidate for the majority. By increasing the count when it is encountered and decreasing the count when it is not, he determines if this is true. When the count equals zero, we know that the first i characters do not have a majority. By continually calculating the local majority we don't need to iterate through the array more than once in the candidate identification phase. We then check to see if that candidate is actually the majority by going through the array a second time, giving us O(n). It actually runs in 2n time, since we iterate twice, but the constant doesn't matter.

Resources