Partial Insertion Sort - algorithm

Is it possible to sort only the first k elements from an array using insertion sort principles?
Because as the algorithm runs over the array, it will sort accordingly.
Since it is needed to check all the elements (to find out who is the smallest), it will eventually sort the whole thing.
Example:
Original array: {5, 3, 8, 1, 6, 2, 8, 3, 10}
Expected output for k = 3: {1, 2, 3, 5, 8, 6, 8, 3, 10} (Only the first k elements were sorted, the rest of the elements are not)

Such partial sorting is possible while resulting method looks like hybrid of selection sort - in the part of search of the smallest element in the tail of array, and insertion sort - in the part of shifting elements (but without comparisons). Sorting preserves order of tail elements (though it was not asked explicitly)
Ideone
void ksort(int a[], int n, int k)
{ int i, j, t;
for (i = 0; i < k; i++)
{ int min = i;
for (j = i+1; j < n; j++)
if (a[j] < a[min]) min = j;
t = a[min];
for (j = min; j > i; j--)
a[j] = a[j-1];
a[i] = t;
}
}

Yes, it is possible. This will run in time O(k n) where n is the size of your array.
You are better off using heapsort. It will run in time O(n + k log(n)) instead. The heapify step is O(n), then each element extracted is O(log(n)).
A technical note. If you're clever, you'll establish the heap backwards to the end of your array. So when you think of it as a tree, put the n-2i, n-2i-1th elements below the n-ith one. So take your array:
{5, 3, 8, 1, 6, 2, 8, 3, 10}
That is a tree like so:
10
3
2
3
5
6
8
1
8
When we heapify we get the tree:
1
2
3
3
5
6
8
10
8
Which is to say the array:
{5, 3, 8, 10, 6, 3, 8, 2, 1}
And now each element extraction requires swapping the last element to the final location, then letting the large element "fall down the tree". Like this:
# swap
{1*, 3, 8, 10, 6, 3, 8, 2, 5*}
# the 5 compares with 8, 2 and swaps with the 2:
{1, 3, 8, 10, 6, 3, 8?, 5*, 2*}
# the 5 compares with 3, 6 and swaps with the 3:
{1, 3, 8, 10, 6?, 5*, 8, 3*, 2}
# The 5 compares with the 3 and swaps, note that 1 is now outside of the tree:
{1, 5*, 8, 10, 6, 3*, 8, 3, 2}
Which in a array-tree representation is:
{1}
2
3
3
5
6
8
10
8
Repeat again and we get:
# Swap
{1, 2, 8, 10, 6, 3, 8, 3, 5}
# Fall
{1, 2, 8, 10, 6, 5, 8, 3, 3}
aka:
{1, 2}
3
3
5
6
8
10
8
And again:
# swap
{1, 2, 3, 10, 6, 5, 8, 3, 8}
# fall
{1, 2, 3, 10, 6, 8, 8, 5, 3}
or
{1, 2, 3}
3
5
8
6
8
10
And so on.

Just in case anyone needs this in the future, I came up with a solution that is "pure" in the sense of not being a hybrid between the original Insertion sort and some other sorting algorithm.
void partialInsertionSort(int A[], int n, int k){
int i, j, aux, start;
int count = 0;
for(i = 1; i < n; i++){
aux = A[i];
if (i > k-1){
start = k - 1;
//This next part is needed only to maintain
//the original element order
if(A[i] < A[k])
A[i] = A[k];
}
else start = i - 1;
for(j = start; j >= 0 && A[j] > aux; j--)
A[j+1] = A[j];
A[j+1] = aux;
}
}
Basically, this algorithm sorts the first k elements. Then, the k-th element acts like a pivot: only when the remaining array elements are smaller than this pivot, it is then inserted in the corrected position between the sorted k elements just like in the original algorithm.
Best case scenario: array is already ordered
Considering that comparison is the basic operation, then the number of comparisons is 2n-k-1 → Θ(n)
Worst case scenario: array is ordered in reverse
Considering that comparison is the basic operation, then the number of comparisons is (2kn - k² - 3k + 2n)/2 → Θ(kn)
(Both take into account the comparison made to maintain the array order)

Related

Longest Increasing subsequence length in NlogN.[Understanding the Algo]

Problem Statement: Aim is to find the longest increasing subsequence(not contiguous) in nlogn time.
Algorithm: I understood the algorithm as explained here :
http://www.geeksforgeeks.org/longest-monotonically-increasing-subsequence-size-n-log-n/.
What i did not understand is what is getting stored in tail in the following code.
int LongestIncreasingSubsequenceLength(std::vector<int> &v) {
if (v.size() == 0)
return 0;
std::vector<int> tail(v.size(), 0);
int length = 1; // always points empty slot in tail
tail[0] = v[0];
for (size_t i = 1; i < v.size(); i++) {
if (v[i] < tail[0])
// new smallest value
tail[0] = v[i];
else if (v[i] > tail[length-1])
// v[i] extends largest subsequence
tail[length++] = v[i];
else
// v[i] will become end candidate of an existing subsequence or
// Throw away larger elements in all LIS, to make room for upcoming grater elements than v[i]
// (and also, v[i] would have already appeared in one of LIS, identify the location and replace it)
tail[CeilIndex(tail, -1, length-1, v[i])] = v[i];
}
return length;
}
For example ,if input is {2,5,3,,11,8,10,13,6},
the code gives correct length as 6.
But tail will be storing 2,3,6,8,10,13.
So I want to understand what is stored in tail?.This will help me in understanding correctness of this algo.
tail[i] is the minimal end value of the increasing subsequence (IS) of length i+1.
That's why tail[0] is the 'smallest value' and why we can increase the value of LIS (length++) when the current value is bigger than end value of the current longest sequence.
Let's assume that your example is the starting values of the input:
input = 2, 5, 3, 7, 11, 8, 10, 13, 6, ...
After 9 steps of our algorithm tail looks like this:
tail = 2, 3, 6, 8, 10, 13, ...
What does tail[2] means? It means that the best IS of length 3 ends with tail[2]. And we could build an IS of length 4 expanding it with the number that is bigger than tail[2].
tail[0] = 2, IS length = 1: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[1] = 3, IS length = 2: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[2] = 6, IS length = 3: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[3] = 8, IS length = 4: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[4] = 10,IS length = 5: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[5] = 13,IS length = 6: 2, 5, 3, 7, 11, 8, 10, 13, 6
This presentation allows you to use binary search (note that defined part of tail is always sorted) to update tail and to find the result at the end of the algorithm.
Tail srotes the Longest Increasing Subsequence (LIS).
It will update itself following the explanation given in the link you provided and claimed to have understood. Check the example.
You want the minimum value at the first element of the tail, which explains the first if statement.
The second if statement is there to allow the LIS to grow, since we want to maximize its length.

Partitioning an array not coming out correctly

I'm trying to partition an array so that each element in the first half of the array is less than each element in the second half of the array. This is the same partition algorithm that is used in quick sort. For some reason I can get the array A = [2, 8, 7, 1, 3, 5, 6, 4] to work but A = [7, 3, 6, 1, 9, 5, 4, 8] will not work.
def partition(A):
x = A[len(A)-1]
i = -1
for j in range (0, len(A)-2):
if A[j]<=x:
i = i + 1
# exchange A[j] and A[i]
jValue = A[j]
A[j] = A[i]
A[i] = jValue
# exchange A[len(A)-1] and A[i+1]
rValue = A[len(A)-1]
A[len(A)-1] = A[i+1]
A[i+1] = rValue
print(A)
The issue is the code needs to pick a pivot that represents the median of the array. This could be done using quick select or something similar. Note that quick select has a worst case time complexity of O(n^2). Wiki article:
http://en.wikipedia.org/wiki/Quickselect

Algorithm to generate Diagonal Latin Square matrix

I need for given N create N*N matrix which does not have repetitions in rows, cells, minor and major diagonals and values are 1, 2 , 3, ...., N.
For N = 4 one of matrices is the following:
1 2 3 4
3 4 1 2
4 3 2 1
2 1 4 3
Problem overview
The math structure you described is Diagonal Latin Square. Constructing them is the more mathematical problem than the algorithmic or programmatic.
To correctly understand what it is and how to create you should read following articles:
Latin squares definition
Magic squares definition
Diagonal Latin square construction <-- p.2 is answer to your question with proof and with other interesting properties
Short answer
One of the possible ways to construct Diagonal Latin Square:
Let N is the power of required matrix L.
If there are exist numbers A and B from range [0; N-1] which satisfy properties:
A relativly prime to N
B relatively prime to N
(A + B) relatively prime to N
(A - B) relatively prime to N
Then you can create required matrix with the following rule:
L[i][j] = (A * i + B * j) mod N
It would be nice to do this mathematically, but I'll propose the simplest algorithm that I can think of - brute force.
At a high level
we can represent a matrix as an array of arrays
for a given N, construct S a set of arrays, which contains every combination of [1..N]. There will be N! of these.
using an recursive & iterative selection process (e.g. a search tree), search through all orders of these arrays until one of the 'uniqueness' rules is broken
For example, in your N = 4 problem, I'd construct
S = [
[1,2,3,4], [1,2,4,3]
[1,3,2,4], [1,3,4,2]
[1,4,2,3], [1,4,3,2]
[2,1,3,4], [2,1,4,3]
[2,3,1,4], [2,3,4,1]
[2,4,1,3], [2,4,3,1]
[3,1,2,4], [3,1,4,2]
// etc
]
R = new int[4][4]
Then the algorithm is something like
If R is 'full', you're done
Evaluate does the next row from S fit into R,
if yes, insert it into R, reset the iterator on S, and go to 1.
if no, increment the iterator on S
If there are more rows to check in S, go to 2.
Else you've iterated across S and none of the rows fit, so remove the most recent row added to R and go to 1. In other words, explore another branch.
To improve the efficiency of this algorithm, implement a better data structure. Rather than a flat array of all combinations, use a prefix tree / Trie of some sort to both reduce the storage size of the 'options' and reduce the search area within each iteration.
Here's a method which is fast for N <= 9 : (python)
import random
def generate(n):
a = [[0] * n for _ in range(n)]
def rec(i, j):
if i == n - 1 and j == n:
return True
if j == n:
return rec(i + 1, 0)
candidate = set(range(1, n + 1))
for k in range(i):
candidate.discard(a[k][j])
for k in range(j):
candidate.discard(a[i][k])
if i == j:
for k in range(i):
candidate.discard(a[k][k])
if i + j == n - 1:
for k in range(i):
candidate.discard(a[k][n - 1 - k])
candidate_list = list(candidate)
random.shuffle(candidate_list)
for e in candidate_list:
a[i][j] = e
if rec(i, j + 1):
return True
a[i][j] = 0
return False
rec(0, 0)
return a
for row in generate(9):
print(row)
Output:
[8, 5, 4, 7, 1, 6, 2, 9, 3]
[2, 7, 5, 8, 4, 1, 3, 6, 9]
[9, 1, 2, 3, 6, 4, 8, 7, 5]
[3, 9, 7, 6, 2, 5, 1, 4, 8]
[5, 8, 3, 1, 9, 7, 6, 2, 4]
[4, 6, 9, 2, 8, 3, 5, 1, 7]
[6, 3, 1, 5, 7, 9, 4, 8, 2]
[1, 4, 8, 9, 3, 2, 7, 5, 6]
[7, 2, 6, 4, 5, 8, 9, 3, 1]

Algorithms for bucket sort

How can I bucket sort an array of integers that contains negative numbers?
And, what's the difference between bucket sort and counting sort?
Bucket sort for negative values
Using Bucket sort for negative values simply requires mapping each element to a bucket proportional to its a distance from the minimal value to be sorted.
For example when using a bucket per value (as suggested above) for the following input would be as follows:
input array: {4, 2, -2, 2, 4, -1, 0}
min = -2
bucket0: {-2}
bucket1: {-1}
bucket2: {0}
bucket3: {}
bucket4: {2, 2}
bucket5: {}
bucket6: {4, 4}
Suggested algorithm
#A: array to be sorted
#count: number of items in A
#max: maximal value in A
#min: minimal value in A
procedure BucketSort(A, count, max, min)
#calculate the range of item in each bucket
bucketRange = (max - min + 1) / bucketsCount
#distribute the item to the buckets
for each item in A:
bucket[(item.value - min) / bucketRange].push(item)
#sort each bucket and build the sorted array A
index = 0
for bucket in {0...bucketsCount}:
sort(bucket)
for item in {0...itemsInBucket}:
A[index] = item
index++
C++ implementation
Notice the bucketRange which is proportional to the range between max and min
#include <iostream>
#include <stdio.h>
#include <vector>
#include <algorithm> // std::sort
#include <stdlib.h> // rand
#include <limits> // numeric_limits
using namespace std;
#define MAX_BUCKETS_COUNT (10) // choose this according to your space limitations
void BucketSort(int * arr, int count, int max, int min)
{
if (count == 0 or max == min)
{
return;
}
// set the number of buckets to use
int bucketsCount = std::min(count, MAX_BUCKETS_COUNT);
vector<int> *buckets = new vector<int>[bucketsCount];
// using this range we will we distribute the items into the buckets
double bucketRange = (((double)max - min + 1) / (bucketsCount));
for (int i = 0; i < count; ++i)
{
int bucket = (int)((arr[i] - min) / bucketRange);
buckets[bucket].push_back(arr[i]);
}
int index = 0;
for (int i = 0; i < bucketsCount; ++i)
{
// here we sort each bucket O(klog(k) - k being the number of item in the bucket
sort(buckets[i].begin(), buckets[i].end());
for (vector<int>::iterator iter = buckets[i].begin(); iter != buckets[i].end(); ++iter)
{
arr[index] = *iter;
++index;
}
}
delete[] buckets;
}
Testing the code
int main ()
{
int items = 50;
int data[items];
int shift = 15;//inorder to get some negative values in the array
int max = std::numeric_limits<int>::min();
int min = std::numeric_limits<int>::max();
printf("before sorting: ");
for (int i = 0; i < items; ++i)
{
data[i] = rand() % items - shift;
data[i] < min ? min = data[i]: true;
data[i] > max ? max = data[i]: true;
printf("%d ,", data[i]);
}
printf("\n");
BucketSort(data, items, max, min);
printf("after sorting: ");
for (int i = 0; i < items; ++i)
{
printf("%d ,", data[i]);
}
printf("\n");
return 0;
}
This is basically a link only answer but it gives you the information you need to formulate a good question.
Bucket Sort
Wikipedia's step 1, where you "Set up an array of initially empty buckets", will need to include buckets for negative numbers.
Counting Sort
"Compared to counting sort, bucket sort requires linked lists, dynamic arrays or a large amount of preallocated memory to hold the sets of items within each bucket, whereas counting sort instead stores a single number (the count of items) per bucket."
Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm.
Steps:
Set up an array of initially empty "buckets".
Scatter: Go over the original array, putting each object in its bucket.
Sort each non-empty bucket.
Gather: Visit the buckets in order and put all elements back into the original array.
Bucket sort assumes that the input is drawn from a uniform distribution and has an average-case running time of O(n). The computational complexity estimates involve the number of buckets.
Worst case performance: O(n^2)
Best case performance: Omega(n+k)
Average case performance: Theta(n+k)
Worst case space complexity: O(n.k)
For implementation and pictographic understanding:
http://javaexplorer03.blogspot.in/2015/11/bucket-sort-or-bin-sort.html
Bucket Sort needs an ordered dictionary with the unique values as the keys with their respective frequencies as the values. This is what the first line does and assigns this dictionary to k.
The second line returns a python list using double list comprehension to output the ordered key 'frequency' times. Sum(..., []) flattens
neglist = [-1, 4, 5, 6, 7, 3, 4, 3, 2, 5, 8, -2, 7, 8, 0, -3, 7, 3, 7, 3, 1, 15, 12, 4, 5, 6, 7, 3, 1, 15]
poslist = [4, 2, 7, 9, 12, 3, 7]
def bucket(k):
k = dict((uni, k.count(uni)) for uni in list(set(k)))
return sum(([key for i in range(k.get(key))] for key in sorted(k.keys())), [])
print("NegList: ", bucket(neglist))
print("PosList: ", bucket(poslist))
'''
NegList: [-3, -2, -1, 0, 1, 1, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7, 7, 7, 8, 8, 12, 15, 15]
PosList: [2, 3, 4, 7, 7, 9, 12]
'''

How to sort an array in linear timer and in place?

question origin
Given an unsorted array of size n containing objects with ids of 0 … n-1, sort the array in place and in linear time. Assume that the objects contain large members such as binary data, so instantiating new copies of the objects is prohibitively expensive.
void linearSort(int* input, const int n) {
for (int i = 0; i < n; i++) {
while (input[i] != i) {
// swap
int swapPoint = input[i];
input[i] = input[swapPoint];
input[swapPoint] = swapPoint;
}
}
}
Is this linear? Does this sort work with any kind of array of ints? If so, why do we need quicksort anymore?
Despite the while loop inside the for, this sort is linear O(n). If the while loop occurs multiple times for a given i then for the i values that match swapPoint there will not execute the while loop at all.
This implementation will only work for arrays of ints where there are no duplicates and the values are sequential from 0 to n-1, which is why Quicksort still is relevant being O(n log n) because it works with non-sequential values.
This can be easily tested by making the worst case:
input = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
and then using the following code:
int whileCount = 0;
for (int i = 0; i < n; i++)
{
while (input[i] != i)
{
whileCount++;
// swap
int swapPoint = input[i];
input[i] = input[swapPoint];
input[swapPoint] = swapPoint;
}
Console.WriteLine("for: {0}, while: {1}", i, whileCount);
}
The output will be as follows:
for: 0, while: 9
for: 1, while: 9
for: 2, while: 9
for: 3, while: 9
for: 4, while: 9
for: 5, while: 9
for: 6, while: 9
for: 7, while: 9
for: 8, while: 9
for: 9, while: 9
so you see even in the worst case where you have the while loop run n-1 times in the first iteration of the for loop, you still only get n-1 iterations of the while loop for the entire process.
Further examples with random data:
{7, 1, 2, 4, 3, 5, 0, 6, 8, 9} => 2 on i=0, 1 on i=3 and nothing more. (total 3 while loop runs)
{7, 8, 2, 1, 0, 3, 4, 5, 6, 9} => 7 on i=0 and nothing more (total 7 while loop runs)
{9, 8, 7, 4, 3, 1, 0, 2, 5, 6} => 2 on i=0, 2 on i=1, 1 on i=2, 1 on i=3 (total 6 while loop runs)
Each you put input[i] to the position swapPoint, which is exactly where it needs to go. So in the following steps those elements are already at the right place and the total time of exchange won't exceed the size n.

Resources