ARM/x86 : Sort vector efficiently [duplicate]

ARM/x86 : Sort vector efficiently [duplicate] - sorting

This question already has answers here:
Sorting 64-bit structs using AVX?
(2 answers)
Fast merge of sorted subsets of 4K floating-point numbers in L1/L2
(6 answers)
Closed 8 months ago.
Do you know a way to use a sorting algorithm that uses vectors intrinsics efficiently ?
I have to use the capability of loading, storing 4 floats at one operation and also other vectors operations.
I found this code for "Quick Sort".
Can you help me understand how to implement it with SIMD ?
int partition(float *arr, int low, int high)
{
float pivot;
int i, j;
// pivot (Element to be placed at right position)
pivot = arr[high];
i = (low - 1); // Index of smaller element and indicates the
// right position of pivot found so far
for (j = low; j <= high - 1; j++) {
// If current element is smaller than the pivot
if (arr[j] < pivot) {
i++; // increment index of smaller element
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
/* low –> Starting index, high –> Ending index */
void quickSort(float *arr, int low, int high)
{
int pi;
if (low < high) {
/* pi is partitioning index, arr[pi] is now at right place */
pi = partition(arr, low, high);
quickSort(arr, low, pi-1); // Before pi
quickSort(arr, pi + 1, high); // After pi
}
}

Related

Sorting with low memory size

What is the best way to sort a dictionary with 1Gbyte size(255 char for each word) with 2G of RAM?
I have already tried quicksort and didn't get the acceptable result.
This the quicksort code:
#include <iostream>
#include <fstream>
#include <cstring>
#define MAXL 4000000
using namespace std;
void swap(char *&ch1,char *&ch2)
{
char *temp = ch1;
ch1 = ch2;
ch2 = temp;
}
int partition (char **arr, int low, int high)
{
string pivot = arr[high]; // pivot
int i = (low - 1); // Index of smaller element
for (int j = low; j <= high- 1; j++)
{
// If current element is smaller than or
// equal to pivot
if (arr[j] <= pivot)
{
i++; // increment index of smaller element
swap(arr[i], arr[j]);
}
}
swap(arr[i + 1], arr[high]);
return (i + 1);
}
void quickSort(char **arr, int low, int high)
{
if (low < high)
{
int pi = partition(arr, low, high);
// Separately sort elements before
// partition and after partition
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
int main()
{
fstream file("input.txt",ios::in|ios::out|ios::app);
fstream o("output.txt",ios::out);
char **arr = new char*[MAXL];
for(int i=0;i<MAXL;i++)
arr[i] = new char[255];
long long i=0;
while(file)
{
//words are sepearated by spcae
file.getline(arr[i],256,' ');
i++;
}
file.close();
quickSort(arr, 0, i-2);
for(long long j=0;j<i-1;j++)
{
o << arr[j] << "\n";
}
}
It takes more than 10 minutes to sort the mentioned list but it shouldn't take more than 20 seconds.
(MAXL is the number of words in the 1G file and input words are stored in a text file)

If you can't fit it all in memory, a file-based merge sort will work well.

In-place algorithms are your solution. Find more here:
As another example, many sorting algorithms rearrange arrays into sorted order in-place, including bubble sort, comb sort, selection sort, insertion sort, heapsort, and Shell sort. These algorithms require only a few pointers, so their space complexity is O(log n).

Is it normal for quicksort to be inefficient when sorting a completely descending array? [duplicate]

This question already has answers here:
Quick sort Worst case
(6 answers)
What is the worst case scenario for quicksort?
(6 answers)
Closed 4 years ago.
#include <iostream>
#include<stdio.h>
#include<fstream>
using namespace std;
void swap(int* a, int* b)
{
int t = *a;
*a = *b;
*b = t;
}
int partition (int arr[], int low, int high)
{
int pivot = arr[high];
int i = (low - 1);
for (int j = low; j <= high- 1; j++)
{
if (arr[j] <= pivot)
{
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
void quickSort(int arr[], int low, int high)
{
if (low < high)
{
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
int main()
{
int arr[100000];
int i;
ifstream fin;
int n = 20000;
fin.open("reverse20k.txt");
if(fin.is_open())
{
for(i=0;i<n;i++)
fin>>arr[i];
}
quickSort(arr, 0, n-1);
return 0;
}
It takes this about 1.25 seconds to sort a 20k purely descending array, while it takes merge sort only 0.05. Is quick sort just extremely inefficient when sorting descending arrays, or is there just something wrong with the algorithm?

How does recursion break out of the first recursive quick sort call?

What signals the program to say, "Ok the first recursive quickSort call is done; proceed to the second recursive call"?
int partition (int arr[], int low, int high)
{
int pivot = arr[high]; // pivot
int i = (low - 1); // Index of smaller element
for (int j = low; j <= high- 1; j++)
{
if (arr[j] <= pivot)
{
i++; // increment index of smaller element
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
void quickSort(int arr[], int low, int high)
{
if (low < high)
{
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}

Your actual question roots to the Recursion Stack.
Let's first understand Recursion, which basically constitutes a method that keeps calling itself on increasingly smaller cases and repeats the same non recursive procedure each time until it reaches base case, at which is stops.
In the case of QuickSort, the base case of the recursion are lists of size zero or one, which never need to be sorted. If this is not the case, the array not meant to be sorted. That's why we call the QuickSort method again, twice, on arrays of smaller sizes.
We recurse on the side of the array containing all the elements from A[0] to A[i - 2], and the side of array containing the elements A[i] to A[A.length - 1].
Why do we leave out A[i - 1]? Simple - It's already in its correct place.

Adding sum of frequencies whille solving Optimal Binary search tree

I am referring to THIS problem and solution.
Firstly, I did not get why sum of frequencies is added in the recursive equation.
Can someone please help understand that with an example may be.
In Author's word.
We add sum of frequencies from i to j (see first term in the above
formula), this is added because every search will go through root and
one comparison will be done for every search.
In code, sum of frequencies (purpose of which I do not understand) ... corresponds to fsum.
int optCost(int freq[], int i, int j)
{
// Base cases
if (j < i) // If there are no elements in this subarray
return 0;
if (j == i) // If there is one element in this subarray
return freq[i];
// Get sum of freq[i], freq[i+1], ... freq[j]
int fsum = sum(freq, i, j);
// Initialize minimum value
int min = INT_MAX;
// One by one consider all elements as root and recursively find cost
// of the BST, compare the cost with min and update min if needed
for (int r = i; r <= j; ++r)
{
int cost = optCost(freq, i, r-1) + optCost(freq, r+1, j);
if (cost < min)
min = cost;
}
// Return minimum value
return min + fsum;
}
Secondly, this solution will just return the optimal cost. Any suggestions regarding how to get the actual bst ?

Why we need sum of frequencies
The idea behind sum of frequencies is to correctly calculate cost of particular tree. It behaves like accumulator value to store tree weight.
Imagine that on first level of recursion we start with all keys located on first level of the tree (we haven't picked any root element yet). Remember the weight function - it sums over all node weights multiplied by node level. For now weight of our tree equals to sum of weights of all keys because any of our keys can be located on any level (starting from first) and anyway we will have at least one weight for each key in our result.
1) Suppose that we found optimal root key, say key r. Next we move all our keys except r one level down because each of the elements left can be located at most on second level (first level is already occupied). Because of that we add weight of each key left to our sum because anyway for all of them we will have at least double weight. Keys left we split in two sub arrays according to r element(to the left from r and to the right) which we selected before.
2) Next step is to select optimal keys for second level, one from each of two sub arrays left from first step. After doing that we again move all keys left one level down and add their weights to the sum because they will be located at least on third level so we will have at least triple weight for each of them.
3) And so on.
I hope this explanation will give you some understanding of why we need this sum of frequencies.
Finding optimal bst
As author mentioned at the end of the article
2) In the above solutions, we have computed optimal cost only. The
solutions can be easily modified to store the structure of BSTs also.
We can create another auxiliary array of size n to store the structure
of tree. All we need to do is, store the chosen ‘r’ in the innermost
loop.
We can do just that. Below you will find my implementation.
Some notes about it:
1) I was forced to replace int[n][n] with utility class Matrix because I used Visual C++ and it does not support non-compile time constant expression as array size.
2) I used second implementation of the algorithm from article which you provided (with memorization) because it is much easier to add functionality to store optimal bst to it.
3) Author has mistake in his code:
Second loop for (int i=0; i<=n-L+1; i++) should have n-L as upper bound not n-L+1.
4) The way we store optimal bst is as follows:
For each pair i, j we store optimal key index. This is the same as for optimal cost but instead of storing optimal cost we store optimal key index. For example for 0, n-1 we will have index of the root key r of our result tree. Next we split our array in two according to root element index r and get their optimal key indexes. We can dot that by accessing matrix elements 0, r-1 and r+1, n-1. And so forth. Utility function 'PrintResultTree' uses this approach and prints result tree in in-order (left subtree, node, right subtree). So you basically get ordered list because it is binary search tree.
5) Please don't flame me for my code - I'm not really a c++ programmer. :)
int optimalSearchTree(int keys[], int freq[], int n, Matrix& optimalKeyIndexes)
{
/* Create an auxiliary 2D matrix to store results of subproblems */
Matrix cost(n,n);
optimalKeyIndexes = Matrix(n, n);
/* cost[i][j] = Optimal cost of binary search tree that can be
formed from keys[i] to keys[j].
cost[0][n-1] will store the resultant cost */
// For a single key, cost is equal to frequency of the key
for (int i = 0; i < n; i++)
cost.SetCell(i, i, freq[i]);
// Now we need to consider chains of length 2, 3, ... .
// L is chain length.
for (int L = 2; L <= n; L++)
{
// i is row number in cost[][]
for (int i = 0; i <= n - L; i++)
{
// Get column number j from row number i and chain length L
int j = i + L - 1;
cost.SetCell(i, j, INT_MAX);
// Try making all keys in interval keys[i..j] as root
for (int r = i; r <= j; r++)
{
// c = cost when keys[r] becomes root of this subtree
int c = ((r > i) ? cost.GetCell(i, r - 1) : 0) +
((r < j) ? cost.GetCell(r + 1, j) : 0) +
sum(freq, i, j);
if (c < cost.GetCell(i, j))
{
cost.SetCell(i, j, c);
optimalKeyIndexes.SetCell(i, j, r);
}
}
}
}
return cost.GetCell(0, n - 1);
}
Below is utility class Matrix:
class Matrix
{
private:
int rowCount;
int columnCount;
std::vector<int> cells;
public:
Matrix()
{
}
Matrix(int rows, int columns)
{
rowCount = rows;
columnCount = columns;
cells = std::vector<int>(rows * columns);
}
int GetCell(int rowNum, int columnNum)
{
return cells[columnNum + rowNum * columnCount];
}
void SetCell(int rowNum, int columnNum, int value)
{
cells[columnNum + rowNum * columnCount] = value;
}
};
And main method with utility function to print result tree in in-order:
//Print result tree in in-order
void PrintResultTree(
Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys)
{
if (startIndex == endIndex)
{
printf("%d\n", keys[startIndex]);
return;
}
else if (startIndex > endIndex)
{
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
PrintResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys);
printf("%d\n", keys[currentOptimalKeyIndex]);
PrintResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
return 0;
}
EDIT:
Below you can find code to create simple tree like structure.
Here is utility TreeNode class
struct TreeNode
{
public:
int Key;
TreeNode* Left;
TreeNode* Right;
};
Updated main function with BuildResultTree function
void BuildResultTree(Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys,
TreeNode*& tree)
{
if (startIndex > endIndex)
{
return;
}
tree = new TreeNode();
tree->Left = NULL;
tree->Right = NULL;
if (startIndex == endIndex)
{
tree->Key = keys[startIndex];
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
tree->Key = keys[currentOptimalKeyIndex];
BuildResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys, tree->Left);
BuildResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys, tree->Right);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
TreeNode* tree = new TreeNode();
BuildResultTree(optimalKeyIndexes, 0, n - 1, keys, tree);
return 0;
}

Modifying this Quicksort to always use the last element as the pivot

I have the following Quicksort that always chooses the first element of the subsequence as its pivot:
void qqsort(int array[], int start, int end) {
int i = start; // index of left-to-right scan
int k = end; // index of right-to-left scan
if (end - start >= 1) { // check that there are at least two elements to sort
int pivot = array[start]; // set the pivot as the first element in the partition
while (k > i) { // while the scan indices from left and right have not met,
while (array[i] <= pivot && i <= end && k > i) // from the left, look for the first element greater than the pivot
i++;
while (array[k] > pivot && k >= start && k >= i) // from the right, look for the first element not greater than the pivot
k--;
if (k > i) // if the left seekindex is still smaller than the right index, swap the corresponding elements
swap(array, i, k);
}
swap(array, start, k); // after the indices have crossed, swap the last element in the left partition with the pivot
qqsort(array, start, k - 1); // quicksort the left partition
qqsort(array, k + 1, end); // quicksort the right partition
} else { // if there is only one element in the partition, do not do any sorting
return;
}
}
Now as you can see, this algorithm always takes the first element to be the pivot: int pivot = array[start];
I want to modify this algorithm to make it always use the last element instead of the first element of the subsequence, because I want to analyze the physical running times of both implementations.
I tried changing the line int pivot = array[start]; to int pivot = array[end]; but the algorithm then outputted an unsorted sequence:
//Changes: int pivot = array[end];
unsorted: {5 4 3 2 1}
*sorted*: {1 2 5 4 3}
To test another pivot, I also tried using the center element of the subsequence but the algorithm still failed:
//Changes: int pivot = array[(start + end) / 2];
unsorted: {5 3 4 2 1}
*sorted*: {3 2 4 1 5}
Can someone please help me understand this algorithm correctly and tell me what changes do I need to make to successfully have this implementation always choose the last element of the subsequence as the pivot?

The Cause of the Problem
The problem is that you use int k = end;. It was fine to use int i = start; when you had the pivot element as the first element in the array because your checks in the loop will skim past it (array[i] <= pivot). However, when you use the last element as the pivot, k stops on the end index and switches the pivot to a position in the left half of the partition. Already you're in trouble because your pivot will most likely be somewhere inside of the left partition rather than at the border .
The Solution
To fix this, you need to set int k = end - 1; when you use the rightmost element as the pivot. You'll also need to change the lines for swapping the pivot to the border between the left and right partitions:
swap(array, i, end);
qqsort(array, start, i - 1);
qqsort(array, i + 1, end);
You have to use i for this because i will end up at the leftmost element of the right partition (which can then be swapped with the pivot being in the rightmost element and it will preserver the order). Lastly, you'll want to change k >= i to k > i in the while which decrements k or else there is small change of an array[-1] indexing error. This wasn't possible to happen before because i always at least was equal to i+1 by this point.
That should do it.
Sidenote:
This is a poorly written quicksort which I wouldn't recommend learning from. It has a some extraneous, unnecessary comparisons along with some other faults that I won't waste time listing. I would recommend using the quicksorts in this presentation by Sedgewick and Bentley.

I didn't test it, but check it anyway:
this
// after the indices have crossed,
// swap the last element in the left partition with the pivot
swap(array, start, k);
probably should be
swap(array, end, i);
or something similar, if we choose end as pivot.
Edit: That's an interesting partitioning algorithm, but it's not the standard one.
Well, the pivot is fixed in the logic of the partitioning.
The algorithm treats the first element as the Head and the rest elements as the Body to be partitioned.
After the partitioning is done, as a final step, the head (pivot) is swapped with the last element of the left partitioned part, to keep the ordering.
The only way I figured to use a different pivot, without changing the algorithm, is this:
...
if (end - start >= 1) {
// Swap the 1st element (Head) with the pivot
swap(array, start, pivot_index);
int pivot = array[start];
...

First hint: If the data are random, it does not matter, on the average, which value you choose as pivot. The only way to actually improve the "quality" of the pivot is to take more (e.g. 3) indices and use the one with median value of these.
Second hint: If you change the pivot value, you also need to change the pivot index. This is not named explicitly, but array[start] is swapped into the "middle" of the sorted subsequence at one point. You need to modify this line accordingly. If you take an index which is not at the edge of the subsequence, you need to swap it to the edge first, before the iteration.
Third hint: The code you provided is excessively commented. You should be able to actually understand this implementation.

Put a single
swap(array, start, end)
before initializing pivot
int pivot = array[start]

#include <time.h>
#include <stdlib.h>
#include<iostream>
#include<fstream>
using namespace std;
int counter=0;
void disp(int *a,int n)
{
for(int i=0;i<n;i++)
cout<<a[i]<<" ";
cout<<endl;
}
void swap(int a[],int p,int q)
{
int temp;
temp=a[p];
a[p]=a[q];
a[q]=temp;
}
int partition(int a[], int p, int start, int end)
{
swap(a,p,start);// to swap the pivot with the first element of the partition
counter+=end-start; // instead of (end-start+1)
int i=start+1;
for(int j=start+1 ; j<=end ; j++)
{
if(a[j]<a[start])
{
swap(a,j,i);
i++;
}
}
swap(a,start,i-1); // not swap(a,p,i-1) because p and start were already swaped..... this was the earlier mistake comitted
return i-1; // returning the adress of pivot
}
void quicksort(int a[],int start,int end)
{
if(start>=end)
return;
int p=end; // here we are choosing last element of the sub array as pivot
// here p is the index of the array where pivot is chosen randomly
int index=partition(a,p,start,end);
quicksort(a,start,index-1);
quicksort(a,index+1,end);
}
int main()
{
ifstream fin("data.txt");
int count=0;
int array[100000];
while(fin>>array[count])
{
count++;
}
quicksort(array,0,count-1);
/*
int a[]={32,56,34,45,23,54,78};
int n=sizeof(a)/sizeof(int);
disp(a,n);
quicksort(a,0,n-1);
disp(a,n);*/
cout<<endl<<counter;
return 0;
}

If you start monitoring each element from the 1st element of the array to the last - 1, keeping the last element as the pivot at every recursion, then you will get the answer in exact O(nlogn) time.
#include<stdio.h>
void quicksort(int [], int, int);
int main()
{
int n, i = 0, a[20];
scanf("%d", &n);
while(i < n)
scanf("%d", &a[i++]);
quicksort(a, 0, n - 1);
i = 0;
while(i < n)
printf("%d", a[i++]);
}
void quicksort(int a[], int p, int r)
{
int i, j, x, temp;
if(p < r)
{
i = p;
x = a[r];
for(j = p; j < r; j++)
{
if(a[j] <= x)
{
if(a[j] <a[i])
{
temp = a[j];
a[j] = a[i];
a[i] = temp;
}
i++;
}
else
{
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
if(x != i)
{
temp = a[r];
a[r] = a[i];
a[i] = temp;
}
quicksort(a, p, i - 1);
quicksort(a, i + 1, r);
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ARM/x86 : Sort vector efficiently [duplicate] - sorting

Related

Sorting with low memory size

Is it normal for quicksort to be inefficient when sorting a completely descending array? [duplicate]

How does recursion break out of the first recursive quick sort call?

Adding sum of frequencies whille solving Optimal Binary search tree

Modifying this Quicksort to always use the last element as the pivot

Categories

Resources