Atomic Exchange Sorting Algorithm in MultiGPU

Atomic Exchange Sorting Algorithm in MultiGPU - sorting

How can atomic exchange sorting algorithm can be implemented in MultiGPU? Is there references available??

It would help if you point out an algorithm that may be used, as a guideline to help answer this question.
So, I pulled an algorithm from: http://www.codingunit.com/exchange-sort-algorithm
Here is the basic algorithm:
int main(void)
{
int array[5]; // An array of integers.
int length = 5; // Lenght of the array.
int i, j;
int temp;
//Some input
for (i = 0; i < 5; i++)
{
cout << "Enter a number: ";
cin >> array[i];
}
//Algorithm
for(i = 0; i < (length -1); i++)
{
for (j=(i + 1); j < length; j++)
{
if (array[i] < array[j])
{
temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
}
//Some output
for (i = 0; i < 5; i++)
{
cout << array[i] << endl;
}
}
You may want to look at this page for some source code that may help:
http://www.bealto.com/gpu-sorting.html
But, if you use OpenCL and the equation above, you may want to do something like this:
Open a connection to each card.
Then, where they have the outer loop, send each of those, perhaps in a round-robin to each card.
You will need to then do a final sort on one GPU to finish, but you may want to use a different algorithm, as this algorithm is best on a single-threaded CPU.

Related

Finding number of occurrences of elements in array and printing them in in ascending order of elements

So, like the question says I want to arrange the occurrences of these elements in an ascending order of those elements. For example- if I input 7-3 times and 3-2 times, then output should be printed with 3-2 first and then (next line) 7-3. If you see the for loop with the comment to sort through the array, without that for loop the code works fine but doesn't print the elements in an ascending order. Let me know what you think about this and why that for loop isn't working?
#include<stdio.h>
int x;
int main()
{ int a[10000],b[10000],i,j,n,x,c=0,p,q ;
scanf("%d",&n);
for(i=0; i<n; i++)
{
scanf("%d",&a[i]);
}
for(i=0; i<n; i++)
{ c=1;
if(a[i]!=-1)
{ for(j=i+1; j<n; j++)
{
if(a[i]==a[j])
{ c++;
a[j]=-1;
}
}
b[i]=c;
}
}
for (i = 0; i < n; ++i) \\for loop to sort a[i] elements in ascending order
{ for (j = i + 1; j < n; ++j)
{
if (a[i] > a[j])
{
x = a[i];
a[i] = a[j];
a[j] = x;
}
}
}
for(i=0; i<n; i++)
{
if(a[i]!=-1 && b[i]>1)
{
printf("%d-%d\n",a[i],b[i]);
}
}
return 0;
}

You can do it either in O(n * lg n) e.g. using sorting or in expected linear time using std::map, I'm not sure if there is something like this in C.
Example impl. w/ sorting:
#include <iostream>
#include <vector>
#include <algorithm>
std::vector<int> v = {3,7,7,7,3,7,7};
std::sort(v.begin(), v.end());
for (int i = 0; i < v.size(); ++i) {
int number = v[i];
int count = 1;
while (v[i + count] == number) ++count;
i = i + count;
std::cout << number << " " << count << std::endl;
}
If you know that range of elements in the array is small enough you can use radix sort and so get it done in linear time.
About your implementation.
You are good with the first loop.
In the second loop, you need to take into account -1 entries. Also you need to swap not only a but b entries as well.
Check for b[i] equals to 1. You can initialize it to 0 before c=1; and drop b[i] > 1 check.
Few more comments not related to correctness. Do not use magic number -1, give it a name, and then use it. Do not declare all variables at the beginning of the function, declare every variable as close as possible to the first use.

Will this Selection Sort Code work in O(n) for best case?

I search everywhere on the internet for the best case time complexity of selection sort that is o(n^2). But i write and tested this below code of selection sort that can work in O(n) for best case (that is array is already sorted). Please find the mistake in this program
This is my code:
#include <bits/stdc++.h>
using namespace std;
/* Function to print an array */
void printArray(int arr[], int size)
{
int i;
for (i = 0; i < size; i++)
cout << arr[i] << " ";
cout << endl;
}
void swap(int *xp, int *yp)
{
int temp = *xp;
*xp = *yp;
*yp = temp;
}
void selectionSort(int arr[], int n)
{
int i, j, max_idx;
// One by one move boundary of unsorted subarray
for (i = 0; i < n - 1; i++)
{
cout << endl;
printArray(arr, n);
// Find the minimum element in unsorted array
max_idx = 0;
int count = 0;
for (j = 1; j < n - i; j++)
{
if (arr[j] >= arr[max_idx])
{
max_idx = j;
count++;
}
}
if (count != n - i - 1)
{ //swap only if not already sorted
// Swap the found minimum element with the first element
swap(&arr[max_idx], &arr[n - i - 1]);
}
else //already Sorted so returning
{
return;
}
//cout << "Sorted array: \n";
printArray(arr, n);
}
}
// Driver program to test above functions
int main()
{
int arr[] = {2, 1, 4, 3, 6, 5, 8, 7};
int n = sizeof(arr) / sizeof(arr[0]);
selectionSort(arr, n);
cout << "Sorted array: \n";
printArray(arr, n);
return 0;
}
// This is code is contributed by www.bhattiacademy.com

Yes, your algorithm has a best case running time of Θ(n), because if the array is already in ascending order then count will equal n - 1 on the first iteration of the outer loop, so the algorithm will terminate early.
Your algorithm is different to the standard selection sort algorithm, which looks like this:
for(int i = 0; i < n - 1; i++) {
int min_idx = i;
for(int j = i + 1; j < n; j++) {
if(arr[j] < arr[min_idx]) {
min_idx = j;
}
}
swap(&arr[i], &arr[min_idx]);
}
The selection sort algorithm iteratively searches for the minimum remaining element and swaps it into place. This doesn't create an opportunity to detect that the array is already in increasing order, so there's no opportunity to terminate early, and selection sort's best case running time is therefore Θ(n2).

Selection Sort: Idea Given an array of n items
1.Find the largest item x, in the range of [0…n−1]
2.Swap x with the (n−1)th item
3.Reduce n by 1 and go to Step 1
Selection sort function you can use following algorithm has hint to write the code:

Sort an array of numbers into ascending order and find the position of a number within the list

I have produced an unsorted array of values of which I would like to put into an ascending order and determine the new position of the last number.
I have previously attempted successfully at other smaller input but got stuck on the last one where the the list consists of 1824300 values and the terminal just wouldn't run the sorting algorithm at all...
#include <stdio.h>
int main(void)
{
signed value = 16239, num = 1824300, i, j;
signed temp;
signed arr[num];
arr[0] = value;
printf("Your initial array is:\n");
printf("%i\n", arr[0]);
for (i = 1; i < num; i++)
{
value = (value*31334)%31337;
arr[i]= value;
printf("%i: ", i);
printf("%i\n", arr[i]);
}
// Insertion sort
for(i = 1; i < num; i++)
{
j = i;
temp = arr[j];
while((j > 0) && (arr[j - 1] > temp))
{
arr[j] = arr[j -1];
arr[j - 1] = temp;
j--;
}
}
insertion sort //
printf("Your sorted array is:\n");
for(i = 0; i < num; i++)
{
printf("%i: ", i);
printf("%i\n", arr[i]);
}
return 0;
}
Can someone please help me on it?
P.S. I am completely new to programming so my code might be very inefficient and messy so sorry about that!!
Thanks a lot!!!

So basically below is what I did at the end. I just inserted a simple counter! and it worked fine... Thank you for everyone who tried to help. Your answers are valuable to me and I am still learning how to implement the algorithms into codes which is a bit difficult for me since I have no programming experience before :((( The algorithms are not tricky to understand at all though...
signed count;
count = 0;
for (i = 0; i < num; i++)
{
if (arr[i] <= value)
{
count = count + 1;
}
}
printf("This is the index of your output in a sorted list: \n");
printf("%i\n", count);

Finding all possible sub-optimal(not optimal!!!) solutions in optimization

I am writing a CPLEX optimization code to generate a matrix, which takes r and n as the command line arguments, but they may be assumed 2 and 4 for now.
The condition for generating the matrix is that the sum of elements in any row or in any column should equal 10, where the elements are integers between 0 and 10. (i.e. doubly-stochastic matrix)
I turned this condition into the constraint, and generated the matrix, but it only gives a matrix with 10s and 0s.
I think it is because CPLEX always finds the "optimal" solution, but for the problem I want to solve, this is not going to help much.
I want matrices with some 6, 7, 8, 9, 10, and 0~5 for the rest.
I want to generate all possible matrices satisfying such condition (and some more condition to be added later) so that I could test all of them and exhaust the case.
How can I do that?
I am looking into this solution pool thing, and it is not easy..
Also,
cplex.out() << "number of solutions = " << cplex.getSolnPoolNsolns() << endl;
this gives 1... meaning that there is only one solution, while I know there are millions of those matrices.
If you have any ideas how to generate all the 'sub-optimal' matrices, please help me.
Thank you.
I attached my code in IPGenMat.cpp, and aa.sol was the solution it gave me.
I also copied it here below.
(In short, two questions: 1. how can I find 'less optimal' solutions? 2. how can I find all of such solutions?)
#include<ilcplex/ilocplex.h>
#include<vector>
#include<iostream>
#include<sstream>
#include<string>
using namespace std;
int main(int argc, char** argv) {
if (argc < 2) {
cerr << "Error: " << endl;
return 1;
}
else {
int r, n;
stringstream rValue(argv[1]);
stringstream nValue(argv[2]);
rValue >> r;
nValue >> n;
int N=n*r;
int ds = 10; //10 if doubly-stochastic, smaller if sub-doubly stochastic
IloEnv env;
try {
IloModel model(env);
IloArray<IloNumVarArray> m(env, N);
for (int i=0; i<N; i++) {
m[i] = IloNumVarArray(env, N, 0, 10, ILOINT);
}
IloArray<IloExpr> sumInRow(env, N);
for (int i=0; i<N; i++) {
sumInRow[i] = IloExpr(env);
}
for (int i=0; i<N; i++) {
for (int j=0; j<N; j++) {
sumInRow[i] += m[i][j];
}
}
IloArray<IloRange> rowEq(env, N);
for (int i=0; i<N; i++) {
rowEq[i] = IloRange(env, ds, sumInRow[i], 10); //doubly stochastic
}
IloArray<IloExpr> sumInColumn(env, N);
for (int i=0; i<N; i++) {
sumInColumn[i] = IloExpr(env);
}
for (int i=0; i<N; i++) {
for (int j=0; j<N; j++) {
sumInColumn[i] += m[j][i];
}
}
IloArray<IloRange> columnEq(env, N);
for (int i=0; i<N; i++) {
columnEq[i] = IloRange(env, ds, sumInColumn[i], 10); //doubly stochastic
}
for (int i=0; i<N; i++) {
model.add(rowEq[i]);
model.add(columnEq[i]);
}
IloCplex cplex(env);
cplex.extract(model);
cplex.setParam(IloCplex::SolnPoolAGap,0.0);
cplex.setParam(IloCplex::SolnPoolIntensity,4);
cplex.setParam(IloCplex::PopulateLim, 2100000000);
cplex.populate();//.solve();
cplex.out() << "solution status = " << cplex.getStatus() << endl;
cplex.out() << "number of solutions = " << cplex.getSolnPoolNsolns() << endl;
cplex.out() << endl;
cplex.writeSolutions("aa.sol");
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
cplex.out() << cplex.getValue(m[i][j]) << " | ";
}
cplex.out() << endl;
}
cplex.out() << endl;
}
catch(IloException& e) {
cerr << " ERROR: " << e << endl;
}
catch(...) {
cerr << " ERROR: " << endl;
}
env.end();
return 0;
}
}

You might try using PORTA's vint utility or PPL for this instead. CPLEX is geared for optimissation problems, not enumeration problems.
I'd add that, while your problem is a tiny optimisation problem, it's a really huge enumeration problem. There are likely to be far more solutions that you'd know what to do with. You might try narrowing down what you want and trying to express that using linear inequalities.

SolnPoolAGap Sets an absolute tolerance on the objective value for the solutions in the solution pool. Solutions that are worse (either greater in the case of a minimization, or less in the case of a maximization) than the objective of the incumbent solution according to this measure are not kept in the solution pool.
So, to obtain sub-optimal solutions you should put a higher value than 0.0
in this parameter

Let's just assume your solution is some matrix with entries m_i_j. Express your problem in terms of a set of binary decision variables, e.g. m_i_j_v meaning "the matrix at row i and column i has value v". Then after you solve the problem, you can take add another constraint that sums over all the decision variables that are set, and force them to be N-1. This will exclude this as the solution. Rinse an Repeat until the problem becomes infeasible.

Breadth first search: Knight cover

I'm trying to follow the USACO training course on algorithms (http://ace.delos.com/usacogate) - and I am currently at a page that describes DFS, BFS etc. I do understand these concepts, but the sample problem they've given for BFS - knight cover - has me puzzled. Here's the problem statement:
Place as few knights as possible on an n x n chess board so that every square is attacked. A knight is not considered to attack the square on which it sits.
This is BFS, the page says, since it tries to see if there's a solution with n knights before trying n+1 knights - which is pretty clear.
However, I don't understand how to formulate the solution from this alone. Can someone help me with the pseudocode for this?
Thanks much in advance!

It is BFS, but you don't search the chessboard; search the space of placements:
Initial state: no knight is placed
Valid move: place a knight on any unoccupied tile
Goal state: all tiles are either occupied or attacked
basic algorithm (BFS of the state space):
push the initial state to the BFS queue.
while there is something in the queue:
remove one state from the queue.
for every unoccupied tile:
create a copy of the current state.
add one knight to that tile.
if the new state doesn't exist in the queue:
if the new state is a goal state, finish.
else add it to the queue.
Note that I'm assuming that all paths to a state are of the same length. This is true when looking for a set of placements this way, but it is not true in general. In cases where this is not true, you should store the set of all visited nodes to avoid revisiting already explored states.
You may require the knights be added left-to-right, top-to-bottom. Then you don't need to check for duplicates in the queue. Additionally, you may discard a state early if you know that an unattacked tile cannot be attacked without violating the insertion order.
If you don't do this and leave the duplicate check as well, the algorithm will still produce correct results, but it will do so much slower. 40 000 times slower, approximately (8!=40 320 is the number of duplicates of an 8-knight state).
If you want a faster algorithm, look into A*. Here, one possible heuristic is:
count the number of unattacked and unoccupied tiles
divide the count by nine, rounding up (a knight cannot attack more than eight new tiles or occupy more than one)
the distance (number of knights needed to be added) is no more than this number.
A better heuristic would note the fact that a knight can only attack tiles of the same color, and occupy a tile of the opposite color. This may improve the previous heuristic slightly (but still potentially help a lot).
A better heuristic should be able to exploit the fact that a knight can cover free spots in no more than a 5x5 square. A heuristic should compute fast, but this may help when there are few spots to cover.
Technical details:
You may represent each state as a 64-bit bit-mask. While this requires some bitwise manipulation, it really helps memory, and equality checking of 64-bit numbers is fast. If you can't have a 64-bit number, use two 32-bit numbers - these should be available.
Circular array queue is efficient, and it's not that hard to expand its capacity. If you have to implement your own queue, pick this one.

Here is an implementation in C++.
It just uses the basic brute force, so it is only good only till n = 5.
#include <iostream>
#include <vector>
#include <queue>
using namespace std;
bool isFinal(vector<vector<bool> >& board, int n)
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(!board[i][j])
return false;
}
}
return true;
}
void printBoard(vector<pair<int,int> > vec, int n)
{
vector<string> printIt(n);
for(int i = 0; i < n; ++i)
{
string s = "";
for(int j = 0; j < n; ++j)
{
s += ".";
}
printIt[i] = s;
}
int m = vec.size();
for(int i = 0; i < m; ++i)
{
printIt[vec[i].first][vec[i].second] = 'x';
}
for(int i = 0; i < n; ++i)
{
cout << printIt[i] << endl;
}
cout << endl;
}
void updateBoard(vector<vector<bool> >& board, int i, int j, int n)
{
board[i][j] = true;
if(i-2 >= 0 && j+1 < n)
board[i-2][j+1] = true;
if(i-1 >= 0 && j+2 < n)
board[i-1][j+2] = true;
if(i+1 < n && j+2 < n)
board[i+1][j+2] = true;
if(i+2 < n && j+1 < n)
board[i+2][j+1] = true;
if(i-2 >= 0 && j-1 >= 0)
board[i-2][j-1] = true;
if(i-1 >= 0 && j-2 >= 0)
board[i-1][j-2] = true;
if(i+1 < n && j-2 >= 0)
board[i+1][j-2] = true;
if(i+2 < n && j-1 >= 0)
board[i+2][j-1] = true;
}
bool isThere(vector<pair<int,int> >& vec, vector<vector<pair<int,int> > >& setOfBoards, int len)
{
for(int i = 0; i < len; ++i)
{
if(setOfBoards[i] == vec)
return true;
}
return false;
}
int main()
{
int n;
cin >> n;
vector<vector<pair<int,int> > > setOfBoards;
int len = 0;
vector<vector<bool> > startingBoard(n);
for(int i = 0; i < n; ++i)
{
vector<bool> vec(n,0);
startingBoard[i] = vec;
}
vector<pair<int,int> > startingVec;
vector<vector<vector<vector<bool> > > > q1;
vector<vector<vector<pair<int,int> > > > q2;
vector<vector<vector<bool> > > sLayer1;
vector<vector<pair<int,int> > > sLayer2;
sLayer1.push_back(startingBoard);
sLayer2.push_back(startingVec);
q1.push_back(sLayer1);
q2.push_back(sLayer2);
int k = 0;
bool flag = false;
int count = 0;
while(!flag && !q1[k].empty())
{
int m = q1[k].size();
vector<vector<vector<bool> > > layer1;
vector<vector<pair<int,int> > > layer2;
q1.push_back(layer1);
q2.push_back(layer2);
for(int l = 0; l < m; ++l)
{
vector<vector<bool> > board = q1[k][l];
vector<pair<int,int> > vec = q2[k][l];
if(isFinal(board, n))
{
while(l < m)
{
board = q1[k][l];
vec = q2[k][l];
if(isFinal(board, n))
{
printBoard(vec, n);
++count;
}
++l;
}
flag = true;
break;
}
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(!board[i][j])
{
pair<int,int> p;
p.first = i;
p.second = j;
vector<vector<bool> > newBoard = board;
vector<pair<int,int> > newVec = vec;
newVec.push_back(p);
updateBoard(newBoard, i, j, n);
sort(newVec.begin(), newVec.end());
if(!isThere(newVec, setOfBoards, len))
{
q1[k+1].push_back(newBoard);
q2[k+1].push_back(newVec);
setOfBoards.push_back(newVec);
++len;
}
}
}
}
}
++k;
}
cout << count << endl;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio