In C++11, {} is preferred over () for variable initialization. However, I noticed that {} cannot correctly initialize the vector of vectors.
Given the following code, vector<vector<int>> mat2(rows, vector<int>(cols, 2)) and vector<vector<int>> mat4{rows, vector<int>(cols, 4)} work as expected, but vector<vector<int>> mat1{rows, vector<int>{cols, 1}} and vector<vector<int>> mat3(rows, vector<int>{cols, 3}) do not. Can anybody explain why?
#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
string parse_matrix(const vector<vector<int>>& mat)
{
stringstream ss;
for (const auto& row : mat) {
for (const auto& num : row)
ss << std::setw(3) << num;
ss << endl;
}
return ss.str();
}
int main()
{
const int rows = 5;
const int cols = 4;
vector<vector<int>> mat1{rows, vector<int>{cols, 1}};
vector<vector<int>> mat2(rows, vector<int>(cols, 2));
vector<vector<int>> mat3(rows, vector<int>{cols, 3});
vector<vector<int>> mat4{rows, vector<int>(cols, 4)};
cout << "mat1:\n" << parse_matrix(mat1);
cout << "mat2:\n" << parse_matrix(mat2);
cout << "mat3:\n" << parse_matrix(mat3);
cout << "mat4:\n" << parse_matrix(mat4);
}
Output:
$ g++ -Wall -std=c++14 -o vector_test2 vector_test2.cc
$ ./vector_test2
mat1:
4 1
4 1
4 1
4 1
4 1
mat2:
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
mat3:
4 3
4 3
4 3
4 3
4 3
mat4:
4 4 4 4
4 4 4 4
4 4 4 4
4 4 4 4
4 4 4 4
For a combination of reasons, uniform initialization is kind of broken for std::vector<int> (and more generally, vector of arithmetic type). Uniform initialization uses the same syntax as list-initialization (constructing from initializer_list); when the syntax is ambiguous, the constructor taking initializer_list takes precedence.
Thus, std::vector<int> v(42); means "use std::vector(size_t) constructor to create a vector of 42 zeros"; while std::vector<int> v{42}; means "use std::vector(std::initializer_list) constructor to create a vector with a single element having value 42".
Similarly, std::vector<int> v(5, 10); uses two-parameter constructor to create a vector of 5 elements, all with value 10; while std::vector<int> v{5, 10}; uses initializer_list-taking constructor and creates a vector of two elements, values 5 and 10.
Related
I have started studying C++ after some years in C# and other languages. I am facing the class arguments (constructors, inheritance, copy etc) and I was trying to write a bad sample code. Below is a sample class (.h and .cpp):
#ifndef SAMPLE_H
#define SAMPLE_H
#include <iostream>
class Sample
{
public:
Sample();
//Sample(const Sample& s);
virtual ~Sample();
int *s_array;
protected:
private:
};
void print(const Sample *s);
#endif // SAMPLE_H
Sample::Sample()
{
std::cout<<"create sample\n";
s_array=new int[10];
std::cout<<"alloc memory 10 int array\n";
for(int i=0; i<10; ++i)
{
s_array[i]=i;
}
}
Sample::~Sample()
{
//dtor
std::cout<<"Dealloc memory 10 int array\n";
delete [] s_array;
std::cout<<"destroy sample\n";
}
void print(const Sample *s)
{
std::cout<<s<<" "<<s->s_array<<'\n';
for(int i=0; i<10; ++i)
{
std::cout<<s->s_array[i]<<" ";
}
std::cout<<"\n\n";
}
Then in main
#include <iostream>
#include "Sample.h"
using namespace std;
int main()
{
cout<<endl<<"Let's try the Copy const WRONG.... "<<endl;
Sample *s1=new Sample();
print(s1);
Sample s2(*s1);
cout<<endl<<"What is s2 ??? "<<endl;
print(&s2);
delete s1;
cout<<endl<<"What is s2 NOW after s1 delete??? "<<endl;
print(&s2);
return 0;
}
I wanted to test the dangers of NOT to use the copy constr and i expected to see after the deletion of s1 a totally 'dirty' array (i.e., 10 random values or even a crash)
This is the output I gain (Win 10 pro, IDE CodeBlock, GNU Gcc compiler):
Let's try the Copy const WRONG....
create sample
alloc memory 10 int array
0x1ba110 0x1b6e48
0 1 2 3 4 5 6 7 8 9
What is s2 ???
0x6efdf0 0x1b6e48
0 1 2 3 4 5 6 7 8 9
Dealloc memory 10 int array
destroy sample
What is s2 NOW after s1 delete???
0x6efdf0 0x1b6e48
1812296 1769664 2 3 4 5 6 7 8 9
Dealloc memory 10 int array
destroy sample
Why only the first two items of s_array are 'dirty' and the remaining 8 are good? Why the deletion of object s1 does not free the whole memory pointed by s2?
Thanx in advance
Diego
I need to minimum values along columns of a matrix along with row indices using thrust. I use the following code (copied from orange owl solutions), However I get errors while compiling. I have posted it as an issue on the corresponding git page. The error message is huge and i dont know how to debug it. Can anyone help me with it? I am using cuda-8.0, thrust version 1.8.
The code:
#include <iterator>
#include <algorithm>
#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <thrust/random.h>
using namespace thrust::placeholders;
int main()
{
const int Nrows = 6;
const int Ncols = 8;
/**************************/
/* SETTING UP THE PROBLEM */
/**************************/
// --- Random uniform integer distribution between 0 and 100
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist(0, 20);
// --- Matrix allocation and initialization
thrust::device_vector<double> d_matrix(Nrows * Ncols);
for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (double)dist(rng);
printf("\n\nMatrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << " [ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
/**********************************************/
/* FIND ROW MINIMA ALONG WITH THEIR LOCATIONS */
/**********************************************/
thrust::device_vector<float> d_minima(Ncols);
thrust::device_vector<int> d_indices(Ncols);
thrust::reduce_by_key(
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), _1 / Nrows),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), _1 / Nrows) + Nrows * Ncols,
thrust::make_zip_iterator(
thrust::make_tuple(thrust::make_permutation_iterator(
d_matrix.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 % Nrows) * Ncols + _1 / Nrows)),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), _1 % Nrows))),
thrust::make_discard_iterator(),
thrust::make_zip_iterator(thrust::make_tuple(d_minima.begin(), d_indices.begin())),
thrust::equal_to<int>(),
thrust::minimum<thrust::tuple<float, int> >()
);
printf("\n\n");
for (int i=0; i<Nrows; i++) std::cout << "Min position = " << d_indices[i] << "; Min value = " << d_minima[i] << "\n";
return 0;
}
Error :
/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/bulk/algorithm/reduce_by_key.hpp(58): error: ambiguous "?" operation: second operand of type "const thrust::tuple<double, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>" can be converted to third operand type "thrust::tuple<float, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>", and vice versa
detected during:
instantiation of "thrust::system::cuda::detail::bulk_::detail::reduce_by_key_detail::scan_head_flags_functor<FlagType, ValueType, BinaryFunction>::result_type thrust::system::cuda::detail::bulk_::detail::reduce_by_key_detail::scan_head_flags_functor<FlagType, ValueType, BinaryFunction>::operator()(const thrust::system::cuda::detail::bulk_::detail::reduce_by_key_detail::scan_head_flags_functor<FlagType, ValueType, BinaryFunction>::first_argument_type &, const thrust::system::cuda::detail::bulk_::detail::reduce_by_key_detail::scan_head_flags_functor<FlagType, ValueType, BinaryFunction>::second_argument_type &) [with FlagType=int, ValueType=thrust::tuple<double, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, BinaryFunction=thrust::minimum<thrust::tuple<float, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>]"
I guess you are using this code.
A curious characteristic of that code is that the matrix is defined using double type, but the captured minima are stored in a float vector.
If you want to use that code as-is, according to my testing, thrust (in CUDA 10, and apparently also CUDA 8) doesn't like this line:
thrust::minimum<thrust::tuple<float, int> >()
That operator is being used to compare two items to determine which is smaller, and it is templated to accept different kinds of items. However, it has decided that finding the minimum of two of those tuples is an "ambiguous" request. Part of the reason for this is that the operator returns a float, int tuple, but is being given variously a double,int tuple or a float,int tuple.
We can fix/work around this by passing our own functor to do the job, that is explicit in terms of handling the tuples passed to it:
$ cat t373.cu
#include <iterator>
#include <algorithm>
#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <thrust/random.h>
using namespace thrust::placeholders;
struct my_min
{
template <typename T1, typename T2>
__host__ __device__
T2 operator()(T1 t1, T2 t2){
if (thrust::get<0>(t1) < thrust::get<0>(t2)) return t1;
return t2;
}
};
int main()
{
const int Nrows = 6;
const int Ncols = 8;
/**************************/
/* SETTING UP THE PROBLEM */
/**************************/
// --- Random uniform integer distribution between 0 and 100
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist(0, 20);
// --- Matrix allocation and initialization
thrust::device_vector<double> d_matrix(Nrows * Ncols);
for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (double)dist(rng);
printf("\n\nMatrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << " [ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
/**********************************************/
/* FIND ROW MINIMA ALONG WITH THEIR LOCATIONS */
/**********************************************/
thrust::device_vector<float> d_minima(Ncols);
thrust::device_vector<int> d_indices(Ncols);
thrust::reduce_by_key(
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 / Nrows)),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 / Nrows)) + Nrows * Ncols,
thrust::make_zip_iterator(
thrust::make_tuple(thrust::make_permutation_iterator(
d_matrix.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), ((_1 % Nrows) * Ncols + _1 / Nrows))),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 % Nrows)))),
thrust::make_discard_iterator(),
thrust::make_zip_iterator(thrust::make_tuple(d_minima.begin(), d_indices.begin())),
thrust::equal_to<int>(),
my_min()
// thrust::minimum<thrust::tuple<float, int> >()
);
printf("\n\n");
for (int i=0; i<Nrows; i++) std::cout << "Min position = " << d_indices[i] << "; Min value = " << d_minima[i] << "\n";
return 0;
}
$ nvcc -o t373 t373.cu
$ ./t373
Matrix
[ 0 1 12 18 20 3 10 8 ]
[ 5 15 1 11 12 17 12 10 ]
[ 18 20 15 20 6 8 18 13 ]
[ 18 20 3 18 19 6 19 8 ]
[ 6 10 8 16 14 11 12 1 ]
[ 12 9 12 17 10 16 1 4 ]
Min position = 0; Min value = 0
Min position = 0; Min value = 1
Min position = 1; Min value = 1
Min position = 1; Min value = 11
Min position = 2; Min value = 6
Min position = 0; Min value = 3
$
I think a better fix is to just choose one or the other, either float or double. If we modify all float types to double, for example, then thrust is happy, without any other changes:
$ cat t373a.cu
#include <iterator>
#include <algorithm>
#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <thrust/random.h>
using namespace thrust::placeholders;
int main()
{
const int Nrows = 6;
const int Ncols = 8;
/**************************/
/* SETTING UP THE PROBLEM */
/**************************/
// --- Random uniform integer distribution between 0 and 100
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist(0, 20);
// --- Matrix allocation and initialization
thrust::device_vector<double> d_matrix(Nrows * Ncols);
for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (double)dist(rng);
printf("\n\nMatrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << " [ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
/**********************************************/
/* FIND ROW MINIMA ALONG WITH THEIR LOCATIONS */
/**********************************************/
thrust::device_vector<double> d_minima(Ncols);
thrust::device_vector<int> d_indices(Ncols);
thrust::reduce_by_key(
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 / Nrows)),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 / Nrows)) + Nrows * Ncols,
thrust::make_zip_iterator(
thrust::make_tuple(thrust::make_permutation_iterator(
d_matrix.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), ((_1 % Nrows) * Ncols + _1 / Nrows))),
thrust::make_transform_iterator(thrust::make_counting_iterator((int) 0), (_1 % Nrows)))),
thrust::make_discard_iterator(),
thrust::make_zip_iterator(thrust::make_tuple(d_minima.begin(), d_indices.begin())),
thrust::equal_to<int>(),
thrust::minimum<thrust::tuple<double, int> >()
);
printf("\n\n");
for (int i=0; i<Nrows; i++) std::cout << "Min position = " << d_indices[i] << "; Min value = " << d_minima[i] << "\n";
return 0;
}
$ nvcc -o t373a t373a.cu
$ ./t373a
Matrix
[ 0 1 12 18 20 3 10 8 ]
[ 5 15 1 11 12 17 12 10 ]
[ 18 20 15 20 6 8 18 13 ]
[ 18 20 3 18 19 6 19 8 ]
[ 6 10 8 16 14 11 12 1 ]
[ 12 9 12 17 10 16 1 4 ]
Min position = 0; Min value = 0
Min position = 0; Min value = 1
Min position = 1; Min value = 1
Min position = 1; Min value = 11
Min position = 2; Min value = 6
Min position = 0; Min value = 3
$
I think this latter solution of using consistent types is the more sensible solution.
I need a class iterator like this
https://github.com/thrust/thrust/blob/master/examples/strided_range.cu
but that this new iterator do the next sequence
[k * size_stride, k * size_stride+1, ...,k * size_stride+size_chunk-1,...]
with
k = 0,1,...,N
Example:
size_stride = 8
size_chunk = 3
N = 3
then the sequence is
[0,1,2,8,9,10,16,17,18,24,25,26]
I don't know how do this efficiently...
The strided range interator is basically a carefully crafted permutation iterator with a functor that gives the appropriate indices for permutation.
Here is a modification to the strided range iterator example. The main changes were:
include the chunk size as an iterator parameter
modify the functor that provides the indices for the permutation iterator to spit out the desired sequence
adjust the definitions of .end() iterator to provide the appropriate length of sequence.
Worked example:
$ cat t1280.cu
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/functional.h>
#include <thrust/fill.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/sequence.h>
#include <iostream>
#include <assert.h>
// this example illustrates how to make strided-chunk access to a range of values
// examples:
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 1,1) -> [0, 1, 2, 3, 4, 5, 6]
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 2,1) -> [0, 2, 4, 6]
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 3,2) -> [0 ,1, 3, 4, 6]
// ...
template <typename Iterator>
class strided_chunk_range
{
public:
typedef typename thrust::iterator_difference<Iterator>::type difference_type;
struct stride_functor : public thrust::unary_function<difference_type,difference_type>
{
difference_type stride;
int chunk;
stride_functor(difference_type stride, int chunk)
: stride(stride), chunk(chunk) {}
__host__ __device__
difference_type operator()(const difference_type& i) const
{
int pos = i/chunk;
return ((pos * stride) + (i-(pos*chunk)));
}
};
typedef typename thrust::counting_iterator<difference_type> CountingIterator;
typedef typename thrust::transform_iterator<stride_functor, CountingIterator> TransformIterator;
typedef typename thrust::permutation_iterator<Iterator,TransformIterator> PermutationIterator;
// type of the strided_range iterator
typedef PermutationIterator iterator;
// construct strided_range for the range [first,last)
strided_chunk_range(Iterator first, Iterator last, difference_type stride, int chunk)
: first(first), last(last), stride(stride), chunk(chunk) {assert(chunk<=stride);}
iterator begin(void) const
{
return PermutationIterator(first, TransformIterator(CountingIterator(0), stride_functor(stride, chunk)));
}
iterator end(void) const
{
int lmf = last-first;
int nfs = lmf/stride;
int rem = lmf-(nfs*stride);
return begin() + (nfs*chunk) + ((rem<chunk)?rem:chunk);
}
protected:
Iterator first;
Iterator last;
difference_type stride;
int chunk;
};
int main(void)
{
thrust::device_vector<int> data(50);
thrust::sequence(data.begin(), data.end());
typedef thrust::device_vector<int>::iterator Iterator;
// create strided_chunk_range
std::cout << "stride 3, chunk 2, length 7" << std::endl;
strided_chunk_range<Iterator> scr1(data.begin(), data.begin()+7, 3, 2);
thrust::copy(scr1.begin(), scr1.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << std::endl;
std::cout << "stride 8, chunk 3, length 50" << std::endl;
strided_chunk_range<Iterator> scr(data.begin(), data.end(), 8, 3);
thrust::copy(scr.begin(), scr.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << std::endl;
return 0;
}
$ nvcc -arch=sm_35 -o t1280 t1280.cu
$ ./t1280
stride 3, chunk 2, length 7
0 1 3 4 6
stride 8, chunk 3, length 50
0 1 2 8 9 10 16 17 18 24 25 26 32 33 34 40 41 42 48 49
$
This is probably not the most optimal implementation, in particular because we are doing division in the permutation functor, but it should get you started.
I assume (and test for) chunk<=stride, because this seemed reasonable to me, and simplified my thought process. I'm sure it could be modified, with an appropriate example of what sequence you would like to see, for the case where chunk>stride.
I am attempting to read in a file containing characters enclosed in parentheses into a vector of integers.
My text file:
(2 3 4 9 10 14 15 16 17 19)
Heres my code:
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
int main(){
ifstream file;
file.open("moves.txt");
vector<int> V;
char c;
if (file){
while (file.get(c)){
if (c != '(' && c != ')' && c != ' ')
V.push_back(c - '0');
}
}
else{
cout << "Error openning file." << endl;
}
for (int i = 0; i < V.size(); i++)
cout << V[i] << endl;
}
My Output:
2
3
4
9
1
0
1
4
1
5
1
6
1
7
1
9
-38
Desired output:
2
3
4
9
10
14
15
16
17
19
What is causing the separation of two digit numbers and why is there a negative number at the end of my output?
Don't read characters one by one : read a line, and parse the numbers within it.
By using the is_number (c++11) function of this answer :
bool is_number(const std::string& s)
{
return !s.empty() && std::find_if(s.begin(),
s.end(), [](char c) { return !std::isdigit(c); }) == s.end();
}
You can read line by line with std::getline and then stream the numbers to a std::stringstream. std::stoi can be used to convert a string to an integer :
std::string line;
while(std::getline(file, line))
{
line.replace(line.begin(), line.begin() + 1, "");
line.replace(line.end() - 2, line.end() - 1, "");
std::string numberStr;
std::stringstream ss(line);
while (ss >> numberStr){
if (is_number(numberStr))
v.push_back(std::stoi(numberStr));
}
}
You'd have to make the replace more robust (by checking the presence of parentheses at these positions)
Problem
Provided I have two arrays:
const int N = 1000000;
float A[N];
myStruct *B[N];
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
Has anyone come across this kind of problem?
Thrust example
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
Thrust's documentation on Github is not up-to-date. As #JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://github.com/thrust/thrust.git
Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
keyVec.begin(),
is_positive());
This returns:
Before:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
After:
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:
stable_partition differs from partition in that stable_partition is
guaranteed to preserve relative order. That is, if x and y are
elements in [first, last), such that pred(x) == pred(y), and if x
precedes y, then it will still be true after stable_partition that x
precedes y.
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
void print_vec(const thrust::host_vector<int>& v)
{
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
}
int main ()
{
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
{
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
}
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
d_keyVec.begin(),
is_positive());
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
}
UPDATE
I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
How about this?:
Count how many positive numbers to determine the inflexion point
Evenly divide each side of the inflexion point into groups (negative-groups are all same length but different length to positive-groups. these groups are the memory chunks for the results)
Use one kernel call (one thread) per chunk pair
Each kernel swaps any out-of-place elements in the input groups into the desired output groups. You will need to flag any chunks that have more swaps than the maximum so that you can fix them during subsequent iterations.
Repeat until done
Memory traffic is swaps only (from original element position, to sorted position). I don't know if this algorithm sounds like anything already defined...
You should be able to achieve this in thrust simply with a modification of your comparison operator:
struct my_compare
{
__device__ __host__ bool operator()(const float x, const float y) const
{
return !((x<0.0f) && (y>0.0f));
}
};
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
my_compare() );