Based on the example code here I wrote this small example (ideone Link):
#include <iostream>
#include <algorithm>
#include <string>
int main()
{
std::string s = "foo123bar456wibble";
auto end = std::unique(s.begin(), s.end(), [](char l, char r){
return std::isdigit(l) && std::isdigit(r);
});
// What does s hold?
std::cout << std::string(s.begin(), end) << '\n';
}
My output is:
foo1bar4wibble
Does the standard guarantee this behaviour, or would this also be acceptable?
foo2bar6wibble
The linked cppreference page says:
Removing is done by shifting the elements in the range in such a way that elements to be erased are overwritten.
But is that normative text or just a suggested implementation?
Furthermore, cplusplus.com says:
Removes all but the first element from every consecutive group of equivalent elements in the range [first,last).
But again is that normative?
25.3.9 [alg.unique]/1
Effects: For a nonempty range, eliminates all but the first element from every consecutive group of equivalent elements referred to by the iterator i in the range [first + 1,last) for which the following
conditions hold: *(i - 1) == *i or pred(*(i - 1), *i) != false.
Related
I have a large array that I need to sort on the GPU. The array itself is a concatenation of multiple smaller subarrays that satisfy the condition that given i < j, the elements of the subarray i are smaller than the elements of the subarray j. An example of such array would be {5 3 4 2 1 6 9 8 7 10 11},
where the elements of the first subarray of 5 elements are smaller than the elements of the second subarray of 6 elements. The array I need is {1, 2, 3, 4, 5, 6, 7, 10, 11}. I know the position where each subarray starts in the large array.
I know I can simply use thrust::sort on the whole array, but I was wondering if it's possible to launch multiple concurrent sorts, one for each subarray. I'm hoping to get a performance improvement by doing that. My assumption is that it would be faster to sort multiple smaller arrays than one large array with all the elements.
I'd appreciate if someone could give me a way to do that or correct my assumption in case it's wrong.
A way to do multiple concurrent sorts (a "vectorized" sort) in thrust is via the marking of the sub arrays, and providing a custom functor that is an ordinary thrust sort functor that also orders the sub arrays by their key.
Another possible method is to use back-to-back thrust::stable_sort_by_key as described here.
As you have pointed out, another method in your case is just to do an ordinary sort, since that is ultimately your objective.
However I think its unlikely that any of the thrust sort methods will give a signficant speed-up over a pure sort, although you can try it. Thrust has a fast-path radix sort which it will use in certain situations, which the pure sort method could probably use in your case. (In other cases, e.g. when you provide a custom functor, thrust will often use a slower merge-sort method.)
If the sizes of the sub arrays are within certain ranges, I think you're likely to get much better results (performance-wise) with block radix sort in cub, one block per sub-array.
Here is an example that uses specific sizes (since you've given no indication of size ranges and other details), comparing a thrust "pure sort" to a thrust segmented sort with functor, to the cub block sort method. For this particular case, the cub sort is fastest:
$ cat t1.cu
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <thrust/scan.h>
#include <thrust/equal.h>
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <sys/time.h>
#define USECPSEC 1000000ULL
const int num_blocks = 2048;
const int items_per = 4;
const int nTPB = 512;
const int block_size = items_per*nTPB; // must be a whole-number multiple of nTPB;
typedef float mt;
unsigned long long dtime_usec(unsigned long long start){
timeval tv;
gettimeofday(&tv, 0);
return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
}
struct my_sort_functor
{
template <typename T, typename T2>
__host__ __device__
bool operator()(T t1, T2 t2){
if (thrust::get<1>(t1) < thrust::get<1>(t2)) return true;
if (thrust::get<1>(t1) > thrust::get<1>(t2)) return false;
if (thrust::get<0>(t1) > thrust::get<0>(t2)) return false;
return true;}
};
// from: https://nvlabs.github.io/cub/example_block_radix_sort_8cu-example.html#_a0
#define CUB_STDERR
#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <cub/block/block_load.cuh>
#include <cub/block/block_store.cuh>
#include <cub/block/block_radix_sort.cuh>
using namespace cub;
//---------------------------------------------------------------------
// Globals, constants and typedefs
//---------------------------------------------------------------------
bool g_verbose = false;
bool g_uniform_keys;
//---------------------------------------------------------------------
// Kernels
//---------------------------------------------------------------------
template <
typename Key,
int BLOCK_THREADS,
int ITEMS_PER_THREAD>
__launch_bounds__ (BLOCK_THREADS)
__global__ void BlockSortKernel(
Key *d_in, // Tile of input
Key *d_out) // Tile of output
{
enum { TILE_SIZE = BLOCK_THREADS * ITEMS_PER_THREAD };
// Specialize BlockLoad type for our thread block (uses warp-striped loads for coalescing, then transposes in shared memory to a blocked arrangement)
typedef BlockLoad<Key, BLOCK_THREADS, ITEMS_PER_THREAD, BLOCK_LOAD_WARP_TRANSPOSE> BlockLoadT;
// Specialize BlockRadixSort type for our thread block
typedef BlockRadixSort<Key, BLOCK_THREADS, ITEMS_PER_THREAD> BlockRadixSortT;
// Shared memory
__shared__ union TempStorage
{
typename BlockLoadT::TempStorage load;
typename BlockRadixSortT::TempStorage sort;
} temp_storage;
// Per-thread tile items
Key items[ITEMS_PER_THREAD];
// Our current block's offset
int block_offset = blockIdx.x * TILE_SIZE;
// Load items into a blocked arrangement
BlockLoadT(temp_storage.load).Load(d_in + block_offset, items);
// Barrier for smem reuse
__syncthreads();
// Sort keys
BlockRadixSortT(temp_storage.sort).SortBlockedToStriped(items);
// Store output in striped fashion
StoreDirectStriped<BLOCK_THREADS>(threadIdx.x, d_out + block_offset, items);
}
int main(){
const int ds = num_blocks*block_size;
thrust::host_vector<mt> data(ds);
thrust::host_vector<int> keys(ds);
for (int i = block_size; i < ds; i+=block_size) keys[i] = 1; // mark beginning of blocks
thrust::device_vector<int> d_keys = keys;
for (int i = 0; i < ds; i++) data[i] = (rand()%block_size) + (i/block_size)*block_size; // populate data
thrust::device_vector<mt> d_data = data;
thrust::inclusive_scan(d_keys.begin(), d_keys.end(), d_keys.begin()); // fill out keys array 000111222...
thrust::device_vector<mt> d1 = d_data; // make a copy of unsorted data
cudaDeviceSynchronize();
unsigned long long os = dtime_usec(0);
thrust::sort(d1.begin(), d1.end()); // ordinary sort
cudaDeviceSynchronize();
os = dtime_usec(os);
thrust::device_vector<mt> d2 = d_data; // make a copy of unsorted data
cudaDeviceSynchronize();
unsigned long long ss = dtime_usec(0);
thrust::sort(thrust::make_zip_iterator(thrust::make_tuple(d2.begin(), d_keys.begin())), thrust::make_zip_iterator(thrust::make_tuple(d2.end(), d_keys.end())), my_sort_functor());
cudaDeviceSynchronize();
ss = dtime_usec(ss);
if (!thrust::equal(d1.begin(), d1.end(), d2.begin())) {std::cout << "oops1" << std::endl; return 0;}
std::cout << "ordinary thrust sort: " << os/(float)USECPSEC << "s " << "segmented sort: " << ss/(float)USECPSEC << "s" << std::endl;
thrust::device_vector<mt> d3(ds);
cudaDeviceSynchronize();
unsigned long long cs = dtime_usec(0);
BlockSortKernel<mt, nTPB, items_per><<<num_blocks, nTPB>>>(thrust::raw_pointer_cast(d_data.data()), thrust::raw_pointer_cast(d3.data()));
cudaDeviceSynchronize();
cs = dtime_usec(cs);
if (!thrust::equal(d1.begin(), d1.end(), d3.begin())) {std::cout << "oops2" << std::endl; return 0;}
std::cout << "cub sort: " << cs/(float)USECPSEC << "s" << std::endl;
}
$ nvcc -o t1 t1.cu
$ ./t1
ordinary thrust sort: 0.001652s segmented sort: 0.00263s
cub sort: 0.000265s
$
(CUDA 10.2.89, Tesla V100, Ubuntu 18.04)
I have no doubt that your sizes and array dimensions don't correspond to mine. The purpose here is to illustrate some possible methods, not a black-box solution that works for your particular case. You probably should do benchmark comparisons of your own. I also acknowledge that the block radix sort method for cub expects equal-sized sub-arrays, which you may not have. It may not be a suitable method for you, or you may wish to explore some kind of padding arrangement. There's no need to ask this question of me; I won't be able to answer it based on the information in your question.
I don't claim correctness for this code or any other code that I post. Anyone using any code I post does so at their own risk. I merely claim that I have attempted to address the questions in the original posting, and provide some explanation thereof. I am not claiming my code is defect-free, or that it is suitable for any particular purpose. Use it (or not) at your own risk.
#include <iostream>
#include <vector>
#include <algorithm>
#include <tuple>
using namespace std;
typedef long long ll;
vector < tuple <ll,ll,ll> > a;
int main()
{
ll t;
cin>>t;
ll id,z,p,l,c,s,newz;
while(t--)
{
cin>>id>>z>>p>>l>>c>>s;
newz=p*50+l*5+c*10+s*20;
a.push_back(make_tuple(z-newz,id,newz));
}
sort(a.begin(),a.end());
for(int i=0;i<5;i++)
{
tie(ignore,id,z)=a[i];
cout<<id<<" "<<z<<endl;
}
return 0;
}
I want the sort on the vector to happen on the basis of first element of the tuple but only when there is a tie then the smallest of the second element of the tuple must be chosen to order the elements with the same first value.
Also specify what should be done, if at the time of a tie the order should be maintain on the basis of greater element of the second element of the tuple(instead of the first).
A custom function as a third parameter sorted my way out perfectly.
bool cmp( tuple <ll,ll,ll> const &s, tuple <ll,ll,ll> const &r)
{
if(get<0>(s)==get<0>(r))
{
return (get<1>(s))>(get<1>(r));
}
else
return (get<0>(s))<(get<0>(r));
}
sort(a.begin(),a.end(),cmp);// Call to sort will change like this.
I am using a std::array (c++11). I am choosing to use a std::array because I want the size to be fixed at compile time (as opposed to runtime). Is there anyway I can iterate over the first N elements ONLY. i.e. something like:
std::array<int,6> myArray = {0,0,0,0,0,0};
std::find_if(myArray.begin(), myArray.begin() + 4, [](int x){return (x%2==1);});
This is not the best example because find_if returns an iterator marking the FIRST odd number, but you get the idea (I only want to consider the first N, in this case N=4, elements of my std::array).
Note: There are questions similar to this one, but the answer always involves using a different container (vector or valarray, which is not what I want. As I described early, I want to size of the container to be fixed at compile time).
Thank you in advance!!
From the way you presented your question, I assume that you say "iterate over", but actually mean "operate on with an algorithm".
The behaviour is not specific to a container, but to the iterator type of the container.
std::array::iterator_type satisfies RandomAccessIterator, the same as std::vector and std::deque.
That means that, given
std::array<int,6> myArray = {0,0,0,0,0,0};
and
auto end = myArray.begin() // ...
you can add a number n to it...
auto end = myArray.begin() + 4;
...resulting in an iterator to one element beyond the nth element in the array. As that is the very definition for an end iterator for the sequence,
std::find_if(myArray.begin(), myArray.begin() + 4, ... )
works just fine. A somewhat more intuitive example:
#include <algorithm>
#include <array>
#include <iostream>
#define N 4
int main()
{
std::array<char, 6> myArray = { 'a', 'b', 'c', 'd', 'e', 'f' };
auto end = myArray.begin() + N;
if ( std::find( myArray.begin(), end, 'd' ) != end )
{
std::cout << "Found.\n";
}
return 0;
}
This finds the 4th element in the array, and prints "Found."
Change #define N 4 to #define N 3, and it prints nothing.
Of course, this is assuming that your array has N elements. If you aren't sure, check N <= myArray.size() first and use myArray.end() instead if required.
For completeness:
A BidirectionalIterator (list, set, multiset, map, multimap) only supports ++ and --.
A ForwardIterator (forward_list, unordered_set, unordered_multiset, unordered_map, unordered_multimap) only supports ++.
An InputIterator does not support dereferencing the result of postfix ++.
If you want to iterate over the first N numbers of a std::array, just do something like:
#include <iostream>
#include <array>
int main() {
constexpr const int N = 4;
std::array<int, 6> arr{ 0, 1, 2, 3, 4, 5 };
for (auto it = std::begin(arr); it != std::begin(arr) + N && it != std::end(arr); ++it)
std::cout << *it << std::endl;
}
With C++20, a std::span can be used to create a subset view of a std::array much like std::string_view does for std::string. The span replaces maintaining the variable 'N' for the number of sub-elements.
auto part = std::span(myArray).first(4);
std::find_if(part.begin(), part.end(), [](int x) {return (x % 2 == 1); });
A std::span offers many other benefits. It can be used in range based for loops. And by using std::span.subspan, a span can view any range of elements, not limited to just the first N. A span can also be used not just with std::array, but also with C arrays, std::vector, and other contiguous containers.
I simply wanna erase the specified element in the range-based loop:
vector<int> vec = { 3, 4, 5, 6, 7, 8 };
for (auto & i:vec)
{
if (i>5)
vec.erase(&i);
}
what's wrong?
You can't erase elements by value on a std::vector, and since range-based loop expose directly values your code doesn't make sense (vec.erase(&i)).
The main problem is that a std::vector invalidates its iterators when you erase an element.
So since the range-based loop is basically implemented as
auto begin = vec.begin();
auto end = vec.end()
for (auto it = begin; it != end; ++it) {
..
}
Then erasing a value would invalidate it and break the successive iterations.
If you really want to remove an element while iterating you must take care of updating the iterator correctly:
for (auto it = vec.begin(); it != vec.end(); /* NOTHING */)
{
if ((*it) > 5)
it = vec.erase(it);
else
++it;
}
Removing elements from a vector that you're iterating over is generally a bad idea. In your case you're most likely skipping the 7. A much better way would be using std::remove_if for it:
vec.erase(std::remove_if(vec.begin(), vec.end(),
[](const int& i){ return i > 5; }),
vec.end());
std::remove shift the elements that should be removed to the end of the container and returns an iterator to the first of those elements. You only got to erase those elements up to the end then.
It's quite simple: don't use a range-based loop. These loops are intended as a concise form for sequentially iterating over all the values in a container. If you want something more complicated (such as erasing or generally access to iterators), do it the explicit way:
for (auto it = begin(vec); it != end(vec);) {
if (*it > 5)
it = vec.erase(it);
else
++it;
}
Actually it IS possible, despite what the other answers say.
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
vector<int> ints{1,2,3,4};
for (auto it = ints.begin(); auto& i: ints) { // you can create the iterator here in C++20
if (i == 3)
ints.erase(it--); // Decrement after erasing a single element, and it preserves the iterator
it++;
}
for_each(ints.cbegin(), ints.cend(),
[] (int i) {cout << i << " ";}
);
}
Godbolt
in C++ 23 you can just erase_if(ints, [](const int i){return i==3;});
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
double A,R;
R=100.64;
R=R*R;
A=3.14159*R;
cout<< setprecision(3)<<A<<endl;
return 0;
}
The reasonably precise and accurate(a) value you would get from those calculations (mathematically) is 31,819.31032.
You have asked for a precision of three digits and, with that value and the floating point format currently active (probably std::defaultfloat), it's only giving you three significant digits:
3.18e+04 (3.18x104 in mathematical form).
If your intent is to instead show three digits after the decimal point, you can do that with the std::fixed manipulator:
#include <iostream>
#include <iomanip>
int main() {
double R = 100.64;
double A = 3.14159 * R * R;
std::cout << std::setprecision(3) << std::fixed << A << '\n';
return 0;
}
This gives 31819.310.
(a) Make sure you never conflate these two, they're different concepts. See for example, the following values of π you may come up with:
Value
Properties
9
Both im-precise and in-accurate.
3
Im-precise but accurate.
2.718281828459
Precise but in-accurate.
3.141592653590
Both precise and accurate.
π
Has maximum precision and accuracy.