I have several meshes in the .off format that together enclose a volume. For instance, take, and that are available with CGAL-4.11 in examples/Mesh_3/data/patches.
Desired output
I would like to get a tetrahedral mesh of this volume and save it in the .mesh format. The difficult part is that I want each line corresponding to a triangle to end with a number 0, 1 or 2 indicating to which of the input patches the triangle corresponds. Currently, I don't care about the tags of the vertices or tetrahedra.
Almost working solution
I tried modifying the CGAL example examples/Mesh_3/mesh_polyhedral_complex.cpp (the modified portion is marked):
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Mesh_triangulation_3.h>
#include <CGAL/Mesh_complex_3_in_triangulation_3.h>
#include <CGAL/Mesh_criteria_3.h>
#include <CGAL/Polyhedral_complex_mesh_domain_3.h>
#include <CGAL/make_mesh_3.h>
#include <cstdlib>
// Domain
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Mesh_polyhedron_3<K>::type Polyhedron;
typedef CGAL::Polyhedral_complex_mesh_domain_3<K> Mesh_domain;
typedef CGAL::Parallel_tag Concurrency_tag;
typedef CGAL::Sequential_tag Concurrency_tag;
// Triangulation
typedef CGAL::Mesh_triangulation_3<Mesh_domain,CGAL::Default,Concurrency_tag>::type Tr;
typedef CGAL::Mesh_complex_3_in_triangulation_3<
Tr,Mesh_domain::Corner_index,Mesh_domain::Curve_segment_index> C3t3;
// Criteria
typedef CGAL::Mesh_criteria_3<Tr> Mesh_criteria;
// To avoid verbose function and named parameters call
using namespace CGAL::parameters;
const char* const filenames[] = {
const std::pair<int, int> incident_subdomains[] = {
std::make_pair(0, 1),
std::make_pair(1, 0),
std::make_pair(1, 0),
int main()
const std::size_t nb_patches = sizeof(filenames) / sizeof(const char*);
CGAL_assertion(sizeof(incident_subdomains) ==
nb_patches * sizeof(std::pair<int, int>));
std::vector<Polyhedron> patches(nb_patches);
for(std::size_t i = 0; i < nb_patches; ++i) {
std::ifstream input(filenames[i]);
if(!(input >> patches[i])) {
std::cerr << "Error reading " << filenames[i] << " as a polyhedron!\n";
// Create domain
Mesh_domain domain(patches.begin(), patches.end(),
incident_subdomains, incident_subdomains+nb_patches);
domain.detect_features(); //includes detection of borders
// Mesh criteria
Mesh_criteria criteria(edge_size = 8,
facet_angle = 25, facet_size = 8, facet_distance = 0.2,
cell_radius_edge_ratio = 3, cell_size = 10);
// Mesh generation
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria);
// Output
std::ofstream medit_file("out.mesh");
This creates a well-looking tetrahedral mesh and saves it to out.mesh. However, all the triangles have a tag 1, as shown in the following excerpt (lines 1318--1328 in out.mesh).
52.527837077556413 58.272620021324407 30.13290265121827 1
0.06169736357779243 30.258121963438846 69.405198139655852 1
923 898 888 1
923 898 888 1
905 903 890 1
905 903 890 1
354 385 375 1
354 385 375 1
When I display the result in medit, all the triangles have the same colour, while (to put the question other way) I would like each of the input patches to be of different colour.
What do I need to modify in the example above?
Side note
I noticed that out.mesh seems to contain two copies of each triangle. Is this related to the problem? How can I get rid of the copies?
Related questions
There already is a similar question. The difference is that they have a single file and try to convey the patch info through colour, whereas my patches are in separate files.

thanks for you precise question.
There is a very easy solution for your question, but that involves an undocumented feature of output_to_medit(). Just replace the line:
c3t3.output_to_medit(medit_file, false, true);
and that will tag the facets with surface patches IDs.


How to correctly use class in external header files with C++ in Visual studio 2019?

I referenced the header file where the class is located and put its location in the additional include Directories, but it still reports an error LNK2019, I don't know what I did wrong? I've tried multiple methods of this, but none of them seem to work. Any ideas? The code is as follows and the head files are attached. Thanks in advance.[enter image description here][1]
The header files are in the trng folder in this link
#include <cstdlib>
#include <iostream>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/uniform01_dist.hpp>
int main() {
const long samples = 1000000l; // total number of points in square
long in = 0l; // no points in circle
// distribute workload over all processes and make a global reduction
#pragma omp parallel reduction(+:in)
trng::yarn2 rx, ry; // random number engines for x- and y-coordinates
int size = omp_get_num_threads(); // get total number of processes
int rank = omp_get_thread_num(); // get rank of current process
// split PRN sequences by leapfrog method
rx.split(2, 0); // choose sub-stream no. 0 out of 2 streams
ry.split(2, 1); // choose sub-stream no. 1 out of 2 streams
rx.split(size, rank); // choose sub-stream no. rank out of size streams
ry.split(size, rank); // choose sub-stream no. rank out of size streams
trng::uniform01_dist<> u; // random number distribution
// throw random points into square
for (long i = rank; i < samples; i += size) {
double x = u(rx), y = u(ry); // choose random x- and y-coordinates
if (x * x + y * y <= 1.0) // is point in circle?
++in; // increase thread-local counter
// print result
std::cout << "pi = " << 4.0 * in / samples << std::endl;

Boost Geometry: segments intersection not yet implemented?

I am trying a simple test: compute the intersection of 2 segments with Boost Geometry. It does not compile. I also tried with some variations (int points instead of float points, 2D instead of 3D) with no improvement.
Is it really possible that boost doesn't implement segment intersection ? Or what did I do wrong ? Missing some hpp ? Confusion between algorithms "intersects" & "intersection" ?
The code is very basic:
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/segment.hpp>
#include <boost/geometry/algorithms/intersection.hpp>
typedef boost::geometry::model::point<float, 3, boost::geometry::cs::cartesian> testPoint;
typedef boost::geometry::model::segment<testPoint> testSegment;
testSegment s1(
testPoint(-1.f, 0.f, 0.f),
testPoint(1.f, 0.f, 0.f)
testSegment s2(
testPoint(0.f, -1.f, 0.f),
testPoint(0.f, 1.f, 0.f)
std::vector<testPoint> output;
bool intersectionExists = boost::geometry::intersects(s1, s2, output);
But I got the following errors at compile time by Visual:
- Error C2039 'apply' n'est pas membre de 'boost::geometry::dispatch::disjoint<Geometry1,Geometry2,3,boost::geometry::segment_tag,boost::geometry::segment_tag,false>' CDCadwork C:\Program Files\Boost\boost_1_75_0\boost\geometry\algorithms\detail\disjoint\interface.hpp 54
- Error C2338 This operation is not or not yet implemented. CDCadwork C:\Program Files\Boost\boost_1_75_0\boost\geometry\algorithms\not_implemented.hpp 47
There are indeed two problems:
you're intersecting 3D geometries. That's not implemented
Instead you can do the same operation on a projection.
you're passing an "output" geometry to intersects (which indeed only returns the true/false value as your chosen name intersectionExists suggested). In the presence of a third parameter, it would be used as a Strategy - a concept for which output obviously doesn't satisfy.
Note intersection always returns true: What does boost::geometry::intersection return - although that's not part of the documented interface
Since your geometries are trivially projected onto 2d plane Z=0:
Live On Coliru
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/segment.hpp>
#include <iostream>
namespace bg = boost::geometry;
namespace bgm = bg::model;
using Point = bgm::point<float, 2, bg::cs::cartesian>;
using Segment = bgm::segment<Point>;
int main() {
Segment s1{{-1, 0}, {1, 0}};
Segment s2{{0, -1}, {0, 1}};
bool exists = bg::intersects(s1, s2);
std::vector<Point> output;
/*bool alwaysTrue = */ bg::intersection(s1, s2, output);
std::cout << bg::wkt(s1) << "\n";
std::cout << bg::wkt(s2) << "\n";
for (auto& p : output) {
std::cout << bg::wkt(p) << "\n";
return exists? 0:1;
LINESTRING(-1 0,1 0)
LINESTRING(0 -1,0 1)
POINT(0 0)

Sorting multiple arrays using CUDA/Thrust

I have a large array that I need to sort on the GPU. The array itself is a concatenation of multiple smaller subarrays that satisfy the condition that given i < j, the elements of the subarray i are smaller than the elements of the subarray j. An example of such array would be {5 3 4 2 1 6 9 8 7 10 11},
where the elements of the first subarray of 5 elements are smaller than the elements of the second subarray of 6 elements. The array I need is {1, 2, 3, 4, 5, 6, 7, 10, 11}. I know the position where each subarray starts in the large array.
I know I can simply use thrust::sort on the whole array, but I was wondering if it's possible to launch multiple concurrent sorts, one for each subarray. I'm hoping to get a performance improvement by doing that. My assumption is that it would be faster to sort multiple smaller arrays than one large array with all the elements.
I'd appreciate if someone could give me a way to do that or correct my assumption in case it's wrong.
A way to do multiple concurrent sorts (a "vectorized" sort) in thrust is via the marking of the sub arrays, and providing a custom functor that is an ordinary thrust sort functor that also orders the sub arrays by their key.
Another possible method is to use back-to-back thrust::stable_sort_by_key as described here.
As you have pointed out, another method in your case is just to do an ordinary sort, since that is ultimately your objective.
However I think its unlikely that any of the thrust sort methods will give a signficant speed-up over a pure sort, although you can try it. Thrust has a fast-path radix sort which it will use in certain situations, which the pure sort method could probably use in your case. (In other cases, e.g. when you provide a custom functor, thrust will often use a slower merge-sort method.)
If the sizes of the sub arrays are within certain ranges, I think you're likely to get much better results (performance-wise) with block radix sort in cub, one block per sub-array.
Here is an example that uses specific sizes (since you've given no indication of size ranges and other details), comparing a thrust "pure sort" to a thrust segmented sort with functor, to the cub block sort method. For this particular case, the cub sort is fastest:
$ cat
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <thrust/scan.h>
#include <thrust/equal.h>
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <sys/time.h>
#define USECPSEC 1000000ULL
const int num_blocks = 2048;
const int items_per = 4;
const int nTPB = 512;
const int block_size = items_per*nTPB; // must be a whole-number multiple of nTPB;
typedef float mt;
unsigned long long dtime_usec(unsigned long long start){
timeval tv;
gettimeofday(&tv, 0);
return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
struct my_sort_functor
template <typename T, typename T2>
__host__ __device__
bool operator()(T t1, T2 t2){
if (thrust::get<1>(t1) < thrust::get<1>(t2)) return true;
if (thrust::get<1>(t1) > thrust::get<1>(t2)) return false;
if (thrust::get<0>(t1) > thrust::get<0>(t2)) return false;
return true;}
// from:
#define CUB_STDERR
#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <cub/block/block_load.cuh>
#include <cub/block/block_store.cuh>
#include <cub/block/block_radix_sort.cuh>
using namespace cub;
// Globals, constants and typedefs
bool g_verbose = false;
bool g_uniform_keys;
// Kernels
template <
typename Key,
__launch_bounds__ (BLOCK_THREADS)
__global__ void BlockSortKernel(
Key *d_in, // Tile of input
Key *d_out) // Tile of output
// Specialize BlockLoad type for our thread block (uses warp-striped loads for coalescing, then transposes in shared memory to a blocked arrangement)
// Specialize BlockRadixSort type for our thread block
typedef BlockRadixSort<Key, BLOCK_THREADS, ITEMS_PER_THREAD> BlockRadixSortT;
// Shared memory
__shared__ union TempStorage
typename BlockLoadT::TempStorage load;
typename BlockRadixSortT::TempStorage sort;
} temp_storage;
// Per-thread tile items
// Our current block's offset
int block_offset = blockIdx.x * TILE_SIZE;
// Load items into a blocked arrangement
BlockLoadT(temp_storage.load).Load(d_in + block_offset, items);
// Barrier for smem reuse
// Sort keys
// Store output in striped fashion
StoreDirectStriped<BLOCK_THREADS>(threadIdx.x, d_out + block_offset, items);
int main(){
const int ds = num_blocks*block_size;
thrust::host_vector<mt> data(ds);
thrust::host_vector<int> keys(ds);
for (int i = block_size; i < ds; i+=block_size) keys[i] = 1; // mark beginning of blocks
thrust::device_vector<int> d_keys = keys;
for (int i = 0; i < ds; i++) data[i] = (rand()%block_size) + (i/block_size)*block_size; // populate data
thrust::device_vector<mt> d_data = data;
thrust::inclusive_scan(d_keys.begin(), d_keys.end(), d_keys.begin()); // fill out keys array 000111222...
thrust::device_vector<mt> d1 = d_data; // make a copy of unsorted data
unsigned long long os = dtime_usec(0);
thrust::sort(d1.begin(), d1.end()); // ordinary sort
os = dtime_usec(os);
thrust::device_vector<mt> d2 = d_data; // make a copy of unsorted data
unsigned long long ss = dtime_usec(0);
thrust::sort(thrust::make_zip_iterator(thrust::make_tuple(d2.begin(), d_keys.begin())), thrust::make_zip_iterator(thrust::make_tuple(d2.end(), d_keys.end())), my_sort_functor());
ss = dtime_usec(ss);
if (!thrust::equal(d1.begin(), d1.end(), d2.begin())) {std::cout << "oops1" << std::endl; return 0;}
std::cout << "ordinary thrust sort: " << os/(float)USECPSEC << "s " << "segmented sort: " << ss/(float)USECPSEC << "s" << std::endl;
thrust::device_vector<mt> d3(ds);
unsigned long long cs = dtime_usec(0);
BlockSortKernel<mt, nTPB, items_per><<<num_blocks, nTPB>>>(thrust::raw_pointer_cast(, thrust::raw_pointer_cast(;
cs = dtime_usec(cs);
if (!thrust::equal(d1.begin(), d1.end(), d3.begin())) {std::cout << "oops2" << std::endl; return 0;}
std::cout << "cub sort: " << cs/(float)USECPSEC << "s" << std::endl;
$ nvcc -o t1
$ ./t1
ordinary thrust sort: 0.001652s segmented sort: 0.00263s
cub sort: 0.000265s
(CUDA 10.2.89, Tesla V100, Ubuntu 18.04)
I have no doubt that your sizes and array dimensions don't correspond to mine. The purpose here is to illustrate some possible methods, not a black-box solution that works for your particular case. You probably should do benchmark comparisons of your own. I also acknowledge that the block radix sort method for cub expects equal-sized sub-arrays, which you may not have. It may not be a suitable method for you, or you may wish to explore some kind of padding arrangement. There's no need to ask this question of me; I won't be able to answer it based on the information in your question.
I don't claim correctness for this code or any other code that I post. Anyone using any code I post does so at their own risk. I merely claim that I have attempted to address the questions in the original posting, and provide some explanation thereof. I am not claiming my code is defect-free, or that it is suitable for any particular purpose. Use it (or not) at your own risk.

Insert into host_vector using thrust

I'm trying to insert one value into the third location in a host_vector using thrust.
static thrust::host_vector <int *> bins;
int * p;
bins.insert(3, 1, p);
But am getting errors:
error: no instance of overloaded function "thrust::host_vector<T, Alloc>::insert [with T=int *, Alloc=std::allocator<int *>]" matches the argument list
argument types are: (int, int, int *)
object type is: thrust::host_vector<int *, std::allocator<int *>>
Has anyone seen this before, and how can I solve this? I want to use a vector to pass information into the GPU. I was originally trying to use a vector of vectors to represent spatial cells that hold different numbers of data, but learned that wasn't possible with thrust. So instead, I'm using a vector bins that holds my data, sorted by the spatial cell (first 3 values might correspond to the first cell, the next 2 to the second cell, the next 0 to the third cell, etc.). The values held are pointers to particles, and represent the numbers of particles in the spatial cell (which is not known before runtime).
As noted in comments, thrust::host_vector is modelled directly on std::vector and the operation you are trying to use requires an iterator for the position argument, which is why you get a compilation error. You can see this if you consult the relevant documentation:
A complete working example of the code snippet you showed would look like this:
#include <iostream>
#include <thrust/host_vector.h>
int main()
thrust::host_vector <int *> bins(10, reinterpret_cast<int *>(0));
int * p = reinterpret_cast<int *>(0xdeadbeef);
bins.insert(bins.begin()+3, 1, p);
auto it = bins.begin();
for(int i=0; it != bins.end(); ++it, i++) {
int* v = *it;
std::cout << i << " " << v << std::endl;
return 0;
Note that this requires that C++11 language features are enabled in nvcc (so use CUDA 8.0):
~/SO$ nvcc -std=c++11 -arch=sm_52
~/SO$ ./a.out
0 0
1 0
2 0
3 0xdeadbeef
4 0
5 0
6 0
7 0
8 0
9 0
10 0

How to partly sort arrays on CUDA?

Provided I have two arrays:
const int N = 1000000;
float A[N];
myStruct *B[N];
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
Has anyone come across this kind of problem?
Thrust example
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
Thrust's documentation on Github is not up-to-date. As #JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://
Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
__host__ __device__
bool operator()(const int &x)
return x >= 0;
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
This returns:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:
stable_partition differs from partition in that stable_partition is
guaranteed to preserve relative order. That is, if x and y are
elements in [first, last), such that pred(x) == pred(y), and if x
precedes y, then it will still be true after stable_partition that x
precedes y.
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
__host__ __device__
bool operator()(const int &x)
return x >= 0;
void print_vec(const thrust::host_vector<int>& v)
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
int main ()
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
std::cout << " valVec = ";
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
std::cout << " valVec = ";
I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
How about this?:
Count how many positive numbers to determine the inflexion point
Evenly divide each side of the inflexion point into groups (negative-groups are all same length but different length to positive-groups. these groups are the memory chunks for the results)
Use one kernel call (one thread) per chunk pair
Each kernel swaps any out-of-place elements in the input groups into the desired output groups. You will need to flag any chunks that have more swaps than the maximum so that you can fix them during subsequent iterations.
Repeat until done
Memory traffic is swaps only (from original element position, to sorted position). I don't know if this algorithm sounds like anything already defined...
You should be able to achieve this in thrust simply with a modification of your comparison operator:
struct my_compare
__device__ __host__ bool operator()(const float x, const float y) const
return !((x<0.0f) && (y>0.0f));
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
my_compare() );
