std::map operator< pointer address compare vs pointer value compare - c++11

I was investigating how map handles custom types and I came across some odd behavior.
I created a custom type ´ComplexType´ that has 1 member, a pointer to an int.
I first compared using the value of this int, which gave the expected behavior.
#include <iostream>
#include <map>
struct ComplexType
{
ComplexType(int i): index(new int(i)){
};
ComplexType(const ComplexType& cT): index(new int(*cT.index)){
}
~ComplexType(){
if(index){
delete index;
}
}
bool operator<(const ComplexType cT) const
{
return *index < *cT.index;
}
int* index;
};
int main(){
int pi[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 8};
std::map< ComplexType , int > container;
for(int i = 0; i < 12; ++i){
container[ComplexType(i)] = pi[i];
}
std::cout << "Loop map, size: " << container.size() << std::endl;
for(auto it = container.begin();it != container.end(); it++){
std::cout << "Show index map, size: " << container.size() << std::endl;
std::cout << *it->first.index << std::endl;
}
return 0;
}
With the output:
Loop map, size: 12
Show index map, size: 12
0
Show index map, size: 12
1
Show index map, size: 12
2
Show index map, size: 12
3
Show index map, size: 12
4
Show index map, size: 12
5
Show index map, size: 12
6
Show index map, size: 12
7
Show index map, size: 12
8
Show index map, size: 12
9
Show index map, size: 12
10
Show index map, size: 12
11
Now I changed my compare function to compare on the address of the pointer.
#include <iostream>
#include <map>
struct ComplexType
{
ComplexType(int i): index(new int(i)){
};
ComplexType(const ComplexType& cT): index(new int(*cT.index)){
}
~ComplexType(){
if(index){
delete index;
}
}
bool operator<(const ComplexType cT) const
{
return index < cT.index;
}
int* index;
};
int main(){
int pi[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 8};
std::map< ComplexType , int > container;
for(int i = 0; i < 12; ++i){
container[ComplexType(i)] = pi[i];
}
std::cout << "Loop map, size: " << container.size() << std::endl;
for(auto it = container.begin();it != container.end(); it++){
std::cout << "Show index map, size: " << container.size() << std::endl;
std::cout << *it->first.index << std::endl;
}
return 0;
}
I expected this to result in a random order based on what addresses the pointer got on the heap. Instead I got the following:
Loop map, size: 12
Show index map, size: 12
1
Show index map, size: 12
0
I compiled using g++ (GCC) 5.3.0
\randomness map>g++ -std=c++11 -o mapConstructionComplexType mapConstructionComplexType.cpp
\randomness map>g++ -std=c++11 -o mapConstructionComplexTypePointerCmp mapConstructionComplexTypePointerCmp.cpp
Can anyone explain this odd behavior?

Related

hash function for a 64-bit OS/compile, for an object that's really just a 4-byte int

I have a class named Foo that is privately nothing more than 4-byte int. If I return its value as an 8-byte size_t, am I going to be screwing up unordered_map<> or anything else? I could fill all bits with something like return foo + foo << 32;. Would that be better, or would it be worse as all hashes are now multiples of 0x100000001? Or how about return ~foo + foo << 32; which would use all 64 bits and also not have a common factor?
namespace std {
template<> struct hash<MyNamespace::Foo> {
typedef size_t result_type;
typedef MyNamespace::Foo argument_tupe;
size_t operator() (const MyNamespace::Foo& f ) const { return (size_t) f.u32InternalValue; }
};
}
An incremental uint32_t key converted to uint64_t works well
unordered_map will reserve space for the hash-table incrementally.
The less significant bits of the key is used to determine the bucket position, in an example for 4 entries/buckets, the less significant 2 bits are used.
Elements with a key giving the same bucket (multiple of the number of buckets) are chained in a linked list. This carry the concept of load-factor.
// 4 Buckets example
******** ******** ******** ******** ******** ******** ******** ******XX
bucket 00 would contains keys like {0, 256, 200000 ...}
bucket 01 would contains keys like {1, 513, 4008001 ...}
bucket 10 would contains keys like {2, 130, 10002 ...}
bucket 11 would contains keys like {3, 259, 1027, 20003, ...}
If you try to save an additional values in a bucket, and it load factor goes over the limit, the table is resized (e.g. you try to save a 5th element in a 4-bucket table with load_factor=1.0).
Consequently:
Having a uint32_t or a uint64_t key will have little impact until you reach 2^32-elements hash-table.
Would that be better, or would it be worse as all hashes are now multiples of 0x100000001?
This will have no impact until you reach 32-bits overflow (2^32) hash-table.
Good key conversion between incremental uint32_t and uint64_t:
key64 = static_cast<uint64>(key32);
Bad key conversion between incremental uint32_t and uint64_t:
key64 = static_cast<uint64>(key32)<<32;
The best is to keep the keys as even as possible, avoiding hashes with the same factor again and again. E.g. in the code below, keys with all factor 7 would have collision until resized to 16 buckets.
https://onlinegdb.com/r1N7TNySv
#include <iostream>
#include <unordered_map>
using namespace std;
// Print to std output the internal structure of an unordered_map.
template <typename K, typename T>
void printMapStruct(unordered_map<K, T>& map)
{
cout << "The map has " << map.bucket_count()<<
" buckets and max load factor: " << map.max_load_factor() << endl;
for (size_t i=0; i< map.bucket_count(); ++i)
{
cout << " Bucket " << i << ": ";
for (auto it=map.begin(i); it!=map.end(i); ++it)
{
cout << it->first << " ";
}
cout << endl;
}
cout << endl;
}
// Print the list of bucket sizes by this implementation
void printMapResizes()
{
cout << "Map bucket counts:"<< endl;
unordered_map<size_t, size_t> map;
size_t lastBucketSize=0;
for (size_t i=0; i<1024*1024; ++i)
{
if (lastBucketSize!=map.bucket_count())
{
cout << map.bucket_count() << " ";
lastBucketSize = map.bucket_count();
}
map.emplace(i,i);
}
cout << endl;
}
int main()
{
unordered_map<size_t,size_t> map;
printMapStruct(map);
map.emplace(0,0);
map.emplace(1,1);
printMapStruct(map);
map.emplace(72,72);
map.emplace(17,17);
printMapStruct(map);
map.emplace(7,7);
map.emplace(14,14);
printMapStruct(map);
printMapResizes();
return 0;
}
Note over the bucket count:
In the above example, the bucket count is as follow:
1 3 7 17 37 79 167 337 709 1493 3209 6427 12983 26267 53201 107897 218971 444487 902483 1832561
This seems to purposely follow a series of prime numbers (minimizing collisions). I am not aware of the function behind.
std::unordered_map<> bucket_count() after default rehash

How make a stride chunk iterator thrust cuda

I need a class iterator like this
https://github.com/thrust/thrust/blob/master/examples/strided_range.cu
but that this new iterator do the next sequence
[k * size_stride, k * size_stride+1, ...,k * size_stride+size_chunk-1,...]
with
k = 0,1,...,N
Example:
size_stride = 8
size_chunk = 3
N = 3
then the sequence is
[0,1,2,8,9,10,16,17,18,24,25,26]
I don't know how do this efficiently...
The strided range interator is basically a carefully crafted permutation iterator with a functor that gives the appropriate indices for permutation.
Here is a modification to the strided range iterator example. The main changes were:
include the chunk size as an iterator parameter
modify the functor that provides the indices for the permutation iterator to spit out the desired sequence
adjust the definitions of .end() iterator to provide the appropriate length of sequence.
Worked example:
$ cat t1280.cu
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/functional.h>
#include <thrust/fill.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/sequence.h>
#include <iostream>
#include <assert.h>
// this example illustrates how to make strided-chunk access to a range of values
// examples:
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 1,1) -> [0, 1, 2, 3, 4, 5, 6]
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 2,1) -> [0, 2, 4, 6]
// strided_chunk_range([0, 1, 2, 3, 4, 5, 6], 3,2) -> [0 ,1, 3, 4, 6]
// ...
template <typename Iterator>
class strided_chunk_range
{
public:
typedef typename thrust::iterator_difference<Iterator>::type difference_type;
struct stride_functor : public thrust::unary_function<difference_type,difference_type>
{
difference_type stride;
int chunk;
stride_functor(difference_type stride, int chunk)
: stride(stride), chunk(chunk) {}
__host__ __device__
difference_type operator()(const difference_type& i) const
{
int pos = i/chunk;
return ((pos * stride) + (i-(pos*chunk)));
}
};
typedef typename thrust::counting_iterator<difference_type> CountingIterator;
typedef typename thrust::transform_iterator<stride_functor, CountingIterator> TransformIterator;
typedef typename thrust::permutation_iterator<Iterator,TransformIterator> PermutationIterator;
// type of the strided_range iterator
typedef PermutationIterator iterator;
// construct strided_range for the range [first,last)
strided_chunk_range(Iterator first, Iterator last, difference_type stride, int chunk)
: first(first), last(last), stride(stride), chunk(chunk) {assert(chunk<=stride);}
iterator begin(void) const
{
return PermutationIterator(first, TransformIterator(CountingIterator(0), stride_functor(stride, chunk)));
}
iterator end(void) const
{
int lmf = last-first;
int nfs = lmf/stride;
int rem = lmf-(nfs*stride);
return begin() + (nfs*chunk) + ((rem<chunk)?rem:chunk);
}
protected:
Iterator first;
Iterator last;
difference_type stride;
int chunk;
};
int main(void)
{
thrust::device_vector<int> data(50);
thrust::sequence(data.begin(), data.end());
typedef thrust::device_vector<int>::iterator Iterator;
// create strided_chunk_range
std::cout << "stride 3, chunk 2, length 7" << std::endl;
strided_chunk_range<Iterator> scr1(data.begin(), data.begin()+7, 3, 2);
thrust::copy(scr1.begin(), scr1.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << std::endl;
std::cout << "stride 8, chunk 3, length 50" << std::endl;
strided_chunk_range<Iterator> scr(data.begin(), data.end(), 8, 3);
thrust::copy(scr.begin(), scr.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << std::endl;
return 0;
}
$ nvcc -arch=sm_35 -o t1280 t1280.cu
$ ./t1280
stride 3, chunk 2, length 7
0 1 3 4 6
stride 8, chunk 3, length 50
0 1 2 8 9 10 16 17 18 24 25 26 32 33 34 40 41 42 48 49
$
This is probably not the most optimal implementation, in particular because we are doing division in the permutation functor, but it should get you started.
I assume (and test for) chunk<=stride, because this seemed reasonable to me, and simplified my thought process. I'm sure it could be modified, with an appropriate example of what sequence you would like to see, for the case where chunk>stride.

Conversion of data type using auto in C++

I have 2 vector container which contains 2 different kind of value with data type uint32_t. I want to print both of them together.
Like this is what I have
vector<uint32_t> data1;
vector<uint32_t> data2;
Now I know a method for single data like below
for(auto const& d1: data1)
cout<< d1 << endl;
But I want to print both data together like this,
cout<< d1 << "\t" << d2 << endl;
How can I do this using auto? (where d2 is auto converted value from data2)
You could use a normal for loop over the index:
for (auto i = 0u; i != n; ++i)
std::cout << data1[i] << "\t" << data2[i] << "\n";
Edit: if you want to convert the uint32_t to an int, for example, you could do:
auto d1 = static_cast<int>(data1[i]);
but it is up to you to ensure the conversion is safe. i.e the value fits in the target type.
Use the Boost Zip Iterator, which will let you have a range of pairs rather than two ranges of the vectors' data types. Something along the lines of:
#include <boost/iterator/zip_iterator.hpp>
#include <boost/range.hpp>
#include <stdint.h>
#include <vector>
#include <iostream>
template <typename... TContainer>
auto zip(TContainer&... containers) -> boost::iterator_range<boost::zip_iterator<decltype(boost::make_tuple(std::begin(containers)...))>> {
auto zip_begin = boost::make_zip_iterator(boost::make_tuple(std::begin(containers)...));
auto zip_end = boost::make_zip_iterator(boost::make_tuple(std::end(containers)...));
return boost::make_iterator_range(zip_begin, zip_end);
}
int main()
{
std::vector<uint32_t> data1( { 11, 22, 33 } );
std::vector<uint32_t> data2( { 44, 55, 66 } );
for (auto t : zip(data1, data2)) {
std::cout << boost::get<0>(t) << "\t" << boost::get<1>(t) << "\n";
}
}
The zip() function is due to this question and you can put it in a separate header file since it's not specific to your case.
If possible (and plausible for your use case): work with a container of pairs
If your application is not in a bind w.r.t. computer resources, and you know that you will be working with the values of your two containers as pairs (assuming same-length containers, as in your example), it might be useful to actually work with a container of pairs, which also ease the use of the neat range-based for loops ( >= C++11).
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<uint32_t> data1 = {1, 2, 3};
std::vector<uint32_t> data2 = {4, 5, 6};
// construct container of (int, int) pairs
std::vector<std::pair<int, int>> data;
data.reserve(data1.size());
std::transform(data1.begin(), data1.end(), data2.begin(), std::back_inserter(data),
[](uint32_t first, uint32_t second) {
return std::make_pair(static_cast<int>(first), static_cast<int>(second));
}); /* as noted in accepted answer: you're responsible for
ensuring that the conversion here is safe */
// easily use range-based for loops to traverse of the
// pairs of your container
for(const auto& pair: data) {
std::cout << pair.first << " " << pair.second << "\n";
} /* 1 4
2 5
3 6 */
return 0;
}

How do a find the maximum non-repeating number in an integer array?

Suppose I have an unsorted integer array {3, -1, 4, 5, -3, 2, 5}, and I want to find the maximum non-repeating number (4 in this case) (5 being invalid as it is repeated). How can I achieve this?
Use an unordered map to count the frequencies of each element. (As an optimization, keep track of largest element encountered and skip elements lower than that.) Then, scan the map to find out the largest element with frequency exactly equal to 1.
template <typename T> // numeric T
pair<T, bool> FindMaxNonRepeating(vector<T> const& vec) {
unordered_map<T, int> elem2freq;
for (auto const& elem : vec) {
elem2freq[elem] += 1;
}
T largest_non_repetitive = std::numeric_limits<T>::min();
bool found = false;
for (auto const& item : elem2freq) {
if (item.first > largest_non_repetitive && item.second == 1) {
largest_non_repetitive = item.first;
found = true;
}
}
return {largest_non_repetitive, found};
}
This runs in time complexity O(n) and requires space complexity O(n).
Sort the array in descending order.
Begin from top element and store it a variable, say max.
Check next element with max, if they are the same, repeat until
you find the next max, otherwise, you found the max non-repeated
number.
Time complexity: O(nlogn)
c++ implementation, based on my Sort (C++):
#include <algorithm>
#include <iostream>
#include <vector>
#include <limits>
#include <cstddef>
using namespace std;
void printVector(vector<int>& v)
{
for(vector<int>::iterator it = v.begin() ; it != v.end() ; it++)
cout << *it << ' ';
cout << endl;
}
bool compar(const int& a, const int& b)
{
return (a > b) ? true : false;
}
int main()
{
vector<int> v = {3, -1, 4, 5, -3, 2, 5};
cout << "Before sorting : " << endl;
printVector(v);
sort(v.begin(), v.end(), compar);
cout << endl << "After sorting : " << endl;
printVector(v);
int max_non_repeat = numeric_limits<int>::min();
for(unsigned int i = 0; i < v.size(); ++i)
{
if(max_non_repeat == v[i])
max_non_repeat = numeric_limits<int>::min();
else if(v[i] > max_non_repeat)
max_non_repeat = v[i];
}
cout << "Max non-repeated element: " << max_non_repeat << endl;
return 0;
}
Output:
C02QT2UBFVH6-lm:~ gsamaras$ g++ -Wall -std=c++0x main.cpp
C02QT2UBFVH6-lm:~ gsamaras$ ./a.out
Before sorting :
3 -1 4 5 -3 2 5
After sorting :
5 5 4 3 2 -1 -3
Max non-repeated element: 4
For maximum pleasure, do base your (a different) approach on How to find max. and min. in array using minimum comparisons? and modify it accordingly.

Is it possible to have several edge weight property maps for one graph?

How would I create a graph, such that the property map (weight of edges) is different in each property map? Is it possible to create such a property map?
Like an array of property maps?
I have not seen anyone on the Internet using it, could I have an example?
Graph g(10); // graph with 10 nodes
cin>>a>>b>>weight1>>weight2>>weight3>>weight4;
and put each weight in a property map.
You can compose a property map in various ways. The simplest approach would seem something like:
Using C++11 lambdas with function_property_map
Live On Coliru
#include <boost/property_map/function_property_map.hpp>
#include <iostream>
struct weights_t {
float weight1, weight2, weight3, weight4;
};
using namespace boost;
int main() {
std::vector<weights_t> weight_data { // index is vertex id
{ 1,2,3,4 },
{ 5,6,7,8 },
{ 9,10,11,12 },
{ 13,14,15,16 },
};
auto wmap1 = make_function_property_map<unsigned, float>([&weight_data](unsigned vertex_id) { return weight_data.at(vertex_id).weight1; });
auto wmap2 = make_function_property_map<unsigned, float>([&weight_data](unsigned vertex_id) { return weight_data.at(vertex_id).weight2; });
auto wmap3 = make_function_property_map<unsigned, float>([&weight_data](unsigned vertex_id) { return weight_data.at(vertex_id).weight3; });
auto wmap4 = make_function_property_map<unsigned, float>([&weight_data](unsigned vertex_id) { return weight_data.at(vertex_id).weight4; });
for (unsigned vertex = 0; vertex < weight_data.size(); ++vertex)
std::cout << wmap1[vertex] << "\t" << wmap2[vertex] << "\t" << wmap3[vertex] << "\t"<< wmap4[vertex] << "\n";
}
Using C++03 with transform_value_property_map
This is mainly much more verbose:
Live On Coliru
#include <boost/property_map/transform_value_property_map.hpp>
#include <iostream>
struct weights_t {
float weight1, weight2, weight3, weight4;
weights_t(float w1, float w2, float w3, float w4)
: weight1(w1), weight2(w2), weight3(w3), weight4(w4)
{ }
template <int which> struct access {
typedef float result_type;
float operator()(weights_t const& w) const {
BOOST_STATIC_ASSERT(which >= 1 && which <= 4);
switch (which) {
case 1: return w.weight1;
case 2: return w.weight2;
case 3: return w.weight3;
case 4: return w.weight4;
}
}
};
};
using namespace boost;
int main() {
std::vector<weights_t> weight_data; // index is vertex id
weight_data.push_back(weights_t(1,2,3,4));
weight_data.push_back(weights_t(5,6,7,8));
weight_data.push_back(weights_t(9,10,11,12));
weight_data.push_back(weights_t(13,14,15,16));
boost::transform_value_property_map<weights_t::access<1>, weights_t*, float> wmap1 = make_transform_value_property_map(weights_t::access<1>(), &weight_data[0]);
boost::transform_value_property_map<weights_t::access<2>, weights_t*, float> wmap2 = make_transform_value_property_map(weights_t::access<2>(), &weight_data[0]);
boost::transform_value_property_map<weights_t::access<3>, weights_t*, float> wmap3 = make_transform_value_property_map(weights_t::access<3>(), &weight_data[0]);
boost::transform_value_property_map<weights_t::access<4>, weights_t*, float> wmap4 = make_transform_value_property_map(weights_t::access<4>(), &weight_data[0]);
for (unsigned vertex = 0; vertex < weight_data.size(); ++vertex)
std::cout << wmap1[vertex] << "\t" << wmap2[vertex] << "\t" << wmap3[vertex] << "\t"<< wmap4[vertex] << "\n";
}
Output
Both samples output
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

Resources