Use of const in c++ [duplicate] - c++11

I have defined a constexpr function as following:
constexpr int foo(int i)
{
return i*2;
}
And this is what in the main function:
int main()
{
int i = 2;
cout << foo(i) << endl;
int arr[foo(i)];
for (int j = 0; j < foo(i); j++)
arr[j] = j;
for (int j = 0; j < foo(i); j++)
cout << arr[j] << " ";
cout << endl;
return 0;
}
The program was compiled under OS X 10.8 with command clang++. I was surprised that the compiler did not produce any error message about foo(i) not being a constant expression, and the compiled program actually worked fine. Why?

The definition of constexpr functions in C++ is such that the function is guaranteed to be able to produce a constant expression when called such that only constant expressions are used in the evaluation. Whether the evaluation happens during compile-time or at run-time if the result isn't use in a constexpr isn't specified, though (see also this answer). When passing non-constant expressions to a constexpr you may not get a constant expression.
Your above code should, however, not compile because i is not a constant expression which is clearly used by foo() to produce a result and it is then used as an array dimension. It seems clang implements C-style variable length arrays as it produces the following warning for me:
warning: variable length arrays are a C99 feature [-Wvla-extension]
A better test to see if something is, indeed, a constant expression is to use it to initialize the value of a constexpr, e.g.:
constexpr int j = foo(i);

I used the code at the top (with "using namespace std;" added in) and had no errors when compiling using "g++ -std=c++11 code.cc" (see below for a references that qualifies this code) Here is the code and output:
#include <iostream>
using namespace std;
constexpr int foo(int i)
{
return i*2;
}
int main()
{
int i = 2;
cout << foo(i) << endl;
int arr[foo(i)];
for (int j = 0; j < foo(i); j++)
arr[j] = j;
for (int j = 0; j < foo(i); j++)
cout << arr[j] << " ";
cout << endl;
return 0;
}
output:
4
0 1 2 3
Now consider reference https://msdn.microsoft.com/en-us/library/dn956974.aspx It states: "...A constexpr function is one whose return value can be computed at compile when consuming code requires it. A constexpr function must accept and return only literal types. When its arguments are constexpr values, and consuming code requires the return value at compile time, for example to initialize a constexpr variable or provide a non-type template argument, it produces a compile-time constant. When called with non-constexpr arguments, or when its value is not required at compile-time, it produces a value at run time like a regular function. (This dual behavior saves you from having to write constexpr and non-constexpr versions of the same function.)"
It gives as valid example:
constexpr float exp(float x, int n)
{
return n == 0 ? 1 :
n % 2 == 0 ? exp(x * x, n / 2) :
exp(x * x, (n - 1) / 2) * x;
}

This is an old question, but it's the first result on a google search for the VS error message "constexpr function return is non-constant". And while it doesn't help my situation, I thought I'd put my two cents in...
While Dietmar gives a good explanation of constexpr, and although the error should be caught straight away (as it is with the -pedantic flag) - this code looks like its suffering from some compiler optimization.
The value i is being set to 2, and for the duration of the program i never changes. The compiler probably noticed this and optimized the variable to be a constant (just replacing all references to variable i to the constant 2... before applying that parameter to the function), thus creating a constexpr call to foo().
I bet if you looked at the disassembly you'd see that calls to foo(i) were replaced with the constant value 4 - since that is the only possible return value for a call to this function during execution of the program.
Using the -pedantic flag forces the compiler to analyze the program from the strictest point of view (probably done before any optimizations) and thus catches the error.

Related

Why doesn't move-assigning a std::vector seem to have any performance benefit over copying in this code?

Since move-assigning a std::vector is is a O(1) time operation and copying a std::vector to another is O(N) (where N is the sum of the sizes of the 2 vectors), I expected to see move-assignment having a significant performance advantage over copying. To test this, I wrote the following code, which move-assigns/copies a std::vector nums2 of size 1000 to nums 100,000 times.
#include <iostream>
#include <vector>
#include <chrono>
using namespace std;
int main()
{
auto start = clock();
vector <int> nums;
for(int i = 0; i < 100000; ++i) {
vector <int> nums2(1000);
for(int i = 0; i < 1000; ++i) {
nums2[i] = rand();
}
nums = nums2; // or nums = move(nums2);
cout << (nums[0] ? 1:0) << "\b \b"; // prevent compiler from optimizing out nums (I think)
}
cout << "Time: " << (clock() - start) / (CLOCKS_PER_SEC / 1000) << '\n';
return 0;
}
The compiler I am using is g++ 7.5.0. When running with g++ -std=c++1z -O3, both the move-assign/copy versions take around 1600ms, which does not match with the hypothesis that move-assignment has any significant performance benefit. I then tested using std::swap(nums, nums2) (as an alternative to move-assignment), but that also took around the same time.
So, my question is, why doesn't move-assigning a std::vector to another seem to have a performance advantage over copy-assignment? Do I have a fundamental mistake in my understanding of C++ move-assignment?

Understanding solveInPlace operation in Eigen

I was trying to explore the option of "solveInPlace()" function while using LLT in Eigen3.3.7 to speed up the matrix inverse computation in my application.
I used the following code to test it.
int main()
{
const int M=3;
Eigen::Matrix<MyType,Eigen::Dynamic,Eigen::Dynamic> R = Eigen::Matrix<MyType,Eigen::Dynamic,Eigen::Dynamic>::Zero(M,M);
// to make sure full rank
for(int i=0; i<M*2; i++)
{
const Eigen::Matrix<MyType, Eigen::Dynamic,1> tmp = Eigen::Matrix<MyType,Eigen::Dynamic,1>::Random(M);
R += tmp*tmp.transpose();
}
std::cout<<"R \n";
std::cout<<R<<std::endl;
decltype (R) R0 = R; // saving for later comparison
Eigen::LLT<Eigen::Ref<Eigen::Matrix<MyType,Eigen::Dynamic,Eigen::Dynamic> > > myllt(R);
const Eigen::Matrix<MyType,Eigen::Dynamic,Eigen::Dynamic> I = Eigen::Matrix<MyType,Eigen::Dynamic,Eigen::Dynamic>::Identity(R.rows(), R.cols());
myllt.solveInPlace(I);
std::cout<<"I: "<<I<<std::endl;
std::cout<<"Prod InPlace: \n"<<R0*I<<std::endl;
return 0;
}
After reading the Eigen documentation, I thought that the input matrix (here "R") will be modified while computing the transform. To my surprise, I found that the results is store in "I". This was not expected as I defined "I" as a constant. Please provide an explanation for this behaviour.
The simple non-compiler answer would be that you're asking for the LLT to solve in-place (i.e. in the passed parameter) so what would you expect the result to be? Apparently, you would expect it to be a compiler error, as the "in-place" means change the parameter, but you're passing a const object.
So, if we search the Eigen docs for solveInPlace, we find the only item that takes a const reference to have the following note:
"in-place" version of TriangularView::solve() where the result is written in other
Warning
The parameter is only marked 'const' to make the C++ compiler accept a temporary expression here. This function will const_cast it, so constness isn't honored here.
The non-in-place option would be:
R = myllt.solve(I);
but that won't really speed up the calculation. In any case, benchmark before you decide that you need the in-place option.
You're question is in place, as what const_cast is meant to do is strip references/pointers of their const-ness iff the underlying variable is not const qualified* (cppref). If you were to write some examples
const int i = 4;
int& iRef = const_cast<int&>(i); // UB, i is actually const
std::cout << i; // Prints "I want coffee", or it can as we like UB
int j = 4;
const int& jRef = j;
const_cast<int&>(jRef)++; // Legal. Underlying variable is not const.
std::cout << j; // Prints 5
The case with i may well work as expected or not, we're dependent on each implementation/compiler. It may work with gcc but not with clang or MSVC. There are no guarantees. As you are indirectly invoking UB in your example, the compiler can choose to do what you expect or something else entirely.
*Technically it's the modification that's UB, not the const_cast itself.

How do the conversions between signed, unsigned and float types work?

The compiler I use is g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4.
I compile my programs with the following command:
g++ -std=c++11 -pedantic -Wall program.cpp
The program no. 1.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54;
cout << b << endl;
return 0;
}
The program prints 4294967242 and this is the value I expected, because this is the case when we assign an out-of-range value to a variable of unsigned type, so the result is the remainder of a modulo division.
The program no. 2.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = 54.1234;
cout << b << endl;
return 0;
}
The program prints 54, and this is also OK, because the stored value is the part before the decimal point, and the franctional part is truncated.
The program no. 3.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54.1234;
cout << b << endl;
return 0;
}
Here during compilation I get the warning "overflow in implicit constant conversion".
And the program prints 0. Why is it so? I thought that it will do the truncation of the fractional part (as in program 2) and then store the result of the modulo division (as in program 1).
But if I write program no. 4.:
program no. 4.
#include <iostream>
using namespace std;
int main() {
unsigned int b;
float k = -54.1234;
b = k;
cout << b << endl;
return 0;
}
then I get no warning, and I get the result (expected by me) 4294967242, which is the result of the modulo division.
I would be grateful if somebody can explain it to me.
Why doesn't the program no. 3 behave like program no. 4? Why don't I get a warning when compiling program no. 1, but I get one when compiling program no. 3.?
According to the standard (§[conv.fpint]).
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.
So, your -54.1234 is truncated to -54. Since that can't be represented in an unsigned, you get undefined behavior.
When converting floating point numbers to integers, C and C++ round floating point numbers towards zero. The rounded result must then be representable in the destination type.
As a result, for 32 bit unsigned int the conversion is guaranteed to give the correct result if -1 < x < 2^32. For smaller numbers there are no guarantees. Since numbers between -1 and 0 must be rounded to zero, and numbers -1 and smaller have no requirements, it wouldn't be surprising if the compiler checks whether x < 0 and gives a result of 0 in that case. (The compiler might check whether x < 1 and give a result of 0; this handles very small positive numbers as well).

Simple random number generator that can generate nth number in series in O(1) time

I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
}
mPower *= mPower;
}
}
};
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
R.setPosition(0);
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
}
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
}
mi = val;
}
}
};
friend struct lookup_t;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
}
};
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
public:
unsigned long long GetSeed() const {
return m_counter;
}
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
}
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
IncrementCounter();
return r;
}
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl;
}
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl << endl;
return 0;
}

Random numbers, C++11 vs Boost

I want to generate pseudo-random numbers in C++, and the two likely options are the feature of C++11 and the Boost counterpart. They are used in essentially the same way, but the native one in my tests is roughly 4 times slower.
Is that due to design choices in the library, or am I missing some way of disabling debug code somewhere?
Update: Code is here, https://github.com/vbeffara/Simulations/blob/master/tests/test_prng.cpp and looks like this:
cerr << "boost::bernoulli_distribution ... \ttime = ";
s=0; t=time();
boost::bernoulli_distribution<> dist(.5);
boost::mt19937 boostengine;
for (int i=0; i<n; ++i) s += dist(boostengine);
cerr << time()-t << ", \tsum = " << s << endl;
cerr << "C++11 style ... \ttime = ";
s=0; t=time();
std::bernoulli_distribution dist2(.5);
std::mt19937_64 engine;
for (int i=0; i<n; ++i) s += dist2(engine);
cerr << time()-t << ", \tsum = " << s << endl;
(Using std::mt19937 instead of std::mt19937_64 makes it even slower on my system.)
That’s pretty scary.
Let’s have a look:
boost::bernoulli_distribution<>
if(_p == RealType(0))
return false;
else
return RealType(eng()-(eng.min)()) <= _p * RealType((eng.max)()-(eng.min)());
std::bernoulli_distribution
__detail::_Adaptor<_UniformRandomNumberGenerator, double> __aurng(__urng);
if ((__aurng() - __aurng.min()) < __p.p() * (__aurng.max() - __aurng.min()))
return true;
return false;
Both versions invoke the engine and check if the output lies in a portion of the range of values proportional to the given probability.
The big difference is, that the gcc version calls the functions of a helper class _Adaptor.
This class’ min and max functions return 0 and 1 respectively and operator() then calls std::generate_canonical with the given URNG to obtain a value between 0 and 1.
std::generate_canonical is a 20 line function with a loop – which will never iteratate more than once in this case, but it adds complexity.
Apart from that, boost uses the param_type only in the constructor of the distribution, but then saves _p as a double member, whereas gcc has a param_type member and has to “get” the value of it.
This all comes together and the compiler fails in optimizing.
Clang chokes even more on it.
If you hammer hard enough you can even get std::mt19937 and boost::mt19937 en par for gcc.
It would be nice to test libc++ too, maybe i’ll add that later.
tested versions: boost 1.55.0, libstdc++ headers of gcc 4.8.2
line numbers on request^^

Resources