i dont know what what is the error its printing 3.18e+04 - precision

#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
double A,R;
R=100.64;
R=R*R;
A=3.14159*R;
cout<< setprecision(3)<<A<<endl;
return 0;
}

The reasonably precise and accurate(a) value you would get from those calculations (mathematically) is 31,819.31032.
You have asked for a precision of three digits and, with that value and the floating point format currently active (probably std::defaultfloat), it's only giving you three significant digits:
3.18e+04 (3.18x104 in mathematical form).
If your intent is to instead show three digits after the decimal point, you can do that with the std::fixed manipulator:
#include <iostream>
#include <iomanip>
int main() {
double R = 100.64;
double A = 3.14159 * R * R;
std::cout << std::setprecision(3) << std::fixed << A << '\n';
return 0;
}
This gives 31819.310.
(a) Make sure you never conflate these two, they're different concepts. See for example, the following values of π you may come up with:
Value
Properties
9
Both im-precise and in-accurate.
3
Im-precise but accurate.
2.718281828459
Precise but in-accurate.
3.141592653590
Both precise and accurate.
π
Has maximum precision and accuracy.

Related

Recommended way to cast a boost cpp_int to a double?

I have some code were I avoid some costly divisions by converting a boost integer to a double. For the real code I will build an fp type that's big enough to hold the maximal value (exponent). To test I am using a double. So I do this:
#define NTYPE_BITS 512
typedef number<cpp_int_backend<NTYPE_BITS, NTYPE_BITS, unsigned_magnitude, unchecked, void> > NTYPE;
NTYPE a1 = BIG_VALUE;
double a1f = (double)a1;
The code generated for that cast is quite complicated. I see it's basically looping over all the values in a1 (least significant first) scaling them by powers of two.
Now in this case I guess at most the number of elements that could affect the result are the last two (64 bits for each element and the most significant element might have less that 64 bits used).
Is there a better way to do this?
First off, NEVER use C-Style casts. (Why use static_cast<int>(x) instead of (int)x?).
Second, avoid using namespace.
(Third, reserve all-caps names for macros).
That said:
double a1f = a1.convert_to<double>();
Is your ticket.
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
#include <iostream>
namespace bmp = boost::multiprecision;
//0xDEADBEEFE1E104B1D00008BADF00D000ABADBABE000D15EA5E
#define BIG_VALUE "0xDEADBEEFE1E104B1D00008BADF00D000ABADBABE000D15EA5E"
#define NTYPE_BITS 512
int main() {
using NTYPE = bmp::number<
bmp::cpp_int_backend<
NTYPE_BITS, NTYPE_BITS,
bmp::unsigned_magnitude, bmp::unchecked, void>>;
NTYPE a1(BIG_VALUE);
std::cout << a1 << "\n";
std::cout << std::hex << a1 << "\n";
std::cout << a1.convert_to<double>() << "\n";
}
Prints
1397776821048146366831161011449418369017198837637750820563550
deadbeefe1e104b1d00008badf00d000abadbabe000d15ea5e
1.39778e+60

Sorting multiple arrays using CUDA/Thrust

I have a large array that I need to sort on the GPU. The array itself is a concatenation of multiple smaller subarrays that satisfy the condition that given i < j, the elements of the subarray i are smaller than the elements of the subarray j. An example of such array would be {5 3 4 2 1 6 9 8 7 10 11},
where the elements of the first subarray of 5 elements are smaller than the elements of the second subarray of 6 elements. The array I need is {1, 2, 3, 4, 5, 6, 7, 10, 11}. I know the position where each subarray starts in the large array.
I know I can simply use thrust::sort on the whole array, but I was wondering if it's possible to launch multiple concurrent sorts, one for each subarray. I'm hoping to get a performance improvement by doing that. My assumption is that it would be faster to sort multiple smaller arrays than one large array with all the elements.
I'd appreciate if someone could give me a way to do that or correct my assumption in case it's wrong.
A way to do multiple concurrent sorts (a "vectorized" sort) in thrust is via the marking of the sub arrays, and providing a custom functor that is an ordinary thrust sort functor that also orders the sub arrays by their key.
Another possible method is to use back-to-back thrust::stable_sort_by_key as described here.
As you have pointed out, another method in your case is just to do an ordinary sort, since that is ultimately your objective.
However I think its unlikely that any of the thrust sort methods will give a signficant speed-up over a pure sort, although you can try it. Thrust has a fast-path radix sort which it will use in certain situations, which the pure sort method could probably use in your case. (In other cases, e.g. when you provide a custom functor, thrust will often use a slower merge-sort method.)
If the sizes of the sub arrays are within certain ranges, I think you're likely to get much better results (performance-wise) with block radix sort in cub, one block per sub-array.
Here is an example that uses specific sizes (since you've given no indication of size ranges and other details), comparing a thrust "pure sort" to a thrust segmented sort with functor, to the cub block sort method. For this particular case, the cub sort is fastest:
$ cat t1.cu
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <thrust/scan.h>
#include <thrust/equal.h>
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <sys/time.h>
#define USECPSEC 1000000ULL
const int num_blocks = 2048;
const int items_per = 4;
const int nTPB = 512;
const int block_size = items_per*nTPB; // must be a whole-number multiple of nTPB;
typedef float mt;
unsigned long long dtime_usec(unsigned long long start){
timeval tv;
gettimeofday(&tv, 0);
return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
}
struct my_sort_functor
{
template <typename T, typename T2>
__host__ __device__
bool operator()(T t1, T2 t2){
if (thrust::get<1>(t1) < thrust::get<1>(t2)) return true;
if (thrust::get<1>(t1) > thrust::get<1>(t2)) return false;
if (thrust::get<0>(t1) > thrust::get<0>(t2)) return false;
return true;}
};
// from: https://nvlabs.github.io/cub/example_block_radix_sort_8cu-example.html#_a0
#define CUB_STDERR
#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <cub/block/block_load.cuh>
#include <cub/block/block_store.cuh>
#include <cub/block/block_radix_sort.cuh>
using namespace cub;
//---------------------------------------------------------------------
// Globals, constants and typedefs
//---------------------------------------------------------------------
bool g_verbose = false;
bool g_uniform_keys;
//---------------------------------------------------------------------
// Kernels
//---------------------------------------------------------------------
template <
typename Key,
int BLOCK_THREADS,
int ITEMS_PER_THREAD>
__launch_bounds__ (BLOCK_THREADS)
__global__ void BlockSortKernel(
Key *d_in, // Tile of input
Key *d_out) // Tile of output
{
enum { TILE_SIZE = BLOCK_THREADS * ITEMS_PER_THREAD };
// Specialize BlockLoad type for our thread block (uses warp-striped loads for coalescing, then transposes in shared memory to a blocked arrangement)
typedef BlockLoad<Key, BLOCK_THREADS, ITEMS_PER_THREAD, BLOCK_LOAD_WARP_TRANSPOSE> BlockLoadT;
// Specialize BlockRadixSort type for our thread block
typedef BlockRadixSort<Key, BLOCK_THREADS, ITEMS_PER_THREAD> BlockRadixSortT;
// Shared memory
__shared__ union TempStorage
{
typename BlockLoadT::TempStorage load;
typename BlockRadixSortT::TempStorage sort;
} temp_storage;
// Per-thread tile items
Key items[ITEMS_PER_THREAD];
// Our current block's offset
int block_offset = blockIdx.x * TILE_SIZE;
// Load items into a blocked arrangement
BlockLoadT(temp_storage.load).Load(d_in + block_offset, items);
// Barrier for smem reuse
__syncthreads();
// Sort keys
BlockRadixSortT(temp_storage.sort).SortBlockedToStriped(items);
// Store output in striped fashion
StoreDirectStriped<BLOCK_THREADS>(threadIdx.x, d_out + block_offset, items);
}
int main(){
const int ds = num_blocks*block_size;
thrust::host_vector<mt> data(ds);
thrust::host_vector<int> keys(ds);
for (int i = block_size; i < ds; i+=block_size) keys[i] = 1; // mark beginning of blocks
thrust::device_vector<int> d_keys = keys;
for (int i = 0; i < ds; i++) data[i] = (rand()%block_size) + (i/block_size)*block_size; // populate data
thrust::device_vector<mt> d_data = data;
thrust::inclusive_scan(d_keys.begin(), d_keys.end(), d_keys.begin()); // fill out keys array 000111222...
thrust::device_vector<mt> d1 = d_data; // make a copy of unsorted data
cudaDeviceSynchronize();
unsigned long long os = dtime_usec(0);
thrust::sort(d1.begin(), d1.end()); // ordinary sort
cudaDeviceSynchronize();
os = dtime_usec(os);
thrust::device_vector<mt> d2 = d_data; // make a copy of unsorted data
cudaDeviceSynchronize();
unsigned long long ss = dtime_usec(0);
thrust::sort(thrust::make_zip_iterator(thrust::make_tuple(d2.begin(), d_keys.begin())), thrust::make_zip_iterator(thrust::make_tuple(d2.end(), d_keys.end())), my_sort_functor());
cudaDeviceSynchronize();
ss = dtime_usec(ss);
if (!thrust::equal(d1.begin(), d1.end(), d2.begin())) {std::cout << "oops1" << std::endl; return 0;}
std::cout << "ordinary thrust sort: " << os/(float)USECPSEC << "s " << "segmented sort: " << ss/(float)USECPSEC << "s" << std::endl;
thrust::device_vector<mt> d3(ds);
cudaDeviceSynchronize();
unsigned long long cs = dtime_usec(0);
BlockSortKernel<mt, nTPB, items_per><<<num_blocks, nTPB>>>(thrust::raw_pointer_cast(d_data.data()), thrust::raw_pointer_cast(d3.data()));
cudaDeviceSynchronize();
cs = dtime_usec(cs);
if (!thrust::equal(d1.begin(), d1.end(), d3.begin())) {std::cout << "oops2" << std::endl; return 0;}
std::cout << "cub sort: " << cs/(float)USECPSEC << "s" << std::endl;
}
$ nvcc -o t1 t1.cu
$ ./t1
ordinary thrust sort: 0.001652s segmented sort: 0.00263s
cub sort: 0.000265s
$
(CUDA 10.2.89, Tesla V100, Ubuntu 18.04)
I have no doubt that your sizes and array dimensions don't correspond to mine. The purpose here is to illustrate some possible methods, not a black-box solution that works for your particular case. You probably should do benchmark comparisons of your own. I also acknowledge that the block radix sort method for cub expects equal-sized sub-arrays, which you may not have. It may not be a suitable method for you, or you may wish to explore some kind of padding arrangement. There's no need to ask this question of me; I won't be able to answer it based on the information in your question.
I don't claim correctness for this code or any other code that I post. Anyone using any code I post does so at their own risk. I merely claim that I have attempted to address the questions in the original posting, and provide some explanation thereof. I am not claiming my code is defect-free, or that it is suitable for any particular purpose. Use it (or not) at your own risk.

C++ and rand/srand behavior not performing as expected

I am using rand and srand from cstdlib and g++ as a compiler. I was playing around trying to generate some pseudo random numbers and I was getting some unexpected biased results. I was curious so I wrote a simple function. The expected behavior would be that a random number between 1 and 10 would be generated and printed out to screen a 100x's. The expected value of the average should be 5. However, when I run this function it will a generate a single random number between 1 and 10 and print it 100x's with the average being equal to the random number that was generated.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
float bs(){
float random;
srand(time(0));
random = rand() % 10 + 1;
return random;
}
int main(){
float average;
float random;
for (int i = 1; i < 101; ++i)
{
random += bs();
cout << random << endl;
}
average = random/100;
cout << average << endl;
return 0;
}
If the initial return from bs = 7 it will stay 7 for the duration of the loop and each time bs() is called. The output will be 7 added to itself 100x's and the average will be equal to gasp 7. What is going on here?
The seed should only be applied once. Move the
srand(time(0));
to main before the loop.

How do the conversions between signed, unsigned and float types work?

The compiler I use is g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4.
I compile my programs with the following command:
g++ -std=c++11 -pedantic -Wall program.cpp
The program no. 1.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54;
cout << b << endl;
return 0;
}
The program prints 4294967242 and this is the value I expected, because this is the case when we assign an out-of-range value to a variable of unsigned type, so the result is the remainder of a modulo division.
The program no. 2.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = 54.1234;
cout << b << endl;
return 0;
}
The program prints 54, and this is also OK, because the stored value is the part before the decimal point, and the franctional part is truncated.
The program no. 3.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54.1234;
cout << b << endl;
return 0;
}
Here during compilation I get the warning "overflow in implicit constant conversion".
And the program prints 0. Why is it so? I thought that it will do the truncation of the fractional part (as in program 2) and then store the result of the modulo division (as in program 1).
But if I write program no. 4.:
program no. 4.
#include <iostream>
using namespace std;
int main() {
unsigned int b;
float k = -54.1234;
b = k;
cout << b << endl;
return 0;
}
then I get no warning, and I get the result (expected by me) 4294967242, which is the result of the modulo division.
I would be grateful if somebody can explain it to me.
Why doesn't the program no. 3 behave like program no. 4? Why don't I get a warning when compiling program no. 1, but I get one when compiling program no. 3.?
According to the standard (§[conv.fpint]).
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.
So, your -54.1234 is truncated to -54. Since that can't be represented in an unsigned, you get undefined behavior.
When converting floating point numbers to integers, C and C++ round floating point numbers towards zero. The rounded result must then be representable in the destination type.
As a result, for 32 bit unsigned int the conversion is guaranteed to give the correct result if -1 < x < 2^32. For smaller numbers there are no guarantees. Since numbers between -1 and 0 must be rounded to zero, and numbers -1 and smaller have no requirements, it wouldn't be surprising if the compiler checks whether x < 0 and gives a result of 0 in that case. (The compiler might check whether x < 1 and give a result of 0; this handles very small positive numbers as well).

Set coefficients of an Eigen::Matrix according an arbitrary distribution

Eigen::Matrix has a setRandom() method which will set all coefficients of the matrix to random values. However, is there a built in way to set all the matrix coefficients to random values while specifying the distribution to use.
Is there a way to achieve something like the following:
Eigen::Matrix3f myMatrix;
std::tr1::mt19937 gen;
std::tr1::uniform_int<int> dist(0,MT_MAX);
myMatrix.setRandom(dist(gen));
You can do what you want using Boost and unaryExpr. The function you pass to unaryExpr needs to accept a dummy input which you can just ignore.
#include <boost/random.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>
#include <Eigen/Dense>
using namespace std;
using namespace boost;
using namespace Eigen;
double sample(double dummy)
{
static mt19937 rng;
static normal_distribution<> nd(3.0,1.0);
return nd(rng);
}
int main()
{
MatrixXd m =MatrixXd::Zero(2,3).unaryExpr(ptr_fun(sample));
cout << m << endl;
return 0;
}
If anyone is coming across this thread, I'm posting an easier answer that is possible nowadays and does not require boost. I found this in an old Eigen Bugzilla Report. All credits go to the author Gael Guennebaud for proposing the following simple method:
#include <Eigen/Sparse>
#include <iostream>
#include <random>
using namespace Eigen;
int main() {
std::default_random_engine generator;
std::poisson_distribution<int> distribution(4.1);
auto poisson = [&] (int) {return distribution(generator);};
RowVectorXi v = RowVectorXi::NullaryExpr(10, poisson );
std::cout << v << "\n";
}
Note that the signature with an int argument of the lambda function is required of Eigen NullaryExpr, despite not being used here in the example.
I had a problem with a similar problem and tried to solve it by using NullaryExpr. But a problem with NullaryExpr is that it cannot be vectorized explicitly. Thus, the solution with NullaryExpr runs quite slowly.
Because of this, I developed EigenRand, an add-on of random distribution for Eigen. I think it will help ones who want to generate random number fast and easily.
#include <Eigen/Dense>
#include <EigenRand/EigenRand>
#include <iostream>
using namespace Eigen;
int main() {
Rand::Vmt19937_64 generator;
// poisson distribution with rate = 4.1
MatrixXi v = Rand::poisson<MatrixXi>(4, 4, generator, 4.1);
std::cout << v << std::endl;
// normal distribution with mean = 3.0, stdev = 1.0
MatrixXf u = Rand::normal<MatrixXf>(4, 4, generator, 3.0, 1.0);
std::cout << u << std::endl;
return 0;
}
Apart the uniform distribution I am not aware of any other types of distribution that can be used directly on a matrix.
What you could do is to map the uniform distribution provided by Eigen directly to your custom distribution (if the mapping exists).
Suppose that your distribution is a sigmoid.
You can map an uniform distribution to the sigmoid distribution using the function y = a / ( b + c exp(x) ).
By temporary converting your matrix to array you can operate element-wise on all values of your matrix:
Matrix3f uniformM;
uniformM.setRandom();
Matrix3f sigmoidM;
sigmoidM.array() = a * ((0.5*uniformM+0.5).array().exp() * c + b).inv();

Resources