How to calculate comp_ellint_1(0) on a c++11 compiler - c++11

I'm sorry if this is a really stupid question, but I really need this for my master thesis, and I just can't find a way. I need to calculate the complete elliptical integral of first kind with eclipse 3.8. on an Ubuntu laptop. My compiler is set to -c -fmessage-length=0 -std=c++11.
As for the ubuntu version, it's
#laptop:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
and for the gcc compiler, it is
laptop:~$ gcc --version
gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I found under mathematical special functions that there is a function double comp_ellint_1( float arg ) that would do the job, but as I understand it it is only included in C++ 17, which I have not installed and where I can't find information about how to install it. But apparently there is a possibility to calculate the function without C++17? Because it says:
As all special functions, comp_ellint_1 is only guaranteed to be available in <cmath> if __STDCPP_MATH_SPEC_FUNCS__ is defined by the implementation to a value at least 201003L and if the user defines __STDCPP_WANT_MATH_SPEC_FUNCS__ before including any standard library headers.
But their example code
#define __STDCPP_WANT_MATH_SPEC_FUNCS__ 1
#include <cmath>
#include <iostream>
int main(){
double integral= std::comp_ellint_1(0);
return 0;
}
Does not work, the error being 15:22: error: ‘comp_ellint_1’ is not a member of ‘std’. I've also tried
#define _STDCPP_MATH_SPEC_FUNCS__201003L
#define __STDCPP_WANT_MATH_SPEC_FUNCS__ 1
#include <cmath>
#include <iostream>
int main(){
double integral= std::comp_ellint_1(0);
return 0;
}
which leads to the same error. It does not say if I need to install certain packages to make it work (if I do need any, which are they and how do I install them). Or am I making a different mistake?
I'd be super thankful for any ideas how to solve this, so thank you very much in advance!

Your gcc 4.8.5 had this function as std::tr1::comp_ellint_1.
You will need to #include <tr1/cmath>
This is mentioned in the cppreference page for its C++17 version

If it does not work or want to run on older versions also you can include boost. To do it at Visual Studio you should include:
#define BOOST_CONFIG_SUPPRESS_OUTDATED_MESSAGE
#include <boost/lambda/lambda.hpp>
#include <boost/math/special_functions/ellint_1.hpp>
#include <boost/math/special_functions/ellint_2.hpp>
#include <boost/math/special_functions/ellint_3.hpp>
Then:
using namespace boost::math;
double Kk = ellint_1(k);
double Ek1 = ellint_2(k) / (q - 4.*al);
To do that you should write a copy of the boost at hard disk, as example at C:\boost_1_66_0
Then by edit the project properties you should add following links:
C/C++ Directories->additional include directories: C:\boost_1_66_0
C/C++->Precompiled headers->Precompiled header-> Not use precompiled headers
Linker->general->Additional Library Directories->C:\boost_1_66_0\libs;
Another way is to include the following function that calculates both: first and second kind complete integrals. I tested it and worked well using an online tool and the ellint_1 and 2:
void Complete_Elliptic_Integrals(double x, double* Fk, double* Ek)
{
const double PI_2 = 1.5707963267948966192313216916397514; // pi/2
const double PI_4 = 0.7853981633974483096156608458198757; // pi/4
double k; // modulus
double m; // the parameter of the elliptic function m = modulus^2
double a; // arithmetic mean
double g; // geometric mean
double a_old; // previous arithmetic mean
double g_old; // previous geometric mean
double two_n; // power of 2
double sum;
if ( x == 0.0 ) { *Fk = M_PI_2; *Ek = M_PI_2; return; }
k = fabs(x);
m = k * k;
if ( m == 1.0 ) { *Fk = DBL_MAX; *Ek = 1.0; return; }
a = 1.0;
g = sqrt(1.0 - m);
two_n = 1.0;
sum = 2.0 - m;
for (int i=0;i<100;i++)
{
g_old = g;
a_old = a;
a = 0.5 * (g_old + a_old);
g = g_old * a_old;
two_n += two_n;
sum -= two_n * (a * a - g);
if ( fabs(a_old - g_old) <= (a_old * DBL_EPSILON) ) break;
g = sqrt(g);
}
*Fk = (double) (PI_2 / a);
*Ek = (double) ((PI_4 / a) * sum);
return;
}
Unfortunately it lasts double than executing ellint_1 and ellint_2

Related

how do i include sm_11_atomic_function.h? [duplicate]

I'm having a issue with my kernel.cu class
Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:
kernel.cu(17): error: identifier "atomicAdd" is undefined
My code:
#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd
__global__ void dot (int *a, int *b, int *c){
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
int sum = 0;
for( int i = 0; i<THREADS_PER_BLOCK; i++)
sum += temp[i];
atomicAdd(c, sum);
}
}
Some suggest?
You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:
nvcc -arch=sm_11 -v kernel.cu -o kernel.o
and see what happens.
EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.
Today with the latest cuda SDK and toolkit this solution will not work.
People also say that adding:
compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;
to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.
You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:
compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

Compiling GSL odeiv2 with g++

I'm attempting to compile the example code relating to the ODE solver, gsl/gsl_odeiv2, using g++. The code below is from their website :
http://www.gnu.org/software/gsl/manual/html_node/ODE-Example-programs.html
and compiles fine under gcc, but g++ throws the error
invalid conversion from 'void*' to 'int (*)(double, const double*, double*, double*,
void*)' [-fpermissive]
in the code :
#include <stdio.h>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_odeiv2.h>
int func (double t, const double y[], double f[], void *params)
{
double mu = *(double *)params;
f[0] = y[1];
f[1] = -y[0] - mu*y[1]*(y[0]*y[0] - 1);
return GSL_SUCCESS;
}
int * jac;
int main ()
{
double mu = 10;
gsl_odeiv2_system sys = {func, jac, 2, &mu};
gsl_odeiv2_driver * d = gsl_odeiv2_driver_alloc_y_new (&sys, gsl_odeiv2_step_rkf45, 1e-6, 1e-6, 0.0);
int i;
double t = 0.0, t1 = 100.0;
double y[2] = { 1.0, 0.0 };
for (i = 1; i <= 100; i++)
{
double ti = i * t1 / 100.0;
int status = gsl_odeiv2_driver_apply (d, &t, ti, y);
if (status != GSL_SUCCESS)
{
printf ("error, return value=%d\n", status);
break;
}
printf ("%.5e %.5e %.5e\n", t, y[0], y[1]);
}
gsl_odeiv2_driver_free (d);
return 0;
}
The error is given on the line
gsl_odeiv2_system sys = {func, jac, 2, &mu};
Any help in solving this issue would be fantastic. I'm hoping to include some stdlib elements, hence wanting to compile it as C++. Also, if I can get it to compile with g++-4.7, I could more easily multithread it using C++11's additions to the language. Thank you very much.
It looks like you have some problems with Jacobian. In your particular case you could just use NULL instead of jac in the definition of your system, i.e.
gsl_odeiv2_system sys = {func, NULL, 2, &mu};
In general you Jacobian must be a function with particular entries - see gsl manual - that is why your compiler is complaining.
Also, you may want to link the gsl library manually:
-L/usr/local/lib -lgsl
if you are on a linux system.

OpenCL in Xcode/OSX - Can't assign zero in kernel loop

I'm developing an accelerated component in OpenCL, using Xcode 4.5.1 and Grand Central Dispatch, guided by this tutorial.
The full kernel kept failing on the GPU, giving signal SIGABRT. I couldn't make much progress interpreting the error beyond that.
But I broke out aspects of the kernel to test, and I found something very peculiar involving assigning certain values to positions in an array within a loop.
Test scenario: give each thread a fixed range of array indices to initialize.
kernel void zero(size_t num_buckets, size_t positions_per_bucket, global int* array) {
size_t bucket_index = get_global_id(0);
if (bucket_index >= num_buckets) return;
for (size_t i = 0; i < positions_per_bucket; i++)
array[bucket_index * positions_per_bucket + i] = 0;
}
The above kernel fails. However, when I assign 1 instead of 0, the kernel succeeds (and my host code prints out the array of 1's). Based on a handful of tests on various integer values, I've only had problems with 0 and -1.
I've tried to outsmart the compiler with 1-1, (int) 0, etc, with no success. Passing zero in as a kernel argument worked though.
The assignment to zero does work outside of the context of a for loop:
array[bucket_index * positions_per_bucket] = 0;
The findings above were confirmed on two machines with different configurations. (OSX 10.7 + GeForce, OSX 10.8 + Radeon.) Furthermore, the kernel had no trouble when running on CL_DEVICE_TYPE_CPU -- it's just on the GPU.
Clearly, something ridiculous is happening, and it must be on my end, because "zero" can't be broken. Hopefully it's something simple. Thank you for your help.
Host code:
#include <stdio.h>
#include <OpenCL/OpenCL.h>
#include "zero.cl.h"
int main(int argc, const char* argv[]) {
dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
size_t num_buckets = 64;
size_t positions_per_bucket = 4;
cl_int* h_array = malloc(sizeof(cl_int) * num_buckets * positions_per_bucket);
cl_int* d_array = gcl_malloc(sizeof(cl_int) * num_buckets * positions_per_bucket, NULL, CL_MEM_WRITE_ONLY);
dispatch_sync(queue, ^{
cl_ndrange range = { 1, { 0 }, { num_buckets }, { 0 } };
zero_kernel(&range, num_buckets, positions_per_bucket, d_array);
gcl_memcpy(h_array, d_array, sizeof(cl_int) * num_buckets * positions_per_bucket);
});
for (size_t i = 0; i < num_buckets * positions_per_bucket; i++)
printf("%d ", h_array[i]);
printf("\n");
}
Refer to the OpenCL standard, section 6, paragraph 8 "Restrictions", bullet point k (emphasis mine):
6.8 k. Arguments to kernel functions in a program cannot be declared with the built-in scalar types bool, half, size_t, ptrdiff_t, intptr_t, and uintptr_t. [...]
The fact that your compiler even let you build the kernel at all indicates it is somewhat broken.
So you might want to fix that... but if that doesn't fix it, then it looks like a compiler bug, plain and simple (of CLC, that is, the OpenCL compiler, not your host code). There is no reason this kernel should work with any constant other than 0, -1. Did you try updating your OpenCL driver, what about trying on a different operating system (though I suppose this code is OS X only)?

glext visual studio cuda

I am currently in a parallel computing class using a book called Cuda by Example. In Chapter 4 of this book I am using some .h files that contain includes for "GL/glut.h" and "GL/glext.h", I have steps for installing GLUT online, and followed those. I think that this worked but I am not sure. I then tried to find directions for glext, but I cannot seem to find as much on this. I did find one .h file and tried to use that by including it in the GL folder as well. This does not seem to work because I received errors when compiling of things similar to this:
Error 1 error : calling a host function("cuComplex::cuComplex") from a device/_global_ function("julia") is not allowed C:\Users\Laptop\Documents\Visual Studio 2010\Projects\Lab1\Lab1\lab1.cu 29 1 Lab1
I think this is because I need more for glext.h, like .dll and things similar to the glut, but I am not sure. Any help with this would be appreciated. Thank You.
EDIT:- this is the code that I am using, and I have not changed it from what I see in the book, except for the top two include statements and the .h files are from google code: thank you for any help
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "book.h"
#include "cpu_bitmap.h"
#define DIM 1000
struct cuComplex {
float r;
float i;
cuComplex( float a, float b) : r(a), i(b) {}
__device__ float magnitude2(void) {
return r*r + i*i;
}
__device__ cuComplex operator* (const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
__device__ cuComplex operator+ (const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
__device__ int julia( int x, int y) {
const float scale = 1.5;
float jx = scale * (float)(DIM/2 -x)/(DIM/2);
float jy = scale * (float)(DIM/2 - y)/(DIM/2);
cuComplex c(-0.8, .156);
cuComplex a(jx, jy);
int i = 0;
for(i=0;i<200;i++) {
a = a * a + c;
if(a.magnitude2() > 1000)
return 0;
}
return 1;
}
__global__ void kernel(unsigned char *ptr ) {
//map from threadIdx/BlockIdx to pixel position
int x = blockIdx.x;
int y = blockIdx.y;
int offset = x + y * gridDim.x;
//now claculate the value at that position
int juliaValue = julia(x,y);
ptr[offset*4 + 0] = 255 * juliaValue;
ptr[offset*4 + 1] = 0;
ptr[offset*4 + 2] = 0;
ptr[offset*4 + 3] = 255;
}
int main( void ) {
CPUBitmap bitmap(DIM, DIM);
unsigned char *dev_bitmap;
HANDLE_ERROR(cudaMalloc((void**)&dev_bitmap, bitmap.image_size()));
dim3 grid(DIM,DIM);
kernel<<<grid,1>>>( dev_bitmap );
HANDLE_ERROR( cudaMemcpy( bitmap.get_ptr(), dev_bitmap, bitmap.image_size(), cudaMemcpyDeviceToHost));
bitmap.display_and_exit();
HANDLE_ERROR( cudaFree( dev_bitmap ));
}
try adding the following.
Original code:
cuComplex( float a, float b) : r(a), i(b) {}
Modified:
__host__ __device__ cuComplex( float a, float b ) : r(a), i(b) {}
It fixed the issue for me. I also didn't need the two include files you added, but you may depending on your build process.
A CUDA program consists of 2 types of code: host code and device code. Host code runs on the host CPU and cannot run on the GPU, and device code runs on the GPU and cannot run on the CPU. If you don't decorate your program in any way, then it will be all host code. But once you start adding CUDA sections delineated by keywords like __ global__ or __ device__ then your program will contain some device code.
The compiler error you received indicated that a function that was running on the device was attempting to use code compiled for the CPU. This is a no-no and the compiler will not allow this. This example is unusual since at some point in time (when the book was written) it presumably did not generate this error, and furthermore the code in cuComplex struct appears to be decorated with __ device__ keyword. However at the outermost level of the struct at the line of code I modified, there is no keyword identifying __ device__ . When I add the __ device__ __ host__ keywords, this tells the compiler "for this logical section, create both a device-compiled version and a host-compiled version of the code". This explicitly tells the compiler you want to be able to use this section of code in the device. And with that addition, we have steered the compiler correctly and it no longer gives the complaint.
Apparently something has changed about the level of decoration that the compiler needs to generate device code in this case. Presumably, with older compilers, the __ device__ keywords inside the struct were enough to let the compiler know that it had to generate device versions of the operators callable by cuComplex type.

GSL Uniform Random Number Generator

I want to use GSL's uniform random number generator. On their website, they include this sample code:
#include <stdio.h>
#include <gsl/gsl_rng.h>
int
main (void)
{
const gsl_rng_type * T;
gsl_rng * r;
int i, n = 10;
gsl_rng_env_setup();
T = gsl_rng_default;
r = gsl_rng_alloc (T);
for (i = 0; i < n; i++)
{
double u = gsl_rng_uniform (r);
printf ("%.5f\n", u);
}
gsl_rng_free (r);
return 0;
}
However, this does not rely on any seed and so, the same random numbers will be produced each time.
They also specify the following:
The generator itself can be changed using the environment variable GSL_RNG_TYPE. Here is the output of the program using a seed value of 123 and the multiple-recursive generator mrg,
$ GSL_RNG_SEED=123 GSL_RNG_TYPE=mrg ./a.out
But I don't understand how to implement this. Any ideas as to what modifications I can make to the above code to incorporate the seed?
The problem is that a new seed is not being generated. If you just want a function that returns a darn random number, and care nothing about the sticky details of how it's generated, try this. Assumes that you have the GSL installed.
#include <iostream>
#include <gsl/gsl_math.h>
#include <gsl/gsl_rng.h>
#include <sys/time.h>
float keithRandom() {
// Random number function based on the GNU Scientific Library
// Returns a random float between 0 and 1, exclusive; e.g., (0,1)
const gsl_rng_type * T;
gsl_rng * r;
gsl_rng_env_setup();
struct timeval tv; // Seed generation based on time
gettimeofday(&tv,0);
unsigned long mySeed = tv.tv_sec + tv.tv_usec;
T = gsl_rng_default; // Generator setup
r = gsl_rng_alloc (T);
gsl_rng_set(r, mySeed);
double u = gsl_rng_uniform(r); // Generate it!
gsl_rng_free (r);
return (float)u;
}
Read 18.6 Random number environment variables to see what that gsl_rng_env_setup() function is doing. It is getting a generator type and seed from environment variables.
Then see 18.3 Random number generator initialization - if you don't want to get the seed from an environment variable, you can use gsl_rng_set() to set the seed.
A complete answer to this question with a sample code can be seen in in this link.
Just for completeness I am putting a copy of the code for a function to create a seed here. It is written by Robert G. Brown: http://www.phy.duke.edu/~rgb/ .
#include <stdio.h>
#include <sys/time.h>
unsigned long int random_seed()
{
unsigned int seed;
struct timeval tv;
FILE *devrandom;
if ((devrandom = fopen("/dev/random","r")) == NULL) {
gettimeofday(&tv,0);
seed = tv.tv_sec + tv.tv_usec;
} else {
fread(&seed,sizeof(seed),1,devrandom);
fclose(devrandom);
}
return(seed);
}
But from my own experience with this function, I would say that the dev/random solution is very time consuming compared to the gettimeofday(), you can check it out. So, the gettimeofday() solution, might be better for you if its level of accuracy is enough:
#include <stdio.h>
#include <sys/time.h>
unsigned long int random_seed()
{
struct timeval tv;
gettimeofday(&tv,0);
return (tv.tv_sec + tv.tv_usec);
}

Resources