'_umul128': identifier not found

'_umul128': identifier not found - visual-studio-2010

In VS2010, I tested the following sample code of _umul128. The compile error occured:
"error C3861: '_umul128': identifier not found".
// umul128.c
// processor: IPF, x64
#include <stdio.h>
#include <intrin.h>
#pragma intrinsic(_umul128)
int main()
{
unsigned __int64 a = 0x0fffffffffffffffI64;
unsigned __int64 b = 0xf0000000I64;
unsigned __int64 c, d;
d = _umul128(a, b, &c);
printf_s("%#I64x * %#I64x = %#I64x%I64x\n", a, b, c, d);
}
The source code is from MSDN: http://msdn.microsoft.com/en-US/library/vstudio/3dayytw9(v=vs.100).aspx.
The OS is Windows8, 64-bit. Is it caused by compile environment? How to fix it?
Thanks in advance.

Related

CUDA: incomplete type is not allowed et al.

Code:
#include <cutil.h>
#include <cstdlib>
#include <cstdio>
#include <string.h>
#if defined(__APPLE__) || defined(MACOSX)
#include <GLUT/glut.h>
#else
#include <GL/glut.h>
#endif
#include <cuda_gl_interop.h>
#include "fluid_system_kern.cu"
extern "C"
{
// Compute number of blocks to create
int iDivUp (int a, int b) {
return (a % b != 0) ? (a / b + 1) : (a / b);
}
void computeNumBlocks (int numPnts, int minThreads, int &numBlocks, int &numThreads)
{
numThreads = min( minThreads, numPnts );
numBlocks = iDivUp ( numPnts, numThreads );
}
void Grid_InsertParticlesCUDA ( uchar* data, uint stride, uint numPoints )
{
int numThreads, numBlocks;
computeNumBlocks (numPoints, 256, numBlocks, numThreads);
// transfer point data to device
char* pntData;
size = numPoints * stride;
cudaMalloc( (void**) &pntData, size);
cudaMemcpy( pntData, data, size, cudaMemcpyHostToDevice);
// execute the kernel
insertParticles<<< numBlocks, numThreads >>> ( pntData, stride );
// transfer data back to host
cudaMemcpy( data, pntData, cudaMemcpyDeviceToHost);
// check if kernel invocation generated an error
CUT_CHECK_ERROR("Kernel execution failed");
CUDA_SAFE_CALL(cudaGLUnmapBufferObject(vboPos));
}
error:
src/fluid_system.cu(30): error : incomplete type is not allowed (points to line -> "void Grid_InsertParticleCUDA")
src/fluid_system.cu(30): error : identifier "uchar" is undefined (points to line -> "void Grid_InsertParticleCUDA")
src/fluid_system.cu(30): error : identifier "data" is undefined (points to line -> "void Grid_InsertParticleCUDA")
src/fluid_system.cu(30): error : expected a ")" (points to line -> "void Grid_InsertParticleCUDA")
src/fluid_system.cu(31): error : expected a ";" (points to line after line-> "void Grid_InsertParticleCUDA")
I don't understand what seems to be the problem. Since I dont see anything strange with that line. I use CUDA 4.2

As pointed out already, you have a few syntax errors.
uchar is not defined anywhere in your program. Either add a
defintion for it or change it to unsigned char which is the proper
C/C++ type.
You have an incorrect arrangement of curly-braces here:
extern "C"
{ // this opening curly-brace has no proper corresponding close-brace
// Compute number of blocks to create
int iDivUp (int a, int b) { // this open brace
return (a % b != 0) ? (a / b + 1) : (a / b);
} // ... closes here
// you should insert another closing curly-brace here }
void computeNumBlocks ...

Cuda error on compiling: identifier "cudamalloc" is undefined

I have a CUDA C code, when I try to compile it, nvcc gives me an error with an undefined identifier error: identifier "cudamalloc" is undefined, identifier "cudamemcpy" is undefined.
I'm running Windows 7 with Visual Studio 10 and CUDA Toolkit 4.0
I have installed Cuda on drive "C" and Visual Studio on drive "E" but im not sure that it is the problem.
I use this command to compile:
nvcc -o ej1b ej1b.cu
and this is my program:
#include <cuda.h>
#include <cstdio>
#include <cuda_runtime_api.h>
#include <device_functions.h>
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
const int N = 512;
const int C = 5;
void init_CPU_array(int vec[],const int N){
unsigned int i;
for(i = 0; i < N; i++) {
vec[i] = i;
}
}
__global__ void kernel(int vec[],const int N, const int C){
int id = blockIdx.x * blockDim.x + threadIdx.x;
if(id<N)
vec[id] = vec[id] * C;
}
int main(){
int vec[N];
int vecRES[N];
int *vecGPU;
unsigned int cantaloc=N*sizeof(int);
init_CPU_array(vec,N);
cudamalloc((void**)&vecGPU,cantaloc);
cudamemcpy(vecGPU,vec,cantaloc,cudaMemcpyHostToDevice);
dim3 dimBlock(64);
dim3 dimGrid((N + dimBlock.x - 1) / dimBlock.x);
printf("-> Variable dimBlock.x = %d\n",dimBlock.x);
kernel<<<dimGrid, dimBlock>>>(vecGPU, N, C);
cudaThreadSynchronize();
cudamemcpy(vecRES,vecGPU,cantaloc,cudaMemcpyDeviceToHost);
cudaFree(vecGPU);
printf("%s \n","-> Resultados");
int i;
for(i=0;i<10;i++){
printf("%d ",vecRES[i]);
printf("%d \n",vec[i]);
}
return 0;
I used all those #include because I don't know where the problem is.

If you read the documentation, you will find the API calls are cudaMalloc and cudaMemcpy. C and C++ are case sensitive languages and you have the names incorrect.

SSE2: _mm_mul_ps fails on OS X in case of GCC 4.2 and O0 optimization

I am trying to calculate squared Euclidean distance between two 4d float vectors using SSE2. My os is Mac OS X 10.7 Lion.
When I use Apple LLVM compiler in XCode 4.5.2 everything is fine. But when I switch into GCC 4.2 in project's settings I have error EXC_BAD_ACCESS at _mm_mul_ps operation.
When I compile code from command line (g++ main.cpp) without additional arguments I have "Segmentation fault". But when I enable any optimization level (O1, O2, O3, Os) except O0 everything works.
I can not reproduce this issue on my Ubuntu 12.04 with GCC 4.6.3.
#include <stdio.h>
#include <emmintrin.h>
typedef float SPPixel[4];
float sp_squared_color_diff(const SPPixel px1, const SPPixel px2) {
SPPixel d;
__m128 sse_px1 = _mm_load_ps(px1);
__m128 sse_px2 = _mm_load_ps(px2);
sse_px1 = _mm_sub_ps(sse_px1, sse_px2);
sse_px2 = _mm_mul_ps(sse_px1, sse_px1); // EXC_BAD_ACCESS
_mm_store_ps(d, sse_px2);
return d[0] + d[1] + d[2] + d[3];
}
int main(int argc, const char * argv[]) {
SPPixel a __attribute__ ((aligned (16))) = {1, 2, 3, 4};
SPPixel b __attribute__ ((aligned (16))) = {2, 4, 6, 8};
float result = sp_squared_color_diff(a, b);
printf("result = %f\n", result);
return 0;
}

The local variable d is misaligned. Fix the alignment in the typedef for SPPixel rather than having to remember it on every definition.
Change:
typedef float SPPixel[4];
to:
typedef float SPPixel[4] __attribute__ ((aligned(16)));
and then you can also remove the __attribute__ ((aligned(16))) qualifiers in main.

int&b = a; for gcc vs g++

I realised gcc and g++ handle differently for the following codes:
#include <stdio.h>
int main(void)
{
int a = 0;
int& b = a;
return 0;
}
gcc returns "parse error before &", while no error is returned by g++.
I once encountered an interview mentioned C and C++ compilers handles differently for int& b.

That's because & has no meaning in a C type declaration - in C++, it means the variable will be a reference, but those do not exist in C.
In other words, int& b = a; simply isn't valid C code.

CreateThread() error

#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream.h>
#include <string.h>
void Thread1( LPVOID param)
{
int a;
a = *((int *)param);
for (int i= 0; i <10; i++)
printf("%d\n", a);
}
int main()
{
int a =4;
int ThreadId;
CreateThread( 0, 0x0100, Thread1, &a, 0, &ThreadId);
for( int i = 0; i <11; i++)
Sleep( 1);
return( 1);
}
This is a simple code but I am not able to figure it out why visual studio is giving me error:
error C2664: 'CreateThread' : cannot convert parameter 3 from 'void (void *)' to 'unsigned long (__stdcall *)(void *)'
None of the functions with this name in scope match the target type
Error executing cl.exe.

define as following
DWORD WINAPI MyThreadProc(LPVOID lpParameter)
CreateThread() require __stdcall calling convention.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

'_umul128': identifier not found - visual-studio-2010

Related

CUDA: incomplete type is not allowed et al.

Cuda error on compiling: identifier "cudamalloc" is undefined

SSE2: _mm_mul_ps fails on OS X in case of GCC 4.2 and O0 optimization

int&b = a; for gcc vs g++

CreateThread() error

Categories

Resources