how to program a true cuda - parallel-processing

i'am taking a course in Advanced artificial intelligence,and i must learn CUDA programming,and i saw in many website cuda examples,but i'am confused about something.my professor told me that cuda is similar to c but no for loops for example
and in all the documentation i open from Nvidia and other website i only can see CUDA C example.so what i want to understand where to start programming CUDA not CUDA C or CUDA C++.
also is there's any reference which helps me to start from scratch.like how to declare variables.how to print on screen how the structure of the program.how to make a function how to make loop w.r.t CUDA not the for loop of the C ore C++ Language.

If you would have look harder, you would have found out that CUDA isn't a programming language (like C or C++). CUDA is the platform of parallel programming for the NVIDIA cards. CUDA C++ or CUDA C are alternatives for you, so you can use CUDA(platform of parallel programming for the NVIDIA cards). There is also pyCUDA for example, so you could take advantage of CUDA through python. The programming language is up to you.
If you just want to code CUDA through examples, there is plenty in the internet, and yes, you have to search for CUDA C or CUDA C++. But if you want to understand the platform, and also code, I recommend two books: the one with a lot of examples, and from NVIDIA: CUDA by Example and the other is Programming Massively Parallel Processors for a more detailed and well-explained book.

Related

How to compare OpenCL with native code performance properly?

Intel provides some advices on comparing OpenCL with native code here.
Are there any additional recommendations? I am particularly interested whether OpenCL is usually compared with straightforward C/C++ code or if optimizazions of the sequential code are also taken into account. What is the case with intrinsic functions?

Cuda Source to Source translation using Rose compiler

I would like to know about the extent of support for cuda in rose compiler. I am trying to build a source to source translator for cuda. Is it possible using Rose compiler? Which distribution of Rose compiler should I use?
I know this has been discussed earlier (support for cuda in rose compiler), but I cannot understand whether cuda support is there or not. Rose user manual does not have much information either.
Rose has a C++ front end and a Fortran front end that seem reasonably well integrated. The Rose system design IMHO is not amenable to easy integration of other front end parsers (such as you would need presumably to parse Cuda), although with enough effort you could do it. (Rose originally only had C++, and Fortran was grafted on).
If you don't see explicit mention of Cuda in the Rose manuals, its pretty like because it simply isn't there.
If you want to process Cuda using source to source transformations, you'll need both a Cuda parser and an appropriate set of transformation machinery something like what Rose has.
I cannot offer you a Cuda parser, but my company does provide industrial strength source-to-source program transformation systems in the form the DMS Software Reengineering Toolkit.
DMS has been used to carry out massive transformations on large C++ systems, so I think it quite reasonable to say it is at least as competent as Rose for that purpose. DMS has also been used to process extremely large C and Fortran systems, and other codes in Java, C#, ECMAScript, PHP, and many other languages, so I think it safe to say it is considerably easier to integrate a different front end into DMS.
Cuda, as I understand it, is a C99 derivative. DMS has a C front end, with explicit support for building various C dialects. Most of C99 is already built using the dialect mechanism. That might be a pretty good starting point.
You can try other tools such as ANTLR as an alternative, but I think it will soon become obvious that ANTLR, and Rose and DMS are in very different leagues in terms of their ability to parse, analyze and transform complex systems of real code.

GPU Programming?

I'm new to the GPU Programming world, I've tried reading on Wikipedia and Googling, but I still have several questions:
I downloaded some GPU Examples, for CUDA, there were some .cu files and some CPP files, but all the code was normal C/C++ Code just some weird functions like cudaMemcpyToSymbol and the rest was pure c code. The question is, is the .cu code compiled with nvcc and then linked with gcc? Or how is it programmed?
if I coded something to be run on GPU, will it run on ALL GPUs? or just CUDA? or is there a method to write for CUDA and a Method to write for ATI and a method to write for both?
To answer your second question:
OpenCL is the (only) way to go if you want to write platform independent GPGPU code.
ATIs website actually has a lot of resources for OpenCL if you search a little, and their example projects are very easy to modify into what you need, or just to understand the code.
The OpenCL spec and reference pages is also a very good source of knowledge:
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/
http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
There are a lot of talks that explain some of the core concepts, and also that explain how to write fast code that I would recommend (that is applicable to CUDA too).
To almost answer your first question:
In OpenCL, the code is compiled at runtime to the specific GPU you're using (to guarantee speed).
You probably want to do some background reading on CUDA - it's not something you can just pick up by looking at a few code samples. There are about 3 different CUDA books on Amazon now, and there is a lot of reference material at http://developer.nvidia.com.
To answer your questions:
yes, .cu files are compiled with nvcc to an intermediate form (PTX) - this is subsequently converted to GPU-specific code at run-time
the generated code will run on a subset of nVidia GPUs, the size of the subset depending on what CUDA capabilities you use in your code
completing the answer given by #nulvinge, I'd say that OpenCL its to GPU Programming like OpenGL is to GPU Rendering. But its not the only option for multi-architecture development, you could also use DirectCompute, but I wouldn't say that its the best option, just if you want your code running on every DirectX11 compatible GPUs, that includes some intel graphics cards chips too right?
But even if you are thinking in doing some GPU programming with OpenCL, do not forget to study the architecture of the platforms that you're using. ATI CPUs, GPUs and NVIDIA GPUs have big differences and your code is needed to be tuned for each platform that you're using if you want to get the most of it...
Fortunately both NVIDIA and AMD have Programming Guides to help you:)
In addition to previous answers, for CUDA you would need a NVIDIA card/GPU, unless you have access for a remote one, which I would recommend this course from Coursera:
Heterogeneous Parallel Programming
It not just gives an introduction to CUDA and OpenCL, memory model, tiling, handling boundary conditions and performance considerations, but also directive-based languages such as OpenACC, a high level language for expressing parallelism into your code, leaving mostly of the parallel programming work for the compiler (good to start with). Also, this course has a online platform where you can use their GPU, which is good to start GPU programming without concerning about software/hardware setup.
If you want to write a portable code which you can execute on different GPU devices and also on CPUs. You need to use OpenCL.
Actually, to configure your kernel you need to write a host code in C. The configuration file might be shorter if you want to write it for CUDA kernels comparing to OpenCL one.

GPU Programming

I want to do some GPU programming. What's the way to go here? I want to learn something that is "open" , cross platform and a "higher" language. I don't want to be lock into just GPU vendor nor OS, platform, etc.
What are my choices here? Cuda, OpenCL, OpenMP, other? What's the pros/cons for them?
What about G/HLSL and PhysX?
I'm looking at doing "general purpose" programming, some math, number crunching, simulations, etc. Maybe spit out some pretty graphics, but not specifically graphics programming.
The answer marked correct is now outdated and incorrect. In particular OpenMP 4.0 supports GPU acceleration.
OpenMP is cpu only, but easy to implement, CUDA is basically GPU only. Ati Stream supports both, but only on Ati/AMD gpu's. OpenCL is your only "open" option that supports both.
Nowadays - 2013/2014 - there is C++ Accelerated Massive Parallelism (AMP) of Microsoft. This is a high level language that compiles to High Level Shader Language (HLSL) so you do not have to write kernel code etc.
'How to learn' is found here (click!)
An introduction video is found here (click!)
A simple and easy to read comparison between OpenCL and C++ AMP is done by the AMD folks and is found here (click!).
The GPU support for openMP will be available in near future:
http://openmp.org/sc14/Booth-Sam-IBM.pdf
If you want to get involve with GPU and at higher level than OpenCL you might have a look to Matlab. There is a chance to program GPUs via Matlab and you do not need to learn lower models such as OpenCL and CUDA.
CUDA it will be more efficient as you probably are going to program a NVIDIA card. However, openCL is the standard for GPGPUs and the way to code is pretty similar. Although you might find not very difficult to use CUDA or openCL, you will really find much harder to optimize them.
I hope it helps.
Open CL is open but I've heard that a downside to this is the lack of documentation. ATI might be a better between NVIDIA and ATI since it was reportedly faster in 2009 but I'm not sure if those stats are still correct.

OCaml and Scheme for game development

This is a question more targeted towards language features and not coding.
Could you tell me which would be a better language (OCaml or Scheme??) to use for basic game development?
My knowledge with both scheme and OCaml is pretty basic and I find both equally challenging to work with and was unable to determine which would be a better one with respect to scalability and ease of use.
If any of you guys have extensive development experience with either of the 2 languages please give me your inputs.
Any inputs appreciated.
Thank you.
Both OCaml and Racket (PLT Scheme) have OpenGL bindings. It looks like Racket doesn't have SDL bindings however, which may or may not be important to you.
Racket uses a JIT compiler, OCaml can be compiled to native code or byte code (and there are a couple of JIT compilers for OCaml).
OCaml is faster than Racket for most of the benchmarks on Languages Benchmark Game.*
Personally I would choose OCaml. It can be compiled to native code, executes faster and has bindings to SDL (which provides input, sound and buffered 2D graphics, among other things).
Another option to consider is F# which is another ML dialect. F# can take advantage of the XNA framework. XNA will limit you to Windows however (from what I understand F# can only be used in dlls on the XBox; there are Mono implementations of XNA but I'm not sure how complete they are).
The benchmark game can only give you a rough idea of the relative efficiency of a language's implementation. A game is much more complex than the tests used by the benchmark game.

Resources