I'm trying to use CUDA with Qt Creator, Win7, and VS2012 compiler. I have experience with Qt on Windows, but have been unsuccessful setting up to integrate CUDA code into a Qt project. I've tried several posted solutions (such as Compiling Cuda code in Qt Creator on Windows), but have had no success. I finally decided to simplify and base my code on this blog post: https://cudaspace.wordpress.com/2012/07/05/qt-creator-cuda-linux-review/ but am still having issues.
Currently, I get the error "LNK1104: cannot open file 'obj\cuda_code.obj'"
My .pro file is:
QT += core
QT -= gui
TARGET = QtCuda
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
SOURCES += main.cpp \
cuda_code.cu
# project build directories
DESTDIR = $$PWD
OBJECTS_DIR = $$DESTDIR/obj
# C++ flags
QMAKE_CXXFLAGS_RELEASE =-O3
# Cuda sources
CUDA_SOURCES += cuda_code.cu
# Path to cuda toolkit install
CUDA_DIR = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v7.0"
# Path to header and libs files
INCLUDEPATH += $$CUDA_DIR/include
QMAKE_LIBDIR += $$CUDA_DIR/lib/x64
# libs used in your code
LIBS += -lcudart -lcuda
# GPU architecture
CUDA_ARCH = sm_50
# Here are some NVCC flags I've always used by default.
NVCCFLAGS = --compiler-options -use_fast_math
# Prepare the extra compiler configuration (taken from the nvidia forum - i'm not an expert in this part)
CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -O3 -arch=$$CUDA_ARCH -c $$NVCCFLAGS \
$$CUDA_INC $$LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT} \
2>&1 | sed -r \"s/\\(([0-9]+)\\)/:\\1/g\" 1>&2
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -O3 -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
cuda.input = $$CUDA_SOURCES
cuda.output = $$OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.obj
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda
My main.cpp
#include <QtCore/QCoreApplication>
#include <iostream>
using namespace std;
#include <cuda_runtime.h>
extern "C"
cudaError_t cuda_main();
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
cudaError_t cuerr = cuda_main();
if (cuerr != cudaSuccess) cout << "CUDA Error: " << cudaGetErrorString( cuerr ) << endl;
return a.exec();
}
My cuda file (cuda_code.cu):
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
extern "C"
cudaError_t cuda_main()
{
// generate 16M random numbers on the host
thrust::host_vector<int> h_vec(1 << 24);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer data to the device
thrust::device_vector<int> d_vec = h_vec;
// sort data on the device (805 Mkeys/sec on GeForce GTX 480)
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
return cudaGetLastError();
}
The OP was able to get a successful compile link by making the following changes:
1) In the .pro file, added
MSVCRT_LINK_FLAG_DEBUG = "/MDd"
MSVCRT_LINK_FLAG_RELEASE = "/MD"
along with (to the cuda.command statement)
-Xcompiler $$MSVCRT_LINK_FLAG_DEBUG -or- -Xcompiler $$MSVCRT_LINK_FLAG_RELEASE
as described in:
Compile cuda file error: "runtime library" mismatch value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in vectorAddition_cuda.o
2) Also had a very strange detail in the makefile that I had to fix manually. I hope that there is a real fix for this, but I haven't been able to figure it out.
At the top of the makefile, there are several definitions, including one for LIBS. After close inspection of this definition, I found that there was an extra set of quotation marks in the specification of library locations. Like this:
LIBS = /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64" ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64"\cuda.lib" ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64"\cudart.lib" /LIBPATH:C:\Qt\5.2.1\msvc2012_64_opengl\lib C:\Qt\5.2.1\msvc2012_64_opengl\lib\Qt5Cored.lib
If you look closely, you can see the extra set of quotation marks in the locations for cuda.lib and cudart.lib. I couldn't figure out what might be causing this (probably something in my .pro file), but if I manually removed the extra quotations, the compile/link worked. Here's the corrected line in the makefile:
LIBS = /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64\cuda.lib" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64\cudart.lib" /LIBPATH:C:\Qt\5.2.1\msvc2012_64_opengl\lib C:\Qt\5.2.1\msvc2012_64_opengl\lib\Qt5Cored.lib
I would sure like to be able to fix this in my .pro file so that these extra quotations didn't appear. Suggestions would be appreciated.
For reference, here's my latest .pro file.
QT += core
QT -= gui
TARGET = QtCuda
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
SOURCES += main.cpp \
cuda_code.cu
# project build directories
DESTDIR = $$PWD
OBJECTS_DIR = $$DESTDIR/obj
# C++ flags
QMAKE_CXXFLAGS_RELEASE =-O3
# Cuda sources
CUDA_SOURCES += cuda_code.cu
# Path to cuda toolkit install
CUDA_DIR = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v7.0"
# Path to header and libs files
INCLUDEPATH += $$CUDA_DIR/include
QMAKE_LIBDIR += $$CUDA_DIR/lib/x64
SYSTEM_TYPE = 64 # '32' or '64', depending on your system
# libs used in your code
LIBS += -lcuda -lcudart
# GPU architecture
CUDA_ARCH = sm_50
# Here are some NVCC flags I've always used by default.
NVCCFLAGS = --use_fast_math
# Prepare the extra compiler configuration (taken from the nvidia forum - i'm not an expert in this part)
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
# MSVCRT link option (static or dynamic, it must be the same with your Qt SDK link option)
MSVCRT_LINK_FLAG_DEBUG = "/MDd"
MSVCRT_LINK_FLAG_RELEASE = "/MD"
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda
# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$OBJECTS_DIR/${QMAKE_FILE_BASE}.obj
cuda_d.commands = $$CUDA_DIR/bin/nvcc.exe -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$LIBS --machine $$SYSTEM_TYPE \
-arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME} -Xcompiler $$MSVCRT_LINK_FLAG_DEBUG
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}.obj
cuda.commands = $$CUDA_DIR/bin/nvcc.exe $$NVCC_OPTIONS $$CUDA_INC $$LIBS --machine $$SYSTEM_TYPE \
-arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME} -Xcompiler $$MSVCRT_LINK_FLAG_RELEASE
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}
[Note: this answer has been created from an edit to the question which included the solution. It has been added as a community wiki entry to get the question off the unanswered list for the CUDA tag]
Related
I am trying to link guile to an Rcpp file. It seems like things compile but there is an error when loading:
sourceCpp("test_2.cpp", rebuild = TRUE, showOutput = TRUE)
/usr/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so' 'test_2.cpp'
g++-10 -I"/usr/share/R/include" -DNDEBUG -I"/home/matias/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include" -I"/home/matias/Documentos/Program/R/guile" -fpic -O3 -march=native -mtune=native -fPIC -pthread -I"/usr/include/guile/3.0" -c test_2.cpp -o test_2.o
g++-10 -shared -L/usr/lib/R/lib -lm -ldl -lgmpxx -lgmp -lmpfr -lmpc -lguile-3.0 -lgc -o sourceCpp_2.so test_2.o -L/usr/lib/R/lib -lR
Error in dyn.load("/tmp/Rtmpm2flY8/sourceCpp-x86_64-pc-linux-gnu-1.0.5/sourcecpp_29e2d33505085/sourceCpp_2.so") :
unable to load shared object '/tmp/Rtmpm2flY8/sourceCpp-x86_64-pc-linux-gnu-1.0.5/sourcecpp_29e2d33505085/sourceCpp_2.so':
/tmp/Rtmpm2flY8/sourceCpp-x86_64-pc-linux-gnu-1.0.5/sourcecpp_29e2d33505085/sourceCpp_2.so: undefined symbol: scm_init_guile
The linking works fine if I remove the Rcpp header and build directly with g++ instead.
My Makevars look like this:
CXX = g++-10
CXXFLAGS = -O3 -march=native -mtune=native -fPIC -pthread -I"/usr/include/guile/3.0"
CXXSTD = -std=c++11
LDFLAGS = -lm -ldl -lgmpxx -lgmp -lmpfr -lmpc -lguile-3.0 -lgc
The .cpp file:
#include <Rcpp.h>
#include <stdio.h>
#include <libguile.h>
using namespace Rcpp;
// [[Rcpp::export]]
int test_guile() {
SCM func, func2;
scm_init_guile();
scm_c_primitive_load("script.scm");
func = scm_variable_ref(scm_c_lookup("simple-func"));
func2 = scm_variable_ref(scm_c_lookup("quick-test"));
scm_call_0(func);
scm_call_0(func2);
return 0;
}
You are so, so close. You essentially solved this. I just took your file, made a small modification of making the script an argument and (as you didn't post script.scm) commented out the content-specific stuff. We still load it though:
#include <Rcpp.h>
#include <stdio.h>
#include <libguile.h>
using namespace Rcpp;
// [[Rcpp::export]]
int test_guile(std::string file) {
SCM func, func2;
scm_init_guile();
scm_c_primitive_load(file.c_str());
//func = scm_variable_ref(scm_c_lookup("simple-func"));
//func2 = scm_variable_ref(scm_c_lookup("quick-test"));
//scm_call_0(func);
//scm_call_0(func2);
return 0;
}
Similarly I just added a src/Makevars to the Rcpp.package.skeleton() created file. This is not good enough to ship as you need some minimal configure or alike logic to get these values from guile-config-3.0 or alike. But it passes the litmus test. C++11 is the default already under R 4.0.*, and the compiler is recent on my box anyway so we just have this (after removing a few GNU GMP and related parts we do not need):
PKG_CXXFLAGS = -I"/usr/include/guile/3.0"
PKG_LIBS = -lguile-3.0 -lgc
This now builds, installs, and runs just fine:
> file <- system.file("guile", "script.scm", package="RcppGuile")
> RcppGuile::test_guile(file)
[1] 0
>
For reference, I committed and pushed the entire example package here. If you provide a pointer to script.scm we can add that too.
Edit: A few seconds of googling leads to the script.scm you may have used so now we have a fully working example with a working embedded Guile interpreter:
> library(RcppGuile)
> test_guile(system.file("guile", "script.scm", package="RcppGuile"))
Script called, now I can change this
Adding another function, can modify without recompilation
Called this, without recompiling the C code
[1] 0
>
I'm having problems when trying to integrate Qt with CUDA. I am running on a 64Bit Mac with the 64Bit CUDA toolkit installed, however when I try to build my code the error ld: file not found: #rpath/CUDA.framework/Versions/A/CUDA for architecture x86_64 is thrown.
I have verified all my paths but the same error is consistently thrown. My .pro configuration code is as follows:
QT += core gui
QT += multimedia
QT += multimediawidgets
QT += concurrent
greaterThan(QT_MAJOR_VERSION, 4): QT += widgets
TARGET = WebcamFilter
TEMPLATE = app
SOURCES += main.cpp\
mainwindow.cpp \
camerafeed.cpp \
HEADERS += mainwindow.h \
camerafeed.h
FORMS += mainwindow.ui
# CUDA Resources
CUDA_SOURCES += gaussian.cu
CUDA_DIR = /usr/local/cuda
# Path to header and lib files
INCLUDEPATH += $$CUDA_DIR/include
QMAKE_LIBDIR += $$CUDA_DIR/lib
# Libs used for source code
LIBS += -lcudart -lcuda
# GPU Architecture
CUDA_ARCH = sm_20
# Custom flags for nvcc
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v
# Prepare extra compiler configuration
CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -O3 -arch=$$CUDA_ARCH -c $$NVCCFLAGS \
$$CUDA_INC $$LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT} \
2>&1 | sed -r \"s/\\(([0-9]+)\\)/:\\1/g\" 1>&2
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -O3 -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda
I came across this problem a few months ago (plus some other issues after this was fixed) so I figured I'd just post a fully working QT/CUDA example now that I have it mostly figured out. I pulled most of the .pro file from a larger project for both Linux and Mac (CUDA stuff is in the gpu folder) but this bit of code has only been tested on OS X.
I'm currently using:
CUDA 7.0 driver V7.0.27
OS X Yosemite 10.10.3
QT 5.3.1
If you haven't updated recently make sure the CUDA deviceQuery and bandwidthTest samples are still working before trying this code.
The .pro file below might be all you need to solve your problems but the C++ code is below as well. The code comments do most of the explaining.
qtcuda.pro
#-------------------------------------------------
#
# Project created by QtCreator 2015-05-02T02:37:39
#
#-------------------------------------------------
QT += core gui
greaterThan(QT_MAJOR_VERSION, 4): QT += widgets
TARGET = qtcuda
TEMPLATE = app
# project build directories (if not using shadow build)
DESTDIR = $$system(pwd)
BUILDDIR = $$DESTDIR/build
MOC_DIR = $$BUILDDIR # moc_... files
UI_DIR = $$BUILDDIR # ui_mainwindow.cpp
OBJECTS_DIR = $$BUILDDIR/bin # .o binary files
SOURCES += main.cpp\
mainwindow.cpp
HEADERS += mainwindow.h
FORMS += mainwindow.ui
# NOTE: C++ flags are needed here for
# the CUDA Thrust library
############### UNIX FLAGS #####################
unix {
QMAKE_CXXFLAGS += -std=c++11
}
############### MAC FLAGS #####################
macx {
# libs that don't get passed to nvcc (we'll remove them from LIBS later)
NON_CUDA_LIBS += -stdlib=libc++
LIBS += $$NON_CUDA_LIBS
QMAKE_CXXFLAGS += -stdlib=libc++ -mmacosx-version-min=10.7
QMAKE_LFLAGS += -mmacosx-version-min=10.7
QMAKE_MACOSX_DEPLOYMENT_TARGET = 10.7
# specific to computers without older sdks
MAC_SDK = /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9. sdk/
if( exists( $$MAC_SDK) ) {
QMAKE_MAC_SDK = macosx10.9 # lowest sdk on my computer :/
}
# don't show warnings for c++11 extentions
QMAKE_CXXFLAGS += -Wno-c++11-extensions
}
################### CUDA ###################### (similar to your setup)
unix:!macx {
SED_STUFF = 2>&1 | sed -r \"s/\\(([0-9]+)\\)/:\\1/g\" 1>&2
}
macx {
SED_STUFF = 2>&1 | sed -E \"s/\\(([0-9]+)\\)/:\\1/g\" 1>&2
}
CUDA_DIR = /usr/local/cuda
# make sure cuda is available on the computer
if ( exists( $$CUDA_DIR/ ) ) {
message( "Configuring for cuda...");
DEFINES += CUDA_7 # # same as putting this in code -> #define CUDA_7
# Cuda sources
CUDA_SOURCES += cuda/wrappers.cu
# show files in working tree
OTHER_FILES += cuda/wrappers.cu \
cuda/wrappers.cuh \
cuda/helper_cuda.h
# Path to cuda install
CUDA_LIB = $$CUDA_DIR/lib
# Pather to header and lib files
INCLUDEPATH += $$CUDA_DIR/include \
cuda # my cuda files
QMAKE_LIBDIR += $$CUDA_LIB
# prevents warnings from code we didn't write
QMAKE_CXXFLAGS += -isystem $$CUDA_DIR/include
LIBS += -lcudart # add other cuda libs here (-lcublas -lcurand, etc.)
# SPECIFY THE R PATH FOR NVCC!!!!! (your problem...previously my problem)
QMAKE_LFLAGS += -Wl,-rpath,$$CUDA_LIB
NVCCFLAGS = -Xlinker -rpath,$$CUDA_LIB
# libs used in the code
CUDA_LIBS = $$LIBS
CUDA_LIBS -= $$NON_CUDA_LIBS # remove libs nvcc won't recognize
# GPU architecture (might be a way to detect this somehow instead of hardcoding)
CUDA_ARCH = sm_20 # <- based on specs from your code. This was tested with sm_30
# Some default NVCC flags
NVCCFLAGS += --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v --std=c++11
# Prepare the extra compiler configuration (taken from the nvidia forum)
CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -O3 -arch=$$CUDA_ARCH -c $$NVCCFLAGS \
$$CUDA_INC $$CUDA_LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT} \
$$SED_STUFF
# nvcc error printout format ever so slightly different from gcc
# http://forums.nvidia.com/index.php?showtopic=171651
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -O3 -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda
} # endif CUDA
The following two files are composed of extern functions used to execute CUDA code. The .cu file defines functions that contain CUDA code and gets compiled with NVCC (as specified in the .pro file). The .cuh file is used as a header file and simply declares the same functions so they can be referenced by C++ files. Only wrappers.cuh needs to be included in the C++ code.
Note: The referenced helper_cuda.h file can be found here
NoteNote: This project assumes wrappers.cuh, wrappers.cu, and helper_cuda.h are kept in a folder labeled cuda within the project directory.
cuda/wrappers.cuh
#ifndef WRAPPERS_CUH
#define WRAPPERS_CUH
typedef unsigned int uint;
extern "C"
{
void cudaInit();
void allocateArray(void **devPtr, int size);
void freeArray(void *devPtr);
void copyArrayToDevice(void *device, const void *host, int offset, int size);
void copyArrayFromDevice(void *host, const void *device, int size);
uint sumNumbers(uint *dNumbers, uint n);
// not used here but useful when calling kernel functions
void computeGridSize(uint n, uint blockSize, uint &numBlocks, uint &numThreads);
}
#endif // WRAPPERS_CUH
cuda/wrappers.cu
#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <thrust/device_ptr.h>
#include <thrust/reduce.h>
#include "helper_cuda.h"
typedef unsigned int uint;
extern "C"
{
void cudaInit()
{
int devID;
// use device with highest Gflops/s
devID = findCudaDevice();
if (devID < 0)
{
printf("No CUDA Capable devices found, exiting...\n");
exit(EXIT_SUCCESS);
}
}
void allocateArray(void **devPtr, size_t size)
{
checkCudaErrors(cudaMalloc(devPtr, size));
}
void freeArray(void *devPtr)
{
checkCudaErrors(cudaFree(devPtr));
}
void copyArrayToDevice(void *device, const void *host, int offset, int size)
{
checkCudaErrors(cudaMemcpy((char *) device + offset, host, size, cudaMemcpyHostToDevice));
}
void copyArrayFromDevice(void *host, const void *device, int size)
{
checkCudaErrors(cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost));
}
uint sumNumbers(uint *dNumbers, uint n)
{
// simple reduction from 1 to n
thrust::device_ptr<uint> dp_numbers(dNumbers);
return thrust::reduce(dp_numbers, dp_numbers + n);
}
//Round a / b to nearest higher integer value
uint iDivUp(uint a, uint b)
{
return (a % b != 0) ? (a / b + 1) : (a / b);
}
// compute grid and thread block size for a given number of elements
void computeGridSize(uint n, uint blockSize, uint &numBlocks, uint &numThreads)
{
numThreads = min(blockSize, n);
numBlocks = iDivUp(n, numThreads);
}
}
The next three files create a simple QT window and check for mouse events. Every time the mouse is moved the X and Y pixel positions are added together to create n. Then a CUDA function is used to find 1 + 2 + ... + n (yes this is weird and random; the point was to show CUDA running in a quick and easy way).
So if the mouse is at (23, 45) then:
n = (23 + 45) = 68 and
1 + 2 + ... + n = 2346
This is then displayed at the bottom of the window.
main.cpp
#include "mainwindow.h"
#include <QApplication>
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
MainWindow w;
w.show();
return a.exec();
}
mainwindow.h
#ifndef MAINWINDOW_H
#define MAINWINDOW_H
#include <QMainWindow>
namespace Ui {
class MainWindow;
}
class MainWindow : public QMainWindow
{
Q_OBJECT
public:
explicit MainWindow(QWidget *parent = 0);
~MainWindow();
// events are passed here
virtual bool eventFilter(QObject *obj, QEvent *event);
private:
Ui::MainWindow *ui;
uint *m_dNumbers; // device array
};
#endif // MAINWINDOW_H
mainwindow.cpp
#include "mainwindow.h"
#include "ui_mainwindow.h"
#include <QEvent>
#include <QMouseEvent>
#include <assert.h>
#include "wrappers.cuh"
const uint MAX_NUMBERS = 5000;
MainWindow::MainWindow(QWidget *parent) :
QMainWindow(parent),
ui(new Ui::MainWindow)
{
// basic ui setup and event filter for mouse movements
ui->setupUi(this);
qApp->installEventFilter(this);
// create a host array and initialize it to {1, 2, 3, ..., MAX_NUMBERS}
uint hNumbers[MAX_NUMBERS];
for (uint i = 0; i < MAX_NUMBERS; i++)
{
hNumbers[i] = i + 1;
}
// CUDA FUNCTIONS:
cudaInit(); // initialiaze the cuda device
allocateArray((void**)&m_dNumbers, MAX_NUMBERS*sizeof(int)); // allocate device array
copyArrayToDevice(m_dNumbers, hNumbers, 0, MAX_NUMBERS*sizeof(int)); // copy host array to device array
}
MainWindow::~MainWindow()
{
// CUDA FUNCTION: free device memory
freeArray(m_dNumbers);
delete ui;
}
// used to detect mouse movement events
bool MainWindow::eventFilter(QObject *, QEvent *event)
{
if (event->type() == QEvent::MouseMove)
{
// find mouseX + mouseY
QMouseEvent *mouseEvent = static_cast<QMouseEvent*>(event);
QPoint p = mouseEvent->pos();
uint n = std::min((uint)(p.x() + p.y()), MAX_NUMBERS);
// CUDA FUNCTION:
// compute the sum of 1 + 2 + 3 + ... + n
uint sum = sumNumbers(m_dNumbers, n);
// check that the sum is correct
assert(sum == ( (n * (n+1) ) / 2 ) );
// show the sum at the bottom of the window
statusBar()->showMessage(QString("Mouse pos: (%1, %2) Sum from 0 to %3 = %4").arg(p.x()).arg(p.y()). arg(n).arg(sum));
}
return false;
}
And last but not least the .ui file if you want to actually build and run the project:
mainwindow.ui
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>MainWindow</class>
<widget class="QMainWindow" name="MainWindow">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>400</width>
<height>300</height>
</rect>
</property>
<property name="windowTitle">
<string>MainWindow</string>
</property>
<widget class="QWidget" name="centralWidget"/>
<widget class="QMenuBar" name="menuBar">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>400</width>
<height>22</height>
</rect>
</property>
</widget>
<widget class="QToolBar" name="mainToolBar">
<attribute name="toolBarArea">
<enum>TopToolBarArea</enum>
</attribute>
<attribute name="toolBarBreak">
<bool>false</bool>
</attribute>
</widget>
<widget class="QStatusBar" name="statusBar"/>
</widget>
<layoutdefault spacing="6" margin="11"/>
<resources/>
<connections/>
</ui>
I know the QT/CUDA process can be annoying and it's been half a year of silence since you asked the question but hopefully this helps.
There are a number of questions already on this subject, but despite that and the help on SourceForge, I cannot generate a .gcno or .gcda file.
sample question
2nd question
My make file compiles and runs my unit tests, but does not generate any output files. Is there something obviously wrong here? Commented out lines are things I have tried before.
CPP_PLATFORM = Gcc
#CPP_PLATFORM = Clang
#CPPUTEST_CPPFLAGS += -DSUPPRESS_PRINTING
#CPPUTEST_CPPFLAGS += -fprofile-arcs
#CPPUTEST_CPPFLAGS += -ftest-coverage
#GCOVFLAGS = -fprofile-arcs -ftest-coverage
#CPPUTEST_LDFLAGS += -lssl
#CPPUTEST_LDFLAGS += -lcrypto
#CPPUTEST_LDFLAGS += -fprofile-arcs
CPPUTEST_CPPFLAGS = -DSUPPRESS_PRINTING
CPPUTEST_CPPFLAGS = -fprofile-arcs:$(CPPUTEST_CPPFLAGS)
CPPUTEST_CPPFLAGS = -ftest-coverage:$(CPPUTEST_CPPFLAGS)
CPPUTEST_LDFLAGS = -lssl
CPPUTEST_LDFLAGS = -lcrypto:$(CPPUTEST_LDFLAGS)
CPPUTEST_LDFLAGS = -fprofile-arcs:$(CPPUTEST_LDFLAGS)
CPPUTEST_CPPFLAGS += -g -O0 --coverage
#CPPUTEST_CPPFLAGS += -fprofile-arcs
#CPPUTEST_CPPFLAGS += -ftest-coverage
CPPUTEST_LDFLAGS += -lprofile_rt
Either the -g -O0 --coverage or -fprofile-arcs -ftest-coverage along with the link flag do the trick. The key was the link flag.
I'm trying to run the following simple example from Xcode4:
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <iostream>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
std::cout << "I am process " << world.rank() << " of " << world.size()
<< "." << std::endl;
return 0;
}
I've added libboost_mpi and libboost_serialization to Xcode, and compiling using the default LLVM returns :
/usr/local/include/boost/mpi/communicator.hpp:1329:9: error: call to
implicitly-deleted copy constructor of 'boost::mpi::communicator'
: comm(comm), source(source), tag(tag), ia(comm), value(value)
^ ~~~~
However, I can compile and run using
mpic++ -I/usr/local/include main.cpp -L/usr/local/lib
-lboost_mpi -lboost_serialization
Although mpic++ seems to be calling through to LLVM:
$ mpic++
i686-apple-darwin11-llvm-g++-4.2: no input files
Anyways, I tried adding mpic++ as a compiler option in Xcode 4. I can run
$ sudo opensnoop -n Xcode | grep mpicc.xcspec
and see that the spec file is being loaded by Xcode, but I don't see any MPICC option. My spec file is fairly simple:
/**
Xcode Compiler Specification for MPICC
*/
{ Type = Compiler;
Identifier = com.apple.compilers.mpicc;
BasedOn = com.apple.compilers.gcc.4_2;
Name = “MPICC”;
Version = “Default”;
Description = “MPI GNU C/C++ Compiler 4.0″;
ExecPath = “/usr/local/bin/mpicc”;
PrecompStyle = pch;
}
and it's stored in
/Applications/Xcode.app/Contents/PlugIns/Xcode3Core.ideplugin/Contents/SharedSupport/Developer/Library/Xcode/Plug-ins/LLVM GCC 4.2.xcplugin/Contents/Resources/mpicc.xcspec
So this works:
link binary with:
libmpi_cxx.dylib
libmpi.dylib
libboost_mpi.dylib
libboost_serialization.dylib
Change compiler (under build options) to LLVM GCC 4.2 (hinted at by running mpic++ directly, which reports that it's using llvm gcc 4.2 internally)
Under targets, build phases, compile sources, add the compiler option "-lm" to report that you need to link with libm. Credit to #pyCthon for pointing out mpic++ --showme:link which revealed the final library that was allowing it to build successfully from the command line
I have been trying for days to get a Qt project file running on a 32-bit Windows 7 system, in which I want/need to include Cuda code. This combination of things is either so simple that no one ever bothered to put an example online, or so difficult that nobody ever succeeded, it seems. Whatever way, the only helpful forum threads I found were the same issue on Linux or Mac, or with Visual Studio on a Windows.
All of these give all sorts of different errors, however, whether due to linking or clashing libraries, or spaces in file names or non-existing folders in the Windows version of the Cuda SDK.
Is there someone who has a clear .pro file to offer that does the trick?
I am aiming to compile a simple programme with ordinary C++ code in Qt style, with Qt 4.8 libraries, which reference several Cuda modules in .cu files. Something of the form:
TestCUDA \
TestCUDA.pro
main.cpp
test.cu
So I finally managed to assemble a .pro file that works on my and probably on all Windows systems. The following is an easy test programme that should probably do the trick. The following is a small project file plus test programme that works at least on my system.
The file system looks as follows:
TestCUDA \
TestCUDA.pro
main.cpp
vectorAddition.cu
The project file reads:
TARGET = TestCUDA
# Define output directories
DESTDIR = release
OBJECTS_DIR = release/obj
CUDA_OBJECTS_DIR = release/cuda
# Source files
SOURCES += src/main.cpp
# This makes the .cu files appear in your project
OTHER_FILES += vectorAddition.cu
# CUDA settings <-- may change depending on your system
CUDA_SOURCES += src/cuda/vectorAddition.cu
CUDA_SDK = "C:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.2/C" # Path to cuda SDK install
CUDA_DIR = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.2" # Path to cuda toolkit install
SYSTEM_NAME = Win32 # Depending on your system either 'Win32', 'x64', or 'Win64'
SYSTEM_TYPE = 32 # '32' or '64', depending on your system
CUDA_ARCH = sm_11 # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math
# include paths
INCLUDEPATH += $$CUDA_DIR/include \
$$CUDA_SDK/common/inc/ \
$$CUDA_SDK/../shared/inc/
# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/$$SYSTEM_NAME \
$$CUDA_SDK/common/lib/$$SYSTEM_NAME \
$$CUDA_SDK/../shared/lib/$$SYSTEM_NAME
# Add the necessary libraries
LIBS += -lcuda -lcudart
# The following library conflicts with something in Cuda
QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib
QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib
# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc.exe -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc.exe $$NVCC_OPTIONS $$CUDA_INC $$LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}
Note the QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib: it took me a long time to figure out, but this library seems to clash with other things in Cuda, which produces strange linking warnings and errors. If someone has an explanation for this, and potentially a prettier way to get around this, I'd like to hear it.
Also, since Windows file paths often include spaces (and NVIDIA's SDK by default does so too), it is necessary to artificially add quotation marks around the include paths. Again, if someone knows a more elegant way of solving this problem, I'd be interested to know.
The main.cpp file looks like this:
#include <cuda.h>
#include <builtin_types.h>
#include <drvapi_error_string.h>
#include <QtCore/QCoreApplication>
#include <QDebug>
// Forward declare the function in the .cu file
void vectorAddition(const float* a, const float* b, float* c, int n);
void printArray(const float* a, const unsigned int n) {
QString s = "(";
unsigned int ii;
for (ii = 0; ii < n - 1; ++ii)
s.append(QString::number(a[ii])).append(", ");
s.append(QString::number(a[ii])).append(")");
qDebug() << s;
}
int main(int argc, char* argv [])
{
QCoreApplication(argc, argv);
int deviceCount = 0;
int cudaDevice = 0;
char cudaDeviceName [100];
unsigned int N = 50;
float *a, *b, *c;
cuInit(0);
cuDeviceGetCount(&deviceCount);
cuDeviceGet(&cudaDevice, 0);
cuDeviceGetName(cudaDeviceName, 100, cudaDevice);
qDebug() << "Number of devices: " << deviceCount;
qDebug() << "Device name:" << cudaDeviceName;
a = new float [N]; b = new float [N]; c = new float [N];
for (unsigned int ii = 0; ii < N; ++ii) {
a[ii] = qrand();
b[ii] = qrand();
}
// This is the function call in which the kernel is called
vectorAddition(a, b, c, N);
qDebug() << "input a:"; printArray(a, N);
qDebug() << "input b:"; printArray(b, N);
qDebug() << "output c:"; printArray(c, N);
if (a) delete a;
if (b) delete b;
if (c) delete c;
}
The Cuda file vectorAddition.cu, which describes a simple vector addition, look like this:
#include <cuda.h>
#include <builtin_types.h>
extern "C"
__global__ void vectorAdditionCUDA(const float* a, const float* b, float* c, int n)
{
int ii = blockDim.x * blockIdx.x + threadIdx.x;
if (ii < n)
c[ii] = a[ii] + b[ii];
}
void vectorAddition(const float* a, const float* b, float* c, int n) {
float *a_cuda, *b_cuda, *c_cuda;
unsigned int nBytes = sizeof(float) * n;
int threadsPerBlock = 256;
int blocksPerGrid = (n + threadsPerBlock - 1) / threadsPerBlock;
// allocate and copy memory into the device
cudaMalloc((void **)& a_cuda, nBytes);
cudaMalloc((void **)& b_cuda, nBytes);
cudaMalloc((void **)& c_cuda, nBytes);
cudaMemcpy(a_cuda, a, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_cuda, b, nBytes, cudaMemcpyHostToDevice);
vectorAdditionCUDA<<<blocksPerGrid, threadsPerBlock>>>(a_cuda, b_cuda, c_cuda, n);
// load the answer back into the host
cudaMemcpy(c, c_cuda, nBytes, cudaMemcpyDeviceToHost);
cudaFree(a_cuda);
cudaFree(b_cuda);
cudaFree(c_cuda);
}
If you get this to work, then more complicated examples are self-evident, I think.
Edit (24-1-2013): I added the QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib and the CONFIG(debug) with the extra D_DEBUG flag, such that it also compiles in debug mode.
Using msvc 2010 I found that the linker does not accept the -l parameter, however nvcc needs it. Therefore I made a simple change in the .pro file:
# Add the necessary libraries
CUDA_LIBS = cuda cudart
# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
# LIBRARIES IN FORMAT NEEDED BY NVCC
NVCC_LIBS = $$join(CUDA_LIBS,' -l','-l', '')
# LIBRARIES IN FORMAT NEEDED BY VISUAL C++ LINKER
LIBS += $$join(CUDA_LIBS,'.lib ', '', '.lib')
And the nvcc command (release version):
cuda.commands = $$CUDA_DIR/bin/nvcc.exe $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
$$NVCC_LIBS was inserted instead of $$LIBS.
The whole .pro file, which works for me:
QT += core
QT -= gui
TARGET = TestCUDA
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
# Define output directories
DESTDIR = release
OBJECTS_DIR = release/obj
CUDA_OBJECTS_DIR = release/cuda
# Source files
SOURCES += main.cpp
# This makes the .cu files appear in your project
OTHER_FILES += vectorAddition.cu
# CUDA settings <-- may change depending on your system
CUDA_SOURCES += vectorAddition.cu
#CUDA_SDK = "C:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.2/C" # Path to cuda SDK install
CUDA_DIR = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0" # Path to cuda toolkit install
SYSTEM_NAME = win32 # Depending on your system either 'Win32', 'x64', or 'Win64'
SYSTEM_TYPE = 32 # '32' or '64', depending on your system
CUDA_ARCH = sm_11 # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math
# include paths
INCLUDEPATH += $$CUDA_DIR/include
#$$CUDA_SDK/common/inc/ \
#$$CUDA_SDK/../shared/inc/
# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/$$SYSTEM_NAME
#$$CUDA_SDK/common/lib/$$SYSTEM_NAME \
#$$CUDA_SDK/../shared/lib/$$SYSTEM_NAME
# The following library conflicts with something in Cuda
QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib
QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib
# Add the necessary libraries
CUDA_LIBS = cuda cudart
# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
NVCC_LIBS = $$join(CUDA_LIBS,' -l','-l', '')
LIBS += $$join(CUDA_LIBS,'.lib ', '', '.lib')
# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc.exe -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc.exe $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}
I also added some essential declarations, i.e. QT += core for the app to work, and also removed the SDK part, which I did not find useful in this case.
I tried this combination to work. Could not make it work due to a number of dependencies in
my project.
My final solution was to break the application into two separate applications on Windows
1)
CUDA application developed in VC and running as a service/DLL in Windows
GUI interface developed in QT and using the DLL for CUDA related tasks.
Hope it saves some time of others