theano fails to compile cuda but the python code runs using GPU - compilation

I am trying to run a theano simple code on Ubuntu 16.04 with Cuda 8.0 on NVIDIA 1060 GPU within a python virtual environment created by anaconda. The following is my theanorc file:
[global]
floatX = float32
device = cuda
The code I am trying to run is a short sample from theano website:
from theano import function, config, shared, tensor
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
And when I run the code I get a bunch of warnings and the following error:
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/sandbox/cuda -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/numpy/core/include -I/home/eb/anaconda2/envs/deep/include/python2.7 -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/gof -L/home/eb/anaconda2/envs/deep/lib -o /home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray/cuda_ndarray.so mod.cu -lcublas -lpython2.7 -lcudart')
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
/tmp/try_flags_M8OZOh.c:4:19: fatal error: cudnn.h: No such file or directory
compilation terminated.
Mapped name None to device cuda: GeForce GTX 1060 6GB (0000:01:00.0)
Surprisingly, the code RUNS and print the desired output as follows:
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.365814 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
I was wondering if I am missing a Theano config or something? Any idea on what's going wrong?
p.s. All libraries have been installed in my python virtual env except for Cuda library which is installed on system level.
--Thanks

Related

Linker Script Always Interprets Variables as Zero?

There are a lot of logs and other stuff below, so to skip to the punchline: I have a linker script and am setting variables within it, and using these variables to set up memory sections. But it seems like these variables are always being set to 0 no matter what I set them to in the script.
I am writing a linker script, trying to add two new memory sections to my output elf for an embedded systems application. I am trying to chop off the first bit of the preexisting RAM memory section, change the size of RAM accordingly, and place my new sections there.
Previously, using the nrf52840 development board and this library, I can modify the linker script for one of the example applications as so:
SEARCH_DIR(.)
GROUP(-lgcc -lc -lnosys)
__NewSection1Start = 0x20000000;
__TotalLength = 0x1000;
__NewSection2Length = __TotalLength / 2;
__NewSection1Length = __TotalLength / 2;
__NewSection2Start = __NewSection1Start + __NewSection1Length;
MEMORY
{
FLASH (rx) : ORIGIN = 0x0, LENGTH = 0xff000
NEWSECTION1 (rwx) : ORIGIN = __NewSection1Start, LENGTH = __NewSection1Length
NEWSECTION2 (rwx) : ORIGIN = __NewSection2Start, LENGTH = __NewSection2Length
RAM (rwx) : ORIGIN = 0x20000000 + __TotalLength, LENGTH = 0x40000
}
FLASH_PAGE_SIZE = 4096;
FLASH_DATA_PAGES_USED = 4;
SECTIONS
{
.newsection1 :
{
KEEP (*(.newsection1));
} > NEWSECTION1
.newsection2 :
{
KEEP (*(.newsection2));
} > NEWSECTION2
}
...
And this works perfectly. Recently, I have needed to port to the nrf9160 using the Zephyr RTOS, so I began using the NordicPlayground nrf library built on top of Zephyr at this link, modifying one of the sample apps in there. In trying to rewrite the program for that board and environment, using a technique similar to this, I did a similar thing with the Zephyr linker script, modifying it as before. The build process somehow takes what I have done and generates the file build/spm/zephyr/linker.cmd in the build output folder:
__TotalLength = 0x1000;
__NewSection1Start = 0x20000000;
__NewSection2Length = __TotalLength / 2;
__NewSection1Length = __TotalLength / 2;
__NewSection2Start = __NewSection1Start + __NewSection1Length;
MEMORY
{
FLASH (rx) : ORIGIN = 0x0, LENGTH = 0xc000
SRAM (wx) : ORIGIN = 0x20000000 + __TotalLength, LENGTH = (64 * 1K) - __TotalLength
NEWSECTION1 (rwx) : ORIGIN = __NewSection1Start, LENGTH = __NewSection1Length
NEWSECTION2 (rwx) : ORIGIN = __NewSection2Start, LENGTH = __NewSection2Length
IDT_LIST (wx) : ORIGIN = (0x20000000 + (64 * 1K)), LENGTH = 2K
}
ENTRY("__start")
SECTIONS
{
.newsection1 :
{
KEEP (*(.newsection1));
} > NEWSECTION1
.newsection2 :
{
KEEP (*(.newsection2));
} > NEWSECTION2
}
...
However, although I thought these two scripts should behave the same, trying to compile and link with the nrf9160 version with the Zephyr RTOS throws this error:
riley#riley-Blade:~<code_base>/nrf/samples/nrf9160/example_app$ west build -b nrf9160_pca10090ns
source directory: /home/riley<code_base>/nrf/samples/nrf9160/example_app
build directory: /home/riley<code_base>/nrf/samples/nrf9160/example_app/build
BOARD: nrf9160_pca10090ns (origin: CMakeCache.txt)
[6/17] Linking C executable spm/zephyr/spm_zephyr_prebuilt.elf
Memory region Used Size Region Size %age Used
FLASH: 32 KB 48 KB 66.67%
SRAM: 10000 B 60 KB 16.28%
NEWSECTION1: 0 GB 2 KB 0.00%
NEWSECTION2: 0 GB 2 KB 0.00%
IDT_LIST: 40 B 2 KB 1.95%
[12/17] Linking C executable zephyr/zephyr_prebuilt.elf
FAILED: zephyr/zephyr_prebuilt.elf
: && ccache /home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/arm-none-eabi-gcc zephyr/CMakeFiles/zephyr_prebuilt.dir/misc/empty_file.c.obj -o zephyr/zephyr_prebuilt.elf -Wl,-T zephyr/linker.cmd -Wl,-Map=/home/riley<code_base>/nrf/samples/nrf9160/example_app/build/zephyr/zephyr.map -u_OffsetAbsSyms -u_ConfigAbsSyms -Wl,--whole-archive app/libapp.a zephyr/libzephyr.a zephyr/arch/arm/core/libarch__arm__core.a zephyr/arch/arm/core/cortex_m/libarch__arm__core__cortex_m.a zephyr/arch/arm/core/cortex_m/mpu/libarch__arm__core__cortex_m__mpu.a zephyr/lib/libc/minimal/liblib__libc__minimal.a zephyr/subsys/net/libsubsys__net.a zephyr/subsys/net/ip/libsubsys__net__ip.a zephyr/drivers/gpio/libdrivers__gpio.a zephyr/drivers/serial/libdrivers__serial.a zephyr/modules/nrf/lib/bsdlib/lib..__nrf__lib__bsdlib.a zephyr/modules/nrf/lib/at_host/lib..__nrf__lib__at_host.a zephyr/modules/nrf/drivers/at_cmd/lib..__nrf__drivers__at_cmd.a /home/riley<code_base>/nrfxlib/bsdlib/lib/cortex-m33/hard-float/libbsd_nrf9160_xxaa.a -Wl,--no-whole-archive zephyr/kernel/libkernel.a zephyr/CMakeFiles/offsets.dir/arch/arm/core/offsets/offsets.c.obj -L"/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v8-m.main+fp/hard" -L/home/riley<code_base>/nrf/samples/nrf9160/example_app/build/zephyr -lgcc -Wl,--print-memory-usage /home/riley<code_base>/nrfxlib/crypto/nrf_oberon/lib/cortex-m33/hard-float/liboberon_3.0.0.a -lc -mthumb -mcpu=cortex-m33 -mfpu=fpv5-sp-d16 -Wl,--gc-sections -Wl,--build-id=none -Wl,--sort-common=descending -Wl,--sort-section=alignment -nostdlib -static -no-pie -Wl,-X -Wl,-N -Wl,--orphan-handling=warn -mabi=aapcs -march=armv8-m.main+dsp libspmsecureentries.a && :
Memory region Used Size Region Size %age Used
FLASH: 110428 B 976 KB 11.05%
SRAM: 24608 B 124 KB 19.38%
NEWSECTION1: 264 B 0 GB inf%
NEWSECTION2: 1 B 0 GB inf%
IDT_LIST: 120 B 2 KB 5.86/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: zephyr/zephyr_prebuilt.elf section `.newsection1' will not fit in region `NEWSECTION1'
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: zephyr/zephyr_prebuilt.elf section `.newsection2' will not fit in region `NEWSECTION2'
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: section .newsection2 LMA [0000000000000000,0000000000000000] overlaps section .newsection1 LMA [0000000000000000,0000000000000107]
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: region `NEWSECTION1' overflowed by 264 bytes
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: region `NEWSECTION2' overflowed by 1 byte
collect2: error: ld returned 1 exit status
%
ninja: build stopped: subcommand failed.
ERROR: command exited with status 1: /home/riley/.local/bin/cmake --build /home/riley<code_base>/nrf/samples/nrf9160/example_app/build
This is really strange for me, because this ouptut makes it seem like NEWSECTION1 and NEWSECTION2 have 0 length and both start at address 0x0, but this definitely should not be the case looking at the used linker script. To further make this confusing, if I replace __NewSection1Length with 0x500, or something like that, then the memory section will be 0x500B long, meaning that it seems like the error is that using variables in this linker script are being ignored or set to 0? Any help would be very much appreciated.
Some of the more complicated features of Zephyr like bootloaders and SPM (secure partition manager) actually build using independent board definitions, linker scripts, and project configurations. Even though they all happen from one call to west build or ninja or make or whatever build tool you are configured with, they can sometimes end up mismatched.
You can see this if you go into the build output directory and notice that there will be multiple sub-directories with names like spm and mcuboot and zephyr. Each of these sub-directories will contain their own object files, map files, linker scripts, etc. The zephyr directory is the top level one that contains the build/link work for your final application and will merge in the pre-compiled binaries from the others. What I think is happening is that the prj.conf changes you made to create the custom sections in your application are only affecting one of these sub-builds. You likely need to find the prj.conf files for the other ones and modify them similarly.

Can't put GPU as Theano device in Windows

My pc specs are a GPU NVIDIA 1050ti in a windows 10.
I installed the CUDA drivers (had to install visual studio with c++ tools) and run the tests, created a new conda environment with with theano and pygpu (python 3.6.3). Run the script:
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
print(config.device)
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print('Looping %d times took' % iters, t1 - t0, 'seconds')
print('Result is', r)
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Everything worked (with the cpu running).
Then I created a .theanorc.txt file and added to it:
#!sh
[global]
device = cpu
floatX = float32
I tried to run again the script and the result was.
5000 lines of some code followed by
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
['nvcc', '-shared', '-O3', '-Xlinker', '/DEBUG', '-D HAVE_ROUND', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,/Zi,/MD', '-I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\theano\\sandbox\\cuda"', '-I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\numpy\\core\\include"', '-I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\include"', '-I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\theano\\gof"', '-L"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\libs"', '-L"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets"', '-o', 'C:\\Users\\JoaquimFerrer\\AppData\\Local\\Theano\\compiledir_Windows-10-10.0.16299-SP0-Intel64_Family_6_Model_158_Stepping_9_GenuineIntel-3.6.3-64\\cuda_ndarray\\cuda_ndarray.pyd', 'mod.cu', '-lcublas', '-lpython36', '-lcudart']
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -Xlinker /DEBUG -D HAVE_ROUND -m64 -Xcompiler -DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,/Zi,/MD -I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\theano\\sandbox\\cuda" -I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\numpy\\core\\include" -I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\include" -I"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\lib\\site-packages\\theano\\gof" -L"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets\\libs" -L"C:\\Users\\JoaquimFerrer\\Anaconda3\\envs\\neural_nets" -o C:\\Users\\JoaquimFerrer\\AppData\\Local\\Theano\\compiledir_Windows-10-10.0.16299-SP0-Intel64_Family_6_Model_158_Stepping_9_GenuineIntel-3.6.3-64\\cuda_ndarray\\cuda_ndarray.pyd mod.cu -lcublas -lpython36 -lcudart')
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: cuda unavailable)
gpu
and then the output of the script running with cpu.
I added cl to the path and now I can run it in the console but the output didn't change.
Can someone help? Even with installing everything in a totally new way.

how to fix wrong stack alignment with Eigen library under Workbench3.0 (vxworks 6.6 IDE)

I am using Eigen(3.2.0) Library under Workbench3.0 (vxworks 6.6). The compiler in this distribution is GCC version 4.1.2.
Language : c++; Operating System: winXP
the problem code is as followed:
Eigen::Matrix3d mOrigin;
/* initial mOrigin */
...
Eigen::Quaterniond qOrigin(mOrigin);
I debuged the program and found that when it runs :
Eigen::Quaterniond qOrigin(mOrigin);
it comes out the assertion and prints:
assertion failed: (reinterpret_cast(array) & 0xf) == 0 && "this assertion is explained here: " "http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" " **** READ THIS WEB PAGE !!
! ****" in function Eigen::internal::plain_array::plain_array() [with T = double, int Size = 4, int MatrixOrArrayOptions = 0] at C:/WindRiver-GPPVE-3.6-IA-Eval/vxworks-6.6/target/h/Eigen/src/Core/DenseStorage.h:78
it said, fixed-size vectorizable Eigen objects must absolutely be created at 16-byte-aligned locations, otherwise SIMD instructions adressing them will crash. And that is why the assertion arises.
I think the problem is:
Compiler making a wrong assumption on stack alignment
http://eigen.tuxfamily.org/dox-devel/group__TopicWrongStackAlignment.html
it said, It appears that this was a GCC bug that has been fixed in GCC 4.5. If you hit this issue, please upgrade to GCC 4.5 and report to us, so we can update this page.
However, GCC is the intergation compiler of Workbench and I don't know how to upgrade it.
And I think I have tested the Local Solution and Global Solution, but they don't work.
Local Solution:
add "__attribute__((force_align_arg_pointer))"
class Interpolation : public InterpMath/*,public Colleague*/
{
public:
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
...
__attribute__((force_align_arg_pointer)) ErrorID LineInterp(
const Position_MCS_rad &targetPoint,
const Position_MCS_rad &originPoint,
const Position_ACS_rad &originACS,
double Ts, double maxVel, double maxAcc,
double maxDecel, double maxJerk,
N_AxisSeqPtr &nAglSeqPtr, GrpTcpSeq *pGrpTcp = NULL)
{
Eigen::Matrix3d mOrigin;
/* initial mOrigin */
...
Eigen::Quaterniond qOrigin(mOrigin);
}
}
warning: 'force_align_arg_pointer' attribute directive ignored
Global Solution:
add compile option -mstackrealign
ccpentium -g -mtune=pentium4 -march=pentium4 -ansi -Wall -MD -MP -Xlinker -mstackrealign -IC:/WindRiver-GPPVE-3.6-IA-Eval/vxworks-6.6/target/h -IC:/WindRiver-GPPVE-3.6-IA-Eval/vxworks-6.6/target/h/wrn/coreip -IC:/WindRiver-GPPVE-3.6-IA-Eval/workspace/RobotInterface_2_2/h -DCPU=PENTIUM4 -DTOOL_FAMILY=gnu -DTOOL=gnu -D_WRS_KERNEL -o
So how can I upgrade the intergration compiler to gcc 4.5 or newer, or whether there is other solution to fix this problem ?
Thanks for reading and help.

cudaGetDeviceCount return 1 instead of 2

I've a gpu cluster composed of 2 Tesla M2050 and when I'm executing my code, cudaGetDeviceCount returns only 1. If I try to set the device 1 with cudaSetDevice it give me this error: invalid device ordinals. In the device manager of windows both the devices are listed. If needed this is my source code
cutilSafeCall(cudaGetDeviceCount(&num_devices));
for (device = 0; device < num_devices; device++) {
cudaDeviceProp properties;
cudaGetDeviceProperties(&properties, device);
printf("Device ID:\t%d\n", device);
printf("Device Name:\t%s\n", properties.name );
printf("Global memory:\t%d\n", properties.totalGlobalMem );
printf("Constant memory:\t%d\n", properties.totalConstMem );
printf("Warp size:\t%d\n", properties.warpSize );
}
devs=0;
ParseArguments(argc, argv);
cutilSafeCall(cudaSetDevice(devs));
any help would be appreciated
edit: output of deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Tesla M2050"
CUDA Driver Version: 5.50
CUDA Runtime Version: 4.20
CUDA Capability Major/Minor version number: 2.0
...
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.50, CUDA Runtime Vers ion = 4.20, NumDevs = 1, Device = Tesla M2050
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
If you have two CUDA GPUs in a single node and deviceQuery only reports one, then consider the following possibilities:
Check both GPUs are functioning correctly by running nvidia-smi, if only one is shown then check it is socketed correctly.
Check the environment variable CUDA_VISIBLE_DEVICES is not set.

/usr/bin/ld: cannot find -latlas

I'm trying to install code which I've successfully installed in the past on a new computer, and am running into problems.
This is on Fedora, using scons. The previous successful installation was on Ubuntu.
When I type scons, it gives the following error:
/usr/bin/ld: cannot find -latlas
I have successfully installed atlas-devel via yum.
If it helps, you'll find below the top level SConstruct File (--- indicates redacted code)
BUILD_LIB_DIR = '#build/lib'
BUILD_INCLUDE_DIR = '#build/include'
BUILD_BIN_DIR = '#build/bin'
import os
default_env = Environment(ENV = os.environ, # use the system $PATH variable
CCFLAGS = ['-pipe', '-Wall'],
CXXFLAGS = ['-std=c++0x'],
CPPPATH = [BUILD_INCLUDE_DIR, '#src/'],
LIBPATH = [BUILD_LIB_DIR],
CPPDEFINES = ['_USE_LCM_'])
default_env.Append(LIBS = [---, 'lapack', 'blas', 'atlas', 'armadillo', 'rt'])
default_env.Alias('install', [BUILD_LIB_DIR, BUILD_INCLUDE_DIR, BUILD_BIN_DIR])
# Create the command-line options along with help text
vars = Variables()
vars.Add(BoolVariable('debug', 'Compile in debug mode with -g and -pg', 0))
vars.Add(EnumVariable('---'))
vars.Add(BoolVariable('log-data', 'Define LOG_DATA in the preprocessor so internal state of modules will be written to log files', 0))
Help(vars.GenerateHelpText(default_env))
debug = ARGUMENTS.get('debug', 0)
log = ARGUMENTS.get('log-data', 0)
if int(debug):
default_env.Append(CCFLAGS = ['-g', '-pg'])
default_env.Append(LINKFLAGS = ['-pg'])
else:
default_env.Append(CCFLAGS = ['-O3'])
if int(log):
default_env.Append(CPPDEFINES = ['LOG_DATA'])
log_env = default_env.Clone();
log_env.Append(LIBS=['global', 'readlog', 'z'])
Export(['default_env', 'log_env', 'BUILD_LIB_DIR', 'BUILD_INCLUDE_DIR', 'BUILD_BIN_DIR', ---])
SConscript(['src/SConscript',
'docs/SConscript'])
It turns out the issue was that atlas puts the libatlas.so in /usr/lib(64)/atlas/...
Thanks to #Dave_Bacher for information on how to deal with this, namely
Add /usr/lib/atlas to the LIBPATH array in your default_env:
LIBPATH = [BUILD_LIB_DIR, /usr/lib/atlas]

Resources