I build the caffe2 with anaconda following the page.
In the server with a single titanx, has cudnn7 and cuda9 but do not have nccl, so I download the nccl2 from nvidia and extract it to path/to/local/nccl2, and then edit the ./pytorch/conda/integrated/build.sh in the line 42 to be:"export NCCL_ROOT_DIR=path/to/local/nccl2".
Then I need to use caffe2 with python2, so I added "conda_args+=(" --python 2.7") " in the ./pytorch/scripts/build_anaconda.sh to use python2.7.
The building was succeed, but when I run python2 test.py from caffe2.python import core
It tells me:
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_hip
Segmentation fault (core dumped)
My question is:
a. why the conda does not support gpu?
b. if I am using a single gpu, is nccl necessary for building?
c. how to fix No module named caffe2_pybind11_state_hip
PyTorch or Caffe2: caffe2
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source):./scripts/build_anaconda.sh --install-locally --cuda 9.0 --cudnn 7
OS:ubuntu16
PyTorch version:
Python version:2.7
CUDA/cuDNN version:9.1/7
GPU models and configuration:??
GCC version (if compiling from source):5.4.0
CMake version:not install
Versions of any other relevant libraries:
Thank you very much!
First of all get CUDA and install it:
sudo apt-get update && sudo apt-get install wget -y --no-install-recommends
wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb"
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
Now proceed with installation from source ( do it in an environment):
FULL_CAFFE2=1 python setup.py install
You can find more info here: https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile#install-with-gpu-support
Follow the below procedure it worked for me
ubuntu#test:~$ cd $HOME
ubuntu#test:~$ conda create -n caffe2
ubuntu#test:~$ source activate caffe2
(caffe2) ubuntu#test:~$ git clone --recursive https://github.com/pytorch/pytorch.git && cd pytorch
(caffe2) ubuntu#test:~/pytorch$ git submodule update --init
(caffe2) ubuntu#test:~/pytorch$ CONDA_INSTALL_LOCALLY=1 ./scripts/build_anaconda.sh --cuda 8.0 --cudnn 7 -DUSE_CUDA=ON -DUSE_NCCL=ON
Related
Right now, I'm trying to integrate a GitHub Action that checks if some code on a pull request compiles properly (VEX Robotics for anyone interested). However, when it gets to running the make command, I get this error:
Building Project
make: Entering directory '/github/workspace/V5'
Adding timestamp [OK]
Creating cold package with libpros,okapilib [ERRORS]
/usr/lib/gcc/arm-none-eabi/6.3.1/../../../arm-none-eabi/bin/ld: unrecognized option '--gc-keep-exported'
/usr/lib/gcc/arm-none-eabi/6.3.1/../../../arm-none-eabi/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
make: *** [bin/cold.package.elf] Error 1
common.mk:200: recipe for target 'bin/cold.package.elf' failed
I'm extremely confused as to why this is occurring? --gc-keep-exported is a real option, and this code compiles perfectly on my local machine. I've tried changing the ubuntu version and updating the VEX SDK to see if it helps, but I keep on getting the same error. What should I do?
Code:
Dockerfile:
FROM ubuntu:18.04
RUN apt-get update
# Install GCC & Clang
RUN apt-get install build-essential -y
RUN apt-get install clang -y
# Install needed ARM deps
RUN apt-get install gcc-arm-none-eabi -y
RUN apt-get install binutils-arm-none-eabi -y
# Install 7z & cURL
RUN apt-get install p7zip-full -y
RUN apt-get install curl -y
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
entrypoint.sh:
echo "Downloading VEX SDK"
# Get VEX SDK and put it in ~/sdk
curl -L https://content.vexrobotics.com/vexcode/v5code/VEXcodeProV5_2_0_1.dmg -o _vexcode_.dmg
7z x _vexcode_.dmg || :
7z x Payload~ ./VEXcode\ Pro\ V5.app/Contents/Resources/sdk -osdk_temp || :
mkdir ~/sdk
mv sdk_temp/VEXcode\ Pro\ V5.app/Contents/Resources/sdk/* ~/sdk
rm -fR _vex*_ _vex*_.dmg sdk_temp/ Payload~
ls ~/sdk # ls just for testing
echo "Building Project"
# Now make the makefile in the set path
make --directory=$1
The reason this was happening was because I used an old version of gcc-arm-none-eabi. The version on apt is super outdated (v6.3.1 vs v10.2.1).
I was able to use the new version by downloading the tarball available on their site and using the direct paths to compile my code.
There is a newer GCC version available for a newer Ubuntu version, you could browse for another one, or be lazy and enjoy some malpractice with me:
FROM ubuntu:latest
Has anyone figured out how to make XGboost work with Apple M1?
I have tried multiple things to fix it, but it does not work.
I have tried reinstalling it; pip and pip3 and python -m pip and conda install; brew install limpomp; brew install gcc#8; Downloading source code and compiling locally.
It seems XGboost does not work on Apple M1.
Here is the error, this occurs when I import xgboost in my script:
XGBoostError: XGBoost Library (libxgboost.dylib) could not be loaded.
Likely causes:
* OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
* You are running 32-bit Python on a 64-bit OS
Error message(s): ['dlopen(/opt/anaconda3/envs/msc-env/lib/python3.8/site-packages/xgboost/lib/libxgboost.dylib, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib\n Referenced from: /opt/anaconda3/envs/msc-env/lib/python3.8/site-packages/xgboost/lib/libxgboost.dylib\n Reason: image not found']
i'd got the same issue on MacBook Pro (13-inch, M1, 2020) with chip Apple M1, fortunately after of hours of some researches i got the solution, you just follow the following instruction:
brew install libomp
conda install -c conda-forge py-xgboost
https://discuss.xgboost.ai/t/xgboost-on-apple-m1/2004/8
How to install xgboost in python on MacOS?
A combination of the answer from cherry (first) and Christoffer (second) work for me with miniforge interpreter:
Make sure gcc-11 (and g+±11) is installed, if not do so with
brew install gcc#11
brew install cmake
Then, do the following
git clone --recursive https://github.com/dmlc/xgboost
mkdir xgboost/my_build
cd xgboost/my_build
CC=gcc-11 CXX=g++-11 cmake ..
make -j4
cd ../python_package
/Users/xx/miniforge3/envs/MLEnv/bin/python setup.py install
With the path to you miniforge venv
I put Terminal in Rosetta mode first before installing brew. This way I'm essentially running intel version of the packages. I provided more details in this gist.
I have a conda virtual environment based on the following yaml:
channels:
- conda-forge
dependencies:
- gcc_linux-64
- gxx_linux-64
- gfortran_linux-64
- theano
This is a simplified example, in reality the YAML has much more packages.
In details, the software is installed in base environment inside a docker container, however I do not believe that my problem is related to containers at all. The important part of the Dockerfile is below:
# BASE IMAGE
FROM ubuntu:18.04
# PATH EXPORT
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
# UPDATE THE PACKAGE LIST
RUN apt-get update
# INSTALL WGET
RUN apt-get install -y wget && rm -rf /var/lib/apt/lists/*
# INSTALL MINICONDA WITH PYTHON 3.7
RUN wget --no-verbose \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
# UPDATE CONDA
RUN conda update --name base --channel defaults conda
# COPY THE YAML & INSTALL SOFTWARE WITH CONDA
COPY conda_packages.yaml .
RUN conda env update --name base --file conda_packages.yaml
The container is built properly and afterwards I can run the Anaconda New Compilers within using commands: x86_64-conda_cos6-linux-gnu-gcc or x86_64-conda_cos6-linux-gnu-c++. However, when I run a test python script that does import theano I get an error:
/root/miniconda3/lib/python3.7/site-packages/theano/configdefaults.py:560: UserWarning:
DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
When I later check the build logs theano version that is installed is: 1.0.4
Compilers versions are: 7.3.0
GPU driver and CUDA is not enabled and accessible by PyTorch.
torch.cuda.is_available() returns false
I am using macOS Mojave 10.14.6
I have installed Cuda 10.0 version of pytorch.
I tried verfication on https://pytorch.org/get-started/locally/ and constructing a randomly initialized tensor works just fine.
But when I tried
import torch
torch.cuda.is_availalbe()
it returns false.
Therefore, I followed instructions on Pytorch and installed Anaconda and Cuda.
Then tried this:
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
In terminal, I got
fatal error: 'string.h' file not found
#include_next <string.h>
I searched on stackoverflow and came up with this. Build Pytorch from source . So I tried:
$ find /Library/Developer/CommandLineTools/usr -type f -name string.h
which returned /Library/Developer/CommandLineTools/usr/include/c++/v1/string.h
Doesn't this mean I already have string.h?
How can I solve this problem?
Are you installing from a conda env? According to the github this should work:
- Create a conda env
- conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing (that installs some requirements)
Then this (which I assume you've already done):
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
And finally set up the conda variable and install:
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
There are issues on the git reporting that behavior here which suggest to add something like:
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ NO_CUDA=1 python setup.py install
Check thatNO_CUDA, this issue as also been mentioned HERE in the forums and it seems to be that it could be an issue caused by the OS and driver versions. If that should be the case I recommend to use Nvidia Docker (hopefully it has mac support) with a pytorch container from https://ngc.nvidia.com/catalog/landing
It that should also fail, your best bet is to install without CUDA support.
environments: Ubuntu 14.04(64bit) Python2.7.11
Firstly, I installed tensorflow in the way of Virtualenz installation.
$ sudo apt-get install python-pip python-dev python-virtualenv
$ virtualenv --system-site-packages ~/tensorflow
$ source ~/tensorflow/bin/activate
$export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
$ pip install --upgrade $TF_BINARY_URL
and then, I test my installation and some issue appear. I know I didn't install tensorflow successfully.
import tensorflow
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow
import tensorflow as tf
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow
I don't know how to solve the problem. Please help me, it cost me one day. I tried to uninstall tensorflow and then I installed in the way of pip installation. But I get the same error.
The protocbuf is 3.1.0.
Are you running python in the same virtual environment you installed tensorflow in?
To access your tensorflow installation, you have to first "activate" the virtualenv in any new terminals, as follows:
source ~/tensorflow/bin/activate
python
import tensorflow as tf
If you run the above in a new terminal, does it solve your problem?
When you did
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
this step you are specifying that you are going to use Nvidia card.
To run tensorflow with GPU(Nvidia graphics card) you need to satisfy all Nvidia requirements
Nvidia requires some special privileges to its CUDA cores
You also need to check for Cuda pathnames to the LD_LIBRARY_PATH environment variable, check in Nvidia Documentation.Also, you need to install an profiling support, this can be done by libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:
sudo apt-get install libcupti-dev
But if you want to run tensorflow in CPU mode only, do not specify $ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl.With this you are overriding TF_BINARY_URL variable to use Nvidia CUDA core
So, to use CPU from all your steps remove $ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl and include only $export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl and reinstall
I hope this should clear the problem
In case, your prerequisite python packages are not installed properly,
check several things.
$ source $HOME/tensorflow/bin/activate
$ which python
$ which pip
please check these binaries are in the path $HOME/tensorflow/bin/activate. If so, try
$ pip install -I --upgrade $TF_BINARY_URL
where -I option forces to install packages.
INSTALLATION OF TENSORFLOW ON UBUNTU 18.04
download anaconda python package
install it via shell using bash
$bash anaconda*.sh
editing the .bashrc script //location home
$sudo apt-get install python3-pip
$sudo apt-get update
$cd
$nano .bashrc
nano is the text editor
insert the given line at the end of the file
export PATH=-/anaconda3/bin:$PATH
create a virtual environment
using conda
$conda create -n myenv python=3.5
//SPECIFY THE VERSION REQUIRED DO NOT USE 3.7 AS THERE IS A COMPATIBLITY ISSUE WITH TENSORFLOW 10
$source activate myenv
$pip install -U tensorflow
$python
>>import tensorflow as tf
>> //get this prompt without an error it means the installation is successful
>>exit()
source deactivate
fully tested if an issue arises do let me know
whenever you install python packages i would suggest to do it in a virtual environment