How to rebuild or clone base conda install? - anaconda

I have a large conda installation that is being used by multiple users. I'm having problems with it where it seems to be getting fragile. I'd like to rebuild it from scratch. I can do a conda list and get a list of packages, but the dependencies will all be random. If I just run a script to install that list, I get constant messages of upgrading and downgrading versions etc.
Is there a way to create a "smart" list of my packages to do an efficient rebuild?
EDIT
Nehal suggested conda list --export. That gives me a list of the form:
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
<package>=<version>
<package>=<version>
...
I was able to make one like that just with just conda list and then awk, but that did have some duplicates and packages that caused errors.
Nevertheless, how do I then use this rebuild the install, rather than creating an environment as the list header says?
I tried
conda install $(cat packagelist | tr "\n" " ")
But got some inconsistencies. Could be my channel priorities?

Related

Pip install local package in conda environemnt

I recently developed a package my_package and am hosting it on GitHub. For easy installation and use, I have following setup.py:
from setuptools import setup
setup(name='my_package',
version='1.0',
description='My super cool package',
url='https://github.com/my_name/my_package',
packages=['my_package'],
python_requieres='3.9',
install_requires=[
'some_package==1.0.0'
])
Now I am trying to install this package in a conda environment:
conda create --name myenv python=3.9
conda activate myenv
pip install git+'https://github.com/my_name/my_package'
So far so good. If I try to use it in the project folder, everything works perfectly. If I try to use the packet outside the project folder (still inside the conda environment), I get the following error:
ModuleNotFoundError: No module named 'my_package'
I am working on windows, if that matters.
EDIT:
I'm verifying that both python and pip are pointing towards the correct version with:
which pip
which python
/c/Anaconda3/envs/my_env/python
/c/Anaconda3/envs/my_env/Scripts/pip
Also, when I run:
pip show my_package
I get a description of my package. So pip finds it, but as soon as I try to import my_package in the script, I get the described error.
I also verified that the package is installed in my environment. So in /c/Anaconda3/envs/my_env/lib/site-packages there is a folder my_package-1.0.dist-info/
Further: python "import sys, print(sys.path)"
shows, among other paths, /c/Anaconda3/envs/my_env/lib/site-packages. So it is in the path.
Check if you are using some explicit shebang in your script pointing to other Python interpreters.
Eg. using the system default Python:
#!/bin/env python
...
While inside your environment myenv, try to uninstall your package first, to do a clean test:
pip uninstall my_package
Also, you have a typo in your setup.py: python_requieres --> python_requires.
And I actually tried to install with your setup.py, and also got ModuleNotFoundError - but because it didn't properly install due to install_requires:
ERROR: Could not find a version that satisfies the requirement some_package==1.0.0
So, check also that everything installs without errors and warnings.
Hope that helps.
First thing I would like to point out (not the solution) regards the following statement you made:
If I try to use it in the project folder [...] If I try to use the packet outside the project folder [...]
I understand "project folder" means the "my_package" folder (inside the git repository). If that is the case, I would like to point out that you are mixing two situations: that of testing a (remote) package installation, while in your (local) repository. Which is not necessarily wrong, but error-prone.
Whenever testing the setup/install process of a package, make sure to move far from your repository (say, "/tmp/" equivalent in Windows) and, preferably, use a fresh environment. That will eliminate "noise" in your tests.
First thing I would tell you to do -- if not already -- is to create a fresh conda env and install your package from an empty/new folder. Eg,
$ conda env create -n test_my_package ipython pip
$ cd /tmp # equivalent temporary or new in your Windows
$ pip install git+https://github.com/my_name/my_package
If that doesn't work (maybe a problem with your pip' git+http code), do another way: create a release for your package (eg, "v1") and then install the released version by indicating the zip package URL (that you get from your "my_package" releases page on Github):
$ pip install https://github.com/my_name/my_package/archive/v1.zip

Specify a chosen default version of conda package through conda-forge / conda-install

I'd like to distribute multiple versions of a package through conda. Specifically, I'd like to do something like this:
...
package-v1.2-dev
package-v1.2
package-v1.1-dev
package-v1.1
package-v1.0
The trick is that I'd like to have the "latest" or default package be the release versions that do not have -dev. As I understand it, conda install <package> without a version number will install the newest build. In my case, that will always be -dev. Is it possible to make the default a specific version number?
You can achieve this by specifying a custom "label" for your dev packages. Keep using the default main label for your release packages, but use a non-main label (e.g. dev) for the other packages.
First, a quick note about version numbers: conda package versions must not contain the - character, so v1.2-dev is not a valid version. For the following examples, I'll use v1.2.dev.
Here's how to upload your packages:
anaconda upload mypackage-v1.2.tar.bz2
anaconda upload --label dev mypackage-v1.2.dev.tar.bz2
(You can also manipulate the labels for existing packages via your account on the http://anaconda.org website.)
By default, your users will only download your main packages. Users who want the dev packages will have two choices:
They can specify the dev label on the command-line:
conda install -c mychannel/label/dev mypackage
OR
They can add your dev label to their .condarc config
# .condarc
channels:
- mychannel/label/dev # dev label
- mychannel # main label only
- conda-forge
- defaults
And then there's no need to specify the channel on the command-line:
conda install mypackage
PS -- Here's a side note about something you wrote above:
As I understand it, conda install <package> without a version number will install the newest build
Just to clarify, it doesn't install the "newest" in chronological sense, but rather the highest compatible version according to conda's VersionOrder logic. That logic is designed to be largely compatible with relevant Python conventions (e.g. PEP440 and others), but with some affordances for compatibility with other languages' conventions, too.
Please note: As far as conda (and PEP440) is concerned, 1.2.dev comes BEFORE 1.2. (Maybe you already knew that, but I don't consider it obvious.)
$ python
>>> from conda.models.version import VersionOrder
>>> VersionOrder('1.2.dev') < VersionOrder('1.2')
True

Why does anaconda download again packages that I already have when creating a new environment?

I have used anaconda3 for a few projects recently, and every time I create a virtual environment for a project, it seems that anaconda is re-downloading the same packages (pytorch, for instance).
Have I misconfigured something or this behavior is OK?
for clerification, I am doing the Stanford CS224n course and for the assignments I use:
conda env create --file env.yml
Where env.yml is of the form:
name: local_nmt
channels:
- pytorch
- defaults
dependencies:
- python=3.5
- numpy
- scipy
- tqdm
- docopt
- pytorch
- nltk
- torchvision
I couldn't fine an explanation in the anaconda documentation.
Thanks in advance!
If only the package name or version is specified, then Conda will default to grabbing the latest versions that are consistent with constraints. Hence, any packages that have newer builds available will result in downloading.
Offline Mode
There is an --offline flag to only use what is available in the package cache.
Specifying Builds
However, that may not always be feasible (e.g., you've added some non-cached packages to the YAML). In that case, one could additionally specify the build (which sort of serves as a unique identifier) to correspond to the already cached versions.
Not sure the cleanest way to do that, but one approach would be to first export a YAML from your existing environments where the packages exist (e.g., conda export env > env.yaml), and then use the specifications in there to fill in the details for the environment YAML you are trying to create.
Cloning
It is likely also worth mentioning that one can also clone existing environments:
conda create --clone old_env --name new_env

How to copy all Conda packages from one env to the base env?

I have an environment called envname, but I would like its packages to be available in the base environment. How can I do this without reinstalling each of them?
Word of Caution
Be very careful when tinkering with the base env. It's where the conda package lives and so if it breaks, the Conda installation will break. This is a very tedious situation to recover from, so I generally recommend against using the base env for anything other than running conda update -n base conda.
That said, one should only try the following for sharing between two non-base envs.
Copying (Linking) Packages Across Envs
One way would be to export an env, let's call it foo, out as a YAML:
conda env export -n foo > foo.yaml
And then ask the other env, let's call it bar, to attempt to install all the packages:
Warning: Conda will attempt the following command without requesting approval!
conda env update -n bar -f foo.yaml
Note that if the foo env has conflicting packages, they will all supersede whatever was in the bar env (if resolvable). To be cautious, you should probably do a diff first, to see what is going to get overwritten. E.g.,
conda env export -n bar > bar.yaml # this is also useful as backup
diff -u bar.yaml foo.yaml
A major thing to check for is the python version. They should match up to and including the minor version (e.g., 3.6.x and 3.6.y are okay; 3.6 and 3.7 are not).
To err on the side of caution, one should probably manually remove any packages from the YAML that would be reversions - however, this could lead to conflicts.
The deletions will not have an effect unless also using the --prune argument (essentially that would completely overwrite bar with foo).
Hopefully all these qualifications and warnings make the point: it could be a mess. It is usually better practice to thoughtfully design a fresh env from the start.

How to document requirements

When I program, I often experiment with third party packages.
Of course, they are installed via pip.
When installed, packages install their dependencies.
At the end I want to have a clean requirements.txt which reflect what packages are really necessary for the project (pip freeze > requirements.txt).
If I manage to find a better solution to a package, I uninstall it via pip.
The problem is that when uninstalled, a package doesn't uninstall its dependencies.
If I'm not mistaken, this it doesn't. Well, and when I make pip freeze, I can't recognize what is going on here.
Therefore I decided to document which package installed which other packages.
When uninstalled, I uninstall them manually.
But this is really troublesome and error prone:
pip freeze > requirements.txt
asgiref==3.2.3
Django==3.0.2
pkg-resources==0.0.0
pytz==2019.3
sqlparse==0.3.0
Edit requirements.txt:
asgiref==3.2.3 # Django dependency
Django==3.0.2 # Django
pkg-resources==0.0.0 # Default package
pytz==2019.3 # Django dependency
sqlparse==0.3.0 # Django dependency
When I install a new package, I make pip freeze > requirements1.txt. Then compare the files, reveal the newly installed package's dependencies, mark them, then copy-paste old comments.
Well, finally I can just make it a complete mess. I make pip freeze and understand that I just don't know which package depends on which as I just forget to make my comments or something.
Again this is my goal: when I finish my project in a year, I'd like to know exactly that:
Every installed package is absolutely necessary for the project to run.
No unnecessary packages is installed.
Could you tell me what is the best practice to do that.
It seems pip-tools is the exact tool you need to solve the problems you encountered in your development workflow.
pip-tools enables you to specify your high-level dependencies (e.g. Django), while taking care automatically of your low-level dependencies (e.g. pytz). It also enables you to sync your environment to a requirements.txt file, regardless of the messy state you might have.
Installing pip-tools in your project
To use pip-tools, first create a new empty Python virtual environment in your project and activate it, using the tool of your choice:
with venv (built-in): python3 -m venv .venv && source .venv/bin/activate
with pyenv: pyenv virtualenv 3.8.1 project-env && pyenv local project-env
Then, install pip-tools in the Python virtual environment of your project:
pip install pip-tools
Managing dependencies with pip-compile using a requirements.in file
pip-tools includes a pip-compile tool that takes a requirements.in file as input. This requirements.in file is similar to requirements.txt, but it contains only high-level dependencies. For example, if your project is using the latest version of Django, you could write something like this in requirements.in:
django>=3.0,<3.1
You can see that requirements.in does not contain the dependencies of Django, only Django itself.
Once you have set your dependencies in requirements.in, use pip-compile to "compile" your requirements.in into a requirements.txt file:
➜ pip-compile requirements.in
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile requirements.in
#
asgiref==3.2.3 # via django
django==3.0.2
pytz==2019.3 # via django
sqlparse==0.3.0 # via django
The requirements.txt file generated by pip-compile indicates the source of every indirect dependency next to the package name. For example, pytz==2019.3 is a dependency of django. In addition to that, it also pins each dependency to a precise version number, to make sure that the installation of your dependencies is reproducible
Applying dependencies from generated requirements.txt with pip-sync
Now that you have a requirements.txt, you can apply it on your Python virtual environment using pip-sync:
➜ pip-sync requirements.txt
Collecting asgiref==3.2.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 1))
Using cached https://files.pythonhosted.org/packages/a5/cb/5a235b605a9753ebcb2730c75e610fb51c8cab3f01230080a8229fa36adb/asgiref-3.2.3-py2.py3-none-any.whl
Collecting django==3.0.2 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 2))
Using cached https://files.pythonhosted.org/packages/55/d1/8ade70e65fa157e1903fe4078305ca53b6819ab212d9fbbe5755afc8ea2e/Django-3.0.2-py3-none-any.whl
Collecting pytz==2019.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 3))
Using cached https://files.pythonhosted.org/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whl
Collecting sqlparse==0.3.0 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 4))
Using cached https://files.pythonhosted.org/packages/ef/53/900f7d2a54557c6a37886585a91336520e5539e3ae2423ff1102daf4f3a7/sqlparse-0.3.0-py2.py3-none-any.whl
Installing collected packages: asgiref, pytz, sqlparse, django
Successfully installed asgiref-3.2.3 django-3.0.2 pytz-2019.3 sqlparse-0.3.0
pip-sync will make sure your virtual environment corresponds exactly to what is defined in the requirements.txt file by uninstalling and installing the relevant packages.
After your environment has been synced, make sure your project code works properly. If it does, commit requirements.in and requirements.txt into version control.
Reference
For more details, you can refer to the docs of pip-tools, or to the article of Hynek Schlawack: Python Application Dependency Management in 2018.
Dependency management is a bit rougher in python land than others, there a tools like poetry and pipenv but for just poking a round I would try the following.
create a virtual environment in your project directory
$ python3 -m venv .venv
Activate your virtual environment
$ source .venv/bin/activate
Install whatever packages you're playing with
$ echo "package1" >> requirements-dev.txt
$ echo "package2" >> requirements-dev.txt
$ pip install --upgrade -r requirements-dev.txt
When you think you have everything you need deactivate your virtual env, create a new one, make sure everything still works, then create your new requirements.txt
$ deactivate
$ rm -rf .venv
$ python3 -m venv .venv
$ pip install --upgrade -r requirements-dev.txt
# test your code still works
# create a new requirements file to be used for "prod"
$ pip freeze > requirements.txt
I wouldn't call this best practice, but its a start and should get you by until to decide which side you want to join when it comes to dep management tools :)

Resources