I have a environment.yml file which I used to create a Python environment using:
conda env create --file environment.yml.
After the environment is created, I need to perform some operations (such as registering a kernel with jupiter-lab):
ipython kernel install --name=to_the_edge
I would like to embed one or more shell commands to run "post install" so that the setup is self-contained within the .yml file. Is there a way to do this? Or is there a different way within conda to get close to what I'm after?
I would also like a way to specify shell commands to be run after conda activate, but that's a secondary hope.
Maybe this isn't possible because conda works cross platform?
This isn't really possible with standard Conda commands, but there are some options to obtain such functionality.
Jupyter and Conda
The best practice for Jupyter and Conda is to have a single env that has jupyter installed and also has nb_conda_kernels. You always launch jupyter notebook from this env. The nb_conda_kernels package enables Jupyter to automatically detect any other envs that have ipykernel (or other language equivalents, e.g., r-irkernel). Hence, you don't need any additional registration, but simply need to include ipykernel in the YAML. See the docs for nb_conda_kernels.
Running scripts at install
This cannot be done from a YAML. However, you could build your own custom package that does this at install time and then include that in your YAML. You would have to provide the .sh, .bat, etc. to run the commands. See the documentation on adding pre-link, post-link, and unlinked scripts to a package recipe.
Through this route, you can also add activate and deactivate scripts that are run when the env is activated and deactivated, respectively. You can also add such scripts manually, i.e., without a custom package. For example, the docs show how to define environment variables at activation, but you can run arbitrary scripts.
Related
I have a conda environment with packages installed via conda install. I also have two local development packages that were each installed with pip install -e .. Now, conda env export shows everything, including both local development packages. However, I don't want conda to include them when creating the same environment on other machines - I want to keep doing it via pip install -e ..
So how can I exclude both local packages when creating the environment.yml file? Do I need to manually remove them or is there a way to this from the command line?
While there are some alternative flags for conda env export that change output behavior (e.g., --from-history is most notable), there really isn't anything as specific as OP describes. Instead, manually remove the offending lines.
Note that YAMLs do support all pip install commands, so the editable installs can also be included. For example, https://stackoverflow.com/a/59864744/570918.
Consider Prioritizing the YAML Specification
In a software engineering setting, I would expect that users should not even be hitting development environments with conda install or pip install commands. Instead, the team should have a maunally-written, version-controlled YAML to begin with and all installations/changes to the environment are managed through editing the YAML file and using conda env update to propagate changes in the YAML to the environment.
That is, conda env export should not be necessary because the environment already has a well-defined means of creation.
conda 4.10.1
airflow 2.2.2
I normally run a script in the following manner
conda activate env
python /path to script/script.py
So I put those two commands into a bash script and used the bashOperator like so:
t1 = BashOperator(
task_id='testtask',
depends_on_past=False,
bash_command='/path to bash/script.bash ',
retries=0,
)
and got the dreaded conda is not setup to activate environments.
Then I did:
conda init bash
conda activate env
python /path to script/script.py
but of course, the shell has to be restarted, which I don't know how to do in apache airflow. There has to be default args or something secret with the .bashrc etc. to activate anaconda environments in non interactive mode, but I'm a windows conda transplant and a tutorial is not handy.
There's this other solution which basically does a bunch of tricky things to start python in the environment of your choice,
How to run Airflow PythonOperator in a virtual environment
That secret hack is to just run the python in the environment:
bash_command='~/anaconda3/envs/env_of_choice/bin/python
/python_files/python_task1.py',
This guy was able to do it on anaconda 3.9!
How to change working directory and specify conda environment in Apache Airflow
But mysteriously, my environment and my base environment have the same python. When I type env for both environments the difference is in the following:
conda_shlvl=2 instead of 1
conda_prefix_1 = users/me/opt/anaconda3
path includes /users/me/opt/anaconda3/envs/env_of_choice/bin
conda_prefix=/users/me/opt/anaconda3/envs/env_of_choice
conda_default_env=sfdc
There are a few ways to go. Maybe I didn't set up the environment correctly and its using the base python instead of making a python in the virtual environment. I used a yml file. It's also really tempting just to set these environment variables in the DAG, but maybe that's not the accepted way? I couldn't find a tutorial. What's the right path? Or maybe my version, 4.10.1 is too advanced and I should downgrade to 3.9. Too many options. Advice?
The way I ended up doing this was to use the conda run command (inspired from this answer). conda run allows you to trigger a conda environment programmatically without needing to activate it - and this works within airflow.
I use conda 4.7.11 with auto_activate_base: false in ~/.condarc. I installed htop using conda install -c conda-forge htop. It was installed at ~/miniconda3/bin/htop. When I am in base environment I am able to use htop because ~/miniconda3/bin is prepended to PATH variable. But when I am outside all environments then only ~/miniconda3/condabin is prepended to PATH. When I am in all other environments except base then ~/miniconda3/envs/CUSTOM_ENV/bin and ~/miniconda3/condabin are prepended to PATH but not ~/miniconda3/bin, that's why I can use htop only from base environment. So my question is about how to be able to use htop installed using conda from all environments, including case when all environments are deactivated.
Please, don't suggest using package managers like apt or yum in my case (CentOS), because I have no root access to use this package manager. Thank you in advance.
Conda environments aren't nested, so what is in base is not inherited by the others. Isolation of environments is the imperative requirement, so it should make sense that the content in base env is not accessible when it isn't activated.
Option 1: Environment Stacking
However, there is an option to explicitly stack environments, which at this point literally means what you're asking for, namely, keeping the previous enviroment's bin/ in the PATH variable. So, if you htop installed only in base, then you can retain access to it in other envs like so
conda activate base
conda activate --stack my_env
If you decide to go this route, I think it would be prudent to be very minimal about what you install in base. Of course, you could also create a non-base env to stack on, but then it might be a bother to have to always activate this env, whereas in default installs, base auto-activates.
Starting with Conda v4.8 there will be an auto_stack configuration option:
conda config --set auto_stack true
See the documentation on environment stacking for details.
Option 2: Install by Default
If you want to have htop in every env but not outside of Conda envs, then the naive solution is to install it in every env. Conda offers a simple solution to this called Default Packages, and is in the Conda config under the key create_default_packages. Running the following will tell Conda to always install htop when creating a new env:
conda config --add create_default_packages htop
Unfortunately that won't update any existing envs, so you'd still have to go back and do that (e.g., Install a package into all envs). There's also a --no-default-packages flag for ignoring default packages when creating new envs.
Option 3: Global Installs
A Word of Caution
The following two options are not official recommendations, so caveat emptor and, if you do ever use them, be sure to report such a non-standard manipulation of $PATH when reporting problems/troubleshooting in the future.
Linking
Another option (although more manual) is to create a folder in your user directory (e.g., ~/.local/bin) that you add to $PATH in your .bashrc and create links in there to the binaries that you wish to "export" globally. I do this with a handful of programs that I wanted to use independently of Conda (e.g., emacs) even though they are installed and managed by Conda.
Dedicated Env
If you plan to do this with a bunch of software, then it might work to dedicate an env to such global software and just add its whole ./bin dir to $PATH. Do not do this with base - Conda wants to strictly manage that itself since Conda v4.4. Furthermore, do not do this with anything Python-related: stick strictly to native (compiled) software (e.g., htop is a good example). If an additional Python of the same version ends up on your $PATH this can create a mess in library loading. I've never attempted this and prefer the manual linking because I know exactly what I'm exporting.
I create a conda environment without specifying any packages using the following command:
conda create --name test_env
I can then use all the packages in the root environment inside test_env (but they do not appear in the outputs of conda list and conda env export). This is already unexpected to me but the real problems begin when I then install something inside that environment, e.g.:
conda install pywavelets
Afterwards, pywavelets is usable but all the other packages which are no dependencies of pywavelets disappear inside the environment (e.g. pandas). I don't understand why that happens. Does anybody have an explanation for that?
More importantly, what does this mean for best practices for working with conda environments? Should I always create my environments specifying at least python (conda create --name test_env python)? However, then I have to install everything by hand in that environment which is quite cumbersome. So, my idea now is to specify anaconda for all environments I create:
conda create --name test_env anaconda
The disadvantage, however, is that the list of dependencies listed by conda list and conda env export gets unnecessarily long (e.g. even listing the Anaconda Navigator). Does anybody have a better solution for this?
The reason you can use all the packages from the root environment when you don't specify a Python version during environment creation is because you're actually using the root environment's Python executable! You can check with which python or python -c "import sys; print(sys.executable)". See also my other answer here.
When you install pywavelets, one of the dependencies is (probably) Python, so a new Python executable is installed into your environment. Therefore, when you run Python, it only picks up the packages that are installed in the test_env.
If you want all of the packages from another environment, you can create a file that lists all the packages and then use that file to create a new environment, as detailed in the Conda docs: https://conda.io/docs/user-guide/tasks/manage-environments.html#building-identical-conda-environments
To summarize
conda list --explicit > spec-file.txt
conda create --name myenv --file spec-file.txt
or to install into an existing environment
conda install --name myenv --file spec-file.txt
Since that's just a text file, you can edit and remove any packages that you don't want.
I noticed that when a conda environment is created without specifying the python version:
conda create --name snowflakes
instead of:
conda create --name snowflakes python=3.6
the environments are not separated and share the package with the default python interpreter.
Thereupon, What is the use of non-separated anaconda environments?
EDIT - 20170824:
The question has been solved. Actually non-separated environments do not exist. With the first command there is no new Python interpreter installed so that it calls the first that it finds in the PATH being the standard Python interpreter because there is no other.
I think you are misunderstanding the word "separate" in the docs. In the docs, they mean "separate" in the sense of "create a new environment, with a new name to try some new things". They do not mean that you are creating a different kind of conda environment. There is only one kind of environment in conda, what you are calling the "separated" environment. All packages in all environments are always unique. It so happens that the first command creates an empty environment with no packages. Therefore, when the new environment is activated, the PATH environment variable looks like: ~/miniconda3/envs/snowflakes/bin:~/miniconda3/bin:... Now, since there is no Python installed into ~/miniconda3/envs/snowflakes/bin (because the snowflakes environment is empty), the shell still finds Python in ~/miniconda3/bin as first on the path. The snowflakes environment does not share with the root environment. For instance, if, after creating, you type conda install -n snowflakes python it will install a new version of Python that won't find any packages! Therefore, there is only one kind of environment in conda, what you are calling the "separated" environment.