Conda environment creation eats up a lot of space - anaconda

I want to create a docker image, on Google VM.
One of the steps of the image is a conda environment creation. This is the command from the dockerfile (i am omiting the RUN):
conda env create -n cq -f environment.yml
This command installs a lot of packages and i end up without disk space.
I have two questions.
I am not sure what the -f environment.yml does. I search for this flag online, but i could not find any example together with a .yml file.
Can i remove some of the un-necessary packages before the installation happens?

-f is an alias for --file
What constitutes an unnecessary package? (I assume you have nothing unnecessary in your environment file.) Maybe you're asking about the packages bundled by installing a complete Anaconda distribution? You may find this guide helpful (including some links in the intro).

Try looking at some of Conda Forge's images to get an idea of best practice examples, maybe Miniforge3 4.10.0 to suggest a specific one.
At minimum one should make an effort to clean after every Conda operation, and do it in a chained operation so Docker doesn't save an intermediate with temporary files. A start would be
RUN conda env create -n cq -f environment.yml && conda clean -afy

Related

Is there a conda (mamba) equivalent to pip install's `--root` path option?

Background
I have a multi-stage Dockerfile for a JupyterLab image. There are three stages:
server
kernel: mamba create -y -n /tmp/kernel-packages/myenv ...
runner:
FROM ... as runner
...
COPY --from=kernel /tmp/kernel-packages/ /opt/conda/envs
RUN conda config --append envs_dirs /opt/conda/envs/
...
Problem
In the resulting image, the preferred python -m pip works, but pip gives:
bash: /opt/conda/envs/myenv/bin/pip: /tmp/kernel-packages/myenv/bin/python3.9: bad interpreter: No such file or directory
The reason is that pip has #!/tmp/kernel-packages/myenv/bin/python3.9 as its shebang.
Expected
A behaviour like pip install --root /tmp/server-packages ..., which works perfectly fine in a COPY --from=server /tmp/server-packages /.
Additional information
Additionally, some other binaries, like curl or papermill also have wrong paths hardcoded by conda. I've read about Anaconda | Moving Conda Environments, but it seems like overkill to use conda-pack and conda-unpack.
Workaround
Simply create an env by name: mamba create -y -n myenv and then in the runner stage COPY --from=kernel /opt/conda/envs /opt/conda/envs.
Question
Is there a Conda (Mamba) equivalent to pip install's --root option? I'm not looking for a solution to the stated problem as I have I have already found a way that works for me. My interest is purely in the functionality of the conda binary.
The --prefix argument is the equivalent - just that some Conda packages use hardcoded paths, hence the issue you encounter.
conda-prefix-replacement
To properly move a Conda environment to a new prefix via a COPY operation one would need to run the conda-prefix-replacement tool (a.k.a., cpr) to ensure that all files with hardcoded paths get updated to the new location. Presumably, conda-pack is doing a similar thing, just under-the-hood.
For your purposes, you might consider pre-running cpr on the environment(s) in the kernel image so that they are ready to work in the deployed location. Though that would mean always COPYing to the same location.
See the cpr repository for details on use.

Why conda doesn't remove packages for removed environment?

I am not an expert in informatics stuffs. I deleted an environment that had many packages, one of them psi4 using the command:
conda remove --name myenv --all
However, in the folder:
~/anaconda3/pkgs
there are still some folders like:
psi4-1.3.2+ecbda83-py37h06ff01c_1, psi4-rt-1.3.2-py37h6cf1279_1
And the same happened for other packages that I manually identified, therefore, I assume that the same happen for the rest of the packages that belonged to this environment. And the problem is that these files take space from my disk and I really don't know how many and what are the packages on this situation.
Is there some way to delete all these non-used folders in order to free space?
Thanks in advance.
The command you used just removes the environment or installed package not the downloaded binary files. You can clean those up using:
conda clean -a

deleting conda environment safely?

I'm new to anaconda and conda. I have created an identical environment in two different directories. Is it safe to just delete the env folder or the environment that I no longer need, or do I need to do something in the anaconda prompt to remove the environment thoroughly? I'm not sure if creating an environment in a local folder leaves a trace in the registry or somewhere else in the computer that needs to be removed too?
conda remove --name myenv --all
Another option is
conda env remove --name myenv
Effectively no difference from the accepted answer, but personally I prefer to use conda env commands when operating on whole envs, and reserve conda remove for managing individual packages.
The difference between these and deleting the folder manually is that Conda provides action hooks for when packages are removed, and so allows packages to execute pre-unlink and post-unlink scripts.

"~/miniconda3/bin" does not prepended to PATH for custom environments

I use conda 4.7.11 with auto_activate_base: false in ~/.condarc. I installed htop using conda install -c conda-forge htop. It was installed at ~/miniconda3/bin/htop. When I am in base environment I am able to use htop because ~/miniconda3/bin is prepended to PATH variable. But when I am outside all environments then only ~/miniconda3/condabin is prepended to PATH. When I am in all other environments except base then ~/miniconda3/envs/CUSTOM_ENV/bin and ~/miniconda3/condabin are prepended to PATH but not ~/miniconda3/bin, that's why I can use htop only from base environment. So my question is about how to be able to use htop installed using conda from all environments, including case when all environments are deactivated.
Please, don't suggest using package managers like apt or yum in my case (CentOS), because I have no root access to use this package manager. Thank you in advance.
Conda environments aren't nested, so what is in base is not inherited by the others. Isolation of environments is the imperative requirement, so it should make sense that the content in base env is not accessible when it isn't activated.
Option 1: Environment Stacking
However, there is an option to explicitly stack environments, which at this point literally means what you're asking for, namely, keeping the previous enviroment's bin/ in the PATH variable. So, if you htop installed only in base, then you can retain access to it in other envs like so
conda activate base
conda activate --stack my_env
If you decide to go this route, I think it would be prudent to be very minimal about what you install in base. Of course, you could also create a non-base env to stack on, but then it might be a bother to have to always activate this env, whereas in default installs, base auto-activates.
Starting with Conda v4.8 there will be an auto_stack configuration option:
conda config --set auto_stack true
See the documentation on environment stacking for details.
Option 2: Install by Default
If you want to have htop in every env but not outside of Conda envs, then the naive solution is to install it in every env. Conda offers a simple solution to this called Default Packages, and is in the Conda config under the key create_default_packages. Running the following will tell Conda to always install htop when creating a new env:
conda config --add create_default_packages htop
Unfortunately that won't update any existing envs, so you'd still have to go back and do that (e.g., Install a package into all envs). There's also a --no-default-packages flag for ignoring default packages when creating new envs.
Option 3: Global Installs
A Word of Caution
The following two options are not official recommendations, so caveat emptor and, if you do ever use them, be sure to report such a non-standard manipulation of $PATH when reporting problems/troubleshooting in the future.
Linking
Another option (although more manual) is to create a folder in your user directory (e.g., ~/.local/bin) that you add to $PATH in your .bashrc and create links in there to the binaries that you wish to "export" globally. I do this with a handful of programs that I wanted to use independently of Conda (e.g., emacs) even though they are installed and managed by Conda.
Dedicated Env
If you plan to do this with a bunch of software, then it might work to dedicate an env to such global software and just add its whole ./bin dir to $PATH. Do not do this with base - Conda wants to strictly manage that itself since Conda v4.4. Furthermore, do not do this with anything Python-related: stick strictly to native (compiled) software (e.g., htop is a good example). If an additional Python of the same version ends up on your $PATH this can create a mess in library loading. I've never attempted this and prefer the manual linking because I know exactly what I'm exporting.

Weird behavior of conda when creating empty environments

I create a conda environment without specifying any packages using the following command:
conda create --name test_env
I can then use all the packages in the root environment inside test_env (but they do not appear in the outputs of conda list and conda env export). This is already unexpected to me but the real problems begin when I then install something inside that environment, e.g.:
conda install pywavelets
Afterwards, pywavelets is usable but all the other packages which are no dependencies of pywavelets disappear inside the environment (e.g. pandas). I don't understand why that happens. Does anybody have an explanation for that?
More importantly, what does this mean for best practices for working with conda environments? Should I always create my environments specifying at least python (conda create --name test_env python)? However, then I have to install everything by hand in that environment which is quite cumbersome. So, my idea now is to specify anaconda for all environments I create:
conda create --name test_env anaconda
The disadvantage, however, is that the list of dependencies listed by conda list and conda env export gets unnecessarily long (e.g. even listing the Anaconda Navigator). Does anybody have a better solution for this?
The reason you can use all the packages from the root environment when you don't specify a Python version during environment creation is because you're actually using the root environment's Python executable! You can check with which python or python -c "import sys; print(sys.executable)". See also my other answer here.
When you install pywavelets, one of the dependencies is (probably) Python, so a new Python executable is installed into your environment. Therefore, when you run Python, it only picks up the packages that are installed in the test_env.
If you want all of the packages from another environment, you can create a file that lists all the packages and then use that file to create a new environment, as detailed in the Conda docs: https://conda.io/docs/user-guide/tasks/manage-environments.html#building-identical-conda-environments
To summarize
conda list --explicit > spec-file.txt
conda create --name myenv --file spec-file.txt
or to install into an existing environment
conda install --name myenv --file spec-file.txt
Since that's just a text file, you can edit and remove any packages that you don't want.

Resources