How do I run python in a conda environment with airflow?

How do I run python in a conda environment with airflow? - anaconda

conda 4.10.1
airflow 2.2.2
I normally run a script in the following manner
conda activate env
python /path to script/script.py
So I put those two commands into a bash script and used the bashOperator like so:
t1 = BashOperator(
task_id='testtask',
depends_on_past=False,
bash_command='/path to bash/script.bash ',
retries=0,
)
and got the dreaded conda is not setup to activate environments.
Then I did:
conda init bash
conda activate env
python /path to script/script.py
but of course, the shell has to be restarted, which I don't know how to do in apache airflow. There has to be default args or something secret with the .bashrc etc. to activate anaconda environments in non interactive mode, but I'm a windows conda transplant and a tutorial is not handy.
There's this other solution which basically does a bunch of tricky things to start python in the environment of your choice,
How to run Airflow PythonOperator in a virtual environment
That secret hack is to just run the python in the environment:
bash_command='~/anaconda3/envs/env_of_choice/bin/python
/python_files/python_task1.py',
This guy was able to do it on anaconda 3.9!
How to change working directory and specify conda environment in Apache Airflow
But mysteriously, my environment and my base environment have the same python. When I type env for both environments the difference is in the following:
conda_shlvl=2 instead of 1
conda_prefix_1 = users/me/opt/anaconda3
path includes /users/me/opt/anaconda3/envs/env_of_choice/bin
conda_prefix=/users/me/opt/anaconda3/envs/env_of_choice
conda_default_env=sfdc
There are a few ways to go. Maybe I didn't set up the environment correctly and its using the base python instead of making a python in the virtual environment. I used a yml file. It's also really tempting just to set these environment variables in the DAG, but maybe that's not the accepted way? I couldn't find a tutorial. What's the right path? Or maybe my version, 4.10.1 is too advanced and I should downgrade to 3.9. Too many options. Advice?

The way I ended up doing this was to use the conda run command (inspired from this answer). conda run allows you to trigger a conda environment programmatically without needing to activate it - and this works within airflow.

Related

How to import geopandas in Pycharm if installed using conda? [duplicate]

Conda env is activated using source activate env_name.
How can I activate the environment in pycharm ?

open
pycharm/preferences/project/Project Interpreter
And check existing interpreter. Conda environments may already be listed there.
If not exists, you can create a new conda environment with "Create Conda Env" button
If you are looking for a specific conda environment you can use 'add local'. When you click 'add local' you will input conda environment path + /bin/python
You can list all conda environment in your system with following commnad.
>>conda info --env
# conda environments:
#
tensorflow * /Users/username/miniconda3/envs/tensorflow
you can chose the approach best fits your needs.

The best PyCharm specific answer is this one by wasabi (below).
In general though, if you want to use an interpreter from within a Conda environment then you can change the location of the interpreter to point to the particular environment that you want to use e.g. /home/username/miniconda/envs/bunnies as mentioned in this comment.
However, as mentioned in this answer by Mark Turner, it is possible to have a shell script executed when activating an environment. This method will not run that shell script, but you can follow his workaround if you need that shell script run:
open a conda prompt
activate the environment
run pycharm from the conda prompt

How about environment.yml
Pycharm can create a new conda environment indeed. Unfortunately, until this issue is fixed, it won't offer environment.yml support, which means it won't install the dependencies declared there.
When working on a project based on such a file, you need to create / update the dedicated env manually on your machine:
conda env create -n <my-project>
Then remember to update each time environment.yml changes (from you or upstream).
conda env update -n <my-project>
Not ideal

As mentioned in one of the comments above, activating an environment can run scripts that perform other actions such as setting environment variables. I have worked in one environment that did this. What worked in this scenario was to:
open a conda prompt
activate the environment
run pycharm from the conda prompt
Pycharm then had access to the environment variables that were set by activating the environment.

I had the same problem i am on windows 10 professional 64 bit
my solution was to start Pycharm as adminstrator and it worked

Go to settings at the top right corner of the PyCharm IDE.
Go to Project:{Your Project Name}->Python Interpreter
Go to the settings inside here and click add:
In Add Python Interpreter select conda env
Select existing environment and click on your required conda environment path from the dropdown menu OR add the path of the python.exe file in your conda environment. As a reference, I am adding the path for my windows10 system: C:\Users\maria\AppData\Local\Continuum\anaconda3\envs<mycondaenv>\python.exe It can vary for your system based on installation configs.

It seems important to me to know, that setting project interpreter as described in wasabi's comment does not actually activate the conda environment.
I had issue with running xgboost (that I installed with conda) inside PyCharm and it turned out that it also need some folders added to PATH. In the end I had to make do with an ugly workaround:
Find out what are the additional folders in PATH for given environment (with echo %PATH% in cmd)
In the file I wish to run put to the top before anything else:
import os
os.environ["PATH"] += os.pathsep + os.pathsep.join(my_extra_folders_list)
I know this is not at all proper solution, but i was unable to find any other beside what Mark Turner mentioned in his comment.

To use Conda environment as PyCharm interpreter
activate Conda environment from Conda navigator
open PyCharm from the navigator tool list
in Conda Add interpreter section choose existing Conda environment and it automatically recognises the path of that environment's python.exe file

First , select Interpreter setting ... in right bottom of Pycharm.
Then choose python.exe from your desired conda environment.
My environment path is : C:\Users\javadsh\anaconda3\envs\tf-gpu\python.exe

Go to Pycharm -> Preferences -> Project Interpreter. At the top left of the packages table there is a plus sign, minus sign, a green circle and an eye; uncheck the green sign; that will let you have access to the packages while using conda environment.

New conda environment is created without python

The conda documentation says that when you use
conda create --name myenv
The new environment
uses the same version of Python that you are currently using because
you did not specify a version.
However, that's not the case for me. I have Windows 10 and Anaconda. I am into the "base" environment created by default.
If I run
conda create --name testenv
Then when I activate the environment
conda activate testenv
There is no Python. If I write
python
to the console the Microsoft Store is opened.
To have a Python interpreter I need to manually specify it
conda create --name testenv2 python=3.8

That specific note in the Conda documentation was a hold-over from before Conda v4.4 and has since been corrected (see here and here).
Background
Previous to Conda v4.4, the base environment's bin/ directory was always on the PATH, hence why not installing a Python interpreter in a new environment it would fall back to the base Python. Conda v4.4 introduced a new strategy for managing environment isolation via defining the primary interface to Conda as a set of shell functions and allowing the base bin directory only to be included on PATH when the base environment was active. This strategy provides cleaner isolation of environments, which means that only what is in the active environment will be available.
Hence, if you want Python in the environment, it must be explicitly installed.

I have added conda and python path to the environment variable, but jupyter notebook is still not getting opened from cmd

I was trying to add conda and python to the environment variable using SETX Command from CMD but it was failing. I tried setting it using PowerShell and it worked. The path was added successfully but I still can't open Jupyter Notebook from my cmd.

Adding Python to the environment path is bad practice, see Anaconda FAQ. If you haven't installed Anaconda with it's default settings, you first need to:
Initialize your shells
conda init --all
After this you should have ../Anaconda3/condabin only in your path (more information via conda init --help).
But before you can run Jupyter, you also need to activate Anaconda:
C:\> conda activate
(base) C:\> jupyter notebook
The activation will add the following folders of the conda base environment to your PATH:
\Anaconda3;
\Anaconda3\Library\mingw-w64\bin;
\Anaconda3\Library\usr\bin;
\Anaconda3\Library\bin;
\Anaconda3\Scripts;
\Anaconda3\bin;
The python.exe resides in Anaconda3, jupyter.exe in Anaconda3\Scripts, so it's not enough to just add the first folder to your Path. And it's especially important to have the libraries on your Path when you want to run C-based packages like numpy.
But the very point behind the conda activate mechanism is that it allows you to configure and run different environments with different versions of python and 3rd party packages that would otherwise conflict, see Managing environmnts.
On top of that you can even install Python from python.org next to your Anaconda distribution, since conda will make sure that they won't interfere.

specify commands to run after conda create from yml file

I have a environment.yml file which I used to create a Python environment using:
conda env create --file environment.yml.
After the environment is created, I need to perform some operations (such as registering a kernel with jupiter-lab):
ipython kernel install --name=to_the_edge
I would like to embed one or more shell commands to run "post install" so that the setup is self-contained within the .yml file. Is there a way to do this? Or is there a different way within conda to get close to what I'm after?
I would also like a way to specify shell commands to be run after conda activate, but that's a secondary hope.
Maybe this isn't possible because conda works cross platform?

This isn't really possible with standard Conda commands, but there are some options to obtain such functionality.
Jupyter and Conda
The best practice for Jupyter and Conda is to have a single env that has jupyter installed and also has nb_conda_kernels. You always launch jupyter notebook from this env. The nb_conda_kernels package enables Jupyter to automatically detect any other envs that have ipykernel (or other language equivalents, e.g., r-irkernel). Hence, you don't need any additional registration, but simply need to include ipykernel in the YAML. See the docs for nb_conda_kernels.
Running scripts at install
This cannot be done from a YAML. However, you could build your own custom package that does this at install time and then include that in your YAML. You would have to provide the .sh, .bat, etc. to run the commands. See the documentation on adding pre-link, post-link, and unlinked scripts to a package recipe.
Through this route, you can also add activate and deactivate scripts that are run when the env is activated and deactivated, respectively. You can also add such scripts manually, i.e., without a custom package. For example, the docs show how to define environment variables at activation, but you can run arbitrary scripts.

What is the use of non-separated anaconda environments?

I noticed that when a conda environment is created without specifying the python version:
conda create --name snowflakes
instead of:
conda create --name snowflakes python=3.6
the environments are not separated and share the package with the default python interpreter.
Thereupon, What is the use of non-separated anaconda environments?
EDIT - 20170824:
The question has been solved. Actually non-separated environments do not exist. With the first command there is no new Python interpreter installed so that it calls the first that it finds in the PATH being the standard Python interpreter because there is no other.

I think you are misunderstanding the word "separate" in the docs. In the docs, they mean "separate" in the sense of "create a new environment, with a new name to try some new things". They do not mean that you are creating a different kind of conda environment. There is only one kind of environment in conda, what you are calling the "separated" environment. All packages in all environments are always unique. It so happens that the first command creates an empty environment with no packages. Therefore, when the new environment is activated, the PATH environment variable looks like: ~/miniconda3/envs/snowflakes/bin:~/miniconda3/bin:... Now, since there is no Python installed into ~/miniconda3/envs/snowflakes/bin (because the snowflakes environment is empty), the shell still finds Python in ~/miniconda3/bin as first on the path. The snowflakes environment does not share with the root environment. For instance, if, after creating, you type conda install -n snowflakes python it will install a new version of Python that won't find any packages! Therefore, there is only one kind of environment in conda, what you are calling the "separated" environment.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio