When I program, I often experiment with third party packages.
Of course, they are installed via pip.
When installed, packages install their dependencies.
At the end I want to have a clean requirements.txt which reflect what packages are really necessary for the project (pip freeze > requirements.txt).
If I manage to find a better solution to a package, I uninstall it via pip.
The problem is that when uninstalled, a package doesn't uninstall its dependencies.
If I'm not mistaken, this it doesn't. Well, and when I make pip freeze, I can't recognize what is going on here.
Therefore I decided to document which package installed which other packages.
When uninstalled, I uninstall them manually.
But this is really troublesome and error prone:
pip freeze > requirements.txt
asgiref==3.2.3
Django==3.0.2
pkg-resources==0.0.0
pytz==2019.3
sqlparse==0.3.0
Edit requirements.txt:
asgiref==3.2.3 # Django dependency
Django==3.0.2 # Django
pkg-resources==0.0.0 # Default package
pytz==2019.3 # Django dependency
sqlparse==0.3.0 # Django dependency
When I install a new package, I make pip freeze > requirements1.txt. Then compare the files, reveal the newly installed package's dependencies, mark them, then copy-paste old comments.
Well, finally I can just make it a complete mess. I make pip freeze and understand that I just don't know which package depends on which as I just forget to make my comments or something.
Again this is my goal: when I finish my project in a year, I'd like to know exactly that:
Every installed package is absolutely necessary for the project to run.
No unnecessary packages is installed.
Could you tell me what is the best practice to do that.
It seems pip-tools is the exact tool you need to solve the problems you encountered in your development workflow.
pip-tools enables you to specify your high-level dependencies (e.g. Django), while taking care automatically of your low-level dependencies (e.g. pytz). It also enables you to sync your environment to a requirements.txt file, regardless of the messy state you might have.
Installing pip-tools in your project
To use pip-tools, first create a new empty Python virtual environment in your project and activate it, using the tool of your choice:
with venv (built-in): python3 -m venv .venv && source .venv/bin/activate
with pyenv: pyenv virtualenv 3.8.1 project-env && pyenv local project-env
Then, install pip-tools in the Python virtual environment of your project:
pip install pip-tools
Managing dependencies with pip-compile using a requirements.in file
pip-tools includes a pip-compile tool that takes a requirements.in file as input. This requirements.in file is similar to requirements.txt, but it contains only high-level dependencies. For example, if your project is using the latest version of Django, you could write something like this in requirements.in:
django>=3.0,<3.1
You can see that requirements.in does not contain the dependencies of Django, only Django itself.
Once you have set your dependencies in requirements.in, use pip-compile to "compile" your requirements.in into a requirements.txt file:
➜ pip-compile requirements.in
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile requirements.in
#
asgiref==3.2.3 # via django
django==3.0.2
pytz==2019.3 # via django
sqlparse==0.3.0 # via django
The requirements.txt file generated by pip-compile indicates the source of every indirect dependency next to the package name. For example, pytz==2019.3 is a dependency of django. In addition to that, it also pins each dependency to a precise version number, to make sure that the installation of your dependencies is reproducible
Applying dependencies from generated requirements.txt with pip-sync
Now that you have a requirements.txt, you can apply it on your Python virtual environment using pip-sync:
➜ pip-sync requirements.txt
Collecting asgiref==3.2.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 1))
Using cached https://files.pythonhosted.org/packages/a5/cb/5a235b605a9753ebcb2730c75e610fb51c8cab3f01230080a8229fa36adb/asgiref-3.2.3-py2.py3-none-any.whl
Collecting django==3.0.2 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 2))
Using cached https://files.pythonhosted.org/packages/55/d1/8ade70e65fa157e1903fe4078305ca53b6819ab212d9fbbe5755afc8ea2e/Django-3.0.2-py3-none-any.whl
Collecting pytz==2019.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 3))
Using cached https://files.pythonhosted.org/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whl
Collecting sqlparse==0.3.0 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 4))
Using cached https://files.pythonhosted.org/packages/ef/53/900f7d2a54557c6a37886585a91336520e5539e3ae2423ff1102daf4f3a7/sqlparse-0.3.0-py2.py3-none-any.whl
Installing collected packages: asgiref, pytz, sqlparse, django
Successfully installed asgiref-3.2.3 django-3.0.2 pytz-2019.3 sqlparse-0.3.0
pip-sync will make sure your virtual environment corresponds exactly to what is defined in the requirements.txt file by uninstalling and installing the relevant packages.
After your environment has been synced, make sure your project code works properly. If it does, commit requirements.in and requirements.txt into version control.
Reference
For more details, you can refer to the docs of pip-tools, or to the article of Hynek Schlawack: Python Application Dependency Management in 2018.
Dependency management is a bit rougher in python land than others, there a tools like poetry and pipenv but for just poking a round I would try the following.
create a virtual environment in your project directory
$ python3 -m venv .venv
Activate your virtual environment
$ source .venv/bin/activate
Install whatever packages you're playing with
$ echo "package1" >> requirements-dev.txt
$ echo "package2" >> requirements-dev.txt
$ pip install --upgrade -r requirements-dev.txt
When you think you have everything you need deactivate your virtual env, create a new one, make sure everything still works, then create your new requirements.txt
$ deactivate
$ rm -rf .venv
$ python3 -m venv .venv
$ pip install --upgrade -r requirements-dev.txt
# test your code still works
# create a new requirements file to be used for "prod"
$ pip freeze > requirements.txt
I wouldn't call this best practice, but its a start and should get you by until to decide which side you want to join when it comes to dep management tools :)
Related
I recently developed a package my_package and am hosting it on GitHub. For easy installation and use, I have following setup.py:
from setuptools import setup
setup(name='my_package',
version='1.0',
description='My super cool package',
url='https://github.com/my_name/my_package',
packages=['my_package'],
python_requieres='3.9',
install_requires=[
'some_package==1.0.0'
])
Now I am trying to install this package in a conda environment:
conda create --name myenv python=3.9
conda activate myenv
pip install git+'https://github.com/my_name/my_package'
So far so good. If I try to use it in the project folder, everything works perfectly. If I try to use the packet outside the project folder (still inside the conda environment), I get the following error:
ModuleNotFoundError: No module named 'my_package'
I am working on windows, if that matters.
EDIT:
I'm verifying that both python and pip are pointing towards the correct version with:
which pip
which python
/c/Anaconda3/envs/my_env/python
/c/Anaconda3/envs/my_env/Scripts/pip
Also, when I run:
pip show my_package
I get a description of my package. So pip finds it, but as soon as I try to import my_package in the script, I get the described error.
I also verified that the package is installed in my environment. So in /c/Anaconda3/envs/my_env/lib/site-packages there is a folder my_package-1.0.dist-info/
Further: python "import sys, print(sys.path)"
shows, among other paths, /c/Anaconda3/envs/my_env/lib/site-packages. So it is in the path.
Check if you are using some explicit shebang in your script pointing to other Python interpreters.
Eg. using the system default Python:
#!/bin/env python
...
While inside your environment myenv, try to uninstall your package first, to do a clean test:
pip uninstall my_package
Also, you have a typo in your setup.py: python_requieres --> python_requires.
And I actually tried to install with your setup.py, and also got ModuleNotFoundError - but because it didn't properly install due to install_requires:
ERROR: Could not find a version that satisfies the requirement some_package==1.0.0
So, check also that everything installs without errors and warnings.
Hope that helps.
First thing I would like to point out (not the solution) regards the following statement you made:
If I try to use it in the project folder [...] If I try to use the packet outside the project folder [...]
I understand "project folder" means the "my_package" folder (inside the git repository). If that is the case, I would like to point out that you are mixing two situations: that of testing a (remote) package installation, while in your (local) repository. Which is not necessarily wrong, but error-prone.
Whenever testing the setup/install process of a package, make sure to move far from your repository (say, "/tmp/" equivalent in Windows) and, preferably, use a fresh environment. That will eliminate "noise" in your tests.
First thing I would tell you to do -- if not already -- is to create a fresh conda env and install your package from an empty/new folder. Eg,
$ conda env create -n test_my_package ipython pip
$ cd /tmp # equivalent temporary or new in your Windows
$ pip install git+https://github.com/my_name/my_package
If that doesn't work (maybe a problem with your pip' git+http code), do another way: create a release for your package (eg, "v1") and then install the released version by indicating the zip package URL (that you get from your "my_package" releases page on Github):
$ pip install https://github.com/my_name/my_package/archive/v1.zip
I am trying to generate requirements.txt for someone to replicate my environment. As you may know, the standard way is
pip freeze > requirements.txt
I noticed that this will list all the packages, including the dependencies of installed packages, which makes this list unnecessary huge. I then browsed around and came across pip-chill that allows us to only list installed packages in requirements.txt.
Now, from my understanding when someone tries to replicate the environment with pip install -r requirements.txt, this will automatically install the dependencies of installed packages.
If this is true, this means it is safe to use pip-chill instead of pip to generate the requirements.txt. My question is, is there any other risk of omitting dependencies of installed packages using pip-chill that I am missing here?
I believe using pip-compile from pip-tools is a good practice when constructing your requirements.txt. This will make sure that builds are predictable and deterministic.
The pip-compile command lets you compile a requirements.txt file from your dependencies, specified in either setup.py or requirements.in
Here's my recommended steps in constructing your requirements.txt (if using requirements.in):
Create a virtual env and install pip-tools there
$ source /path/to/venv/bin/activate
(venv)$ python -m pip install pip-tools
Specify your application/project's direct dependencies your requirements.in file:
# requirements.in
requests
boto3==1.16.51
Use pip-compile to generate requirements.txt
$ pip-compile --output-file=- > requirements.txt
your requirements.txt files will have:
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile --output-file=-
#
boto3==1.16.51
# via -r requirements.in
botocore==1.19.51
# via
# boto3
# s3transfer
certifi==2020.12.5
# via requests
chardet==4.0.0
# via requests
idna==2.10
# via requests
jmespath==0.10.0
# via
# boto3
# botocore
python-dateutil==2.8.1
# via botocore
requests==2.25.1
# via -r requirements.in
s3transfer==0.3.3
# via boto3
six==1.15.0
# via python-dateutil
urllib3==1.26.2
# via
# botocore
# requests
Your application should always work with the dependencies installed from this generated requirements.txt. If you have to update a dependency you just need to update the requirements.in file and redo pip-compile. I believe this is a much better approach than doing pip freeze > requirements.txt which I see some people do.
I guess the main advantage of using this is you can keep track of the actual direct dependencies of your project in a separate requirement.in file
I find this very similar to how node modules/dependencies are being managed in a node app project with the package.json (requirements.in) and package-lock.json (requirements.txt).
From my point of view requirements.txt files should list all dependencies, direct dependencies as well as their dependencies (indirect, transient). If for some reason, only direct dependencies are wanted there are tools that can help with that, from a cursory look, pip-chill seems inadequate since it doesn't actually look at the code to figure out what packages are directly imported. Maybe better look at projects such as pipreqs, pigar, they seem to be more accurate in figuring out what the actual direct dependencies are (based on the imports in your code).
But at the end of the day you should curate such lists by hand. When writing the code you choose carefully which packages you want to import, with the same care you should curate a list of the projects (and their versions) containing those packages. Tools can help, but the developer knows better.
I'm having issues with pip failing to install editable packages from a local directory. I was able to install the packages manually using commands like pip install -e pkg1. I wanted to use a requirements.txt file to automate future installs, because my coworkers will be working on the same packages. My ideal development workflow is for each developer to checkout the source from version control and run pip install -r requirements.txt. The requirements file would designate all the packages as editable so we can import our code without the need for .pth files but we wouldn't have to keep updating our environments. And by using namespace packages, we can decouple the import semantics from the file structures.
But it's not working out.
I have a directory with packages like so:
index/
pkg1/
src/
pkg1/
__init__.py
pkg1.py
setup.py
pkg2/
src/
...etc.
Each setup.py file contains something like:
from setuptools import setup, find_packages
setup(
name="pkg1",
version="0.1",
packages=find_packages('src'),
package_dir={'':'src'},
)
I generated my requirements.txt file using pip freeze, which yielded something like this:
# Editable install with no version control (pkg1==0.1)
-e c:\source\pkg1
# Editable install with no version control (pkg2==0.1)
-e c:\source\pkg2
...etc...
I was surprised when pip choked on the requirements file that it created for itself:
(venv) C:\Source>pip install -r requirements.txt
c:sourcepkg1 should either be a path to a local project or a VCS url beginning with svn+, git+, hg+, or bzr+
Also, some of our packages rely on other of our packages and pip has been absolutely useless at identifying these dependencies. I have resorted to manually installing packages in dependency order.
Maybe I'm pushing pip to its limits here. The documentation and help online has not been helpful, so far. Most sources discuss editable installation, installation from requirements files, package dependencies, or namespace packages, but never all these concepts at once. Usually when the online help is scarce, it means that I'm trying to use a tool for something it wasn't intended to do or I've discovered a bug.
Is this development process viable? Do I need to make a private package index or something?
I'm working on a support library for a large Python project which heavily uses relative imports by appending various project directories to sys.path.
Using The Hitchhiker's Guide to Packaging as a template I attempted to create a package structure which will allow me to do a local install, but can easily be changed to a global install later if desired.
One of the dependencies of my package is the pyasn1 package for the encoding and decoding of ASN.1 annotated objects. I have to include the pyasn1 library separately as the version supported by the CentOS 6.3 default repositories is one major version back and has known bugs that will break my custom package.
The top-level of the library structure is as follows:
MyLibrary/
setup.py
setup.cfg
LICENSE.txt
README.txt
MyCustomPackage/
pyasn1-0.1.6/
In my setup configuration file I define the install directory for my library to be a local directory called .lib. This is desirable as it allows me to do absolute imports by running the command import site; site.addsitedir("MyLibrary/.lib") in the project's main application without requiring our engineers to pass command line arguments to the setup script.
setup.cfg
[install]
install-lib=.lib
setup.py
setup(
name='MyLibrary',
version='0.1a',
package_dir = {'pyasn1': 'pyasn1-0.1.6/pyasn1'},
packages=[
'MyCustomPackage',
'pyasn1',
'pyasn1.codec',
'pyasn1.compat','
pyasn1.codec.ber',
'pyasn1.codec.cer',
'pyasn1.codec.der',
'pyasn1.type'
],
license='',
long_description=open('README.txt').read(),
data_files = []
)
The problem I've run into with doing the installation this way is that when my package tries to import pyasn1 it imports the global version and ignores the locally installed version.
As a possible workaround I have tried installing the pyasn1 package under a different name than the global package (eg pyasn1_0_1_6) by doing package_dir = {'pyasn1_0_1_6':'pyasn1-0.1.6/pyasn1'}. However, this fails since the imports used internally to the pyasn1 package do not use the pyasn1_0_1_6 name.
Is there some way to either a) force Python to import a locally installed package over a globally installed one or b) force a package to install under a different name?
Use virtualenv to ensure that your application runs in a fully known configuration which is independent from the OS version of libraries.
EDIT: a quick (unix) solution is setting the PYTHONPATH environment variable, which works just like PATH for Python modules (module loaded from first path in which is found, so simply append you directory at the beginning of the PYTHONPATH). Anwyay, I strongly recommend you to proceed with virtualenv, since it was specifically engineered for handling situations like the one you are facing.
Rationale
The process is easily automatable if you write a setuptools script specifying dependencies with install_requires. For a complete example, refer to this one I wrote
Setup
Note that you can easily insert the steps below in a setup.sh shell script.
First create a virtualenv and enter it:
$ virtualenv $name
$ cd $name
Activate it:
$ source bin/activate
Now cd to your project directory and run the installer script:
$ cd $my_project_dir
$ python ./setup.py --prefix $path_to_virtualenv
Note the --prefix $path_to_virtualenv, which is used to tell the script to install in the virtualenv instead of system-wide. Call this after activating the virtualenv. Note that all the depencies are automatically downloaded and installed in the virtualenv.
Then you are done. When you want to leave the virtualenv, issue:
$ deactivate
On subsequent calls, you will only need to activate the virtualenv (step 2), maybe using a runawesomeproject.sh if you really want.
As noted on the virtualenv website, you should use virtualenv >= 1.9, as the previous versions did not download dependencies via HTTPS. If you consider plain HTTP to be sufficient, then any version should do.
You might also try relocatable virtualenvs: setup it and copy the folder to your host. Anyway, note that this feature is still experimental.
I have a flask app where I'm trying to automate deployment to EC2.
Not a big deal, but is there a setting in either Fabric or Distribute that reads the requirements.txt file directly for the setup.py, so I don't have to spell everything out in the setup(install_requires=[]) list, rather than writing a file reader for my requirements.txt? If not, do people have recommendations or suggestions on auto-deployment and with pip?
I'm reviewing from here and here.
Not a big deal, but is there a setting in either Fabric or Distribute
that reads the requirements.txt file directly for the setup.py, so I
don't have to spell everything out in the setup(install_requires=[])
list, rather than writing a file reader for my requirements.txt?
You might still want to checkout frb's answer to the duplicate question How can I reference requirements.txt for the install_requires kwarg in setuptools.setup?, which provides a straight forward two line solution for writing a file reader.
If you really want to avoid this, you could alternatively add the common pip install -r requirements.txtto your fabfile.py, e.g.:
# ...
# create a place where we can unzip the tarball, then enter
# that directory and unzip it
run('mkdir /tmp/yourapplication')
with cd('/tmp/yourapplication'):
run('tar xzf /tmp/yourapplication.tar.gz')
# now install the requirements with our virtual environment's
# pip installer
run('/var/www/yourapplication/env/scripts/pip install -r requirements.txt')
# now setup the package with our virtual environment's
# python interpreter
run('/var/www/yourapplication/env/bin/python setup.py install')