pip's requirements.txt best practice - pip

I am trying to generate requirements.txt for someone to replicate my environment. As you may know, the standard way is
pip freeze > requirements.txt
I noticed that this will list all the packages, including the dependencies of installed packages, which makes this list unnecessary huge. I then browsed around and came across pip-chill that allows us to only list installed packages in requirements.txt.
Now, from my understanding when someone tries to replicate the environment with pip install -r requirements.txt, this will automatically install the dependencies of installed packages.
If this is true, this means it is safe to use pip-chill instead of pip to generate the requirements.txt. My question is, is there any other risk of omitting dependencies of installed packages using pip-chill that I am missing here?

I believe using pip-compile from pip-tools is a good practice when constructing your requirements.txt. This will make sure that builds are predictable and deterministic.
The pip-compile command lets you compile a requirements.txt file from your dependencies, specified in either setup.py or requirements.in
Here's my recommended steps in constructing your requirements.txt (if using requirements.in):
Create a virtual env and install pip-tools there
$ source /path/to/venv/bin/activate
(venv)$ python -m pip install pip-tools
Specify your application/project's direct dependencies your requirements.in file:
# requirements.in
requests
boto3==1.16.51
Use pip-compile to generate requirements.txt
$ pip-compile --output-file=- > requirements.txt
your requirements.txt files will have:
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile --output-file=-
#
boto3==1.16.51
# via -r requirements.in
botocore==1.19.51
# via
# boto3
# s3transfer
certifi==2020.12.5
# via requests
chardet==4.0.0
# via requests
idna==2.10
# via requests
jmespath==0.10.0
# via
# boto3
# botocore
python-dateutil==2.8.1
# via botocore
requests==2.25.1
# via -r requirements.in
s3transfer==0.3.3
# via boto3
six==1.15.0
# via python-dateutil
urllib3==1.26.2
# via
# botocore
# requests
Your application should always work with the dependencies installed from this generated requirements.txt. If you have to update a dependency you just need to update the requirements.in file and redo pip-compile. I believe this is a much better approach than doing pip freeze > requirements.txt which I see some people do.
I guess the main advantage of using this is you can keep track of the actual direct dependencies of your project in a separate requirement.in file
I find this very similar to how node modules/dependencies are being managed in a node app project with the package.json (requirements.in) and package-lock.json (requirements.txt).

From my point of view requirements.txt files should list all dependencies, direct dependencies as well as their dependencies (indirect, transient). If for some reason, only direct dependencies are wanted there are tools that can help with that, from a cursory look, pip-chill seems inadequate since it doesn't actually look at the code to figure out what packages are directly imported. Maybe better look at projects such as pipreqs, pigar, they seem to be more accurate in figuring out what the actual direct dependencies are (based on the imports in your code).
But at the end of the day you should curate such lists by hand. When writing the code you choose carefully which packages you want to import, with the same care you should curate a list of the projects (and their versions) containing those packages. Tools can help, but the developer knows better.

Related

Mark a pip dependency as explicit installed

I want to differentiate between packages that I have explicitly installed, and packages pulled in as dependency. You can do that by using the --not-required option:
pip3 list --not-required --format freeze
However if I have a package that requires for example the requests package, then it will be automatically pulled in, if installed via requirements.txt. Installing requests via pip install requests will not put it in the list of --not-required packages. Not even adding it to the requirements.txt file would help setting this packages as required.
It seems that pip will always exclude those sub dependencies and only print those packages that are not dependent by another package. Is that true? How could I work around that without adding additional dependencies for package management. It seems that there is no such clever builtin option, right?

How to document requirements

When I program, I often experiment with third party packages.
Of course, they are installed via pip.
When installed, packages install their dependencies.
At the end I want to have a clean requirements.txt which reflect what packages are really necessary for the project (pip freeze > requirements.txt).
If I manage to find a better solution to a package, I uninstall it via pip.
The problem is that when uninstalled, a package doesn't uninstall its dependencies.
If I'm not mistaken, this it doesn't. Well, and when I make pip freeze, I can't recognize what is going on here.
Therefore I decided to document which package installed which other packages.
When uninstalled, I uninstall them manually.
But this is really troublesome and error prone:
pip freeze > requirements.txt
asgiref==3.2.3
Django==3.0.2
pkg-resources==0.0.0
pytz==2019.3
sqlparse==0.3.0
Edit requirements.txt:
asgiref==3.2.3 # Django dependency
Django==3.0.2 # Django
pkg-resources==0.0.0 # Default package
pytz==2019.3 # Django dependency
sqlparse==0.3.0 # Django dependency
When I install a new package, I make pip freeze > requirements1.txt. Then compare the files, reveal the newly installed package's dependencies, mark them, then copy-paste old comments.
Well, finally I can just make it a complete mess. I make pip freeze and understand that I just don't know which package depends on which as I just forget to make my comments or something.
Again this is my goal: when I finish my project in a year, I'd like to know exactly that:
Every installed package is absolutely necessary for the project to run.
No unnecessary packages is installed.
Could you tell me what is the best practice to do that.
It seems pip-tools is the exact tool you need to solve the problems you encountered in your development workflow.
pip-tools enables you to specify your high-level dependencies (e.g. Django), while taking care automatically of your low-level dependencies (e.g. pytz). It also enables you to sync your environment to a requirements.txt file, regardless of the messy state you might have.
Installing pip-tools in your project
To use pip-tools, first create a new empty Python virtual environment in your project and activate it, using the tool of your choice:
with venv (built-in): python3 -m venv .venv && source .venv/bin/activate
with pyenv: pyenv virtualenv 3.8.1 project-env && pyenv local project-env
Then, install pip-tools in the Python virtual environment of your project:
pip install pip-tools
Managing dependencies with pip-compile using a requirements.in file
pip-tools includes a pip-compile tool that takes a requirements.in file as input. This requirements.in file is similar to requirements.txt, but it contains only high-level dependencies. For example, if your project is using the latest version of Django, you could write something like this in requirements.in:
django>=3.0,<3.1
You can see that requirements.in does not contain the dependencies of Django, only Django itself.
Once you have set your dependencies in requirements.in, use pip-compile to "compile" your requirements.in into a requirements.txt file:
➜ pip-compile requirements.in
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile requirements.in
#
asgiref==3.2.3 # via django
django==3.0.2
pytz==2019.3 # via django
sqlparse==0.3.0 # via django
The requirements.txt file generated by pip-compile indicates the source of every indirect dependency next to the package name. For example, pytz==2019.3 is a dependency of django. In addition to that, it also pins each dependency to a precise version number, to make sure that the installation of your dependencies is reproducible
Applying dependencies from generated requirements.txt with pip-sync
Now that you have a requirements.txt, you can apply it on your Python virtual environment using pip-sync:
➜ pip-sync requirements.txt
Collecting asgiref==3.2.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 1))
Using cached https://files.pythonhosted.org/packages/a5/cb/5a235b605a9753ebcb2730c75e610fb51c8cab3f01230080a8229fa36adb/asgiref-3.2.3-py2.py3-none-any.whl
Collecting django==3.0.2 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 2))
Using cached https://files.pythonhosted.org/packages/55/d1/8ade70e65fa157e1903fe4078305ca53b6819ab212d9fbbe5755afc8ea2e/Django-3.0.2-py3-none-any.whl
Collecting pytz==2019.3 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 3))
Using cached https://files.pythonhosted.org/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whl
Collecting sqlparse==0.3.0 (from -r /var/folders/r1/n_n031s51wz2gjwy7mb9k4rh0000gn/T/tmpvhv549si (line 4))
Using cached https://files.pythonhosted.org/packages/ef/53/900f7d2a54557c6a37886585a91336520e5539e3ae2423ff1102daf4f3a7/sqlparse-0.3.0-py2.py3-none-any.whl
Installing collected packages: asgiref, pytz, sqlparse, django
Successfully installed asgiref-3.2.3 django-3.0.2 pytz-2019.3 sqlparse-0.3.0
pip-sync will make sure your virtual environment corresponds exactly to what is defined in the requirements.txt file by uninstalling and installing the relevant packages.
After your environment has been synced, make sure your project code works properly. If it does, commit requirements.in and requirements.txt into version control.
Reference
For more details, you can refer to the docs of pip-tools, or to the article of Hynek Schlawack: Python Application Dependency Management in 2018.
Dependency management is a bit rougher in python land than others, there a tools like poetry and pipenv but for just poking a round I would try the following.
create a virtual environment in your project directory
$ python3 -m venv .venv
Activate your virtual environment
$ source .venv/bin/activate
Install whatever packages you're playing with
$ echo "package1" >> requirements-dev.txt
$ echo "package2" >> requirements-dev.txt
$ pip install --upgrade -r requirements-dev.txt
When you think you have everything you need deactivate your virtual env, create a new one, make sure everything still works, then create your new requirements.txt
$ deactivate
$ rm -rf .venv
$ python3 -m venv .venv
$ pip install --upgrade -r requirements-dev.txt
# test your code still works
# create a new requirements file to be used for "prod"
$ pip freeze > requirements.txt
I wouldn't call this best practice, but its a start and should get you by until to decide which side you want to join when it comes to dep management tools :)

How to pip install interdependent packages from a local directory in editable mode using a requirements file

I'm having issues with pip failing to install editable packages from a local directory. I was able to install the packages manually using commands like pip install -e pkg1. I wanted to use a requirements.txt file to automate future installs, because my coworkers will be working on the same packages. My ideal development workflow is for each developer to checkout the source from version control and run pip install -r requirements.txt. The requirements file would designate all the packages as editable so we can import our code without the need for .pth files but we wouldn't have to keep updating our environments. And by using namespace packages, we can decouple the import semantics from the file structures.
But it's not working out.
I have a directory with packages like so:
index/
pkg1/
src/
pkg1/
__init__.py
pkg1.py
setup.py
pkg2/
src/
...etc.
Each setup.py file contains something like:
from setuptools import setup, find_packages
setup(
name="pkg1",
version="0.1",
packages=find_packages('src'),
package_dir={'':'src'},
)
I generated my requirements.txt file using pip freeze, which yielded something like this:
# Editable install with no version control (pkg1==0.1)
-e c:\source\pkg1
# Editable install with no version control (pkg2==0.1)
-e c:\source\pkg2
...etc...
I was surprised when pip choked on the requirements file that it created for itself:
(venv) C:\Source>pip install -r requirements.txt
c:sourcepkg1 should either be a path to a local project or a VCS url beginning with svn+, git+, hg+, or bzr+
Also, some of our packages rely on other of our packages and pip has been absolutely useless at identifying these dependencies. I have resorted to manually installing packages in dependency order.
Maybe I'm pushing pip to its limits here. The documentation and help online has not been helpful, so far. Most sources discuss editable installation, installation from requirements files, package dependencies, or namespace packages, but never all these concepts at once. Usually when the online help is scarce, it means that I'm trying to use a tool for something it wasn't intended to do or I've discovered a bug.
Is this development process viable? Do I need to make a private package index or something?

import local package over global package

I'm working on a support library for a large Python project which heavily uses relative imports by appending various project directories to sys.path.
Using The Hitchhiker's Guide to Packaging as a template I attempted to create a package structure which will allow me to do a local install, but can easily be changed to a global install later if desired.
One of the dependencies of my package is the pyasn1 package for the encoding and decoding of ASN.1 annotated objects. I have to include the pyasn1 library separately as the version supported by the CentOS 6.3 default repositories is one major version back and has known bugs that will break my custom package.
The top-level of the library structure is as follows:
MyLibrary/
setup.py
setup.cfg
LICENSE.txt
README.txt
MyCustomPackage/
pyasn1-0.1.6/
In my setup configuration file I define the install directory for my library to be a local directory called .lib. This is desirable as it allows me to do absolute imports by running the command import site; site.addsitedir("MyLibrary/.lib") in the project's main application without requiring our engineers to pass command line arguments to the setup script.
setup.cfg
[install]
install-lib=.lib
setup.py
setup(
name='MyLibrary',
version='0.1a',
package_dir = {'pyasn1': 'pyasn1-0.1.6/pyasn1'},
packages=[
'MyCustomPackage',
'pyasn1',
'pyasn1.codec',
'pyasn1.compat','
pyasn1.codec.ber',
'pyasn1.codec.cer',
'pyasn1.codec.der',
'pyasn1.type'
],
license='',
long_description=open('README.txt').read(),
data_files = []
)
The problem I've run into with doing the installation this way is that when my package tries to import pyasn1 it imports the global version and ignores the locally installed version.
As a possible workaround I have tried installing the pyasn1 package under a different name than the global package (eg pyasn1_0_1_6) by doing package_dir = {'pyasn1_0_1_6':'pyasn1-0.1.6/pyasn1'}. However, this fails since the imports used internally to the pyasn1 package do not use the pyasn1_0_1_6 name.
Is there some way to either a) force Python to import a locally installed package over a globally installed one or b) force a package to install under a different name?
Use virtualenv to ensure that your application runs in a fully known configuration which is independent from the OS version of libraries.
EDIT: a quick (unix) solution is setting the PYTHONPATH environment variable, which works just like PATH for Python modules (module loaded from first path in which is found, so simply append you directory at the beginning of the PYTHONPATH). Anwyay, I strongly recommend you to proceed with virtualenv, since it was specifically engineered for handling situations like the one you are facing.
Rationale
The process is easily automatable if you write a setuptools script specifying dependencies with install_requires. For a complete example, refer to this one I wrote
Setup
Note that you can easily insert the steps below in a setup.sh shell script.
First create a virtualenv and enter it:
$ virtualenv $name
$ cd $name
Activate it:
$ source bin/activate
Now cd to your project directory and run the installer script:
$ cd $my_project_dir
$ python ./setup.py --prefix $path_to_virtualenv
Note the --prefix $path_to_virtualenv, which is used to tell the script to install in the virtualenv instead of system-wide. Call this after activating the virtualenv. Note that all the depencies are automatically downloaded and installed in the virtualenv.
Then you are done. When you want to leave the virtualenv, issue:
$ deactivate
On subsequent calls, you will only need to activate the virtualenv (step 2), maybe using a runawesomeproject.sh if you really want.
As noted on the virtualenv website, you should use virtualenv >= 1.9, as the previous versions did not download dependencies via HTTPS. If you consider plain HTTP to be sufficient, then any version should do.
You might also try relocatable virtualenvs: setup it and copy the folder to your host. Anyway, note that this feature is still experimental.

Automatically read requirements.txt in fabric or deploy

I have a flask app where I'm trying to automate deployment to EC2.
Not a big deal, but is there a setting in either Fabric or Distribute that reads the requirements.txt file directly for the setup.py, so I don't have to spell everything out in the setup(install_requires=[]) list, rather than writing a file reader for my requirements.txt? If not, do people have recommendations or suggestions on auto-deployment and with pip?
I'm reviewing from here and here.
Not a big deal, but is there a setting in either Fabric or Distribute
that reads the requirements.txt file directly for the setup.py, so I
don't have to spell everything out in the setup(install_requires=[])
list, rather than writing a file reader for my requirements.txt?
You might still want to checkout frb's answer to the duplicate question How can I reference requirements.txt for the install_requires kwarg in setuptools.setup?, which provides a straight forward two line solution for writing a file reader.
If you really want to avoid this, you could alternatively add the common pip install -r requirements.txtto your fabfile.py, e.g.:
# ...
# create a place where we can unzip the tarball, then enter
# that directory and unzip it
run('mkdir /tmp/yourapplication')
with cd('/tmp/yourapplication'):
run('tar xzf /tmp/yourapplication.tar.gz')
# now install the requirements with our virtual environment's
# pip installer
run('/var/www/yourapplication/env/scripts/pip install -r requirements.txt')
# now setup the package with our virtual environment's
# python interpreter
run('/var/www/yourapplication/env/bin/python setup.py install')

Resources