When developing a lambda using sam cli, I often need to run sam build which create a new image from my Dockerfile
This process include fetching all the dependency:
RUN python3.8 -m pip install -r requirements.txt -t .
And takes a lot of time, I'm looking for a way to avoid this step, but couldn't find any info about it
I figure this solution, run sam build, and use the generated image as a base for the next run (and remove the pip install cmd)
it is working, but I was wondering if it is a good practice or can it cause problems later?
any other solutions?
Related
I am preparing a streamlit app for object detection that can do both training and inference.
I want to prepare a single requirements.txt file that can work whether when the app is run locally or if the app is deployed to streamlit cloud.
On streamlit cloud obviously, I won’t be doing training because of the need to have a GPU and in this case the user should clone the github repo for running locally.
Now, I come to the dependencies and requirements.txt. If I am running locally, I want to include say opencv-python but if I am running on streamlit cloud, I would want to include opencv-python-headless instead. PS: I have couple of cases like this; e.g pytorch+cuXX for local (GPU enabled) and pytorch for streamlit cloud (CPU only).
My question is how to reflect this in the requirements.txt file. I came across pip environment markers for conditional installation of pip packages, but the available environment markers cannot tell if the app is local or deployed. I wonder if there is a solution to this.
I came across this response which can help a lot, but as far as I understand it, I can make a setup.py and use subprocess.run to pip install the package. As I said, I would be doing previous work in this setup.py to determine the correct version for installation.
Given this (if this is a correct approach), then I call this setup.py from requirements.txt where each line represents something like a pip install <package_name>. I don't know yet how to do such a thing. If the total approach is not suitable for my case, I would appreciate your help.
I am running apache beam pipeline on dataflow with 3 Pypi packages defined in requirements.txt file. When I am running my pipeline with option "--requirements_file=requirements.txt", it submit below command to download Pypi packages.
python -m pip download --dest /tmp/requirements-cache -r requirements.txt --exists-action i --no-binary :all:
This command takes huge time to download the packages. I tried running it manually as well,it runs forever.
Why apache beam is using --no-binary :all: option, this is the root cause of long duration. Am I doing some mistake or any other way we can decrease the pip download time?
This is because the packages need to be installed on the workers, and it doesn't want to download binaries specific to whatever machine you happen to be launching the pipeline from.
If you have a lot of dependencies, the best solution is to use custom containers.
I have recently built the h2o4gpu docker image using the Dockerfile-runtime, and managed to run it and log into the Jupyter notebooks.
However, when trying to run
import h2o4gpu
I get the error that there is no h2o4gpu module. After, I tried installing by adding the below command to the dockerfile.
pip install --extra-index-url https://pypi.anaconda.org/gpuopenanalytics/simple h2o4gpu
pip install h2o4gpu-0.2.0-cp36-cp36m-linux_x86_64.whl
This also failed, so I was wondering if there were other changes I should make, or if I should be making the docker file from scratch.
Thank you
To build the project, you can follow this recipe:
git clone https://github.com/h2oai/h2o4gpu.git
cd h2o4gpu
make centos7_cuda9_in_docker
This will work on either an x86_64 or ppc64le host with a modern docker installed.
The python .whl file artifact is written to the dist directory.
Even if the build process is significantly refactored, this style of build API is very likely to remain.
I have been learning docker using the docker docs, and first encountered a problem in the quick start guide to Django. The program would build normally up until the the second to last command. Here is my dockerfile and the output:
FROM python:3.5.2
ENV PYTHONUNBUFFERED 1
WORKDIR /code
ADD requirements.txt /code
RUN pip3 install -r requirements.txt
ADD . /code
Then when I run:
docker-compose run web django-admin startproject src .
I get the whole thing built and then it hangs:
Installing collected packages: Django, psycopg2
Running setup.py install for psycopg2: started
Running setup.py install for psycopg2: finished with status 'done'
Successfully installed Django-1.10.5 psycopg2-2.6.2
So since I don't have experience with compose I tried the most basic docker build that included a dockerfile. This one also got hung up.
FROM docker/whalesay:latest
RUN apt-get -y update && apt-get install -y fortunes
CMD /usr/games/fortune -a | cowsay
And this is the last terminal line before the hang.
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
According to the tutorial this occurs in the same spot. The second to last command, which happens to also be RUN.
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
---> dfaf993d4a2e
Removing intermediate container 05d4eda04526
[END RUN COMMAND]
So thats why I think its in the RUN command, but I'm not sure why or how to fix it. Does anyone know why this might be occurring? I am using a Macbook Pro 8.1 running OSX EL CAPITIAN version 10.11.6 and Docker version 1.13.1, build 092cba3
Using the serverless framework v1.0.0, I have a 'requirements.txt' in my service root with the contents being the list of dependant python packages. (e.g. requests).
However my resulting deployed function fails as it seems these dependencies are not installed as part of the packaging
'Unable to import module 'handler': No module named requests'
I assume it is serverless that does the pip install, but my resulting zip file is small and clearly its not doing it, either by design or my fault as I am missing something? Is it because its Lambda that does this? If so what am I missing?)
Is there documentation on what is required to do this and how it works? Is it serverless that pip installs these or on aws lambda side?
You need to install serverless-python-requirements and docker
$ npm install serverless-python-requirements
Then add the following to your serverless.yml
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
dockerizePip: non-linux
Make sure you have your python virtual environment active in CLI:
$ source venv/bin/activate
Install any dependencies with pip - note that in CLI you can tell if venv is active by the venv to the left of the terminal text
(venv) $ pip install <NAME>
(venv) $ pip freeze > requirements.txt
Make sure you have opened docker then deploy serverless as normal
$ serverless deploy
What will happen is that serverless-python-requirements will build you python packages in docker using a lambda environment, and then zip them up ready to be uploaded with the rest of your code.
Full guide here
Now you can use serverless-python-requirements. It works both for pure Python and libraries needing native compilation (using Docker):
A Serverless v1.x plugin to automatically bundle dependencies from requirements.txt and make them available in your PYTHONPATH.
Requires Serverless >= v1.12
The Serverless Framework doesn't handle the pip install. See https://stackoverflow.com/a/39791686/1111215 for the solution