(Skip to the TL;DR if you’re in a hurry)
I currently work for AWS (neither for Lambda team nor the CodeBuild team). This is a personal project on my personal website, and the motivation for this was completely unrelated to work.
Vicki Boykis wrote an awesome post about someone struggling with Python Lambda function packaging. They ended up rewriting it in Java 😥. So I thought I’d write up the simplest workflow I can think of, and tried to keep it short by just focussing on the dependency downloading step, which is where real-world packaging can differ from the documentation.
Feedback is welcome, especially if you’re a Python novice - this is attempt 2, and hopefully clearer now.
Try the simplest thing first
So why do the docs say this? Well, for many cases it is that easy. To explain why and when it breaks, we need to know a bit about Python packages and packaging.
Often, a Python package is just Python code. This works fine. But the most commonly used Python runtime, CPython, allows C extensions. This is where things go wrong. C needs to be compiled, and that depends on the target system.
Lambda functions run in a Linux environment. So if you are packaging Python Lambdas on a non-compatible system (e.g. macOS or Windows) and one or more of the direct or indirect dependency includes a C extension, it breaks. There are quite a few packages that use C extensions, such as lxml, cryptography, pillow, numpy, psycopg2, pycrypto, and more.
To be completely fair, this is not a Python issue, it’s a deployment issue.
And then the next simplest thing
Why bother creating a deployment package at all? Can’t the Lambda just run
It’s an interesting thought. But
pip is meant for humans on a CLI, and is very slow, especially without a cache. The start-up time would be awful. And Lambdas are a limited environment to run functions. While you can run
pip, don’t expect to find a compiler or the headers necessary for building C extensions 1. (This is also a deal breaker for a packaging Lambda/ “packaging-as-a-service”, which I also tried.)
And the next…
While pre-compiled Python packages for Lambda are nice idea for beginners, I rejected them because of some drawbacks. You no longer control your own dependencies. It’s hard to integrate into a development workflow or continuous integration pipeline. Last but not least, if you need a package which isn’t provided, it still breaks.
And the next?
I wouldn’t call Docker simple. Despite the hype, it does solve packaging and deployment issues well by providing a system that’s independent from the underlying host system.
I’m assuming Python 3.6; although this method can work with the legacy Python 2.7 version.
AWS has a service for building artefacts called AWS CodeBuild. It’s instructive to see what Docker images they provide. For example,
aws/codebuild/python:3.6.5 seems to track the Python Lambda runtime. It uses Ubuntu 14.04 as the base! Very interesting, either the Lambda runtime is based on Ubuntu, or Ubuntu is close enough to the Lambda runtime for build artefacts.
Even though CodeBuild’s Dockerfile are open source, they haven’t uploaded the images, so Docker can’t pull them:
$ docker run --rm aws/codebuild/python:3.6.5 env Unable to find image 'aws/codebuild/python:3.6.5' locally docker: Error response from daemon: pull access denied for aws/codebuild/python, repository does not exist or may require 'docker login'.
You have to build them yourself (if you’re following along, hold off and read on):
$ git clone https://github.com/aws/aws-codebuild-docker-images.git $ cd aws-codebuild-docker-images/ $ docker build ubuntu/python/3.6.5/ \ --tag aws/codebuild/python:3.6.5 $ docker run --rm aws/codebuild/python:3.6.5 \ python --version Python 3.6.5
For CodeBuild, this image makes perfect sense. And if your use-case is running complex build scripts as part of continuous integration, you may find it appropriate (feel free to skip the next section). I can’t recommend it for this project (long build time2).
As with CodeBuild, the point here is not to get you to use a specific tool. Instead, open source is great for learning from other software engineers. The SAM CLI can run Lambda functions locally - how do they do this? With the open source Lambda CI Docker images, “Docker images and test runners that replicate the live AWS Lambda environment”. Bingo! There are even build images for this exact purpose!
$ docker run --rm lambci/lambda:build-python3.6 python --version Unable to find image 'lambci/lambda:build-python3.6' locally build-python3.6: Pulling from lambci/lambda [...] Python 3.6.1
Honestly, this is a great project, and super easy to use3. ❤️
(This should work for most cases, because the build image isn’t quite as locked own as the Lambda environment, but if it fails, try the CodeBuild image.)
Building inside a Docker container
First, let’s install the dependencies locally, to have a baseline on how it should look. I’m building on macOS, and you can see I get the macOS specific package:
$ echo "lxml" > requirements.txt $ mkdir build $ pip3 install -r requirements.txt -t build/ Collecting lxml (from -r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/e4/e4/75453295abd6dcd8f7b48c1eb092ce2c23c34ae08ca7acc8c42de35a5a78/lxml-4.2.5-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.7MB) Installing collected packages: lxml Successfully installed lxml-4.2.5 $ grep 'Tag' build/lxml-*.dist-info/WHEEL Tag: cp36-cp36m-macosx_10_6_intel Tag: cp36-cp36m-macosx_10_9_intel Tag: cp36-cp36m-macosx_10_9_x86_64 Tag: cp36-cp36m-macosx_10_10_intel Tag: cp36-cp36m-macosx_10_10_x86_64 $ rm -rf build/
Now, inside the container. For simplicity, I’m going to mount a directory inside the container. This doesn’t work for remote docker hosts (there are workarounds, using S3, NFS shares, etc). Quickly check the mount works:
$ docker run \ -v "$PWD":/var/task \ --rm \ lambci/lambda:build-python3.6 \ cat requirements.txt lxml
Good, let’s go. The only difference is running
pip inside the container:
$ echo "lxml" > requirements.txt $ mkdir build $ docker run \ -v "$PWD":/var/task \ --rm \ lambci/lambda:build-python3.6 \ pip install -r requirements.txt -t build/ Collecting lxml (from -r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/03/a4/9eea8035fc7c7670e5eab97f34ff2ef0ddd78a491bf96df5accedb0e63f5/lxml-4.2.5-cp36-cp36m-manylinux1_x86_64.whl (5.8MB) Installing collected packages: lxml Successfully installed lxml-4.2.5 $ grep 'Tag' build/lxml-*.dist-info/WHEEL Tag: cp36-cp36m-manylinux1_x86_64
Excellent. This time, we get the
manylinux flavoured package, which is exactly right.
Using the CodeBuild image is similar, except by default the container runs in the root folder:
$ docker run \ -v "$PWD":/var/task \ --rm \ aws/codebuild/python:3.6.5 \ pip install -r /var/task/requirements.txt -t /var/task/build/ Collecting lxml (from -r /var/task/requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/03/a4/9eea8035fc7c7670e5eab97f34ff2ef0ddd78a491bf96df5accedb0e63f5/lxml-4.2.5-cp36-cp36m-manylinux1_x86_64.whl (5.8MB) Installing collected packages: lxml Successfully installed lxml-4.2.5 $ grep 'Tag' build/lxml-*.dist-info/WHEEL Tag: cp36-cp36m-manylinux1_x86_64
Docker contains provide a simple way to download OS-specific dependencies correctly for Python Lambda functions. You can then use these to package Python Lambda functions correctly on your local machine, or integrate it in your continuous integration/continuous delivery pipeline to ensure you get the right dependencies.
Do let me know if you found this helpful.
$ echo "lxml" > requirements.txt $ mkdir build $ docker run \ -v "$PWD":/var/task \ --rm \ lambci/lambda:build-python3.6 \ pip install -r requirements.txt -t build/
Pedantically speaking, Python also has binary packaging via wheels, which allows shipping pre-compiled C extensions for common platforms. This means it’s quicker to install, and you don’t necessarily need a compiler installed - a huge usability win. If all Python extensions provided Lambda-compatible wheels, conceivably running
pip installcould work. But it doesn’t, I’ve tried. I think because the Lambda environment is so locked down,
pipgets confused and can’t figure out the Linux variant, and if it’s manylinux compatible. And so it falls back to building from source. ↩
Even on a beefy Code i7 Macbook Pro, it took just under nine minutes to build this (excluding pulling base images), which is kinda outrageous for local development. The CodeBuild Dockerfile is also large, and installs things like mono. So it’s a bad fit for local development, unless you have a Docker repository which you can pull this image from. ↩
Infrastructure work is rarely sexy, but all the “10x developers” I know are infrastructure/ “devops” people, because they enable other teams to be several times more productive. Hi James! ↩