Simple Python Lambda Packaging Guide

4 Nov '18

(Skip to the TL;DR if you’re in a hurry)

This is my personal blog, and this is a personal project. But you should know I currently work for AWS, although neither for Lambda team nor the CodeBuild team. My specific role doesn’t involve much Lambda work, and most of my experience with Lambda come from my job before joining AWS. In fact, the motivation for this was completely unrelated to work.

Vicki Boykis wrote an awesome post about someone struggling with Python Lambda function packaging. They ended up rewriting it in Java 😥. So I thought I’d write up the simplest workflow I can think of, and tried to keep it short by just focussing on the dependency downloading step, which is where real-world packaging can differ from the documentation. Please let me know if you could follow along, especially if you’re a Python novice.

Packaging

Packaging a Python Lambda function/ “serverless application” is easy. You just follow the documentation or even an example project, which boils down to pip install && zip. Right?

Haha, jk!

To be fair, for many cases it is that easy. Unfortunately, when it breaks, it isn’t always obvious. To explain why, we need to know a bit about Python packaging. Python does binary packaging via wheels, which allows shipping pre-compiled C extensions for common platforms. This means it’s quicker to install, and you don’t necessarily need a compiler installed - a huge usability win.

However, Lambda functions run in a Linux environment. So for the edge case if you are packaging Python Lambdas on a non-compatible system (usually macOS or Windows) and one or more of the direct or indirect dependency includes a C extension, it breaks. There are quite a few packages that use C extensions, such as lxml, cryptography, pillow, numpy, psycopg2, pycrypto, and more. To be completely fair, this is not a Python issue, it’s a Lambda deployment issue.

Now that we know the issue, there is an easy approach which should meet your needs. (I rejected pre-compiled Python packages for Lambda because of several drawbacks1.) I’m assuming Python 3.6; although this method can work with the legacy Python 2.7 version. It also uses Docker, as despite the hype, Docker is a great tools for OS-independent packaging and deployment. If this doesn’t appeal to you, I’m working on a follow-up.

AWS CodeBuild

AWS has a service for building artefacts called AWS CodeBuild. It’s instructive to see what Docker images they provide. For example, aws/codebuild/python:3.6.5 seems to track the Python Lambda runtime. It uses Ubuntu 14.04 as the base! Very interesting, either the Lambda runtime is based on Ubuntu, or Ubuntu is close enough to the Lambda runtime for build artefacts.

Even though CodeBuild’s Dockerfile are open source, they haven’t uploaded the images, so Docker can’t pull them:

$ docker run --rm aws/codebuild/python:3.6.5 env
Unable to find image 'aws/codebuild/python:3.6.5' locally
docker: Error response from daemon: pull access denied for aws/codebuild/python, repository does not exist or may require 'docker login'.

You have to build them yourself (if you’re following along, hold off and read on):

$ git clone https://github.com/aws/aws-codebuild-docker-images.git
$ cd aws-codebuild-docker-images/
$ docker build ubuntu/python/3.6.5/ \
    --tag aws/codebuild/python:3.6.5
$ docker run --rm aws/codebuild/python:3.6.5 \
    python --version
Python 3.6.5

For CodeBuild, this image makes perfect sense. And if your use-case is running complex build scripts as part of continuous integration, you may find it appropriate (feel free to skip the next section). I can’t recommend it for this project (long build time2).

Lambda CI

AWS SAM CLI “is a CLI tool for local development and testing of Serverless applications”. So far I love the local testing functionality, although I’ve used Serverless more.

As with CodeBuild, the point here is not to get you to use a specific tool, but to learn from other software engineers. The SAM CLI can run Lambda functions locally - how do they do this? With the open source Lambda CI Docker images, “Docker images and test runners that replicate the live AWS Lambda environment”. Bingo! There are even build images for this exact purpose!

$ docker run --rm lambci/lambda:build-python3.6 python --version
Unable to find image 'lambci/lambda:build-python3.6' locally
build-python3.6: Pulling from lambci/lambda
[...]
Python 3.6.1

Honestly, this is a great project, and super easy to use3. ❤️

Building inside a Docker container

First, let’s install the dependencies locally, to have a baseline on how it should look. I’m building on macOS, and you can see I get the macOS specific package:

$ echo "lxml" > requirements.txt
$ mkdir build
$ pip3 install -r requirements.txt -t build/
Collecting lxml (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/e4/e4/75453295abd6dcd8f7b48c1eb092ce2c23c34ae08ca7acc8c42de35a5a78/lxml-4.2.5-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.7MB)
Installing collected packages: lxml
Successfully installed lxml-4.2.5
$ grep 'Tag' build/lxml-*.dist-info/WHEEL
Tag: cp36-cp36m-macosx_10_6_intel
Tag: cp36-cp36m-macosx_10_9_intel
Tag: cp36-cp36m-macosx_10_9_x86_64
Tag: cp36-cp36m-macosx_10_10_intel
Tag: cp36-cp36m-macosx_10_10_x86_64
$ rm -rf build/

Now, inside the container. For simplicity, I’m going to mount a directory inside the container. This doesn’t work for remote docker hosts (there are workarounds, using S3, NFS shares, etc). Quickly check the mount works:

$ docker run \
    -v "$PWD":/var/task \
    --rm \
    lambci/lambda:build-python3.6 \
    cat requirements.txt
lxml

Good, let’s go. The only difference is running pip inside the container:

$ echo "lxml" > requirements.txt
$ mkdir build
$ docker run \
    -v "$PWD":/var/task \
    --rm \
    lambci/lambda:build-python3.6 \
    pip install -r requirements.txt -t build/
Collecting lxml (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/03/a4/9eea8035fc7c7670e5eab97f34ff2ef0ddd78a491bf96df5accedb0e63f5/lxml-4.2.5-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
Installing collected packages: lxml
Successfully installed lxml-4.2.5
$ grep 'Tag' build/lxml-*.dist-info/WHEEL
Tag: cp36-cp36m-manylinux1_x86_64

Excellent. This time, we get the manylinux flavoured package, which is exactly right.

Using the CodeBuild image is similar, except by default the container runs in the root folder:

$ docker run \
    -v "$PWD":/var/task \
    --rm \
    aws/codebuild/python:3.6.5 \
    pip install -r /var/task/requirements.txt -t /var/task/build/
Collecting lxml (from -r /var/task/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/03/a4/9eea8035fc7c7670e5eab97f34ff2ef0ddd78a491bf96df5accedb0e63f5/lxml-4.2.5-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
Installing collected packages: lxml
Successfully installed lxml-4.2.5
$ grep 'Tag' build/lxml-*.dist-info/WHEEL
Tag: cp36-cp36m-manylinux1_x86_64

Conclusion

Docker contains provide a simple way to download OS-specific dependencies correctly for Python Lambda functions. You can then use these to package Python Lambda functions correctly on your local machine, or integrate it in your continuous integration/continuous delivery pipeline to ensure you get the right dependencies.

At the same time, it’s a bit of a let down for continuous integration if you’re a serverless shop. Running Docker seems a bit contrary to the point of not having to worry about infrastructure. Well, as hinted at, I’m currently working on a Docker-free method, and will update this post soon.

Do let me know if you found this helpful.

TL;DR

$ echo "lxml" > requirements.txt
$ mkdir build
$ docker run \
    -v "$PWD":/var/task \
    --rm \
    lambci/lambda:build-python3.6 \
    pip install -r requirements.txt -t build/

  1. While pre-compiled packages are nice for beginners, they have some drawbacks. You no longer control your own dependencies. It’s hard to integrate into a development workflow or continuous integration. And if you need a package which isn’t provided, it still breaks. Better to learn the basics. 

  2. Even on a beefy Code i7 Macbook Pro, it took just under nine minutes to build this (excluding pulling base images), which is kinda outrageous for local development. The CodeBuild Dockerfile is also large, and installs things like mono. So it’s a bad fit for local development, unless you have a Docker repository which you can pull this image from. 

  3. Infrastructure work is rarely sexy, but all the “10x developers” I know are infrastructure/ “devops” people, because they enable other teams to be several times more productive. Hi James! 

Python, AWS, Lambda, Serverless

Older