gzip compression for S3 uploads with boto3

2 Jun '17

boto3 doesn’t do compressed uploading, probably because S3 is pretty cheap, and in most cases it’s simply not worth the effort.

But for text files, compression can be over 10x (e.g. uncompressed 50MiB, compressed 5MiB). And if you allow downloads from S3, and you use gzip, browsers can uncompress the file automatically on download. This is awesome if you have e.g. the sales team download a huge CSV file! (To get this to work, you’ll need to set the correct content type. Browsers care about that, boto3 doesn’t.)

Sadly, Python’s gzip library is a bit confusing to use. Also, you need to know the final file size to upload files to S3. So the compression needs to be performed in advance.

For well-compressible files, I compress them in memory, but for truly large files, you can pass in e.g. a TemporaryFile to allow better scaling. See for yourself:

Python

Newer Older