Debs in S3
Lucid Software uses Debian packages (debs) for packaging and installation. Custom scripts download the debs from a private AWS S3 bucket. As much as we loved the quirks of homegrown scripts, we wanted to move to a use a proper Apt (Advanced Package Tool) repository and toolchain.
This required us getting Apt to work with S3.
Existing solutions
People have already written custom Apt transports for S3. There's apt-s3 in C, which is a fork of a fork of a fork of apt-transport-s3. And there's an identically named -- but completely separate -- apt-transport-s3 in Python. However, neither project
- used the standard AWS credential resolution; instead they required ad-hoc credential files
- supported version 4 of AWS signatures (mandatory in eu-central-1)
- supported If-Modified-Since caching
- supported pipelining
Creating a custom transport
Given the drawbacks of current solutions, I decided to author a better Apt transport method. The Apt transport method documentation is sparse, but it gives an adequate overview.
Each protocol -- http, https, ssh, etc. -- is implemented as an executable, and placed in a file named by its schema in /usr/lib/apt/methods. apt-get invokes each of these executables. It sends messages to the process via its stdin and receives messages from the process via its stdout. Messages have a HTTP-like text format consisting of an initial status/command followed by several RFC-822 fields, and terminated by a blank line.
However, the Apt method documentation has omissions or ambiguity. For example, on the subject of pipelining, the documentation says only
Methods should set the pipeline bit if their underlying protocol supports pipelining. The only known method that does support pipelining is http.
mv /usr/lib/apt/methods/http /usr/lib/apt/http-real
"Setting the pipleline bit" is rather unclear. So I proxied the "official" http method to observe the inputs and outputs.
mv /usr/lib/apt/methods/http /usr/lib/apt/methods/http-real
echo '#!/bin/sh > /usr/lib/apt/methods/http' > /usr/lib/apt/methods/http
echo 'tee /tmp/in | /usr/lib/apt/methods/http-real "$@" | tee /tmp/out' >> /usr/lib/apt/methods/http
Now after apt-get update, incoming messages for the http transport are in /tmp/in, and outgoing messages are in /tmp/out.
100 Capabilities
Version: 1.2
Pipeline: true
Send-Config: true
600 URI Acquire
URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat...
Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_...
Fail-Ignore: true
Index-File: true
201 URI Done
URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat...
Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_...
Size: 762361
Last-Modified: Tue, 15 Apr 2014 16:42:29 GMT
MD5-Hash: 6d991ed7d035b51aa77883a107896db9
MD5Sum-Hash: 6d991ed7d035b51aa77883a107896db9
SHA1-Hash: 8aa7a170afdf02c587c700b63d090c6edd794a02
SHA256-Hash: ed8741c9fb597579cbbb491f1f2a3bd8851e373aae9e61deddb46913d0...
SHA512-Hash: 2004577b96a20392c6934679cb40c81486967f67927c4ff9dd1dc32da2...
So "Pipeline: true" is needed for pipelining. Some more lessons:
- Although documentation mentions only MD5-Hash, if the method does not provide hashes for all algorithms in the package index, apt-get fails with "Failed to fetch ... Hash Sum mismatch" (example). Include all standard algorithms: MD5-Hash, SHA1-Hash, SHA256-Hash, SHA512-Hash.
- Sometimes the downloaded lists can become corrupted and cause odd issues.
rm -r /var/lib/apt/lists
fixes that. - Don't forget to flush! Otherwise, apt will hang while waiting for you.
- If you support pipelining, set "Single-Instance" to "yes". This will start a single process for your method and reuse it.
- Even for a cache hit, include the Filename from the URI Acquire request in the response.
- The standard configuration apt mechanism is /etc/apt/apt.conf.d . Prefer that over requiring ad-hoc files all over the place.
- When possible, use conditional Last-Modified/If-Modified-Since caching. This allows the client to avoid downloading megabytes of package lists on every update.
- What apt means by "pipelining" (queuing requests) is really "multiplexing" (accepting responses in arbitrary order relative to requests).
apt-boto-s3
The project has been released as apt-boto-s3. The implementation is under 250 lines of Python. See it at https://github.com/lucidsoftware/apt-boto-s3/blob/v1.0/s3.py.
See the Github repo for more information, including how to install it from our public Bintray apt repo.
About Lucid
Lucid Software is a pioneer and leader in visual collaboration dedicated to helping teams build the future. With its products—Lucidchart, Lucidspark, and Lucidscale—teams are supported from ideation to execution and are empowered to align around a shared vision, clarify complexity, and collaborate visually, no matter where they are. Lucid is proud to serve top businesses around the world, including customers such as Google, GE, and NBC Universal, and 99% of the Fortune 500. Lucid partners with industry leaders, including Google, Atlassian, and Microsoft. Since its founding, Lucid has received numerous awards for its products, business, and workplace culture. For more information, visit lucid.co.