<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAALCAYAAABCm8wlAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4QoPAxIb88htFgAAABl0RVh0Q29tbWVudABDcmVhdGVkIHdpdGggR0lNUFeBDhcAAACxSURBVBjTdY6xasJgGEXP/RvoonvAd8hDyD84+BZBEMSxL9GtQ8Fis7i6BkGI4DP4CA4dnQON3g6WNjb2wLd8nAsHWsR3D7JXt18kALFwz2dGmPVhJt0IcenUDVsgu91eCRZ9IOMfAnBvSCz8I3QYL0yV6zfyL+VUxKWfMJuOEFd+dE3pC1Finwj0HfGBeKGmblcFTIN4U2C4m+hZAaTrASSGox6YV7k+ARAp4gIIOH0BmuY1E5TjCIUAAAAASUVORK5CYII=">
A visualization
The gist of the encoding process is captured in the following interactive visualization. Type in some ASCII characters in the top input and hit the “Encode” button. If you run a few strings through this visualization, you may notice that the encoding process is simply a pair of nested loops. The outer loop iterates over the data in 24-bit increments; the spec refers to these as “input groups.” The inner loop iterates over each input group 6 bits at a time. Each 6-bit value is interpreted as an unsigned integer that is used to index an alphabet of 64 characters. The indexed alphabet value is the output. With the help of ES6 generators, this encoding process can be implemented with just a handful of functions:/**
* @param {Uint8Array} bytes
* @return {string} Base64 encoded string
*/
function base64Encode(bytes) {
let encoding = '';
for (let group of groups24Bits(bytes)) {
for (let value of values6Bits(group)) {
if (value !== undefined) {
encoding += ALPHABET[value];
} else {
encoding += PAD;
}
}
}
return encoding;
}
const ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
const PAD = '=';
/**
* @param {Uint8Array} bytes
* @return {Uint8Array} The next input group (yielded on each execution)
*/
function* groups24Bits(bytes) {
for (let i = 0; i < bytes.length; i += 3) {
yield bytes.slice(i, i + 3); // 3 bytes/3 octets/24 bits
}
}
/**
* @param {Uint8Array} group Expected to be array of 1 to 3 bytes
* @return {number|undefined} The next 6-bit value from the
* input group (yielded on each execution)
*/
function* values6Bits(group) {
const paddedGroup = Uint8Array.from([0, 0, 0]);
paddedGroup.set(group);
let numValues = Math.ceil((group.length * 8) / 6);
for (let i = 0; i < numValues; i++) { let base64Value; if (i == 0) { base64Value = (paddedGroup[0] & 0b11111100) >> 2;
} else if (i == 1) {
base64Value = (paddedGroup[0] & 0b00000011) << 4; base64Value = base64Value | ((paddedGroup[1] & 0b11110000) >> 4);
} else if (i == 2) {
base64Value = (paddedGroup[1] & 0b00001111) << 2; base64Value = base64Value | ((paddedGroup[2] & 0b11000000) >> 6);
} else if (i == 3) {
base64Value = paddedGroup[2] & 0b00111111;
}
yield base64Value;
}
let numPaddingValues = 4 - numValues;
for (let j = 0; j < numPaddingValues; j++) {
yield undefined;
}
}
If there is an “interesting” part to the encoding process, it is the ending conditions where we must apply padding. Each input group is required to be 24 bits long (or equivalently three 8-bit bytes). (It seems likely the spec writers chose 24-bit input groups since 24 is the least common multiple of 6 and 8.) In the implementation given above, we pad the final group with bytes of zeroes when the final input group is only 1 or 2 bytes long. As we iterate over this final input group, if the 6-bit value consists entirely of padding bits, then = is the output character, the designated padding character. If, however, the 6-bit value straddles “real” bits and padding bits—as can be seen in the input “foob”—then the alphabet is still indexed and the padding bits are taken to be zeroes.
A couple usages
You will not find any mention of “HTML” in the Base64 spec. Instead, the authors simply mention that Base64 encoding is used in environments where, “perhaps for legacy reasons,” the “storage or transfer” of data is limited to ASCII characters. More or less, this idea sums up the browser and its heavy consumption of HTML, JSON, CSS, and JavaScript. Increasingly, this text is encoded using UTF-8, a superset of ASCII. In this text-heavy ecosystem, Base64 encoding finds various niche applications.Data URLs
The first part of a URL is the scheme. It is the prefix string that goes before the first colon; for example, it is the https in https://example.com or the beginning ftp in ftp://ftp.funet.fi/pub/standards/RFC/rfc4648.txt. The scheme tells the client (a browser or a different network app) how to retrieve the resource and what protocol to follow. The scheme prefix also makes URLs extensible and suitable for future protocols. If a new protocol comes along, we can create a new URL scheme for it and still identify resources by URL. The data scheme is one such extension, which we saw in the image encoded in the introduction. This scheme tells clients, “My resource’s data is located right here in the rest of this URL string.” URLs that use the data scheme follow this format:data:[<mediatype>][;base64],<data>
.triangle-icn {
background-image: url(data:image/gif;base64,R0lGODlhCAAHAIABAGSV7f///yH+EUNyZWF0ZWQgd2l0aCBHSU1QACH5BAEKAAEALAAAAAAIAAcAAAINjI8BkMq41onRUHljAQA7);
background-repeat: no-repeat;
background-position: center;
}
const image = new Image();
image.src = "data:image/jpg;base64,/9j/4AAQSkZJRgABAQEAWQBZAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wgARCAAHAAgDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAAB//EABUBAQEAAAAAAAAAAAAAAAAAAAUG/9oADAMBAAIQAxAAAAFXph//xAAWEAADAAAAAAAAAAAAAAAAAAABFRb/2gAIAQEAAQUCoS8//8QAGBEAAgMAAAAAAAAAAAAAAAAAABEUI0H/2gAIAQMBAT8Bk3PD/8QAGxEAAAcBAAAAAAAAAAAAAAAAAAESFBUxQuH/2gAIAQIBAT8Bjyap1fB//8QAGxAAAQQDAAAAAAAAAAAAAAAAEQABEhQzUWH/2gAIAQEABj8CmXrYxza//8QAGxAAAQQDAAAAAAAAAAAAAAAAAQARIUFRcZH/2gAIAQEAAT8hI7j3Vy86hf/aAAwDAQACAAMAAAAQf//EABoRAAEFAQAAAAAAAAAAAAAAAJEAESFRgaH/2gAIAQMBAT8QeLlnkL//xAAaEQACAgMAAAAAAAAAAAAAAAABIRFhAFHB/9oACAECAQE/EARV18QtS8//xAAYEAEBAAMAAAAAAAAAAAAAAAABEQAh8P/aAAgBAQABPxBQyNdodCLh/9k="
const canvas = document.getElementById("myCanvas");
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAHCAYAAAA1WQxeAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA3XAAAN1wFCKJt4AAAAB3RJTUUH4QoQAiIwiYqPWwAAAHNJREFUCNd1zaENQjEAhOGvgAFbAwnMwhoMUFeBYBEEVXQAGIIFGARXjXkC817SEPjdXfLfBR2ptD0WeNQcGUPPDUPNcTcVs85OWGObSjtNfUilwRxDt/SuOa7SpQmjfcbx6+5eczyEVNoGL79ZznD1n+cHk7cb99sXV8cAAAAASUVORK5CYII=">
Source maps
Another common but less visible usage of Base64 encoding is in source maps. Below is a source map generated by Google’s Closure compiler:{
"version":3,
"file":"",
"lineCount":1,
"mappings":"AAUIA,OAAAC,IAAA,CAAYC,CAGFC,IATAC,QAAQ,EAAW,CAE7B,IAAAF,EAAA,CAOsBG,cATO,CAMjBH,GAAZ;",
"sources":["greet.js"],
"names":["console","log","_greeting","greeter","Greeter","greeting"]
}
mappings
field. The comma and semicolon delimited snippets are the Base64 encoded binary data of integers encoded as variable-length quantities (VLQ).
Images and source maps are just a couple places Base64 encoding is used. If you know of others or any novel uses of Base64 encoding, please mention them in the comments below. It also might be worth “inspecting” page sources to find others. For example, in Chrome, if you go to chrome://dino
you can find that the offline dinosaur game’s image assets (and it appears sound assets) are Base64 encoded. (Examining these assets—which are also embedded on YouTube’s homepage—is how I discovered the dinosaur can duck under the low-flying pterodactyls.)About Lucid
Lucid Software is the leader in visual collaboration and work acceleration, helping teams see and build the future by turning ideas into reality. Its products include the Lucid Visual Collaboration Suite (Lucidchart and Lucidspark) and airfocus. The Lucid Visual Collaboration Suite, combined with powerful accelerators for business agility, cloud, and process transformation, empowers organizations to streamline work, foster alignment, and drive business transformation at scale. airfocus, an AI-powered product management and roadmapping platform, extends these capabilities by helping teams prioritize work, define product strategy, and align execution with business goals. The most used work acceleration platform by the Fortune 500, Lucid's solutions are trusted by more than 100 million users across enterprises worldwide, including Google, GE, and NBC Universal. Lucid partners with leaders such as Google, Atlassian, and Microsoft, and has received numerous awards for its products, growth, and workplace culture.