Abstract


  • Base64 is called Base64 because it encodes binary data into 64 unique characters, each of which can be represented using 6 bits (). These 64 characters consist of:
    • 26 uppercase English letters (A-Z)
    • 26 lowercase English letters (a-z)
    • 10 digits (0-9)
    • 2 additional characters: the plus sign (+) and the forward slash (/)
  • Base64 encoding works by converting groups of 3 bytes into groups of 4 characters

The length of a base64 encoded string must be a multiple of 4 bytes

If the original data doesn’t result in a multiple of 4 bytes when encoded, base64 padding comes to the rescue!

Base64 padding, represented by the = character, is often found at the end of base64 encoded strings. It doesn’t represent any characters from the original data, but is used to ensure the encoded string’s length is a multiple of . This maintains consistency and allows for correct decoding of the original data.

Compatibility with 7-bit

Some systems could only transmit data in 7-bit chunks due to limitations in character sets and protocols (like older email systems).

If data encoded in 8-bit chunks were transmitted over these 7-bit systems, data could be corrupted or lost. The 8th bit might be stripped off, or a random bit could be added to fill the 7-bit space.

Base64 encoding solves this problem. By converting data into 6-bit groups and representing each group with a printable character, it ensures that data can be transmitted safely over systems that can only handle 7-bit chunks. The receiving system can then decode the base64 data back into its original 8-bit format.

While the 7-bit transmission limitation is less common today, base64 remains useful for encoding binary data (like images or files) into a text format that can be easily transmitted or stored & making data URL-safe, as base64 characters are all standard URL characters.

Reference