What Is Binary Encoding

coders tool programmer

by Staff Coder

on September 12, 2020

What Is Binary Encoding?

Binary encoding is a procedure to convert data to a form that allows the data to be easily used by difference computer operating systems. This achieved by converting binary data to a ASCII string format, specifically, converting 8-bit data into a 7-bit format, that use as standard set of ASCII printable characters. ASCII, American Standard Code for Information Interchange, was developed by AT&T in the early 1960s And is the most widely used character encoding format. Modern character encoding continues to be base on ASCII although it support many additional characters and different languages.

The original ASCII character code, which provides 128 different characters, numbered 0 to 127. ASCII and 7-bit ASCII are synonymous. Since the 8-bit byte is the common storage element, ASCII leaves room for 128 additional characters, which are used to represent a host of foreign language and other symbols (see code page). If none of the additional character combinations is used (128-255), the first bit of the byte is 0.

Some concepts and terms to be familiar with includes:

  • Binary encode, provides commands for encoding values to base64, hexadecimal, and uuencode formats.
  • base64, is a method for encoding binary data in an ASCII format.
  • uuencode, is a format for encoding binary data.
  • ASCII: The American Standard Code for Information Interchange , published by ANSI, specifies a set of 128 characters (control characters and graphic characters, such as letters, digits, and symbols) with their coded representation. 646 is an internationalized version of ASCII. ISO/IEC 8859 is a set of 8-bit codes based on ASCII, intended to be combined with a standard set of terminal control sequences.
  • Text to binary, encode and convert text to bytes. Computers store instructions, texts and characters as binary data. All Unicode characters can be represented soley by UTF-8 encoded ones and zeros (binary numbers). These Unicode binary encodings are designed to be useful for compressing short strings, and maintains code point order.

Simple Binary Encoding

SBE, Simple Binary Encoding is a binary-format protocol for decoding and encoding messages. It is designed for low latency and deterministic performance.

The binary encoded message format is specified using native primitive data types (integers, chars), so there is no need for translation of the data into a string. The SBE concerns data representation only; the message structure is not subject to business-level specifications. Supports fields of both fixed-length and variable-length.

The message layout is specified in the SBE template (schema) which is based on XML. The prototype determines the fields belong to a message and where they are within a message. It also defines valid value ranges and facts, such as constant values, which need not be sent on the wire.

What Is Base64 Encoding?

Base64 provides a safe way to transfer binary data as only printable ASCII characters over a computer network. It is commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with ASCII. Data can safely be transferred without an possibility of loosing data due to confusion of control characters. Base64 encoding is the most popular of the character “base encoding” that includes format such as Base 16 or Base 32. Base64 offers a high level of interoperable among a wide variety of different systems. In current technologies, Base64 is the most popular binary data encoding and decoding technology.

Base 64 Alphabet

Base64 used the following subset of the US-ASCII characters.

[0-9] – 10 characters
[a-z] – 26 characters
[A-Z] – 26 characters
[/] - 1 character [filler character] [+] - 1 character [filler character] [=] - Used for Padding purposes, as explained later.

Base64 uses 6 -bits. This allows up 64 characters. You will notice that the total number of upper case letters, lower case letters and digits, add up to 62. The ‘+’ and ‘/’ are designated as filler and fills the gap to account for 64 characters. Base64 characters are formed by taking a block of three octets to form a 24-bit string, which is converted into four Base64 characters.

Base64 alphabet

The characters in the Base 64 alphabet includes, ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz/+

Base64 Encode

Base64 converts a sequence of bytes into a sequence of characters, each representing six bits. Each sequence of three unencoded bytes will fall neatly into a sequence of four encoded characters, each representing the same twenty-four bits.

If the sequence of bytes to encode doesn’t fall evenly into three bytes at a time, the remaining bytes are still sequenced into four encoded characters. Equals signs are used to represent the filler portion that will not be decoded into the original data. Depending on the length of the original data, the encoded characters may end in zero, one or two equals signs.

Practical Uses Of Base64 Encode

  • Base64 on one of the main pillar that supports the construction and operation of emails messages. Base64 is integral in the structuring emails that need attachments like image, video, documents or any other file formats.

  • Base-64 can encode and transfer any sets of binary data through dissimilar system and then decode them to the original binary data.

  • XML documents can be used to store binary content. Binary data can be base-64 encoded and be specified inline within any XML 1.0 document.

  • HTTP basic authentication is encoded using the RFC2045-MIME variant of Base64, except that is not limited to 76 character per line. Both the username and password is combined with a single colon.
    It is not done for security reasons but as a means of escaping special characters.

HTTP Data Transfer, Content Encodings and Transfer Encodings

There are two particular issues that HTTP had to resolve in order to bring in its messages a wide range of media types: encoding the data, and defining its form and features. As we have already seen, HTTP borrows from MIME the notion of media forms and the Content-Type header for handling the type recognition.

It similarly borrows MIME principles and headers to deal with the problem of encoding. But here we’re running into some of the big differences between HTTP and MIME.

Encoding was a major problem for MIME, since it was developed with the old RFC 822 e-mail message format for the specific purpose of sending non-text data. RFC 822 imposes many big limitations on the messages it carries, the most important of which is to encrypt data using 7-bit ASCII. Even RFC 822 messages are limited to lines of no more than 1000 characters ending in a ‘CRLF’ sequence.

These limitations mean that arbitrary binary files which have no line definition and consist of bytes which can each contain a value from 0 to 255 can not be sent in their native format using RFC 822. To pass these files, MIME must encode them using a method like base64, which transforms three 8-bit characters into a set of four 6-bit characters that can be expressed in ASCII.

The MIME Content-Transfer-Encoding header is used in the message when this kind of transformation is performed so that the receiver can reverse the encoding to return the data to its normal form.

Now, although this technique works, it’s less effective than sending the data directly in binary, since encoding base64 increases the message size by 33 percent (three bytes are encoded using four ASCII characters, each of which requires one byte to send). HTTP messages are sent directly over a TCP connection between client and server, and do not use the standard RFC 822.

This allows binary data to be transmitted between HTTP clients and servers without the need for base64 encoding or other techniques of transformation. Since sending the data unencoded is more effective, this could be one reason why developers at HTTP have chosen not to make the protocol strictly MIME compliant.