Binary Canonical Serialization (BCS)

Binary Canonical Serialization (BCS) is a method that allows for compact and efficient storage. It was invented at Diem for the purposes of a consistent signing mechanism.

Properties

Binary

The serialization method is directly in bytes and is not human-readable. For example, for a String hello, it would be represented by the length of the string as a binary encoded uleb-128, followed by the UTF-8 encoded bytes of hello.
e.g. "hello" = 0x0548656C6C6F This is different than, say, a human-readable format such as JSON which would give "hello"

Canonical

There is only one canonical way to represent the bytes. This ensures that signing and representation are consistent.

Example:

Let's consider this struct in Move:

module 0x42::example {
    struct FunStruct {
        a: u8,
        b: u8
    }
}

In JSON, the struct {"a":1, "b": 2} can also be represented as {"b":2, "a":1}. Both are interchangeable so they are not canonical. In BCS, it would be a pre-defined order, so only one would be the valid representation.

However, in BCS, there is only one valid representation of that, which would be the bytes 0x0102. 0x0201 is not canonical, and it would instead be interpreted as {"a":2, "b":1}.

Non-self describing

The format is not self-describing. This means that deserialization requires knowledge of the shape and how to interpret the bytes. This is in opposition to a type like JSON, which is self-describing.

Example:

Let's consider this struct in Move again:

module 0x42::example {
    struct FunStruct {
        a: u8,
        b: u8
    }
}

The first byte will always be interpreted as a then b. So, 0x0A00 would be {"a":10, "b":0} and 0x0A01 would be {"a":10, "b":1}. If we flip it to 0x000A it would be {"a":0, "b":10}.

Note that this means if I do not know what the shape of the struct is, then I do not know if this is a single u16, the above struct, or something else.

Details about different types