Binary Canonical Serialization (BCS)
Binary Canonical Serialization, also known as BCS, is a binary canonical serialization method, that allows for compact and efficient storage. It was invented at Diem for the purposes of a consistent signing mechanism.
Properties
Binary
The serialization method is directly in bytes, and is not human-readable. For example, for a String hello
, it would be
represented by the length of the string as a binary encoded uleb-128, followed by the UTF-8 encoded bytes of hello.
e.g. "hello" = 0x0548656C6C6F
This is different than say a human-readable format such as JSON which would give
"hello"
Canonical
There is only one canonical way to represent the bytes. This ensures signing and the representation is consistent.
Example:
Let's consider this struct in Move:
module 0x42::example {
struct FunStruct {
a: u8,
b: u8
}
}
In JSON, the struct {"a":1, "b": 2}
can also be represented as {"b":2, "a":1}
. Both are interchangeable so
they are not-canonical. In BCS, it would be a pre-defined order, so only one would be the valid representation.
However, in BCS, there is only one valid representation of that, which would be the bytes 0x0102
. 0x0201
is not
canonical, and it would instead be interpreted as {"a":2, "b":1}
.
Non-self describing
The format is non-self describing. This means that deserialization requires knowledge of the shape and how to interpret the bytes. This is in opposition to a type like JSON, which is self describing.
Example:
Let's consider this struct in Move again:
module 0x42::example {
struct FunStruct {
a: u8,
b: u8
}
}
The first byte will always be interpreted as a
then b
. So, 0x0A00
would be {"a":10, "b":0}
and 0x0A01
would be
{"a":10, "b":1}
. If we flip it to 0x000A
it would be {"a":0, "b":10}
.
Note, that means if I do not know what the shape of the struct is, then I do not know if this is a single u16
, the
above struct, or something else.