- Irakli Gozalishvili, DAG House
- Joel Thorstensson, 3Box Labs
- Quinn Wilton, Fission
- Brooklyn Zelenka, Fission
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Varsig is a multiformat for describing signatures over IPLD data and raw bytes in a way that preserves information about the payload and canonicalization information.
IPLD is a deterministic encoding scheme for data expressed in common types plus content addressed links.
Common formats such as JWT use encoding (e.g. base64) and text separators (e.g. "."
) to pass around encoded data and their signatures:
// JWT
"eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCIsInVjdiI6IjAuOC4xIn0.eyJhdWQiOiJkaWQ6a2V5Ono2TWtyNWFlZmluMUR6akc3TUJKM25zRkNzbnZIS0V2VGIyQzRZQUp3Ynh0MWpGUyIsImF0dCI6W3sid2l0aCI6eyJzY2hlbWUiOiJ3bmZzIiwiaGllclBhcnQiOiIvL2RlbW91c2VyLmZpc3Npb24ubmFtZS9wdWJsaWMvcGhvdG9zLyJ9LCJjYW4iOnsibmFtZXNwYWNlIjoid25mcyIsInNlZ21lbnRzIjpbIk9WRVJXUklURSJdfX1dLCJleHAiOjkyNTY5Mzk1MDUsImlzcyI6ImRpZDprZXk6ejZNa2tXb3E2UzN0cVJXcWtSbnlNZFhmcnM1NDlFZnU2cUN1NHVqRGZNY2pGUEpSIiwicHJmIjpbXX0.SjKaHG_2Ce0pjuNF5OD-b6joN1SIJMpjKjjl4JE61_upOrtvKoDQSxZ7WeYVAIATDl8EmcOKj9OqOSw0Vg8VCA"
Many binary-as-text encodings are inefficient and inconvenient. Others have opted to use canonicalization and a tag. This can be effective, but requires careful handling and signalling of the specific canonicalization method used.
const payload = canonicalize({"hello": "world", "count": 42})
{payload: payload, sig: key.sign(sha256(payload))}
Directly signing over canonicalized data introduces new problems: forced encoding and canonicalization attacks.
Data must first be rendered to binary before it is signed. This means imposing an encoding. There is no standard way to include the encoding that some IPLD was encoded with other than a CID. In IPFS, CIDs imply a link, which can have implications for network access and storage. Further, generating a CID means producing a hash, which is then potentially rehashed by the cryptographic signature library.
To remedy this, varsig includes the encoding information used in production of the signature.
Since IPLD is deterministically encoded, it can be tempting to rely on canonicalization at validation time, rather than rendering the IPLD to inline bytes or a CID and signing that. Since the original payload can be rederived from the output, this seems like a clean option:
// DAG-JSON
{
"role": "user",
"links": [
{"/": "bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"},
{"/": "bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"},
{"/": "bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"}
],
"sig": "8ufaS9w3CGN8cbQTUSoL1i7eaKiWLSXsD2LbZVmvM9zF"
}
This opens the potential for canonicalization attacks. Parsers for certain formats — such as JSON — are known to handle duplicate entries differently. IPLD needs to be serialized to a canonical form before checking the signature. Without careful handling, it is possible to fail to check if any additional fields have been added to the payload which will be parsed by the application.
An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.
— RFC8259
{
"role": "user", // Parsed by an IPLD implementation
"role": "admin", // Malicious duplicate field, omitted by the IPLD parser, accepted by the browser
"links": [
{"/": "bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"},
{"/": "bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"},
{"/": "bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"}
],
"sig": "8ufaS9w3CGN8cbQTUSoL1i7eaKiWLSXsD2LbZVmvM9zF"
}
In the above example, the canonicalization step MAY lead to the signature validating, but the client parsing the role: "admin"
field instead.
The above can be quite subtle. Here is a step by step example of one such scenario.
An application receives some block of data, as binary. It checks the claimed CID, which validates.
%x7ba202022726f6c65223a202275736572222ca202022726f6c65223a202261646d696e222ca2020226c696e6b73223a205ba202020207b222f223a20226261666b72656964623271336b7467746c6d3579696f3762756a337379707967686a7466683565726e737465716d616b66347032633562776d7969227d2c20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020207b222f223a20226261666b72656963373579646735766b773332346f716b636d716c74667663336b6976796e67716b69626a6f7973647769696c616b68347a356665227d2ca202020207b222f223a20226261666b726569666664697a3672616634367a727233623275737566677a35666f34346167676d6f637a347a61707072366b6868686c6a63647079227da20205d2ca202022736967223a2022387566615339773343474e386362515455536f4c31693765614b69574c53587344324c625a566d764d397a4622a7d
Decoded to a string, the above reads as follows:
"{\n
"role": "user",\n
"role": "admin",\n
"links": [\n
{"/": "bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"},\n
{"/": "bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"},\n
{"/": "bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"}\n
],\n
"sig": "8ufaS9w3CGN8cbQTUSoL1i7eaKiWLSXsD2LbZVmvM9zF"\n
}"
Note that the JSON above contains a duplicate role
key a sig
field with a base64 signature.
Next, the application parses the JSON with the browser's native JSON parser.
{
"role": "admin", // Picked the second key
"links": [
{"/": "bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"},
{"/": "bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"},
{"/": "bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"}
],
"sig": "8ufaS9w3CGN8cbQTUSoL1i7eaKiWLSXsD2LbZVmvM9zF"
}
The application MUST check the signature of all field minus the sig
field. Under the assumption that the binary input was safe, and that canonicalization allows for the deterministic manipulation of the payload, the object is parsed to an internal IPLD representation using Rust/Wasm.
Ipld::Assoc([
("role", Ipld::String("user")),
(
"links",
Ipld::Array([
Ipld::Cid("bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"),
Ipld::Cid("bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"),
Ipld::Cid("bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"),
]),
),
(
"sig",
Ipld::Binary([
%xf2, %xe7, %xda, %x4b, %xdc, %x37, %x08, %x63, %x7c, %x71, %xb4, %x13, %x51, %x2a,
%x0b, %xd6, %x2e, %xde, %x68, %xa8, %x96, %x2d, %x25, %xec, %x0f, %x62, %xdb, %x65,
%x59, %xaf, %x33, %xdc, %xc5,
]),
),
]);
Note that the IPLD parser has dropped the role: "admin"
key.
The "sig"
field is then removed, and the remaining fields serialized to binary;
Ipld::DagJson::serialize(
Ipld::Assoc([
("role", Ipld::String("user")),
(
"links",
Ipld::Array([
Ipld::Cid("bafkreidb2q3ktgtlm5yio7buj3sypyghjtfh5ernsteqmakf4p2c5bwmyi"),
Ipld::Cid("bafkreic75ydg5vkw324oqkcmqltfvc3kivyngqkibjoysdwiilakh4z5fe"),
Ipld::Cid("bafkreiffdiz6raf46zrr3b2usufgz5fo44aggmocz4zappr6khhhljcdpy"),
]),
)
])
);
The signature is then checked against the above fields, which passes since there's only a role: "user"
entry. The application then uses the original JSON with the role: "admin"
entry.
Data that has already been parsed to an in-memory IPLD representation can be canonically encoded trivially: it has already been through a parser / validator.
Data purporting to conform to an IPLD encoding (such as DAG-JSON) MUST be validated prior to signature verification. This MAY be as simple as round-trip decoding/encoding the JSON and checking that the hash matches. A validation error MUST be signalled if it does not match.
[Implementers] may provide an opt-in for systems where round-trip determinism is a desireable [sic] feature and backward compatibility with old, non-strict data is unnecessary.
As it is critical for guarding against various attacks, the assumptions around canonical encoding MUST be enforced.
Rather than validating the inline IPLD, replacing the data with a CID link to the content MAY be used instead. Note while this is very safe (as it is impractical to alter a signed hash), this approach mixes data layout with security, and may have a performance, disk, and networking impacts.
Signing CIDs has two additional caching consequences:
- Signing CIDs enables a simple strategy for caching validation by CID.
- Such a strategy also MAY require accounting for revocation of the signing keys themselves. In this case, the cache would need to include additional information about the signing key.
Canonicalization is not required if data is encoded as raw bytes (multicodec %x55
). The exact bytes are already present, and MUST not be changed.
After being decoded from unsigned varints, a varsig includes the following segments:
varsig = multibase-prefix %x34 varsig-header varsig-body
multibase-prefix = ALPHA ; Multibase
varsig-header = unsigned-varint ; Usually the public key code from Multicodec
varsig-body = *OCTET; Zero or more segments required by the kind of varsig (e.g. raw bytes, hash algorithm, etc)
For example, here is an EdDSA signature for some content encoded as DAG-PB:
%x34ed01ae3784f03f9ee1163382fa6efa73b0c31ecf58c899c836709303ba4621d1e6df20e09aaa568914290b7ea124f5b38e70b9b69c7de0d216880eac885edd41c302
The varsig prefix MUST be %x34
.
The prefix of the signature algorithm. This is often the multicodec of the associated public key, but MAY be unique for the signature type. The code MAY live outside the multicodec table. This field MUST act as a discriminant for how many expected fields come in the varsig body, and what each of them mean.
The varsig body MUST consist of one or more segments, and MUST be defined by the signature algorithm.
Some examples include:
- Raw signature bytes only
- CID of DKIM certification transparency record, and raw signature bytes
- Hash algorithm multicodec prefix, data encoding prefix, signature counter, nonce, HMAC, and raw signature bytes
The IPLD data model is encoding agnostic by design. This is very convenient in many applications, such as making for very convenient conversions between types for transmission versus encoding. Unfortunately signatures require signing over specific bytes, and thus over a specific encoding of the data.
To facilitate this, the type encoding-info
MAY be used:
encoding-info
= %x5F ; Single verbatim payload (without key)
/ %x70 ; DAG-PB multicodec prefix
/ %x71 ; DAG-CBOR multicodec prefix
/ %x0129 ; DAG-JSON multicodec prefix
/ %x6A77 ; JWT
/ %xE191 encoding-info ; EIP-191
message-byte-length = unsigned-varint
To manage this, it is RECOMMENDED that varsig types include a nested encoding multiformat. For example, here's a 2048-bit RS256 signature over some DAG-CBOR:
; RSA 256-bytes sig-bytes
; | | |
; v v v
%x34 %x1205 %x12 %x0100 %x71 256(OCTET)
; ^ ^ ^
; | | |
;varsig SHA-256 DAG-CBOR
And another showing data signed with EIP-191:
; secp256k1 EIP-191
; | |
; v v
%x34 %xE7 %x1B %xE191 64(OCTET)
; ^ ^ ^
; | | |
;varsig keccak-256 sig-bytes
Note that in the above examples, more nested information MAY be nested inside the encoding info section, depending on the definition of the encoding info.
Below are a few common signature headers and their fields.
RSASSA-PKCS #1 v1.5 signatures MUST include the following segments:
rsa-varsig = rsa-varsig-header rsa-hash-algorithm signature-byte-length encoding-info sig-bytes
rsa-varsig-header = %x1205 ; RSASSA-PKCS #1 v1.5
rsa-hash-algorithm = unsigned-varint
signature-byte-length = unsigned-varint
encoding-info = 1*unsigned-varint ; Number of segments defined by the encoding header
sig-bytes = *OCTET
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
rsa-varsig-header |
%x1205 |
%x8524 |
RSASSA-PKCS #1 v1.5 multicodec prefix |
rsa-hash-algorithm |
%x12 |
%x12 |
SHA2-256 multicodec prefix |
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
rsa-varsig-header |
%x1205 |
%x8524 |
RSASSA-PKCS #1 v1.5 multicodec prefix |
rsa-hash-algorithm |
%x13 |
%x13 |
SHA2-512 multicodec prefix |
ed25519-varsig = ed25519-varsig-header encoding-info sig-bytes
ed25519-varsig-header = %xED ; Ed25519 multicodec prefix
encoding-info = 1*unsigned-varint
sig-bytes = 64(OCTET)
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
ed25519-varsig-header |
%xED |
%xED01 |
Ed25519 key multicodec prefix |
ECDSA defines a general mechanism over many elliptic curves.
ecdsa-varsig = ecdsa-varsig-header ecdsa-hash-algorithm encoding-info sig-bytes
ecdsa-varsig-header = unsigned-varint
ecdsa-hash-algorithm = unsigned-varint
encoding-info = 1*unsigned-varint
sig-bytes = *OCTET
Here are a few examples encoded as varsig:
es256-varsig = es256-varsig-header es256-hash-algorithm encoding-info sig-bytes
es256-varsig-header = %x1200 ; P-256 multicodec prefix
es256-hash-algorithm = %x12 ; SHA2-256
encoding-info = 1*unsigned-varint
sig-bytes = 64(OCTET)
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
es256-varsig-header |
%x1200 |
%x8024 |
P-256 multicodec prefix |
es256-hash-algorithm |
%x12 |
%x12 |
SHA2-256 multicodec prefix |
es256k-varsig = es256k-varsig-header es256k-hash-algorithm encoding-info sig-bytes
es256k-varsig-header = %xe7 ; secp256k1 multicodec prefix
es256k-hash-algorithm = %x12 ; SHA2-256
encoding-info = 1*unsigned-varint
sig-bytes = 64(OCTET)
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
es256k-varsig-header |
%xE7 |
%xE701 |
secp256k1 multicodec prefix |
es256k-hash-algorithm |
%x12 |
%x12 |
SHA2-256 multicodec prefix |
es512-varsig = es512-varsig-header es512-hash-algorithm encoding-info sig-bytes
es512-varsig-header = %x1202 ; P-521 multicodec prefix
es512-hash-algorithm = %x13 ; SHA2-512
encoding-info = 1*unsigned-varint
sig-bytes = 128(OCTET)
Segment | Hexadecimal | Unsigned Varint | Comment |
---|---|---|---|
es512-varsig-header |
%x1202 |
%x8224 |
P-521 multicodec prefix |
es512-hash-algorithm |
%x13 |
%x13 |
SHA2-512 multicodec prefix |