How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

maxh190 · 2025-02-04T18:19:11Z

Hi,

Many thanks for releasing the AF3 source!

The 7BBV in #54, which includes several single glycan residues in the JSON file, is working correctly. My question is how to specify multiple glycan residues per glycan location.

For example, how can I encode the following branched ligand chain for a single glycan location to run AF3 on a local server?
NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))): branched ligand chain

I appreciate your guidance on this.

Augustin-Zidek · 2025-02-05T14:38:50Z

Hi!

This is documented in https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md#defining-glycans.

For your example, you will need to do something like this:

"sequences": [
    # Your protein sequence(s) would go here, let's assume it has ID "P".
    {
      "ligand": {
        "id": ["A", "B"], "ccdCodes": ["NAG"]
      }
    },
    {
      "ligand": {
        "id": ["C", "D", "E", "F", "G", "H"],
        "ccdCodes": ["MAN"]
      }
    },
  "bondedAtomPairs": [
    # The protein - NAG bond.
    [["P", <res num>, <atom name>], ["A", 1, <atom name>]],
    # The NAG - NAG bond.
    [["A", 1, <atom name>], ["B", 1, <atom name>]],
    # The NAG - MAN bond.
    [["B", 1, <atom name>], ["C", 1, <atom name>]],
    # MAN - MAN bonds.
    [["C", 1, <atom name>], ["D", 1, <atom name>]],
    [["D", 1, <atom name>], ["E", 1, <atom name>]],
    [["D", 1, <atom name>], ["F", 1, <atom name>]],
    [["F", 1, <atom name>], ["G", 1, <atom name>]],
    [["F", 1, <atom name>], ["H", 1, <atom name>]],
  ],

maxh190 · 2025-02-05T23:19:38Z

Thanks a lot for your suggestions! However, I didn’t get the expected glycan predictions using the following JSON file:

{
  "name": "WK24_Glycan_monomer",
  "modelSeeds": [1],
  "sequences": [
  {
    "protein": {
      "id": "A",
      "sequence": "WVTVYYGVPVWKEAKTTLFCASDAKAYEKEVHNVWATHACVPTDPNPQEMVLKNVTENFNMWKNDMVDQMHEDVISLWDQSLKPCVKLTPLCVTLNCTNATANATASNSSIIEGMKNCSFNITTELRDKREKKNALFYKLDIVQLDGNSSQYRLINCNTSVITQACPKVSFDPIPIHYCAPAGYAILKCNNKTFTGTGPCNNVSTVQCTHGIKPVVSTQLLLNGSLAEGEIIIRSENITNNGKTIIVQLNESVKIECTRPNNKTRTSIRIGPGQAFYATGQVIGNIREAYCNISESKWNETLQRVSKKLKEYFPHKNITFQPSSGGDLEITTHSFNCGGEFFYCNTSSLFNRTYMANSTDMANSTETNSTRIITIHCRIKQIINMWQEVGRAMYAPPIAGNITCISNITGLLLTRDGGKNNTDTETFRPGGGNMKDNWRSELYKYKVVEVKPLGVAPTNARRRVV"
    }
  },
  {
    "protein": {
      "id": "B",
      "sequence": "RAVGMGAVFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQSNLLKAIEAQQHMLKLTVWGIKQLQARVLALERYLKDQQLLGMWGCSGKLICTTNVYWNSSWSNKTYGDIWDNMTWMQWEREISNYTEIIYELLEESQNQQEKNEQDLLALD"
    }
  },
  {
    "protein": {
      "id": "C",
      "sequence": "QVQLVQSGAEVKKPGASVTVSCQASGYTFTNYYVHWVRQAPGQGLQLMGWIDPSWGRTNYAQNFQGRITMTRDTSTSTVYMEMRSLRSEDTAVYYCARNVATEGSLLHYDYWGQGTLVTVSA"
    }
  },
  {
    "protein": {
      "id": "D",
      "sequence": "EIVLTQSPATLSVSPGERATLSCRASQSVRSNLAWYQQRPGQAPRLLIYGTSTRATGVPARFSGRGSGTEFTLAISSMQSEDFAVYLCLQYNNWWTFGQGTKVEIK"
    }
  },

  {
    "ligand": {
      "id": ["O", "P"], "ccdCodes": ["NAG"]
    }
  },
  {
    "ligand": {
      "id": ["Q", "R", "S", "T", "U", "V"],
      "ccdCodes": ["MAN"]
    }
  }
  ],

  "bondedAtomPairs": [
    [["B", 101, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 106, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 115, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 127, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]]
  ],

  "dialect": "alphafold3",
  "version": 2
}

Would you please review the JSON file above and check for any errors?

Augustin-Zidek · 2025-02-06T10:34:00Z

The problem is that you are reusing the NAG and MAN ligands. You will have to define them separately for each glycan.

I.e. you will need 4 * 2 = 8 NAG ligands, and 4 * 6 = 24 MAN ligands. Then e.g. on position 101 you will use the first 2 NAGs + 6 MANs, on position 106 the next 2 + 6 and so on.

The current definition basically bonds all of the things together creating a glycan shared by the 4 protein residues...

I will improve the bond checking code though to help catch such cases (e.g. fail on duplicate bond definitions).

Augustin-Zidek · 2025-02-06T14:32:00Z

I've added a check for bond uniqueness in a3cf058.

maxh190 · 2025-02-06T14:32:11Z

Many thanks for your guidance!

If I need to define the NAG and MAN ligands for each glycan, there won’t be enough available letters. For example, with 30 glycans in a protein chain, I would need 8 × 30 = 240 letters. AlphaFold Server can handle glycans like NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))) automatically, but I’m unsure how it processes multiple branched ligand chains.

Augustin-Zidek · 2025-02-06T15:11:10Z

If I need to define the NAG and MAN ligands for each glycan, there won’t be enough available letters.

Use double-letter IDs, e.g. AA, AB, AC, AD, ....

AlphaFold Server can handle glycans like NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))) automatically, but I’m unsure how it processes multiple branched ligand chains.

Yes, AlphaFold Server uses a different data pipeline that allows this. This feature is alas not available in the standalone AlphaFold 3.

maxh190 · 2025-02-06T19:41:42Z

Thanks again!

Using double-letter IDs for ligands to multiple glycans is a good idea, but it becomes problematic if there are more than 100 glycans in a multi-chain protein. Can I use three-letter or even four-letter IDs if double-letter IDs are insufficient?

Augustin-Zidek · 2025-02-07T11:41:41Z

Can I use three-letter or even four-letter IDs if double-letter IDs are insufficient?

Yes, you can as long as all are upper-case letters A-Z. Hopefully 26 + 26^2 + 26^3 + 26^4 will be enough for your use-case. :)

maxh190 · 2025-02-11T14:17:51Z

Thanks a lot!!

maxh190 changed the title ~~How to add multiple glycan residues?~~ How can a branched ligand chain be encoded for a single glycan location in a JSON file? Feb 4, 2025

Augustin-Zidek added the question Further information is requested label Feb 5, 2025

Augustin-Zidek closed this as completed Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

maxh190 commented Feb 4, 2025

Augustin-Zidek commented Feb 5, 2025

maxh190 commented Feb 5, 2025 •

edited

Loading

Augustin-Zidek commented Feb 6, 2025

Augustin-Zidek commented Feb 6, 2025

maxh190 commented Feb 6, 2025

Augustin-Zidek commented Feb 6, 2025 •

edited

Loading

maxh190 commented Feb 6, 2025

Augustin-Zidek commented Feb 7, 2025

maxh190 commented Feb 11, 2025

How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

Comments

maxh190 commented Feb 4, 2025

Augustin-Zidek commented Feb 5, 2025

maxh190 commented Feb 5, 2025 • edited Loading

Augustin-Zidek commented Feb 6, 2025

Augustin-Zidek commented Feb 6, 2025

maxh190 commented Feb 6, 2025

Augustin-Zidek commented Feb 6, 2025 • edited Loading

maxh190 commented Feb 6, 2025

Augustin-Zidek commented Feb 7, 2025

maxh190 commented Feb 11, 2025

maxh190 commented Feb 5, 2025 •

edited

Loading

Augustin-Zidek commented Feb 6, 2025 •

edited

Loading