Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can a branched ligand chain be encoded for a single glycan location in a JSON file? #288

Closed
maxh190 opened this issue Feb 4, 2025 · 9 comments
Labels
question Further information is requested

Comments

@maxh190
Copy link

maxh190 commented Feb 4, 2025

Hi,

Many thanks for releasing the AF3 source!

The 7BBV in #54, which includes several single glycan residues in the JSON file, is working correctly. My question is how to specify multiple glycan residues per glycan location.

For example, how can I encode the following branched ligand chain for a single glycan location to run AF3 on a local server?
NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))): branched ligand chain

Image

I appreciate your guidance on this.

@maxh190 maxh190 changed the title How to add multiple glycan residues? How can a branched ligand chain be encoded for a single glycan location in a JSON file? Feb 4, 2025
@Augustin-Zidek Augustin-Zidek added the question Further information is requested label Feb 5, 2025
@Augustin-Zidek
Copy link
Collaborator

Hi!

This is documented in https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md#defining-glycans.

For your example, you will need to do something like this:

"sequences": [
    # Your protein sequence(s) would go here, let's assume it has ID "P".
    {
      "ligand": {
        "id": ["A", "B"], "ccdCodes": ["NAG"]
      }
    },
    {
      "ligand": {
        "id": ["C", "D", "E", "F", "G", "H"],
        "ccdCodes": ["MAN"]
      }
    },
  "bondedAtomPairs": [
    # The protein - NAG bond.
    [["P", <res num>, <atom name>], ["A", 1, <atom name>]],
    # The NAG - NAG bond.
    [["A", 1, <atom name>], ["B", 1, <atom name>]],
    # The NAG - MAN bond.
    [["B", 1, <atom name>], ["C", 1, <atom name>]],
    # MAN - MAN bonds.
    [["C", 1, <atom name>], ["D", 1, <atom name>]],
    [["D", 1, <atom name>], ["E", 1, <atom name>]],
    [["D", 1, <atom name>], ["F", 1, <atom name>]],
    [["F", 1, <atom name>], ["G", 1, <atom name>]],
    [["F", 1, <atom name>], ["H", 1, <atom name>]],
  ],

@maxh190
Copy link
Author

maxh190 commented Feb 5, 2025

Thanks a lot for your suggestions! However, I didn’t get the expected glycan predictions using the following JSON file:

{
  "name": "WK24_Glycan_monomer",
  "modelSeeds": [1],
  "sequences": [
  {
    "protein": {
      "id": "A",
      "sequence": "WVTVYYGVPVWKEAKTTLFCASDAKAYEKEVHNVWATHACVPTDPNPQEMVLKNVTENFNMWKNDMVDQMHEDVISLWDQSLKPCVKLTPLCVTLNCTNATANATASNSSIIEGMKNCSFNITTELRDKREKKNALFYKLDIVQLDGNSSQYRLINCNTSVITQACPKVSFDPIPIHYCAPAGYAILKCNNKTFTGTGPCNNVSTVQCTHGIKPVVSTQLLLNGSLAEGEIIIRSENITNNGKTIIVQLNESVKIECTRPNNKTRTSIRIGPGQAFYATGQVIGNIREAYCNISESKWNETLQRVSKKLKEYFPHKNITFQPSSGGDLEITTHSFNCGGEFFYCNTSSLFNRTYMANSTDMANSTETNSTRIITIHCRIKQIINMWQEVGRAMYAPPIAGNITCISNITGLLLTRDGGKNNTDTETFRPGGGNMKDNWRSELYKYKVVEVKPLGVAPTNARRRVV"
    }
  },
  {
    "protein": {
      "id": "B",
      "sequence": "RAVGMGAVFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQSNLLKAIEAQQHMLKLTVWGIKQLQARVLALERYLKDQQLLGMWGCSGKLICTTNVYWNSSWSNKTYGDIWDNMTWMQWEREISNYTEIIYELLEESQNQQEKNEQDLLALD"
    }
  },
  {
    "protein": {
      "id": "C",
      "sequence": "QVQLVQSGAEVKKPGASVTVSCQASGYTFTNYYVHWVRQAPGQGLQLMGWIDPSWGRTNYAQNFQGRITMTRDTSTSTVYMEMRSLRSEDTAVYYCARNVATEGSLLHYDYWGQGTLVTVSA"
    }
  },
  {
    "protein": {
      "id": "D",
      "sequence": "EIVLTQSPATLSVSPGERATLSCRASQSVRSNLAWYQQRPGQAPRLLIYGTSTRATGVPARFSGRGSGTEFTLAISSMQSEDFAVYLCLQYNNWWTFGQGTKVEIK"
    }
  },

  {
    "ligand": {
      "id": ["O", "P"], "ccdCodes": ["NAG"]
    }
  },
  {
    "ligand": {
      "id": ["Q", "R", "S", "T", "U", "V"],
      "ccdCodes": ["MAN"]
    }
  }
  ],

  "bondedAtomPairs": [
    [["B", 101, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 106, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 115, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]],
    [["B", 127, "CA"], ["O", 1, "CA"]],
    [["O", 1, "CA"], ["P", 1, "CA"]],
    [["P", 1, "CA"], ["Q", 1, "CA"]],
    [["Q", 1, "CA"], ["R", 1, "CA"]],
    [["R", 1, "CA"], ["S", 1, "CA"]],
    [["R", 1, "CA"], ["T", 1, "CA"]],
    [["T", 1, "CA"], ["U", 1, "CA"]],
    [["T", 1, "CA"], ["V", 1, "CA"]]
  ],

  "dialect": "alphafold3",
  "version": 2
}

Would you please review the JSON file above and check for any errors?

@Augustin-Zidek
Copy link
Collaborator

The problem is that you are reusing the NAG and MAN ligands. You will have to define them separately for each glycan.

I.e. you will need 4 * 2 = 8 NAG ligands, and 4 * 6 = 24 MAN ligands. Then e.g. on position 101 you will use the first 2 NAGs + 6 MANs, on position 106 the next 2 + 6 and so on.

The current definition basically bonds all of the things together creating a glycan shared by the 4 protein residues...

I will improve the bond checking code though to help catch such cases (e.g. fail on duplicate bond definitions).

@Augustin-Zidek
Copy link
Collaborator

I've added a check for bond uniqueness in a3cf058.

@maxh190
Copy link
Author

maxh190 commented Feb 6, 2025

Many thanks for your guidance!

If I need to define the NAG and MAN ligands for each glycan, there won’t be enough available letters. For example, with 30 glycans in a protein chain, I would need 8 × 30 = 240 letters. AlphaFold Server can handle glycans like NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))) automatically, but I’m unsure how it processes multiple branched ligand chains.

@Augustin-Zidek
Copy link
Collaborator

Augustin-Zidek commented Feb 6, 2025

If I need to define the NAG and MAN ligands for each glycan, there won’t be enough available letters.

Use double-letter IDs, e.g. AA, AB, AC, AD, ....

AlphaFold Server can handle glycans like NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))) automatically, but I’m unsure how it processes multiple branched ligand chains.

Yes, AlphaFold Server uses a different data pipeline that allows this. This feature is alas not available in the standalone AlphaFold 3.

@maxh190
Copy link
Author

maxh190 commented Feb 6, 2025

Thanks again!

Using double-letter IDs for ligands to multiple glycans is a good idea, but it becomes problematic if there are more than 100 glycans in a multi-chain protein. Can I use three-letter or even four-letter IDs if double-letter IDs are insufficient?

@Augustin-Zidek
Copy link
Collaborator

Can I use three-letter or even four-letter IDs if double-letter IDs are insufficient?

Yes, you can as long as all are upper-case letters A-Z. Hopefully 26 + 26^2 + 26^3 + 26^4 will be enough for your use-case. :)

@maxh190
Copy link
Author

maxh190 commented Feb 11, 2025

Thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants