Need to create config.pl file where to write information about database connexion for pdbe_test (db, user and password) in the config directory
Need to create a schema under pdbe_test in Oracle SGBD
Main program is mapping_process.pl
All the files generated are wrote under MDA_results/CATH_4_1 directory
- Get different PDB entries data from ENTITY_CATH and ENTITY_SCOP tables (PDBEREAD_PDBE_LIVE schema)
- Import in SEGMENT_CATH and SEGMENT_SCOP
- Aggregate data from the both tables on
auth_asym_id
in SEGMENT_CATH_SCOP
- Compare start/end residue numbers
- Calculate percentage and overlapping for domains
- Enter data in DOMAIN_MAPPING
Calculate percentage, overlapping, medals for nodes (superfamilies)
If sequence coverage by CATH domain and by SCOP domain > 25% => enter data in NODE_MAPPING
Clustering nodes using DOMAIN_MAPPING and NODE_MAPPING
If average domain coverage at SF level > 25% and overlapping between CATH and SCOP > 25% => enter data in CLUSTER
Determine MDA blocks for each cluster (MDA block is a sequence of following CATH and SCOP domain superfamilies)
If overlapping between CATH and SCOP > 50% => enter data in MDA_BLOCKS with different CATH and SCOP domains begin-end positions
Link with cluster in CLUSTER_BLOCK table
- For each MDA block, get chains ID with uniprot ID and uniprot sequence coverage percentage by the chain corresponding
- Enter data into BLOCK_CHAIN and BLOCK_UNIPROT tables
- Write mda_blocks.list (uniprot with number of chain for each block) and mda_info.list (for each uniprot in each block: chain and coverage)
chopping: equivalence split, one instance, class4...
homology differences
Write info in files
Main program is mapping_process_ecod.pl
All the files generated are wrote under MDA_results/CATH_ECOD directory
The mapping process is the same as the CATH/SCOP mapping above, but just replacing SCOP data by ECOD data and the tables names have _ECOD added at the end
To get the SEGMENT_ECOD table the program calls a python script located under the update_database directory. ECOD data are obtained from the ECOD website (http://prodata.swmed.edu/ecod/distribution/ecod.latest.domains.txt).