This repository contains the code for the COVID-19 Drug Discovery Paper by O'Donovan et al published in journal. This repository contains all the code andd supplementary files needed to complete the analysis from start to finish. It also contains the data files that we used to generate particular results from each version for quicker verification.
This analysis runs best on *nix or macOS operating systems. Run the following commands to run the analysis including the data download.
$ git clone https://github.com/AliSajid/Covid19.git
$ cd Covid19
$ make
This analysis requires R (> 4.0.0)
.
├── ... # Boilerplate files and primary scripts
├── data # Contains all the data downloaded and generated from iLINCS
│ ├── A549-10uM-24h #
│ ├── A549-10uM-6h #
│ ├── HA1E-10uM-24h #
│ ├── HT29-10uM-24h # Filtered drug and group signatures and connected perturbagens
│ ├── MCF7-10uM-24h # for all 8 cell line combinations
│ ├── PC3-10uM-24h #
│ ├── VCAP-10uM-24h #
│ ├── VCAP-10uM-6h #
│ ├── disease # Filtered disease signatures and connected perturbagens
│ └── signatures # Downloaded signatures for diseases and drugs and drug groups
├── figures # Contains all the generated figures
| ├── ... # All primary figures in PNG, JPG and PDF formats
│ ├── densityplots # Density plots for all group signatures signifying normal distribution
│ ├── histograms # Histograms for all group signatures signifying normal distribution
│ └── qqplots # Quartile-Quartile plots for all group signatures signifying normal distribution
├── maps # Drug signature to drug name maps for all 8 cell line combinations
├── raw # Files that are imported from outside and shouldn't be deleted
| ├── ... # General files
│ ├── annotation # Annotation and gene counts for the GSE56192 dataset
│ └── drug_signature_list # List of signatures for each drug in our list
├── renv # Renv Dependency Management directory
└── results # Final Results folder
This file is used to set up the directory structure and other initialization settings.
This script takes in the drug-specific signatures and creates a file which combines all of that and outputs the groups with maximum representation in the selected drugs
The two scripts download the drug and the group data and connected perturbagens
These scripts process the influenza, mers and sars datasets and generaate their signatures
This script downloads the perturbagnes connected to the disease signatures, including the SARS-CoV-2 signatures
These scripts take the lists of generated perturbagens and create intersections between them at given thresholds
These scripts perform the final steps of the analysis including outputting summarized datasets and threshold tables
These scripts generate all the individual figures