Benchmarking Compound Activity Prediction for Real-World Drug Discovery Applications
CARA is a benchmark of compound activity prediction and evaluation for real-world applications.
CARA defines six tasks with two task types and three target types. The two task types are virtual screening (VS) and lead optimization (LO). The VS task focuses on screening hit compounds for specific target from a compound library with diverse scaffolds. The LO task tries to optimize a compound to those that have better activities.
The three target types are All, Kinase, and G-protein coupled receptor (GPCR).
As a result, the six tasks are VS-All, VS-Kinase, VS-GPCR, LO-All, LO-Kinase, and LO-GPCR.
We suggest use VS-All and LO-All for performance evaluation and comparison (see our manuscript for more details).
The train-test splitting is conducted at the assay level, i.e., training assay and test assay. We also make sure that there is no data leakage.
For the VS task, we use new-protein splitting scheme such that the protein targets in the test assays were not seen during training.
For the LO task, we use new-assay splitting scheme such that the congeneric compounds in the test assays were not seen during training.
For FS scenario, the samples in the test assays are further splitted into support samples and query samples. Therefore, you can use the support samples for training or fine-tuning. In this case, the query samples are used for evaluation.
For the VS task, we care more about the accuracy of top ranking compounds, therefore, we mainly use enrichment factors.
EF@1%: Enrichment factor at top 1%. The hit compounds are defined as those with top 1% highest activities.
EF@5%: Enrichment factor at top 5%. The hit compounds are defined as those with top 5% highest activities.
SR@1%: Success rate at top 1%. The hit compounds are defined as those with top 1% highest activities. Success: at least one hit compound ranked at the top 1% of the list by predicted scores.
SR@5%: Success rate at top 5%. The hit compounds are defined as those with top 5% highest activities. Success: at least one hit compound ranked at the top 5% of the list by predicted scores.
For the LO task, we need the overall rankings of the compounds, therefore we use correlation coefficients.
PCC: Pearson’s correlation coefficient.
SCC: Spearman’s correlation coefficient.
SR@0.5: Success rate with PCC > 0.5. Success: PCC > 0.5.
Task | VS-All | VS-Kinase | VS-GPCR | LO-All | LO-Kinase | LO-GPCR |
---|---|---|---|---|---|---|
#Assays | 12,029 | 2,733 | 2,256 | 81,187 | 11,276 | 22,917 |
#Proteins | 2,242 | 434 | 268 | 4,456 | 487 | 579 |
#Compounds | 317,855 | 25,943 | 41,352 | 625,099 | 111,279 | 161,263 |
#Samples | 1,237,256 | 84,605 | 70,179 | 1,187,136 | 200,800 | 321,904 |
#Training assays | 9,408 | 1,459 | 1,584 | 81,033 | 11,220 | 22,872 |
#Test assays | 100 | 58 | 18 | 100 | 54 | 43 |
We provide the code for model training and performance evaluation based on CARA. You can install the following packages to run the code, the installation time may cost a few minutes to several hours. The training may cost several hours to a few days for different tasks, dataset sizes, or methods.
python=3.7.11
pandas=1.3.5
numpy=1.21.5
scipy=1.7.3
json=2.0.9
scikit-learn=1.0.2
pytorch=1.12.1
torch-geometric=2.2.0
rdkit=2020.09.1.0
gensim=3.8.3
networkx=2.6.3
subword_nmt=0.3.8
codecs
python -u runTrain.py --model [MODEL] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
MODEL
: model name, e.g., DeepConvDTITASK
: task name, e.g., VS_AllSUBSET
: subset of data, e.g., train, supportINFO
: mark for training, e.g., fastTrain, begin with ‘fast’ will print less to speed upGPU
: gpu number, e.g., 0python -u runTrain.py --model [MODEL]Meta --dataset_params task:[TASK],subset:[SUBSET],n_way:[N_WAY],k_shot:[K_SHOT] --info [INFO] --gpu [GPU]
N_WAY
: number of assays per batch, e.g., 1K_SHOT
: number of support samples per assay, e.g., 50python -u runTest.py --model [MODEL] --model_path [MODEL_PATH] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
MODEL
: model name, e.g., DeepConvDTIMODEL_PATH
: folder to the trained modelsTASK
: task name, e.g., VS_AllSUBSET
: subset of data, e.g., test, queryINFO
: mark for training, e.g., test,GPU
: gpu number, e.g., 0python -u runTest.py --model [MODEL]Meta ---model_path [MODEL_PATH] --dataset_params task:[TASK],subset:finetune,step:[STEP],shot:[SHOT] --info [INFO] --gpu [GPU]
STEP
: number of steps for fine-tuning, e.g., 10000SHOT
: number of support samples per assay for fine-tuning, e.g., 50Method | EF@1% | SR@1% (%) | EF@5% | SR@5% (%) |
---|---|---|---|---|
DeepConvDTI | 9.48 ± 1.22 | 39.40 ± 2.73 | 3.22 ± 0.24 | 81.60 ± 2.87 |
DeepDTA | 8.76 ± 1.56 | 36.00 ± 3.52 | 3.37 ± 0.43 | 83.40 ± 2.87 |
DeepCPI | 7.73 ± 0.34 | 31.80 ± 1.94 | 2.95 ± 0.22 | 78.60 ± 2.65 |
MONN | 7.08 ± 0.64 | 33.00 ± 2.68 | 2.70 ± 0.47 | 76.00 ± 4.15 |
Tsubaki | 6.09 ± 1.30 | 30.60 ± 2.80 | 2.53 ± 0.14 | 79.20 ± 2.86 |
TransformerCPI | 5.60 ± 0.65 | 28.20 ± 3.06 | 2.46 ± 0.29 | 78.00 ± 2.53 |
MolTrans | 5.61 ± 0.90 | 29.60 ± 2.80 | 2.20 ± 0.13 | 74.00 ± 1.79 |
GraphDTA | 4.70 ± 0.88 | 24.40 ± 1.96 | 1.88 ± 0.21 | 70.80 ± 4.07 |
| Method | SCC | PCC | SR@0.5 (%) | |—|—|—|—| | DeepConvDTI | 0.30 ± 0.01 | 0.31 ± 0.01 | 26.60 ± 2.15 | | DeepDTA | 0.28 ± 0.01 | 0.30 ± 0.01 | 22.40 ± 1.36 | | DeepCPI | 0.24 ± 0.01 | 0.25 ± 0.01 | 16.00 ± 0.63 | | MONN | 0.25 ± 0.01 | 0.27 ± 0.01 | 15.40 ± 2.24 | | Tsubaki | 0.19 ± 0.02 | 0.19 ± 0.01 | 9.40 ± 1.62 | | TransformerCPI | 0.19 ± 0.01 | 0.19 ± 0.02 | 8.00 ± 2.90 | | MolTrans | 0.20 ± 0.01 | 0.20 ± 0.02 | 12.20 ± 1.47 | | GraphDTA | 0.22 ± 0.01 | 0.24 ± 0.01 | 15.20 ± 2.04 |
Strategy | Method | EF@1% | SR@1% (%) |
---|---|---|---|
Pre-training | DeepCPI | 7.96 ± 0.82 | 29.80 ± 1.47 |
Pre-training | DeepDTA | 9.17 ± 1.53 | 34.20 ± 4.26 |
Pre-training | DeepConvDTI | 9.14 ± 1.28 | 35.40 ± 2.87 |
QSAR | RF | 7.83 ± 0.49 | 28.40 ± 0.80 |
QSAR | GBT | 7.90 ± 0.52 | 29.60 ± 1.02 |
QSAR | SVM | 6.29 ± 0.00 | 28.00 ± 0.00 |
QSAR | DNN | 8.82 ± 0.39 | 27.60 ± 0.80 |
Pre-training and fine-tuning | DeepCPI | 9.92 ± 0.86 | 32.80 ± 1.17 |
Pre-training and fine-tuning | DeepDTA | 13.02 ± 3.26 | 41.20 ± 5.19 |
Pre-training and fine-tuning | DeepConvDTI | 11.17 ± 1.94 | 36.80 ± 2.64 |
Meta-learning | DeepCPI-c | 11.85 ± 2.20 | 36.60 ± 2.73 |
Meta-learning | MTDNN | 12.76 ± 0.37 | 41.60 ± 1.20 |
Meta-learning | DeepConvDTI-c | 15.67 ± 1.66 | 44.60 ± 4.13 |
Multi-task learning | MTDNN | 4.20 ± 1.58 | 25.20 ± 4.75 |
Multi-task learning | DeepConvDTI-c | 15.95 ± 1.77 | 45.20 ± 1.72 |
Re-training | DeepCPI | 13.29 ± 1.34 | 41.20 ± 2.93 |
Re-training | DeepDTA | 11.81 ± 2.08 | 39.00 ± 3.41 |
Re-training | DeepConvDTI | 13.28 ± 0.59 | 42.80 ± 1.72 |
Strategy | Method | PCC | SR@0.5 (%) |
---|---|---|---|
Pre-training | DeepCPI | 0.26 ± 0.01 | 16.60 ± 1.02 |
Pre-training | DeepDTA | 0.30 ± 0.01 | 24.00 ± 1.67 |
Pre-training | DeepConvDTI | 0.32 ± 0.01 | 28.60 ± 1.85 |
QSAR | RF | 0.55 ± 0.00 | 58.20 ± 1.72 |
QSAR | GBT | 0.54 ± 0.00 | 57.80 ± 1.17 |
QSAR | SVM | 0.57 ± 0.00 | 65.00 ± 0.00 |
QSAR | DNN | 0.54 ± 0.00 | 61.20 ± 1.72 |
Pre-training and fine-tuning | DeepCPI | 0.33 ± 0.01 | 18.80 ± 0.40 |
Pre-training and fine-tuning | DeepDTA | 0.49 ± 0.01 | 51.00 ± 1.79 |
Pre-training and fine-tuning | DeepConvDTI | 0.39 ± 0.01 | 32.40 ± 1.36 |
Meta-learning | DeepCPI-c | 0.42 ± 0.02 | 37.40 ± 3.44 |
Meta-learning | MTDNN | 0.46 ± 0.04 | 42.26 ± 6.92 |
Meta-learning | DeepConvDTI-c | 0.55 ± 0.01 | 62.80 ± 4.07 |
Multi-task learning | MTDNN | 0.48 ± 0.00 | 48.20 ± 1.60 |
Multi-task learning | DeepConvDTI-c | 0.58 ± 0.00 | 65.40 ± 2.06 |
Re-training | DeepCPI | 0.44 ± 0.00 | 43.89 ± 2.91 |
Re-training | DeepDTA | 0.46 ± 0.01 | 45.80 ± 1.72 |
Re-training | DeepConvDTI | 0.50 ± 0.01 | 54.40 ± 1.96 |