Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions

In recent years, machine learning has been proposed as a promising strategy to build accurate scoring functions for computational docking finalized to numerically empowered drug discovery. However, the latest studies have suggested that over-optimistic results had been reported due to the correlations present in the experimental databases used for training and testing. Here, we investigate the performance of an artificial neural network in binding affinity predictions, comparing results obtained using both experimental protein-ligand structures as well as larger sets of computer-generated structures created using commercial software. Interestingly, similar performances are obtained on both databases. We find a noticeable performance suppression when moving from random horizontal tests to vertical tests performed on target proteins not included in the training data. The possibility to train the network on relatively easily created computer-generated databases leads us to explore per-target scoring functions, trained and tested ad-hoc on complexes including only one target protein. Encouraging results are obtained, depending on the type of protein being addressed.

Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions

Pellicani, F;Dal Ben, D;Perali, A;Pilati, S

2023-01-01

Abstract

In recent years, machine learning has been proposed as a promising strategy to build accurate scoring functions for computational docking finalized to numerically empowered drug discovery. However, the latest studies have suggested that over-optimistic results had been reported due to the correlations present in the experimental databases used for training and testing. Here, we investigate the performance of an artificial neural network in binding affinity predictions, comparing results obtained using both experimental protein-ligand structures as well as larger sets of computer-generated structures created using commercial software. Interestingly, similar performances are obtained on both databases. We find a noticeable performance suppression when moving from random horizontal tests to vertical tests performed on target proteins not included in the training data. The possibility to train the network on relatively easily created computer-generated databases leads us to explore per-target scoring functions, trained and tested ad-hoc on complexes including only one target protein. Encouraging results are obtained, depending on the type of protein being addressed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Rivista
	
				MOLECULES
			
	Codice DOI
	
				https://dx.doi.org/10.3390/molecules28041661
			
	ID tipologia loginMiur
	
				262
			
	Appare nelle tipologie:
	
				1.1 Articolo

File in questo prodotto:

File	Dimensione	Formato
molecules-28-01661 (1).pdf accesso aperto Tipologia: Versione Editoriale Licenza: PUBBLICO - Creative Commons Dimensione 491.35 kB Formato Adobe PDF Visualizza/Apri	491.35 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/471565

Citazioni

3

9

9

social impact