In recent years, artificial intelligence makes its appearance in extremely different fields with promising results able to produce enormous steps forward in some circumstances. In chemoinformatics the use of machine learning technique, in particular, allows the scientific community to build apparently accurate scoring functions for computational docking. These types of scoring functions can overperform classic ones, the type of scoring functions used until now. However the comparison between classic and machine learning scoring functions are based on particular tests which can favour these latter, as highlighted by some studies. In particular the machine learning scoring functions, per definition, must be trained on some data, passing to the model the instances chosen to describe the complexes and the relative ligand-protein affinity. In these conditions the scoring power of the machine learning scoring functions can be evaluated on different dataset and the scoring functions performance recorded can be different depending on it. In particular, datasets very similar to the one used for the training phase of the machine learning scoring function can facilitate in reaching high performance in the scoring power. The objective of the present study is to verify the real efficiency and the effective performances of the new born machine learning scoring functions. Our aim is to give an answer to the scientific community about the doubts on the fact that the machine learning scoring function can be or not the revolutionary road to be followed in the field of chemioinformatic and drug discovery. In order to do this many tests are conducted and a definitive test protocol to be executed to exhaustive validate a new machine learning scoring function is proposed . Here we investigate what are the circumstances in which a machine learning scoring function produces overestimated performances and why it can happen. As a possible solution we propose a tests protocol to be followed in order to guarantee a real performance descriptions of machine learning scoring functions. Eventually an effective and innovative solution in the field of machine learning scoring functions is proposed. It consists in the use of per-target scoring functions which are machine learning scoring functions created using complexes coming from a single protein and able to predict the affinity of complexes which use that target. The data used to build the model are synthetic and for this reason are easy to be created. The performances on the target chosen are better than the ones obtained with basic model of scoring functions and machine learning scoring functions trained on database composed by more than one protein.

Ligand-Protein Binding Affinity Prediction Using Machine Learning Scoring Functions

PELLICANI, FRANCESCO
2023-01-20

Abstract

In recent years, artificial intelligence makes its appearance in extremely different fields with promising results able to produce enormous steps forward in some circumstances. In chemoinformatics the use of machine learning technique, in particular, allows the scientific community to build apparently accurate scoring functions for computational docking. These types of scoring functions can overperform classic ones, the type of scoring functions used until now. However the comparison between classic and machine learning scoring functions are based on particular tests which can favour these latter, as highlighted by some studies. In particular the machine learning scoring functions, per definition, must be trained on some data, passing to the model the instances chosen to describe the complexes and the relative ligand-protein affinity. In these conditions the scoring power of the machine learning scoring functions can be evaluated on different dataset and the scoring functions performance recorded can be different depending on it. In particular, datasets very similar to the one used for the training phase of the machine learning scoring function can facilitate in reaching high performance in the scoring power. The objective of the present study is to verify the real efficiency and the effective performances of the new born machine learning scoring functions. Our aim is to give an answer to the scientific community about the doubts on the fact that the machine learning scoring function can be or not the revolutionary road to be followed in the field of chemioinformatic and drug discovery. In order to do this many tests are conducted and a definitive test protocol to be executed to exhaustive validate a new machine learning scoring function is proposed . Here we investigate what are the circumstances in which a machine learning scoring function produces overestimated performances and why it can happen. As a possible solution we propose a tests protocol to be followed in order to guarantee a real performance descriptions of machine learning scoring functions. Eventually an effective and innovative solution in the field of machine learning scoring functions is proposed. It consists in the use of per-target scoring functions which are machine learning scoring functions created using complexes coming from a single protein and able to predict the affinity of complexes which use that target. The data used to build the model are synthetic and for this reason are easy to be created. The performances on the target chosen are better than the ones obtained with basic model of scoring functions and machine learning scoring functions trained on database composed by more than one protein.
20-gen-2023
Science and Technology
Settore FIS/02 - Fisica Teorica, Modelli e Metodi Matematici
Settore PHYS-02/A - Fisica teorica delle interazioni fondamentali, modelli, metodi matematici e applicazioni
URN:NBN:IT:UNICAM-158529
PILATI, Sebastiano
File in questo prodotto:
File Dimensione Formato  
01_20_23 - Pellicani Francesco.pdf

Open Access dal 21/07/2023

Descrizione: Tesi di dottorato FRANCESCO PELLICANI
Tipologia: Altro materiale allegato
Licenza: DRM non definito
Dimensione 2.18 MB
Formato Adobe PDF
2.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/483525
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact