Pharmacovigilance is crucial for ensuring drug safety. With the advent of big data and advanced computational techniques, the integration of machine learning (ML) and Quantitative Structure-Activity Relationship (QSAR) predictions has emerged as an innovative approach to enhance pharmacovigilance efforts.
This approach is particularly significant in the clinical trial phase, where the effects of molecules on the human body are still under investigation. We propose a mixed approach (statistical metrics and QSAR properties) that evaluates the contribution of these additional variables (enrichment) for post-marketing pharmacovigilance surveillance.
Machine Learning in Pharmacovigilance
Machine learning, a subset of artificial intelligence, is not a new approach in pharmacovigilance. These algorithms can learn from and make predictions based on data. In pharmacovigilance, ML algorithms can analyse vast amounts of data from various sources, such as clinical trials, electronic health records, and social media, to identify patterns and predict adverse drug reactions (ADRs). Data sources can be remarkably diverse, ranging from volatile data like online literature and real-world evidence to more structured and human-annotated legacy pharmacovigilance databases.
Data Enrichment and QSAR Prediction
The more data available, the better the prediction: Data enrichment in pharmacovigilance involves integrating information from various databases, including patients’ genomic profiles, chemical properties, and demographics, to provide a more comprehensive understanding of drug safety profiles. Enriched data sets enable the development of more robust and accurate machine learning models, as they offer a richer context for analysis, provided the enrichment data is reliable and accounts for all variables necessary for predicting observables.
QSAR models are computational methods that predict the activity of chemical compounds based on their molecular structure. In pharmacovigilance, QSAR predictions can be used to assess the potential toxicity of new drug candidates well before they enter clinical trials.
Case Studies and Applications
The case study involves an in-depth investigation of a customer’s pharmacovigilance database, which has already been enriched with statistical analysis. This analysis provides both frequentist/disproportionate and Bayesian statistical metrics (four, overall) for signal detection in a consensus-based manner. Signal Detection represents a set of statistical methods designed to suggest and prioritize the relevance of DECs (Drug-Event Combination) to investigate their potential to become a real ADR (Adverse Drug Reaction).
A set of 20 QSAR models has been applied to the same number of Molecules within a dataset composed of about 150k DECs
The applied ML algorithm is a Decision-tree method (with Bagging), the overall set of variables was then composed of:
- Twenty QSAR models
- Four different statistical metrics for Signal Detection
- Around twenty different variables coming from Database fields (Age Group, Gender, Age, Seriousness etc…)
The objective is to develop a prediction model for the observable “Listedness.” Listedness refers to reported adverse events (AEs) that are already documented in the official product labelling or other references.
Listedness can have three different values: Listed, Unlisted, and Unknown. However, for ease of implementation and a more conservative approach, Unknown values have been assimilated to the Unlisted category.
Results
An overall prediction of Listedness accounts for an 80% precision, and the important goal is to assess the weight of False Negatives and False Positives, e.g.: an Unlisted DEC predicted as Listed and a Listed DEC predicted as Unlisted.
While the existent data in the Database constitute the training set that account for the setup of the model, new incoming cases with unknown status of Listedness can be evaluated giving hints to Clinicians in prioritizing DECs on top of previous prioritization obtained with Signal Detection based on Statistical Model. In conjunction with that, also Unlisted DECs in Database can become Listed over time and so their predicted status is worth to be evaluated.
In brief: an assessment for Predicted-Listedness=Listed on existing DEC in the Database with Database-Listedness=Unlisted Or Unknown it is of great interest by Clinicians and eventually controlled.
Challenges and Future Directions
Despite the promising potential, there are challenges to be addressed. These include the need for high-quality, standardised data, the complexity of integrating diverse data sources, and the interpretability of ML models. The number of biases introduced as per the novelty of the approach, are high and important, just to name a few: accuracy of QSAR models Vs. Experimental values (where possible to obtain), exact knowledge of Listedness status for the already known DECs of the training set, efficient and rational handling of multi API molecules, use of more Case properties, such as previous History or Concomitant medication and much more.
Conclusion
The integration of machine learning, data enrichment, and QSAR prediction marks a significant advancement in pharmacovigilance. By leveraging these technologies, we can enhance drug safety monitoring, reduce Clinicians’ workload, and expedite the complex process of DEC’s prioritization. This helps professionals analyse DECs in advance and in depth, identifying those that are likely to become real signals and subsequently confirmed adverse drug reactions (ADRs).
About the Author
Filippo Magnaguagno
Senior Consultant, System Integration Services at Arithmos
Theoretical Chemist by education from the University of Padova, Filippo spent over 22 years in the Life Sciences sector drifting through and specialising in Computerized Systems from R&D and Analytical to complex Databasing systems devoted to Pharma GxP, both in Analytical and Documental environments.
Senior consultant in complex projects, where Pharmacovigilance software data analysis is required. Ex-ante analysis of both statistical and technical topics to be covered.
Clinical and Omic’s data long-term archiving management for data warehouse implementation consultant in both Pharma and Medical devices areas, with CDISC approach.
About Arithmos
With deep expertise in the Life Science Industry, Arithmos supports companies in their digital transformation journey to achieve the best value from technology-enabled solutions and excellence in business operations.
Arithmos team share extensive knowledge of the full GxP regulated environment for both deliveries of services and technologies, and our domains range from Clinical R&D, Quality Assurance, Drug Safety & Pharmacovigilance, Regulatory and Medical Affairs.