Recent decades of biomedical research have seen a boom in the discovery of molecular indicators (also known as biomarkers) that help clinicians diagnose diseases, monitor how diseases are progressing and how they react to potential treatments. These biomarkers can help doctors and pharmaceutical companies tailor treatments to individual patients, creating more personalised medicine.
While only a dozen of these molecular ‘biomarkers’ have been approved in the European Union and United States, they have become a pillar for pharmaceutical research and clinical trials. In the EU almost 90 are pending approval, and many more are being evaluated for their potential use in translational research and clinical trials.
Over the years, hundreds of biomarkers have been discovered. Their use in clinical studies can be found in ClinicalTrials.gov, a US-based registry of clinical trials. However, information on biomarkers is found as ‘free text’ meaning there is no classification or organisation of the information – only sentences and paragraphs.
This free text information makes it difficult for other researchers to further analyse biomarkers for use in other clinical trials. “It requires an expert reading and extracting this information, which is a very cumbersome and expensive process,” says Dr Laura Furlong of MedBioinformatics Solutions SL, a biomedical informatics company based in Barcelona, Spain.
In a recent paper published in the Computational and Structural Biotechnology Journal, Dr Furlong and colleagues describe a method they have used to identify, extract, and classify the biomarker information. The research is part of the eTRANSAFE project, funded by the Innovative Medicines Initiative (IMI). The project aims to develop a powerful data integration infrastructure as a basis for computer-based tools in drug development.
The researchers developed a ‘natural language processing’ machine-learning algorithm to identify and extract the information on biomarkers from the ClinicalTrials.gov registry and classify them according to the assessment methods used.
One challenge they faced was that the registry has over 50 different sections depending on the type of clinical study. This meant experts had to manually review the records to see which sections held the information they wanted to extract and classify. This helped to train the machine learning algorithm on how to classify the biomarkers according to methods.
Finally, since abbreviations are used a lot when reporting clinical trial results, the researchers used an artificial intelligence (AI) algorithm to understand and write out the abbreviation fully, which also helped classify the information.
In total, the researchers found over 3 000 biomarkers, including those already approved in the US. However, Dr Furlong notes that the biomarkers they found were being used in relation to around 2 600 diseases. “We expected to find a large number of biomarkers being investigated or used in clinical trials,” she said. “What surprises us most is that the number of approved biomarkers is so small.”
The classified biomarkers mean that researchers can now easily spot patterns in how biomarkers are being used in different therapeutic areas, the type of biomarker (e.g. certain genes being expressed, cell markers, or proteins in the body) and how specific they are to certain diseases.
This information has now been packaged into the Clinical Biomarker App, which Dr Furlong describes as a ‘proof-of-concept tool’ to display the data generated by their approach. “We decided to make the data available to enable potential users and readers of the publication to explore and use the data,” she said. The app's data uses information from ClinicalTrials.gov and their DISGENET plus Platform, which has biomarker information from scientific literature and other databases. The app is free to use, while the DISGENET plus platform has licensing options for both academic organisations and the industry.
For now, Dr Furlong and her colleagues at the eTRANSAFE project want to continue refining their natural language processing machine learning method, and hope to classify biomarkers according to the US Food and Drug Administration’s BEST categories (Diagnostic, Monitoring, Response, Predictive, Prognostic, Safety or Susceptibility).
eTRANSAFE is supported by the Innovative Medicines Initiative, a partnership between the European Union and the European pharmaceutical industry.