Uncovering Data Connections and Insights with Knowledge Graph Tech

Matthew Driver

3 years ago

As the Life Sciences sector becomes increasingly data-driven, knowledge graph technology is helping companies uncover connections and insights from multiple data sources. Neo4j’s Alexander Jarasch explains…

The Life Sciences sector is increasingly focused on deriving better insights from data. The challenge of making connections between data in different locations within organisations and deriving meaningful insights from those connections is driving pharma companies into the arms of knowledge graphs.

Knowledge graphs let companies convert unstructured information into rich insights and actionable intelligence that spans the entire drug development and marketing lifecycle. Graph technology first hit the headlines when it was harnessed to reveal the Panama Papers scandal.

Knowledge graphs are multi-dimensional and work on the basis that every dataset is a connected element. Traditional SQL databases can’t represent that complexity or make those connections. It is in the discovery of complex interconnections of different data sources that knowledge graph technology delivers new insights.

Deeper analysis of interrelationships

Graph databases enable comparisons involving billions of connections. This is a game changer in complex fields such as biological science, where information about a disease is inextricably linked to information about genes, the environment, a person’s diet and behaviour, and so on. Deeper analysis of these interrelationships speeds up the discovery of important correlations.

Most major pharma companies are using knowledge graphs, but certainly not to the technology’s full potential. In most cases, the technology represents one in a hundred of their databases. Organisations often feel held back by their legacy data, but the reality is that they could be doing a great deal more with their data and knowledge graphs hold the key.

Graph databases are largely used in the Life Sciences sector for identifying novel drug targets for new therapies, transforming clinical trials, and managing supply chains more dynamically. In future, when companies provide for greater interconnectivity between data from the outset, they’ll be able to approach high throughput screening/compound registrations in smarter and more efficient ways.

Exploring the possibilities

In late 2022, Neo4j hosted an event exploring the possibilities of knowledge graphs in Life Sciences. Emerging use cases ranged from drug discovery management to AstraZeneca’s application of the technology to chemical reactions, to predict new reactions and molecular synthesis—with the potential to circumvent or attack existing patents.

There has also been progress in classifying diabetes patients by combining patient data with a Graph Data Science library and machine learning. In addition, data innovators at GlaxoSmithKline explored the scope for knowledge graphs to improve clinical reporting workflows and overcome the issues of labour intensity, multiple handoffs and data transformations, by means of a single, contextualised, knowledge graph.

Knowledge graphs aren’t dependent on data sources having been prepared or formatted in a particular way. They can work with the native data structure and queries can be performed by asking meaningful questions. For example, queries could uncover the best clinical doctor to target for a clinical trial to be successful, based on not only their area of expertise but also their availability and whether they have access to the right equipment. Queries can be performed at hyper speed, too—typically, 3,000 times faster than a SQL database query and across dense networks of knowledge.

Clinical trials face the challenge of reaching statistical significance among small populations with rare conditions. As some of the growing body of work in diabetes research shows, knowledge graphs can help with phenotype mapping between humans and animals, by extrapolating and connecting data points that are phenotypically equivalent between studies of mice, and of humans, where clinical parameters and observations aren’t immediately comparable.

The future is data-based

The future of Life Sciences is data-driven. Pharma must plan for more sophisticated use of data to determine product roadmaps and navigate regulatory approvals more swiftly. While work must continue in earnest in standardising and fine-tuning the quality of everyday data, the more data that is fed into a knowledge graph, the more accurate the picture it is able to build. Then it can apply AI/machine learning to understand the insights that matter.

A good analogy would be Google Maps, which builds reliable representations of the physical world from huge volumes of diverse data, a picture that won’t alter if the odd rogue data point creeps in.

Traditional relational database systems simply lack the capacity to deliver innovation. Understanding the value of relationships between data is every bit as important as what individual data points reveal. Without the ability to mine those correlations for new insights, companies will fail to make the connections that will lead to better products, more effective delivery and compliance—and ultimately enhanced patient outcomes.

About the author

The author is Dr Alexander Jarasch, the Technical Consultant for Pharma and Life Sciences, at native graph database leader Neo4j. He was previously Head of Data Management and Knowledge Management at Germany’s National Center for Diabetes Research (DZD). He is a visionary speaker on the future of clinical investigation, plus AI and data management in the pharma and healthcare space—in particular the potential to advance pharmaceutical analytics and unlock terabytes of hard-to-parse research/trial data by revealing data relationships for better predictive accuracy.