Are You a Drug Company or a Data Company That Produces Drugs?
In this blog
- You are a data company that happens to make drugs
- The economic case: Data at rest is a liability, data in motion is a currency
- Why bury the treasure?
- The data-in-motion challenge is industry wide, and each segment faces it differently
- Data readiness diagnostic
- The data value chain is longer than most organizations are managing
- Why agentic AI raises the stakes on data architecture
- Return on research (ROR): A more appropriate measure of the value chain
- How leading organizations are approaching this: Assessment, then architecture
- The competitive dynamic is already in motion
- Where does your organization stand?
- Download
Twenty years ago, renting a movie from Blockbuster was as routine as picking up groceries or grabbing takeout. It seemed as though there was a Blockbuster around every corner. Popping in to browse the aisles was simply part of life. Then streaming arrived, not as a distant threat but as a structural shift that made Blockbuster's physical model obsolete. Blockbuster saw it coming and still couldn't move fast enough. By 2010, the company with 9,000 stores had filed for bankruptcy. Today, one store survives, equal parts cultural relic and tourist attraction.
You might say every industry has its Blockbuster moment. The inflection point is where the old model still works, the numbers still look reasonable, and the urgency to change feels optional. Unfortunately, it is rarely recognized in real time. Rather, most see it clearly in retrospect.
Today, the structural shift underway in life sciences, across pharma, biopharma, biotech, medical devices, and academic research, is the move from data at rest to data in motion. From architectures built to store and retrieve information to architectures built to keep it flowing, analyzed, and active, generating a measurable return on research. Organizations that make this transition will find that their AI investments compound, their research cycles accelerate, and their operational decisions improve. Those who continue to optimize their legacy models will do so effectively, right up until the competitive gap becomes too difficult to close.
Understanding why that gap opens requires looking at what life sciences organizations do and what drives the speed and quality of everything they produce.
You are a data company that happens to make drugs
Every stage of the life sciences value chain is, at its core, a data process. Research generates molecular and genomic data. Preclinical work generates safety and efficacy data. Clinical trials generate outcome and adverse event data. Manufacturing generates process and quality data. Post-market surveillance generates real-world effectiveness data. The molecule or device is the end product of that chain, but what determines the speed, quality, and cost of every step is how well data moves through an organization.
Every one of those processes is also explicitly governed by regulatory frameworks such as FDA 21 CFR Part 11, ICH guidelines, and GxP requirements, or implicitly by organizational standards and controls that determine which data is trusted, traceable, and audit-ready.
Regulatory frameworks make this explicit: the primary output may be a drug, a device, or a diagnostic, but the actual engine of the business is data. It is not so much the process itself, but the data and the data flow through the process, where organizations either add value or extract it.
That recognition changes what questions get asked at the strategic level. The conversation shifts from how do we develop better drugs to how do we make data move faster, more intelligently, and with greater visibility across every stage of development and operations. Those are different questions, and they lead to different investments.
Those investments, however, carry a non-negotiable constraint: security. Data moving faster and more intelligently across an organization only creates value if it moves safely. A single breach of patient data, proprietary compound data, or trial outcomes carries regulatory, legal, and reputational consequences that no efficiency gain offsets. Speed and security are not in tension in a well-architected data environment. But security cannot be an afterthought in one that is not.
The specifics vary by sector. Pharma and biopharma face this in drug development and manufacturing. Biotech faces it in early discovery and molecular modeling. Medical device companies face it in real-time sensor data and post-market performance tracking. Academic medical centers face it in translational research and clinical data integration. The details differ. The underlying dynamic does not. That dynamic has an economic consequence, and most organizations haven't fully priced it yet.
The economic case: Data at rest is a liability, data in motion is a currency
Think of your organization's data archives the way a CFO thinks about capital tied up in unsold inventory. It has real value — but only if it moves. Data sitting in storage isn't neutral. It is a boat anchor. It costs money to store, secure, and govern. It depreciates as it ages and as the context around it shifts. It creates regulatory exposure when it isn't properly cataloged or when retention policies aren't enforced. And like any idle asset, it generates no return while it sits. The principle that follows from this is not a technology observation — it is a macroeconomic one: data at rest is a liability, and data in motion is a currency.
Why bury the treasure?
Life sciences organizations are sitting on some of the most valuable data in any industry. Decades of clinical trial outcomes. Compound libraries. Genomic sequences. Real-world patient data. Manufacturing process records. The organizations that treat these archives as a treasure store — actively mining them to accelerate discovery, improve trial design, and reduce operational waste — are generating a measurable return on that research investment. Those who store it and ignore it are paying to maintain an asset they never use. In pirate terms: they buried the treasure and lost the map.
The scale of this idle capacity is considerable. According to IDC research, life sciences and healthcare organizations spend approximately half their data team capacity on data preparation and movement rather than on analysis. Gartner finds that 63% of organizations either lack or are uncertain whether they have the right data management practices to support AI initiatives. The gap between data generated and data actively used is widest precisely in the domains — manufacturing, clinical operations, post-market surveillance — where that data has the greatest operational value.
The problem is compounding. The instruments generating data are getting faster, more numerous, and higher in fidelity. Genome sequencers that once ran overnight now produce continuous streams. Imaging platforms that once generated periodic snapshots now generate near-continuous visual data. Bioreactor monitoring systems that once produced batch reports now generate real-time telemetry. In plain terms: the data is arriving faster than most organizations can do anything useful with it.
The numbers make the tension visible. Ninety-three percent of life sciences executives plan to increase investment in data, digital, and AI in 2025. Yet Gartner projects that 60% of AI projects lacking AI-ready data will be abandoned by 2026. Organizations are accelerating investment into infrastructure that isn't ready to support it — spending more to get less, because the underlying data isn't moving.
Data in motion is the architectural response. When data flows continuously through intelligent, governed pipelines — ingested, transformed, analyzed, and acted on in near real time — it generates value at every step rather than accumulating as a storage cost. The competitive advantage goes to the organizations that build this capability first and learn to exploit it systematically.
The organizations getting ahead of this aren't just solving a storage problem. They are beginning to think differently about what data is worth — not as a sunk cost of doing research, but as a compounding asset that drives measurable return on every dollar invested in discovery.
The gap looks different depending on where you sit in the industry — but it is present in every segment.
The data-in-motion challenge is industry wide, and each segment faces it differently
While pharmaceutical and biopharma organizations are often the primary focus of conversations about data in life sciences, the same structural challenge runs through the broader industry and is arguably most visible in segments where real-time data has the most immediate operational consequence.
- Pharmaceutical and biopharma manufacturing. The stakes are in process efficiency and quality control. A bioreactor operating at slightly sub-optimal conditions loses yield efficiency that, at production scale, translates directly to cost and output. Real-time process data — temperature, pH, dissolved oxygen, agitation — fed into optimization models can identify and correct these conditions continuously, rather than waiting for batch review. A 1.5% efficiency improvement in a large-scale drug manufacturing operation is not a marginal gain. It is a material financial outcome.
- Biotech and genomics. The volume and velocity of data from next-generation sequencing, cryo-EM, and high-content imaging are already straining batch-processing architectures. Variant calling, structural analysis, and imaging classification that wait overnight for batch jobs to run are operating at a fraction of the throughput these instruments can support.
- Medical devices. The move toward connected, sensor-equipped devices means manufacturers now have access to real-world performance data at a scale unavailable a decade ago. That data has value for product improvement, regulatory compliance, and patient outcome research — but only if it moves from device to analysis pipeline in a timely and governed way.
- Academic medical centers and research hospitals. Institutions often face coordination challenges across research domains that have historically operated with separate data systems, different naming conventions, and no shared data governance framework. The data is rich; the infrastructure to make it collectively useful is frequently underdeveloped.
- Contract research organizations (CROs). The challenge is the handoff: data generated in trials moves between sponsor organizations, CROs, regulatory bodies, and technology platforms, and every handoff is an opportunity for delay, loss of context, or governance failure. Getting those handoffs right is itself a competitive differentiator — for CROs and for the sponsors who choose them.
- Government and regulatory bodies. Agencies like the FDA operate on the confidence and frequency of data submitted to them. A drug approval, a manufacturing inspection, a post-market surveillance requirement — each depends on data that is complete, traceable, and delivered on a timeline regulators can rely on. Organizations that can demonstrate continuous, governed data flows don't just satisfy regulatory requirements. They build the kind of institutional credibility that accelerates review cycles and reduces the friction in subsequent submissions.
Across all these segments, the underlying dynamic is the same: data is generated faster than it is used, and the gap between generation and utilization is where value is lost.
The diagnostic below is designed to help senior leaders identify where that gap exists in their own organization — and how wide it has become.
Data readiness diagnostic
Mark Yes, Sometimes, or No for each question.
- Does your team spend more time moving and cleaning data than actually analyzing it?
☐ Yes ☐ Sometimes ☐ No
- Do AI initiatives perform well in proof of concept but struggle to reach full production?
☐ Yes ☐ Sometimes ☐ No
- If asked right now where a specific critical dataset lives and who owns it — could anyone answer with confidence?
☐ Yes ☐ Sometimes ☐ No
- Is security typically addressed after a data initiative is already underway rather than at the start?
☐ Yes ☐ Sometimes ☐ No
- Are your instruments — sequencers, bioreactors, imaging platforms — generating data that isn't being acted on in real time?
☐ Yes ☐ Sometimes ☐ No
- Is your organization still running batch-processing pipelines for workloads that would benefit from continuous data flow?
☐ Yes ☐ Sometimes ☐ No
- Does your current infrastructure make it difficult to scale an AI workload from a successful pilot to full production?
☐ Yes ☐ Sometimes ☐ No
Mostly Yes or Sometimes: The data architecture may be optimized for conditions that are changing faster than the optimization can keep pace. The path forward is well-defined, and the infrastructure to get there exists today. See the call to action at the end of this article.
Mostly No: A solid foundation is in place. The question is whether data in motion is being fully leveraged as the strategic and competitive asset it has become.
The same principle applies at the level of the full data value chain — and most organizations are only managing part of it.
The data value chain is longer than most organizations are managing
Most life sciences organizations have built strong data capabilities for the stages of the value chain they understand best: discovery, development, and the regulatory path to market. These are the areas where investment is concentrated and where data governance is most mature. The middle of the chain is well-instrumented. The front and back ends are not.
Think of an integrated oil company. It has sophisticated upstream operations in exploration and extraction, and agile downstream operations in refining and distribution. But if the midstream is missing, if there is no optimized pipeline connecting the two, the efficiency gains at either end are limited by the bottleneck in the middle. Life sciences organizations face the same structural problem, just with data instead of crude. The midstream, the governed and continuous flow of data connecting early research to late-stage development to post-market operations, is the missing middle for most organizations, and it is where the greatest optimization opportunity sits.
Upstream, the data generated in the earliest stages of discovery, including computational chemistry, genomic screening, and early molecular modeling, is often the least well-governed and least connected to the downstream processes it should be informing. The meta-information that would allow researchers to understand the provenance, quality, and relevance of upstream data is frequently absent, making it difficult to build reliable research processes on top of it.
Downstream, most pharmaceutical organizations have limited visibility into what happens after a product reaches distribution. Supply chain integrity, real-world outcomes, adverse event signals, and pharmacovigilance data exist, but they often sit in disconnected systems, are collected reactively rather than continuously, and are rarely integrated back into the development process to improve the next generation of products. Post-market data, properly connected, also has a role that is rarely fully exploited: informing customer retention, lifecycle management, and the commercial strategies that extend a product's value well beyond its launch window.
The organizations that will extract the most value from their data are the ones that extend their architecture to cover this full chain, upstream through downstream, research environment through market environment, with the midstream governed and optimized as deliberately as any other stage. That is not a technology ambition. It is a business one.
Getting the data value chain right matters now more than ever, because what organizations are building on top of it has never been more demanding.
Why agentic AI raises the stakes on data architecture
The numbers behind AI investment in life sciences are no longer speculative. The pharmaceutical industry alone is projected to grow from roughly $4 billion in AI investment today to more than $25 billion by 2030, according to McKinsey. Generative AI alone could deliver between $60 billion and $110 billion in annual value to the life sciences, with the greatest impact in research, clinical development, and commercial operations. Organizations across the industry are investing accordingly.
The harder question is whether the underlying data infrastructure will support the ambitions being built on top of it.
To understand why that question matters, it helps to understand what the most valuable category of AI actually does. Most people are familiar with AI that responds to questions or generates content. Agentic AI is different. It takes autonomous, multi-step actions on its own, without a human approving each step. Think of it less like a search engine and more like a capable employee who can be given a goal and trusted to work toward it independently. In life sciences, that means systems that can monitor a manufacturing process and adjust it in real time, flag anomalies in clinical trial data before they become problems, or coordinate a sequence of research tasks without waiting for human handoffs at each stage.
The operational value is significant. So are the data requirements. An agentic system managing a bioreactor in real time does not just need a capable AI model. It needs continuous, high-quality data flowing reliably to that model, with the speed, consistency, and security that autonomous operational decisions demand. If the data is delayed, inconsistent, or ungoverned, the system cannot function as intended. And unlike a human analyst who can recognize and work around a data quality problem, an autonomous system will act on what it receives.
When the data foundation is not in place, the consequences show up quickly. The pattern is consistent across industries: organizations invest in AI, build promising pilots, and then discover that moving from pilot to production requires solving a data problem they had not fully anticipated. The model performs well in testing, where the data is clean and carefully prepared. It struggles in production, where data arrives late, in inconsistent formats, and without the governance controls that reliable decisions require. Gartner projects that through 2026, organizations will abandon 60% of AI projects that lack AI-ready data. That is not a prediction about AI. It is a prediction about data infrastructure.
The sequence matters. The data infrastructure is the foundation. The AI is what runs on top of it. Organizations that try to build the second without the first are not accelerating their AI ambitions. They are setting up the conditions for the abandonment rate Gartner is projecting.
Understanding the cost of poor data infrastructure is one thing. Measuring the value of getting it right requires a different framework entirely.
Return on research (ROR): A more appropriate measure of the value chain
Traditional ROI works well when inputs and outputs are direct and measurable. Spend X, get Y in return, calculate the difference. In life sciences research, that equation breaks down. The connection between a dollar invested in upstream data infrastructure and a drug that eventually reaches patients is long, indirect, and shaped by factors that standard financial metrics were not designed to capture. Applying a traditional ROI lens to research investment doesn't give an incomplete picture. It gives a misleading one.
WWT developed the Return on Research (ROR) framework to address this important issue. ROR is a multi-dimensional value framework that measures the full impact of investments in research-enabling infrastructure, including HPC, AI, and machine learning platforms, advanced data systems, and medical imaging, in environments where traditional ROI falls short. The question it asks is more relevant for life sciences organizations: what actual scientific and operational value is being generated per dollar invested in upstream research and data activities?
The stakes behind that question are significant. Deloitte's 2024 analysis found that the average cost to bring a new drug to market had risen to $2.23 billion, a figure that reflects not just successful development but also the accumulated cost of failed trials and abandoned candidates. The same analysis found that pharmaceutical organizations spent $7.7 billion in 2024 on trials for candidates that were ultimately terminated. That attrition cost is not just a financial loss. It represents research investment that generated data, consumed resources, and produced outcomes that were rarely connected back to the decisions that preceded them in any systematic way.
The practical challenge with ROR is that most organizations have not built the data infrastructure to measure it. Research investment goes in. Drug candidates and data assets come out. The connection between the two is tracked imprecisely, if at all.
Improving ROR requires visibility into what is happening to data across the research pipeline: what information is being generated, how it is used, its quality, and how it connects to downstream outcomes. This is only possible when data has what might be called provenance, the documented record of its origin, processing history, relationships to other data, and relevance to specific research questions.
When data has that level of documentation and governance, it becomes intellectual property that can be tracked, valued, and improved over time. When it sits undocumented in a lake or warehouse, it is a storage cost with an uncertain shelf life. The transition from the latter to the former is one of the most consequential investments a life sciences organization can make, and it pays returns not just in the current research cycle but in every one that follows.
How leading organizations are approaching this: Assessment, then architecture
The organizations that make the most progress on data infrastructure do not typically begin with a technology decision. They begin with an honest inventory of where they are. This distinction matters more than it might seem. Choosing a platform before understanding the data is like designing a highway before studying where people need to go.
The questions that matter at the start are not technical. They are organizational. Who owns which data, and who is accountable for its quality and governance? Where does it live, and is that the result of a deliberate decision or historical default? Who needs access to it, and under what security and governance constraints? What does the actual workflow look like, not the idealized version in a process diagram, but the one teams are running every day? And which steps in that workflow exist because of genuine operational need, rather than habits nobody has stopped to question?
WWT's engagement with one of the world's largest academic medical campuses began exactly this way. Before any infrastructure decisions were made, the work started with the fundamentals: building a shared data dictionary. It emerged that different research teams were using the same terminology to describe different datasets, and the same datasets were being identified by different names across departments. That alignment work is foundational, and it is often more time-consuming than anticipated. But it is the necessary precondition for any data architecture to function as intended.
Once the current state is understood, the architecture can be designed to grow with the organization. HPE Private Cloud AI (PCAI), deployed through WWT's Advanced Technology Center (ATC) and validated in the AI Proving Ground (AIPG), is built for exactly this kind of phased growth. Organizations can validate their approach at proof-of-concept scale in the AIPG, extend it to a managed environment, and scale to full production on the same architectural foundation, without a costly rebuild at each stage.
Four requirements should be treated as structural from the start, not features to be added later: security, attestability, traceability, and observability. Security for data in motion requires a fundamentally different approach than securing data at rest. Attestability means being able to demonstrate, to regulators and internal stakeholders alike, that data meets the standards required for the decisions being made with it. Traceability means knowing where data came from, how it was processed, and what changed along the way. Observability means having real-time visibility into what is happening across pipelines and AI workflows so problems can be identified and corrected before they compound. Organizations that treat any of these as afterthoughts will find themselves retrofitting them into systems that were not designed to support them, at significant cost and with risk that does not fully go away.
The organizations building these capabilities now are not just solving a current problem. They are creating distance from the ones that are not.
The competitive dynamic is already in motion
Blockbuster's failure was not a failure of execution. The company ran its stores well. It understood its customers. It had refined its model over decades. What it failed to recognize was that the structural shift underway, from physical inventory to streaming, would make that well-optimized model progressively irrelevant. The lesson is not that Blockbuster made poor operational decisions. It is that optimizing a model that is becoming obsolete is not a viable long-term strategy, regardless of how well the optimization is executed.
Life sciences organizations are not at the same cliff edge. But the structural shift from data at rest to data in motion is real, it is accelerating, and the organizations investing now in cohesive, scalable data infrastructure are building a compounding advantage over those that are not. Every year of deferred investment widens the gap.
The good news is that the path forward is well-defined and the expertise to execute it exists today. The challenge is recognizing that this is a strategic decision, not a technical one, and that it belongs at the level of the organization where strategy is set. Data infrastructure is not an IT conversation. It is a competitive positioning conversation.
Data at rest is a cost. Data in motion is a currency. The organizations that internalize that distinction and build their infrastructure around it will find that their AI investments perform better, their research cycles move faster, their operational decisions improve, and their ability to see and manage the full value chain strengthens. The ones that do not will find the gap harder to close with each passing year.
Where does your organization stand?
If the diagnostic earlier in this article surfaced more Yes or Sometimes answers than expected, the practical starting point is a data readiness assessment: a structured review of where data lives, who governs it, how it flows across the value chain, and what the gap is between the current state and what AI and agentic workflows actually require.
WWT works with pharmaceutical, biopharma, biotech, medical device, and academic research organizations at every stage of this journey. The path typically follows three steps.
First, a data readiness assessment. A structured review of where data lives, who governs it, how it flows, and what the gap is between the current state and what the organization's AI ambitions demand.
Second, proof-of-concept validation. WWT's AI Proving Ground provides a low-risk environment to validate the architecture and test AI workloads against real data before committing to production infrastructure.
Third, architecture and deployment. HPE Private Cloud AI, deployed through WWT's Advanced Technology Center, supports phased growth from departmental scale to supercomputing-class infrastructure without requiring a rebuild at each stage.