From data to infrastructure: why synthetic health data hubs are redefining healthcare innovation

Synthetic data is not just a privacy tool. It is the foundation of a new healthcare data infrastructure—one that makes data usable at scale.

In recent years, healthcare innovation has been built on a simple assumption: more data leads to better outcomes.

And yet, this paradigm is starting to show its limits.

The healthcare sector generates vast amounts of data, but most of it remains unused—not because we lack the technology to process it, but because we lack the ability to make it usable.

The real issue is not data scarcity, but systemic data unusability. Healthcare data is fragmented, heterogeneous, and constrained by regulatory and organizational barriers, making its use—especially in AI and clinical research—slow and difficult.

This creates a paradox: we have more data than ever, but less ability to use it effectively.

The challenge is not accessing data, but making it usable in a secure, scalable, and repeatable way.

Synthetic data is often framed as a technical solution to privacy constraints—artificially generated data that preserves the statistical properties of real data without exposing identifiable information. But this view is too narrow.

In the broader context of healthcare innovation, synthetic data represents something more: a way to transform data into something operational. It is not just about protecting data, but about making it shareable, scalable, and usable in complex environments.

In other words, synthetic data does not just enable access to data. It enables the use of data.

The key shift: from datasets to infrastructures

This shift becomes particularly clear in the concept of synthetic health data hubs, such as the one proposed in Friuli Venezia Giulia.

In these models, value does not lie in producing new datasets, but in building an infrastructure that allows data to circulate, be used, and generate value across an ecosystem. The hub becomes an intermediary layer—a controlled access point and an environment for experimentation and development. This is aligned with the broader European regulatory evolution, particularly the European Health Data Space, which introduces the concept of a data intermediary: an entity that enables regulated access and use of data rather than simply holding it.

The focus shifts from “what data do we have?” to “what infrastructure do we have to use it?”.

This shift has immediate consequences.

In clinical trials, synthetic control arms can replace or reduce the need for real control groups, lowering recruitment requirements and accelerating timelines. Regulatory acceptance is already emerging: the FDA has approved treatments using external control data from real-world sources (e.g., Nulibry), and the methodology is being formalized through propensity score matching and explicit causal frameworks to address selection bias and ensure comparability.

In RWE, synthetic data can be used to simulate patient populations, allowing researchers to test hypotheses and validate models without repeating complex approval processes each time.

In healthcare operations, hospitals can use synthetic data to simulate patient flows, optimize resource allocation, or test new organizational strategies before implementing them in real settings. Early implementations, such as the synthetic health data hub proposed for Friuli Venezia Giulia, have projected significant capacity gains—up to 17% increase in diagnostic throughput (CT/MRI scans) and measurable reductions in waiting times—demonstrating that infrastructure-based approaches can deliver tangible operational value.

In all these cases, the value lies not in the data itself, but in its operational usability.

This infrastructure-based approach is gaining institutional recognition beyond data governance. Regulatory bodies are beginning to formalize frameworks for evaluating AI-generated evidence in clinical research—including the EMA’s ongoing reflection paper on the use of external controls for evidence generation and the recently finalized ICH M15 guideline on model-informed drug development. These developments signal a shift: synthetic data is no longer treated as a workaround, but as a methodological tool that, when properly validated, can support regulatory decision-making.

Europe: from data protection to data activation

This transition is particularly relevant in the European context.

The European Union has built one of the most advanced frameworks for data protection. However, this has also contributed to slowing down data utilization.

In this setting, synthetic data represents one of the few viable paths to reconcile fundamental rights protection, technological innovation, and economic development.

A critical question in this transition is regulatory clarity around synthetic data itself: when does it qualify as fully anonymized under GDPR, and when does it remain personal data subject to the same constraints? While guidance is still evolving, recent certifications (such as Europrivacy for healthcare applications) suggest that high-quality synthetic data—generated through validated models with demonstrably low re-identification risk—can be treated as anonymous. This distinction is not merely technical: it determines whether synthetic data hubs can function as true infrastructure, or remain constrained within the same governance bottlenecks they aim to overcome.

Synthetic health data hubs can therefore be seen as an attempt to build a new data infrastructure—one that overcomes the limitations of the current model.

If value shifts from datasets to infrastructure, the role of key players also changes. The focus is no longer on producing or analyzing data alone, but on enabling systems where data can be generated, used, and continuously leveraged. In this context, synthetic data technologies take on a new role: not as standalone tools, but as core components of this emerging infrastructure layer.

The challenge today is not to have more data. It is to make data usable at scale.

This requires a shift: from access to generation, from datasets to infrastructure, from availability to operability.

Synthetic Health Data Hubs represent one of the first concrete attempts to build this new architecture. And most likely, they are not an exception. They are the beginning of a new phase.

Join Us

Want to learn more or work with us? We'd love to hear from you.