AI in Spatial Biology: Why Data Quality Matters
Spatial Proteomics Apr 15, 2025 2:12:09 PM Jay H. Lee, CEO & Co-founder 21 min read
Over the past year, artificial intelligence (AI) has made astounding technological leaps. In the context of spatial biology, where we aim to resolve and quantify molecular targets within intact tissues, AI offers something potentially transformative: faster panel design, image segmentation, tissue phenotyping, and exploratory biomarker discovery. In theory, all of that could happen faster, cheaper, and at scale using AI.
But here’s the catch: AI is only as good as the substrate it's built on.
Having worked in spatial biology since its inception - and co-developed some of the earliest in situ sequencing technologies and co-developed multiple spatial genomics and proteomics platforms in academia and industry - it’s clear to me that we’re at a crossroads. Many in the field are racing to layer AI on top of incomplete, noisy, or ill-defined datasets. This is not new. We’ve seen this play out in genomics, in transcriptomics, and now it's unfolding in proteomics.
Sophisticated models will hallucinate if they’re trained on sparse or biased input. We’ve all seen this - AI output that fills in the blanks with superficially plausible, but fundamentally incorrect inferences. This is not a failure of AI; it’s a failure of data quality and representation.
And the problem in spatial biology is that the data itself, i.e what molecules are detected, in which tissues, at what sensitivity, and with what precision is still deeply uneven across platforms. Resolution, quantitation, reproducibility, and target coverage are all variable, and often poorly standardized. That makes it nearly impossible to draw clear actionable conclusions without multiple layers of confirmation.
This reminds me of the early days of next-gen sequencing. Back then, genome drafts were a breakthrough, but filled with errors, gaps, and ambiguous variant calls. They were exploratory tools, not diagnostic-grade data. It wasn’t until deep, targeted sequencing became routine that we started to get reliable, interpretable, and clinically useful results.
So the question is: what is the spatial biology equivalent of deep sequencing?
What my co-founders and I am building now is my attempt to answer that question. We’re not chasing multiplexing for multiplexing’s sake. We’re focused on technology development that will give us signal quality, reproducibility, and the ability to derive biologically grounded, clinically relevant insights. That includes feeding AI models only what they can digest meaningfully - high-density, high-confidence molecular maps, not experimental noise masked by false precision.
In research, mistakes are tolerable. In translational and clinical contexts, they’re expensive and potentially dangerous if prematurely applied. Despite the excitement over spatial biology, I submit that today's technologies are inadequate.
Spatial biology has the potential to redefine diagnostics and drug development, but only if the foundation is solid. That means focusing less on flashy front ends, and more on rigorous assay chemistry, calibration, and validation. Only then can AI add true value by accelerating discovery, not amplifying errors.
This is where the next leap will come from. Not another algorithm, not another public dataset. But from solving the hard problem of data fidelity at the tissue level, across diverse clinical settings, and feeding that into AI systems that can learn from truth, not speculation.
Although I could be proven wrong, my prediction is that quantity of features cannot make up for lack of quality, even in the AI era. In spatial biology, as in any experimental science, clarity beats complexity every time.