Google's Veo-3: A Surgical Video Imposter?
Can AI really understand medicine? This is the question researchers are asking after putting Google's Veo-3 to the test. Despite its impressive visuals, Veo-3 falls short when it comes to medical logic, raising concerns about its potential impact on healthcare.
But here's where it gets controversial...
Veo-3 was tasked with predicting surgical procedures using a single image. An international team created a benchmark, SurgVeo, to evaluate its performance using real surgical videos. The results? Visually stunning, but medically inaccurate.
In abdominal surgery tests, Veo-3 scored high on visual plausibility initially, but its performance plummeted when it came to instrument use and tissue response. The same pattern emerged for brain surgery, with even more significant gaps.
And this is the part most people miss...
Over 93% of Veo-3's errors were related to medical logic, not image quality. It invented tools, imagined unrealistic tissue reactions, and performed actions that made no clinical sense.
Researchers tried providing more context, but the model struggled to process and understand it.
The SurgVeo study highlights the limitations of current video AI in medical understanding. While future systems may assist in training and planning, today's models lack the knowledge for safe decision-making.
So, is AI ready for the operating room?
The researchers are releasing the SurgVeo benchmark on GitHub, inviting others to improve their models. They also caution against using synthetic AI videos for medical training, emphasizing the risks of incorrect procedures being taught to robots or trainees.
While video models as 'world models' remain a distant concept, text-based AI is already making strides in medicine. Microsoft's MAI Diagnostic Orchestrator, for example, has shown impressive diagnostic accuracy.
So, where do we go from here? Should we embrace the potential of AI in healthcare, or proceed with caution? The debate is open...