Advancing Surgical Phase Segmentation in Endoscopic Pituitary Surgery

Clara Scholes

•

June 20, 2025

At the 2025 North American Skull Base Society (NASBS) Annual Meeting, Dr. Jonathan Chainey from Lariboisière Hospital in Paris, France, presented his research using Surgical Data Science Collective’s (SDSC) Phase Recognition Machine Learning (ML) model. This tool automatically analyzes and segments videos into the distinct chapters, or phases, that make up a surgical procedure from start to finish, meaning surgeons can rapidly access relevant operative checkpoints for teaching, research, or review.

Dr. Chainey’s study specifically focused on dividing pituitary tumor removal surgeries into four distinct operative phases using Surgical Video Platform (SVP). This work builds on previous efforts in surgical workflow analysis, while addressing the need for more generalized artificial intelligence (AI) models in real-world, heterogeneous environments.

‍

Enhancing Surgical Video Phase Recognition With Advanced AI Models For Endoscopic Pituitary Tumor Surgery

Jonathan Chainey MD MSc FRCSC¹, Jack Cook², Ruth Lau MD¹, Margaux Masson-Forsythe², Daniel Donoho MD^2,3, Dhiraj Pangal MD⁴

¹Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada

²Surgical Data Science Collective (SDSC)

³Department of Neurosurgery, George Washington University School of Medicine & Health Sciences, Washington, DC

⁴Department of Neurosurgery, Stanford University, Stanford, California

Surgery for pituitary neuroendocrine tumor removal requires precise and delicate navigation of the nasal cavity to reach and remove tumors from the pituitary gland. Each surgical phase involves distinct anatomy, techniques, and challenges – so clear differentiation is essential for analysis and training.

‍

The four clinically validated phases, established using the expert Delphi consensus, are as follows:

1. Nasal – Navigating the nasal anatomy up to the sphenoid sinus.

2. Sphenoid – Entering and working within the sphenoid sinus to open the skull base.

3. Sellar – Accessing the intracranial space and resecting the tumor.

4. Closure – Achieving hemostasis and repairing the skull base.

Previous segmentation efforts, such as those from the PitVis 2023 Challenge, relied on tightly controlled, homogenous video datasets – clean, well-lit, uniformly formatted footage often sourced from a single institution. While valuable, these conditions don’t capture the variability of day-to-day surgical environments and recording systems.

As such, the ML team trained our model using a heterogeneous dataset from three contributing surgical centers, incorporating varying endoscope holders, lighting conditions, surgeons, and camera angles. This approach better reflects the diversity of clinical practice and results in a more generalizable, robust AI model.

The process began by pre-training a convolutional neural network (CNN) on 221 procedures using self-supervised learning (SSL). This allowed the model to develop a foundational understanding of what surgical scenes “look like” without relying on labels. The model was then fine-tuned using 80 labelled procedures with clear phase annotations defined by a global panel of skull base surgeons.

‍

Two deep-learning model pipelines were developed to segment the phases of pituitary tumor surgery:

1. The first model employs a state-of-the-art CNN to directly predict surgical phases from video input.

a. A video is, of course, made up of a sequence of still images, or frames, and the surgical phase of these frames can be classified by running them through a CNN.

However, in real-world practice, even a surgeon can’t look at an individual frame and tell you what phase it is.

They would skip the video back and forth to identify what is going on at that point in time, and what maneuver is being performed. So, the model extracts embeddings, which are a lower dimensional numerical vector that represents the model’s understanding of a particular frame. These vectors are subsequently stacked on top of each other in the same chronological order as the video and fed through a second model.

2. The second model generates frame-by-frame embeddings, which are processed using a Multi-Stage Temporal Convolutional Network (MSTCN⁺⁺, where ⁺⁺ denotes 2^nd model iteration) to predict the phases.

a. The MSTCN⁺⁺ uses the context of the frames nearby, both on a narrow scale and wider scale, and accumulates all this information to decide what phase the video frame is in.

As an additional post-processing stage, these pipelines utilize an accumulator, which is applied to enhance the accuracy and consistency of the model’s predictions. Because surgical procedures usually follow a logical and linear phase progression, it is essential that the predictions follow this same order of events. The accumulator corrects any illogical phase transitions, for example, you cannot jump from the sellar phase back to the nasal stage. By enforcing these logical constraints, the accumulator helps refine predictions and improve consistency.

*Visualization of how a video is broken down into frames, processed via the CNN and MSTCN++ models, and output as a segmented timeline of surgical phases.*

The results of this study demonstrate the power of this dual-model approach. The CNN + MSTCN⁺⁺ pipeline achieved an accuracy, precision and recall score of 90%, and an F1 score of 0.89. The closer the score is to 1, the higher the overlap between predicted phases and ground truth annotations. These metrics indicate excellent model overlap, showcasing the system’s ability to reliably segment pituitary removal surgery videos.

In addition to the quantitative metrics, visual segmentation timelines were generated to illustrate the model’s predictions over time. These visualizations make it easier to identify phase transitions, analyze discrepancies, and better understand how the model interprets surgical workflows.

*Examples of SDSC’s phase breakdown model predictions (V2) vs. ground truth annotations (GT).*

This work is a push towards creating more robust, real-time tools for surgical education and analysis.

By successfully segmenting pituitary tumor surgeries using a heterogenous, real-world dataset, SDSC has demonstrated not only the feasibility, but the clinical relevance of AI-driven workflow segmentation.

This approach is already inspiring expansion into other types of surgery, and represents a collaborative effort between ML engineers, surgeons, and global data contributors. As AI continues to transform the surgical landscape, we’re excited to collaborate with researchers and clinicians eager to push these boundaries further. Get in touch with us – we’d love to partner with you to build the ML models that help you achieve your research and clinical practice goals.

Please to watch this video.

Newsletter Subscription

Receive professional insights, application guidance, and the latest news.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Share this post:

Advancing Surgical Phase Segmentation in Endoscopic Pituitary Surgery

Enhancing Surgical Video Phase Recognition With Advanced AI Models For Endoscopic Pituitary Tumor Surgery

The four clinically validated phases, established using the expert Delphi consensus, are as follows:

Two deep-learning model pipelines were developed to segment the phases of pituitary tumor surgery:

Other Posts

IDEAL in Practice: How Can We Remove the Barriers to Surgical Data Science? (Part 2)

Dr. Hani Marcus and the IDEAL Framework: Rethinking How We Evaluate Surgical Innovation (Part 1)

Dr. Sandeep Nayak’s Data Science Approach to Standardized Robotic Cancer Care

Join the Conversation: Engage with Us on Social Media