Entrepreneurship In IVF: The Promise And Challenges Of Artificial Intelligence

by

The use of artificial intelligence (AI) to re-reengineer in vitro fertilization (IVF) was one of the implicit goals of the IVF venture project when it was launched two years ago, aiming to make assisted reproduction more efficient, digital and scalable. Indeed, some of the best ideas and most interesting business plans are applications of machine learning to isolate and optimize specific parts of the IVF procedure.

One area that has received a lot of attention has been the use of a computer algorithm to score each embryo within an IVF cycle cohort, ranking them for probability of pregnancy. Entrepreneurs from five continents have sent us business plans for moving this idea into clinical practice.

The idea is sound and the prototypes thoughtfully and painstakingly built. Each of these groups has done great work, but much remains to be done. Questions remain about how to adequately quantify the opportunity as a stand-alone business, as a platform, and regarding AI’s role in general in creating better and more accessible IVF.

The most fundamental question: how much better is AI at embryo selection be than traditional approaches? And is the improvement worth it? 

Let’s start by assessing product development. As I understand it (and I am far from a data scientist ― comments and abuse welcomed), the algorithm is trained on a retrospective dataset, viewing tens of thousands of embryos that were chosen for transfer, and then assessing the outcomes versus the patterns observed and analyzed by a powerful and sophisticated imaging method. The underlying assumption is that the computer sees things that we don’t, either because we lack the visual acuity to detect these patterns, because we just haven’t figured out where to look, or because the complexity of the interrelationships between the numerous observations is beyond our processing abilities.

My first concern is that since the initial training set is based on already-transferred embryos, and that those embryos were chosen using the criteria that we are attempting to improve upon, then we are introducing a bias into the process at its earliest stage, and we asking the computer to find new decision-influencing inputs, but only from a pool of embryos that already incorporate the old decision-making parameters.

I have been assured that subsequent cycles of machine learning, as the algorithm prospectively examines its own assumptions and reprioritizes its decision-making criteria, gradually diminish the effects of the initial bias. Can this slow start to algorithm generation be accelerated by feeding a greater number or cycles into the dataset, or would it benefit from combining different embryo selection methods from different clinics? 

Beyond the mechanics of deriving the algorithm, the challenge of assessing AI for embryo selection highlights the need to validate some of our long-held assumptions, as well as to backfill some of voids in our industry-wide operating model. The most basic assumption is that embryo selection by a protocol is necessarily better than random embryo selection at the blastocyst stage.

The transition from the cell stage to the blast stage is itself a stress test for embryo quality, assuming adequate culturing conditions. Numerous articles populate the medical literature comparing different culture media and comparing day 3 versus day 5 transfers, but I am unaware of any study comparing random selection of a single blastocyst versus selection by a protocol from the set of embryos that have reached that stage. (If I have missed a seminal paper or abstract that answers this question please correct me.) 

An ideal prospective study would include three arms, one each for a clinic’s standard of care, random selection and algorithm-based selection. What is the outcome spread between these three? If we are solving for dollars per baby and time to baby, these percent improvement figures are essential for quantifying the cost savings per individual patient, and the added number of additional patients that the industry can successfully treat in aggregate, if the new technique is adopted. 

Broadly speaking, assessing the value proposition for AI use in embryo selection highlights a broader IVF data problem. The under-engineering of IVF, forty years after the first successful cycle, is in large part related to the paucity of measurable endpoints that we can use to isolate and assess specific steps in what is a long sequence of contributions to a single outcome: pregnant or not. From the following list: decision to initiate follicular stimulation, daily adjustments of medication dose and timing of triggering of maturation, the mechanical steps during egg retrieval and oocyte preparation for fertilization, the mechanics of sperm preparation, the variations in technique for sperm injection (ICSI), culturing, embryo manipulation and biopsy, embryo selection, and the degree of difficulty and mechanics of embryo transfer, the only hard intermediate endpoints to assess for process analysis are percent fertilization and blastocyst development, and of those only fertilization, in some cases, has a linear, cause and effect relationship to a single endpoint, fertilization method. 

This scarcity of hard intermediate data makes it difficult to isolate individual decisions and interventions without a very large number of observations, a number that may be unrealistic given the volume of IVF done in a given clinic or even a given country. The heterogeneity of IVF patients, and the often inconsistent decision-making criteria (whether or not to use ICSI, for example) from one IVF program to another or between different clinicians within the same program magnifies this problem. 

That said, the step-wise incorporation of mechanized procedure and data-driven, rational decision-making are integral to creating a scalable IVF industry with sufficient capacity to effectively meet the needs of the massive, as yet untreated patient population. And thankfully, the entrepreneurs behind these early-stage AI projects remained undaunted by the prospect of this uphill climb to validation. 

There are ways to address these challenges. For example, the incorporation of preimplantation genetic testing, for example, adds a hard data point (euploid vs aneuploid) to the “trust the algorithm” acceptance of the complex criteria that machine learning imposes when it translates tens of thousands of “blastocyst image analysis crossed with pregnancy outcome” correlations per embryo into individual treatment recommendations. 

Another potential source of validation is by cross-referencing information derived from time lapse imaging of the embryo development. The ongoing debate regarding the relative benefits of single, static image analysis versus dynamic morpho-kinetic data may resolve to show that the two methods are complementary, and their benefits are additive. If so, AI incorporating both may be more effective in predicting outcome, and by correlating static image characteristics with identifiable growth abnormalities (the dominance of a single cell, for instance) offer physiologic correlates to the empiric conclusions of the algorithm.

Also reassuring: the multivariable complexity of the data set is a solvable math problem, addressed by increasing the size of the data set, assuming the underlying variability of the contribution the outcome for each step can be reasonably estimated. This suggests that the pace of digitization and optimization of IVF depends in large part on the ability to work with very large and very well characterized volumes of cycle data. Once adequate numbers are analyzed and a preliminary algorithm put into place, the AI system would learn from itself, initiating a continuous process of quality improvement.

For now, however, reaching critical mass is the limiting step. The cycle numbers being used to train the prototype algorithms are large relative to the amount of IVF data that can be collected from even the largest IVF programs, but is still a relatively small number for “big data” purposes, which implies a need to aggregate several or many clinics’ patients together. 

This is problematic in two ways.

First, IVF laboratories do not speak a common language. Cycle data is inconsistently recorded from place to place and embryo assessment lacks a common descriptive vocabulary, much less a reproducible and consistent digital language. 

Second, since AI, even in its “low hanging fruit” first application, embryo assessment, collects and analyzes data at the cellular level, each embryo occupies its own line in the database, requiring a unique identifier, and currently there is no system in place to combine different clinics’ samples into a uniformly accessible and analyzable form that eliminates the risk of overlap and duplication of identification. Adoption of a standardized specimen-level system of data collection is a logical and necessary step, then, towards using AI effectively. 

Artificial intelligence will be an indispensable part of the transformation of in vitro fertilization from an inconsistently performed, resource-inefficient, labor-intensive procedure to a process optimized and engineered industry, one that can adequately meet the needs a very large, unaddressed population of patients in need.