false
Catalog
Pediatric and Adult Congenital AI-ECG: From Model ...
Pediatric and Adult Congenital AI-ECG: From Model ...
Pediatric and Adult Congenital AI-ECG: From Model Development to Clinical Implementation (non-ACE)
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I'm going to keep this talk kind of geared towards clinicians, because I'm a clinician, so understanding that this is a talk about AI and AI algorithms, you're not going to find a whole lot of technical details. But if anyone in the audience has questions along those lines, we can perhaps address it during the group discussion. So I was asked to talk about what can we learn from our experience with developing AI ECG algorithms in the adult world, and I'm going to do so by showing you the journey of a single AI ECG model that we developed at Mayo Clinic, and Dr. Adhi actually should take credit for all of this. He really spearheaded much of the work I'm going to show you. And then lay out a roadmap for how we could translate this experience to pediatric and ACHD care, and then touch upon the unique challenges that we would face in this population. So the first step is certainly the development of the algorithm, and the one I'm going to show you is the algorithm we developed for prediction of low ejection fraction in adults. So this algorithm utilized 12 lead ECGs and trained a convolutional neural network on about 45,000 patients at Mayo Clinic who had ECGs and echocardiograms to detect low ejection fraction that was defined as less than 35%. And then the same algorithm was validated in an independent cohort of patients, about 52,000 of them, that were unique patients and found that the algorithm was quite accurate. The area under the curve was .93, which is really excellent with sensitivity, specificity, and accuracy of about 86%. So now that we have an algorithm and we feel like it performs well, the next step is to do prospective validation. And at Mayo, this was done within the Mayo dataset of about 16,000 patients who were prospectively collected who had ECGs and ECHOs and similar type of performance in this prospective validation study. But an algorithm that performs well in Minnesota may not perform well across the globe. So there was a concerted effort for this algorithm to really validate this at a number of centers within the country and also across the globe. And I'm just showing you here a couple of examples, one from a center in Uganda where the same algorithm was validated on 12 lead ECGs and another one from Russia, a couple of centers in Russia, in fact. So now that you have the first step down, and I think I will not be remiss to say that much of the work we are doing in pediatrics and ACHD is currently at the stage of algorithm development and actually validating it within internal and external cohorts. But now that we had this algorithm in the adult world, the next step we wanted to do was to actually put it into practice. And for this, we created a platform called the AI ECG dashboard that was really integrated into Epic. So any patient who gets an ECG at Mayo Clinic, you can go into that Epic chart and you can click on the AI ECG dashboard and it gives you the output of a slate of different AI ECG algorithms. And the one I'm showing you here is from a patient of mine who has Tetralogy of Fallot. And the algorithm we are talking about thought that this patient had high probability of low ejection fraction and indeed, she did have low ejection fraction. So the next step, I think this is really critical in building some trust in AI is to actually put it into practice in the real world and see what the impact would be. So this was done using a pragmatic clinical trial utilizing that platform that I just showed you where we randomized primary care practices, primary care providers within the Mayo network into two groups. There was one group that was the intervention group. They would receive the results of the AI ECG algorithm. The other group did not receive these results. They just had standard practice of care, standard care and compared what was the outcome of patients in the two groups. What was remarkable about the study is it actually enrolled about 32,000 patients within a span of eight to 10 months using this platform approach. I think that's something to keep in mind as we think about future studies within the small world of pediatrics and ACHD. So the primary finding was that this intervention, the AI ECG actually increased the rate of diagnosis of previously undiagnosed to low ejection fraction by about 30%. But the overall utilization of echocardiogram was similar between the two groups. And more importantly, patients who had an echocardiogram within the AI ECG group actually had a higher chance that they would have a positive result. So perhaps this points us towards the fact that these algorithms in real life might lead to better utilization of healthcare resources like echocardiogram. And now that all this was collected, this data was used to validate the algorithm. Next comes the real world implementation, which is I think the biggest challenge in this area is there's so much excitement, but where is the real world implementation? So for this algorithm that we are talking about, received FDA clearance as a software, as medical device based on this data. But Dr. Attia and our group did not stop there. We wanted to take this to the next level, take this algorithm to the clinician's offices, to the patient's home. So we collected single lead ECGs from different types of wearable and portable devices. So one was this digital stethoscope, the other one was the Apple Watch ECG and retrained that algorithm that I showed you to run on a single lead ECG and showed that the accuracy can be nearly as accurate as a 12 lead ECG, providing something that's portable that you can take into the office or to the home. But one thing to keep in mind is even if we kind of get through the challenge of implementing an AI algorithm, whether that's ECG or other things, we should not think about AI algorithms as a static tool. They are really a living and learning tool. So the way to implement it would be to make sure that there is constant monitoring of the output and the long-term efficacy, but also constant monitoring of biases that might creep into the system, whether that's racial, age, or other types of bias. So this is one such effort. So for the algorithm that I showed you, there was a study that looked at how does the algorithm perform prospectively, and this is monthly data of the area under the curve over a course of a year, showing that the algorithm performance is stable. So this is something to build into the system when implementing an AI algorithm. So to kind of wrap that up and provide a roadmap for when we think about this type of discriminatory AI algorithm development, it starts with development and validation, but it's a cyclical process. You go through real-world testing, implementation, and then monitoring and retraining of the algorithm should be built into the implementation of these algorithms. So now that we have seen what happened in the adult world, how does it translate to children and patients with ACHD? So I'm going to show you some parallel studies that we did at Mayo focused on children and ACHD, and the next speaker is Josh, is actually going to show much of the excellent work that he has done along these lines. So now that we are talking about low EF algorithm, we wanted to create something for children, and we thought that there should be a novel algorithm for this, because fundamentally there are unique features in the pediatric ECG that are not present in the adult ECGs. And not only that, the pediatric ECG is really dynamic. There is a lot of changes in the pediatric ECG with growth and development of the child. So this one was a novel algorithm that we developed looking at two things, two algorithms. One was for RV dysfunction and the other one for LV dysfunction. So looking at the RV dysfunction model, really good area under the curve and good accuracy area under the curve was 0.9, and the LV dysfunction model, the curve to watch for is actually the one in red. That's the curve that looks at LVEF less than 35% in children, and that model area under the curve was 0.93. Now we were obviously curious about how does the adult algorithm perform if we applied it to children without any form of retraining of the algorithm? And that's the curve that you're seeing here in blue here, and the area under the curve was less, about 0.87, but really not too bad, right? But it really underscores the importance of developing these algorithms in the population where you intend to use it, and make sure that your use case is where you actually develop the algorithm. We also performed external validation of these algorithms in an external cohort from Texas Children's. Again, the RV dysfunction model in green, area under the curve of 0.87, and the LV dysfunction model area under the curve of 0.82, so this was without any retraining of the algorithm, so holds up in an external cohort as well. Now how about ACHD patients? These are adults, and can we translate what we know of adult algorithms into the ACHD population? So this was a pretty straightforward study. We did about 8,200 ACHD patients with ECHOs and ECG, and applied that same adult algorithm, and the area under the curve was about 0.86. And not surprisingly, the adult algorithm performs better if you have mild ACHD, compared to a more complex ACHD pathology, such as univentricular hearts. So to wrap it up, I think the future is bright. As we think about the future of healthcare being more preemptive, personalized, and accessible everywhere, I do think that AI would play a major role in making this happen for our community, and for the community of patients. But there are a lot of questions that remain. So one question that comes up is, you know, these are, CNN networks often feel like black boxes. There have been many questions about, there have been many efforts to try to explain what these networks do, but they fundamentally remain black boxes. And I think from a, as a clinician, investigator, I feel that if we can crack that and understand what is it that the CNN is looking at, that we as expert clinicians don't understand, there's going to be new pathophysiologic insights, new insights into phenotypes of these patients that will, I think, benefit us in the long run. But beyond that, I think there are lots of procedure and process-related issues that need to be resolved. For example, if, you know, how do you get regulatory approach for this? What is the financial structure of implementing something like this? And in that context, how do you make sure that the access to these technologies are equitable? And more importantly, what is the regulatory meaning of a system that is perpetually learning and relearning by itself? So it's not like approving a drug and knowing that this is how it's going to work in the real world. And then there are obviously unique challenges of using AI in pediatrics. I think it goes without saying that we are looking at rare conditions in pediatrics and ACHD and with a great deal of anatomical heterogeneity. And there is a lack of large data sets with quality data, and that's where I think there is a real need for multi-center collaborations to make this happen for our children. What is the real world impact in children? We have to think a little differently when it comes to children about unnecessary testing and anxiety. What are the ethical considerations? How do we make sure that existing biases, whether it's racial, gender, or geographical bias, doesn't get translated to these and amplified by these AI models, but rather be solved by the AI model? So I think these are all questions that we should be talking about in a collaborative way, and I look forward to the rest of this presentation. Thank you. Thank you very much. As is the practice in most of the core curricula, we will hold questions until the end of the presentations. Our next presenter is Dr. Joshua Mayorian from my institution, Boston Children's Hospital. And the title of Dr. Mayorian's talk is Applications of AIECG in the Pediatric and Adult Congenital Population. So, while it's loading, hi, everybody, I'm Josh. Thank you, Zaki, and thank you, John, for the invitation. Great talk, Malini, and I'm excited to hear the rest of the presentations. So, today, I'm excited to talk about current and future applications of AIECG in the pediatric and adult congenital population. So, globally, there's a maldistribution of pediatric cardiology expertise worldwide, and as shown in this figure, low- and middle-income countries have disproportionately higher rates of CHD mortality, as shown in red. Even in the US, our field is becoming more regionalized. This really underscores the need for accessible and inexpensive technology to support high-quality care. So, today, I hope to convince you that the ubiquitous and inexpensive nature of ECGs makes AI-enabled ECG, or AIECG, a promising avenue to fill this technological gap. So, first, some background on AIECG, and I know Malini had gone through this already, but, and disclaimer, I apologize in advance. I'm gonna leave out a lot of landmark papers here, but I think it's fair to say that Zaki helped pave the way for AIECG in adult cardiology in 2019. And then, since then, this field has really rapidly evolved. So, there was an AIECG clinical trial in 2021, use of AI-enabled smartware in 2022, and even today, in 2025, Evan and his group at Yale are putting out high-impact AIECG and AI-enabled smartwatch work, really demonstrating the abundance of use cases for AIECG. In comparison, AIECG for pediatric and adult congenital cardiology is nascent, and humbly really started last year with our group. But we're lagging by five years, and I think, importantly, there are unique considerations that have been pointed out for the pediatric and CHD ECGs that require tailored models. So, today, I'll provide an overview on AIECG models we've developed since 2024, and a real focus on screening and risk stratification applications, and then I'll go through some of our ongoing work. So, first, I'll go through one of our tools that could aid in pediatric screening efforts. So, in this first model, we aim to assess whether AIECG can provide expert-level, automated interpretation of the ECG. To develop this model, we took about 600,000 ECGs from 200,000 patients at Boston Children's, each with interpretations from pediatric cardiology experts. 56% had any ECG abnormality, 1% had WPW, and 5% had prolonged QTC. So, then we trained the model to take the 12-lead ECG digital waveforms as inputs, and predict the expert interpretations. So, here I show the model's ability to detect WPW, but in reality, the model's able to provide a comprehensive list of all ECG diagnoses. So, to look at model performance, we use something called the precision recall curve that's commonly used for rare outcomes, such as WPW. So, in this case, the x-axis is the sensitivity, the y-axis is the positive predictive value, and the higher up to the right you are, the better the model is performing. So, you can see that the AIECG model, which is in blue, clearly outperforms commercial MUSE software for interpretation of WPW, and that adding agent sex to the model, which is in orange, provided minimal benefit. So, then we asked four blinded experts to re-adjudicate ECGs, where there was a discrepancy between the AIECG interpretation and the original reader of the ECG, and in general, the experts agreed more with the AIECG than the original reader, and we found similar findings for prolonged QTC and detection of any ECG abnormality. So, we envisioned this tool that could be particularly helpful for screening programs for WPW and prolonged QTC, especially in areas with limited access to care. So, the next model, we aim to see if AIECG can detect critical CHD in infants, and the motivation here is fetal echoes are not universally available, and while the pulse-ox screening is effective, it has limited sensitivity for certain lesions, such as coarctation. So, we trained a model on about 60,000 infants, less than or equal to one years old, and internally, that cohort had about 21% of some form of critical CHD. We then externally validated the model using publicly available data in China, where about 26% had critical CHD, and as you'll see in these AUC curves, we achieve an AUC of above .9, both internally and externally, and what we found was that not only it does well for detecting the composite of critical CHD, but we actually can use it to identify each individual form of critical CHD, so for example, including coarctation. So, we envision this tool could help complement pulse-ox to enhance CCHD screening, and possibly even help detect individual CCHD lesions in low-resource settings. So, to jump onto tools that may aid in risk stratification and resource utilization, so what we did was we trained an AIECG model to predict LV ejection fraction, less than or equal to 40%, across the congenital heart disease lifespan, so that was nearly 70,000 patients at Boston Children's, and 40,000 patients at CHOP with the help of IVOR. So, the model achieves an AUC of about .95, both internally and externally at CHOP, whether using the first ECG for a patient or their last ECG, meaning that it works really across the lifespan, and even when we go lesion by lesion, the model works quite well, and interestingly, the model is predictive not only if you have current LV systolic dysfunction, but it's also predictive, will you have a future onset of LV systolic dysfunction? So, for patients that are deemed low-risk by AIECG, we think that this tool can help reduce unnecessary echoes, whereas for patients that are deemed high-risk, it could lead to a closer follow-up and or earlier initiation of GDMT. So, lastly, we trained a model to detect five-year risk of mortality in about 80,000 patients who presented to our cardiology clinic. So, this was performed across a wide range of ranges and CHD lesions, and as you can see in this survival analysis, both internally at Boston Children's and externally in Toronto, AIECG can effectively stratify patients with repaired tetralogy of fallot, where you can see that patients that are deemed low-risk in green compared to those that were high-risk in red, being pretty good at discriminating patients, both at BCH and externally at Toronto. What we also found is that the same model, which is predicting all-cause mortality, can even predict future risk of sudden cardiac death. So, we envision this tool can be helpful in reducing expensive risk stratification testing, such as MRI and tetralogy of fallot, or help prioritize patients for closer follow-up, or finding ways to integrate this AIECG with other modes to improve diagnostic precision care using multimodal methods. So, the question is, where do we go from here? I'll give an overview of ongoing work and what we're actively thinking about in the background. So, as you may have picked up, our inputs for these models are digital 12-lead ECG waveforms, and practically, this digital data, or the digital waveforms, are not accessible in low-income regions where this technology is probably needed most. So, Evan and his group at Yale came up with a creative workaround for this issue by creating AIECG models in adults that simply require photos of ECG printouts as inputs, rather than requiring the digital waveforms. So, our group's been working on making similar technology. And the idea here is we take the digitally-stored ECG waveforms, and we generate synthetic ECG photos that are like printouts, that look like printout 12-lead ECGs. And then, so you can see, here's an example of what the photo would look like. And then we incorporate real-life augmentation using different types of noise that are encountered with rotation, zooming, blurring, contrast, and different layouts, such as, is it a 12-lead ECG, is it a 15-lead ECG? And then we take these augmentations and these images of ECGs and input it into the AI model. So, here you can have an example of the synthetic ECG image. And then, here you can see many different iterations of augmentations added, such as rotation, blurring, gray scaling, and so on. And what we found is that when you look at the AEC curve, the photo-based model, which is in blue, does nearly as well as the digital waveform-based model in orange to detect any ECG rhythm abnormality outperforming muse. So, now we're working with several centers across the country to externally validate this model. We're also working on making our own AI-enabled smartwatch models to predict LVEF, which will help facilitate remote patient monitoring in our most vulnerable populations. So, think of cardiomyopathy and single ventricles. So, to train this model, we take single-lead ECGs, throw in real-life noise into the single-leads that are commonly seen with wearable devices. So, we found that this model works quite well across multiple centers, including Boston Children's, CHOP, and Toronto. And this is really just a quick plug to say that I'm gonna be presenting this work as a poster session right after this. So, feel free to stop by to learn more about this work. And to end this talk, I really wanna take a more practical and humbling approach. And that's to say, our group is what I'd call in catch-up mode right now. We're trying to catch the adult colleagues and all the wonderful work they're doing. And while we can create all these AI-ECG models, they're useless if we can't use them effectively and safely. And I think Chad GBT also agrees with me. So, the real crucial step here is, as Melina had pointed out, clinical implementation of these tools. And to do so, we need rigorous prospective studies, pragmatic clinical trials, much like Zach has already done for adult cardiology. And this is necessary to make sure we're performing correct, making correct clinical use cases, building physician trust, and safely and seamlessly using this technology within a hospital system across diverse cohorts. Thank you. So, our next speaker will be Ivan Economo, talking about AI multicenter collaborations. Hello, everyone. I'm excited to be here with you today and present some of our ongoing work in the Cardiovascular Data Science Lab at Yale. These were amazing talks, and I think, actually, Josh just covered some of the key points that I was trying to, that I'm going to highlight in my talk, and actually, I think you've caught up with us. You're doing amazing, phenomenal work. My disclosure is that I'm not an electrophysiologist, and I'm not a congenital heart disease expert. I'm a graduating general adult cardiology fellow, but also an investigator in the medical AI space and a member of the Cardiovascular Data Science Lab at Yale. And I hope that some of the experience that we have in our lab and some of the efforts that we've done so far, you will find useful as you try to navigate the next steps in this space. These are my disclosures. Now before we dive deeper, I just want to take a moment to appreciate how far we've gone in medical artificial intelligence in the last few years as we've quickly moved from models that could simply encode human knowledge, automate tasks that humans could do, to deep learning-based models that could learn abstract representations from input ECGs, input echocardiograms, detect, screen for hidden labels, such as a key example is Zachy's work with AICG screening for left ventricular systolic dysfunction and how that redefined this whole space and this whole area of research. More recently, we started to talk about foundation models, models that are task agnostic, multimodal, can generalize to tasks that they've never seen before. And in the last year, we actually now talk about AI agents, which are systems that are end-to-end, can reason, can plan, can autonomously make decisions and learn, and they practically work like an AI data scientist. Now with all that rapid evolution, I will say it often feels humbling to be an investigator in this space. So it's always important to redefine our mission, and our mission in the Cardiovascular Data Science Lab at Yale is to make sure that as clinicians and as data scientists, we work to make sure that those tools are low-cost, they're scalable, and they're globally accessible. Because a lot of the feedback that we get is that there's been a lot of work being done in this space, but the end-user often does not have an easy way to interact with those models. And that's important to actually make sure that there's trust on behalf of the medical community in those models. If we track the evolution of the hardware and the devices that we use to acquire the data that we use to train our models and deploy our models, we'll see that there's a similar evolution. As we're moving from 12 lead ECG devices, bulky transthoracic echocardiography devices, to handheld portable devices such as smartwatches, smartphone-adapted POCUS devices, and there's a lot of exciting innovation in this space. We're moving towards smart clothing, wearable ultrasound patches that can enable ambulatory 24-7 acquisition of that data in the community. So one of the questions is, who are we building our models for? And there's definitely a need in building AI models that can automate, as we said, what humans can already do. So if we take the example of a patient with hypertrophic cardiomyopathy, a scan, a very good scan acquired by a certified echocardiography tech, interpreted by a board-certified echocardiographer. And certainly there's value in building AI systems that can automate this process. But perhaps AI can really help us most where we, sorry, with low-quality acquisitions such as the ones that we acquire from point-of-care ultrasounds. These are examples of real-world scans that were acquired across our system and the Mount Sinai Health System. These were ED providers scanning for other abnormalities. But we want to see whether we can use this kind of information, so off-axis views, views acquired within a few seconds. These are not standard transthoracic echocardiography protocols to screen for undiagnosed disease. And we've actually trained now models. And every time we train an echo model in our system, we make sure that we introduce real-world noise from point-of-care ultrasounds. And we make sure that we adjust how we train those models to make sure that they perform as well as they can with low-quality acquisitions. So we validated this concept across more than 90,000 POCO studies that were done in our system in Mount Sinai. And we see that we're actually able to maintain robustness in performance. We can still screen for hypertrophic cardiomyopathy, amyloid cardiomyopathy, aortic stenosis from low-quality acquisitions, single VD acquisitions of a person along axis that was perhaps off-axis, that was perhaps acquired for a different reason. And obviously, the operator was not probably screening for amyloid or hypertrophic cardiomyopathy at that time. Moving on to L-bleed ECGs, as Josh mentioned, there's something to be said about the scalability of actually directly working with ECG images. And that's the feedback we got from a lot of clinicians, not just internationally, but also here nationally, that they want to have a way to interact with those systems. And even though many of them are not really approved for clinical use yet, we're getting there. Some of them already are. But we want to make sure that anyone who's interested, any researcher, any clinician who wants to check their model and see if they can actually be confident in its predictions, we want to make sure they have a way to do that. So I want to highlight work done by our group and led by Dr. Dhingra, one of the postdocs in our group, who's actually formed a way to train models against all possible layouts of ECG images. All sorts of ECG layouts, all sorts of image backgrounds, line thickness, all the possible variations that you can think of, including screenshots, including actual ECG printouts to make sure that the model is as robust as it can be. And we again and again see that those models perform very well in screening for different forms of structural heart disease. And they actually perform very reasonably and very well at the same level as the signal-based models. So that clinicians can easily access those models, and they can actually run some sample cases to see if they believe that the predictions are actually valid. So whenever we build a model now, we actually make it available on our lab's website. There's a way you can actually just go there. You can upload a screenshot or an image of an ECG, and you can get a prediction right away. For hypergraphic cardiomyopathy, amyloid cardiomyopathy, simple ECG interpretation, you name it, if we have a model, it's publicly accessible there. Same thing with transitioning to one-lead ECGs, the ECGs that we can acquire from handheld devices, portable devices, wearable devices. We know that we can actually mimic the real-world acquisition noise and make sure that we train models that are robust enough, even if the input is just a single-lead ECG. And in many cases, we just get minimal degradation in performance that we can still effectively screen for various forms of structural heart disease from that minimal input. Again, that allows us to make models that are actually, we can share with the world for further research use. So we actually now, for those wearable models, we embed them into the CardsPlus application that you can, there's a QR code that leads you to our lab's website. You can download it. This is a scientific academic effort. So any model that we've trained for a wearable device is actually available there. So people can go to our lab's website, they can download the app, and then they can link it to a wearable device like an Apple Watch or like a Cardia or you name it, and you can run inference on the model that, on the signal that you acquire over 30 seconds for different hidden labels. And obviously, this is only meant to be used for research use right now, but we've actually seen that there's a lot of good feedback that we get by having the models out there. People can use them and can give us very timely feedback on how to optimize the deployment of that. And that really allows us to scale those applications, as I'm going to describe later. Another thing that I want to highlight is the importance of multi-modal inputs. So a lot of what we talk about is AICG, but the truth of the matter is that for many of the patients that we see, especially in the cardiomyopathy space that we work on on the adult side, but also in the congenital heart disease space, those patients will also have a lot of other information. And clinically, when we evaluate a patient, we will integrate all that information to form our clinical judgment. To take the simplest example, we normally have an AICG, but we also often have an echo available. So why not integrate all that information that we have to make a more reliable recertification judgment? Again, much of what I'm presenting here comes from an older population. We do a lot of work in the amyloid cardiomyopathy space, a condition that affects an older population, obviously, but we know it's very, very underdiagnosed, and that's a wonderful example for those modalities and how they can offer some incremental value. And we consistently see that when we integrate information from different modalities, when we take the positives from the AICG and the positives from the AI echo, the overlap of that is actually consistently a much more reliable predictor of who truly has the condition of interest. And that's because AICG has its own biases, like the false positives on an AICG have a specific phenotype, the false positives on the AI echo have a specific phenotype. So when you take the overlap of that across many different modalities, you're much more likely to define a population that is much more likely to have the condition of interest. And that concept generally applies across many different modalities, but I just want to put that out there, because when we scale those technologies, when we deploy them at scale, we really want to minimize how much we, how many false positives we have, because we want to make sure that those screening systems are sustainable. Now with that in mind, we've been able to actually scale those systems across many international collaborations. A lot of those are led by our group at Yale in the Cardiovascular Data Science Lab, but there's also a lot that are led by other groups that are actually borrowing or using this technology and deploying it at scale using ACG images, point-of-care ultrasound devices. And one effort I would like to highlight in particular is the TRACE-AI network. And this is not in the adult congenital heart disease space. This is meant to screen for undiagnosed amyloid cardiomyopathy. But I want to present that as a blueprint of what I think an AI collaboration would look like in 2025. So the effort here is to identify those patients that have undiagnosed amyloid cardiomyopathy across many health systems in the U.S. There's already more than 10 systems that are in part of this process. And we've built algorithms for the EHR, for the point-of-care ultrasounds, for echo, for transthoracic echocardiography, for ECG. And we've actually deployed those models across every system. But we do that in a way that protects patient privacy. We send the models in an application to any interested party. And they can run it locally. They can run it in their system. We don't have to get access to their data. And we see that there's a lot more people who are actually happy to contribute when that happens. And also the fact that we can use ECG images, which some centers may not have ready access to ECG signals, really allows us to scale that to maybe some smaller centers and some centers that would not have contributed data otherwise. So some final thoughts about what does that mean about the congenital heart disease space. And again, this is an outsider's view. But there is a lot of concerns about, and some of them were already mentioned in the talks earlier, that there's a lot of, that there might be some data drifts or data shifts that are more pronounced in this space than in, like, the unselected adult population. Mostly because we work in referral centers and there is probably the population or the distribution or the phenotypic composition of that population might differ more than what's usual among different sites. Another question worth asking is, what is truly the lowest phenotype in this space that really depends on the composition of the population that we train our models on? Is the risk modifiable? And again, I think that this is also a wonderful example where multimodality imaging, multimodality phenotyping can be truly a game, like a game changer. With that, I would like to thank all of you for your attention and our lab for all their amazing work. I'm happy to take any questions later or stay later and answer any questions you might have once the session is over. Thank you. Great. Thank you. Thank you. Our final speaker in this session before we open it up for questions and discussion is Dr. Katia Bravo-Jaimes from Mayo Clinic as well. The title of Dr. Jaimes' talk is going to be Clinical Trials and AI in Limited Resource Settings. Thank you, Dr. Treitman and Dr. Atiyah for the invitation. I'm happy to be here with all of you and we're going to get a little bit real today. So in this space, I'm going to talk about a personal experience. I am originally from Peru, and Peru is such a beautiful country. Hopefully many of you will visit soon. It has 34 million people and is the third largest country in Latin America. It's very multi-ethnic, but it also has high inequality with a gene coefficient of 40.3. We see real walls that are built in Lima. This is in my hometown. And we also see that congenital heart disease is a public health problem in my country. Analyzing the data that is publicly available, we saw a 20-time increase in the congenital heart disease diagnosis after the congenital heart centers were opened in Lima. However, the mortality had not really changed in the last two decades. We saw that congenital heart disease at age-adjusted mortality was greater than four times the global average. And this is similar to countries like Sudan or Afghanistan. We also analyzed the disparities across different regions in Peru and saw that many of the problems that were all coming ultimately to the capital, Lima, were related to really late detection. And this was very highly prevalent in regions like the Andes. We saw that in the Andes, as you know, hyperbaric hypoxia makes it very, very hard to really detect neonatal critical congenital heart disease via pulse oximetry. We took this as a personal problem and led the Andes CHD study where we were able to depict how neonatal oximetry varied across the babies with normal hearts and those with critical congenital heart disease. With this data, we created three different algorithms that were adjusted for high altitude. However, when we look at the performance of these algorithms, we saw that there was still 27% false positives in high altitude regions. This is very important because high altitude constitutes a very, very unique physiology. The babies who are born there have a very slower remodeling of the RV. Their ECGs are going to look different than the babies from sea level. And also, we know that the skin tone plays a big role in how pulse oximetry detects hypoxemia. And in adult data, we know that 20% of cases of severe hypoxemia get missed in those who have dark skin tones. We also demonstrate that early detection isolated to just the capital still led to significant mortality. And this is a study led by mentees in Lima where we saw that 65% of the kids with critical congenital heart disease were dying, even when they were being born 30 minutes away of the congenital heart disease center. This led to a huge movement of pediatric cardiologists, surgeons, and patients. The most important support that we've had throughout has not really been from the Ministry of Health. It has been from the patient association. Amigos de Corazon and the whole coalition of physicians led to this law that was passed last year for critical congenital heart disease screening in Peru. This also led to what next? What happens after we have decided that neonatal oximetry use is limited? Then it came this possibility of applying AI in this context. We are performing a multi-center study where we have four different countries that are enrolled and we're using a point of care device called the Eco500. So far what we have is different kind of lessons that we are going to share today. First of all, what is the ideal AI algorithm in low and middle income countries? First it has to be age specific. We have reviewed the data in pediatrics but there's many, many differences with the newborns as well as fetuses, school-age children, and adults. This performance algorithm has to be robust and that's why we need multicenter collaboration. Perhaps we also need to think beyond congenital heart disease and think about structural heart disease as the colleagues from Yale have shown us. And in this space, it might be worth partnering with the rheumatic heart disease efforts that are ongoing. We are gonna need point of care detection and this might be very hard with the 12 lead ECGs in newborns, that's why we thought of the digital stethoscope as a potential alternative. And we also have to be very, very aware of the context. It's not the same to do an ECG for mortality in Boston than in the Andes, right? And ultimately, the access to care question. What if we are creating a problem much more than a potential solution? So these are some pictures of the places where we're taking the study. This is Huancayo in the heart of Peru. We know that their NICU is actually very well equipped, contrary to what many people could think. They have a lot of support from the government and these are some of our baby enrollees who are getting their digital ECG with three leads and we can see the clear signals when the babies are able to collaborate. We also see the significant limitations and this is the newborn nursery that was literally collapsing in front of our eyes while we were doing the study. Thankfully, no baby was harmed. They were all evacuated and it also leave us a sense of resilience because this is not something that they haven't seen before. This happens every single day. You can see in the Dominican Republic, you know, that accident that happened. Similarly in Peru, the ceilings are falling but they also are very, very resilient and this phrase that it's in front of the NICU makes us realize that God gives their best battles to their best warriors. We have to really make sure that all of our efforts in global health and also in AI implementation in low and middle income countries do not represent a threat or architectural problem as you can see in these actual pictures, again, from Peru. And we also have to realize that in this stage, right, very, very preliminary of model development and validation, we still have to follow with effectiveness study compared with oximetry, compared with handheld echo and also see from the beginning what are the barriers for implementation and in that sense, we have many very concerned parents thinking that we are giving the disease to the children. Therefore, an effort to educate the population is really, really needed to empower our patients who carry a diagnosis, to build robust networks and to create national surveillance systems are super important. We can use AI also to educate the population and this is one example with the Sunco social media platform where we are doing content related to congenital heart disease education in Spanish using AI with a second to fourth grade level language. Sunco means heart in Quechua, the native Inka language and I hope you can all share the QR codes with your Spanish speaking patients because this is directed to grow such as the Inka empire. I wanna close with this phrase by Paul Farmer, the idea that some lives matter less is the root of all that's wrong with the world. Thank you for your attention. Thank you. Well, on behalf of the entire audience and we as moderators, I wanna thank all four presenters for spectacular presentations that really covered a broad range of perspectives on our topic for today. I have a couple questions in the moderated Q&A through the app and feel free to upload more. I will say the questions I have there are technical and I'm not sure we can interpret them properly. So anyone who has asked a question or has a question that they would like to generate now and ask, please feel free to come to the mic or submit it through the app. And I'm gonna start things off. Do you wanna ask first? Go ahead, you go ahead and take the mic first and I'll save my question for after you. Talks everybody. I had a question for Malini and Zaki. For the algorithms that Anumana took to FDA approval, did the FDA give you any guidance on what performance needs to be in order to achieve approval? Or do they just sort of hold their cards, you show them how good it is and they just say yes or no? I think I'm gonna let Zaki answer that. I'm not too familiar with the example. Yeah, so we are actually, we just published the study about lawyer for FDA approval. They were asking for specific, that we will report specific sensitivity and specificity. Meaning that we would say we would have sensitivity of 80% and specificity of 85%. They don't care about AUC, they don't want to deal with thresholds and so on. They want a binary test that says positive or negative. And, but it was a process. And actually the people we worked with in FDA were extremely diligent, very professional, really understood the problem and the science. I was, I have to say I was pleasantly surprised with working with them. Really, because it's a new field and it moved very fast. But they did a great job and kind of told us, it was a joint process with them. And then we had to ask, we had to say these are the specifications that we will have. And then we did an external study. And they did require some ethnicity diversification and so on. So they said what is the minimum in each group, for example. And then we had to show that we passed the bar that we set. Awesome, thank you. All right, and while I wait for more questions to arise from our highly engaged audience here, I wanna toss something out that I would like, hopefully all of the panelists to chime in on. You've all alluded in one way or another to trust and explainability. And the obstacles that will exist to getting AI algorithmic diagnoses accepted by clinicians as part of their clinical practice. What do you see as in each of your laboratories as or in the case of deploying them, perhaps in a low resource environment, Dr. Jaimes, what do you see as the next important challenge that you need to address in order to take a step in that direction? Well, I think there are two perspectives always. One is from the clinicians who obviously wanna learn where this is originating from. And I think the prior studies that Josh has shown are looking into the saliency maps and many of the explainability of these models. However, when you go to the patients, they don't really care. They just want something that helps them, right? And I think that they are much more willing to engage in research, much more than what we, the clinicians, think. I think the question of explainability and trust is actually really important. And I think it's going to be important both for patients and for physicians moving forward that we establish trust. But trust in this space is, I think, defined a little bit in a soft way. And at least in the pediatric and ACHD world, one way to build that trust is to show the data and show multi-center data and large data. But also prospective data and what really happens if you put this out in the real world in a systematic way, in a way that we compare what this algorithm does to the practice and the workflows and the outcomes of our patients compared to standard of care currently. I think that's where our community should go next, would be to actually provide that systematic data in some pragmatic clinical trial fashion. I'm just gonna echo points that were already brought up. I think the root part of it is building trust. And I'll say there are two ends of the spectrum. And one of the spectrum is you can build trust by asking the AI, what are you looking at to diagnose WPW? And then with the saliency mapping, hopefully it's looking within the delta waves to tell you I think it's WPW. If a physician was exposed to a bunch of examples with these saliency maps and the AI is pointing in the right direction, hopefully that will build trust. That's one side. The other side, let's put the explainability aside. In some sense, the proof is in the pudding. If you had a model that was 99, 100% accurate, at some point people will accept it even if you didn't have the explainability. But what that requires is having that in physicians' hands across the country and doing the multi-center external validation and deployment with prospective studies. So I think both areas need to be explored. And I think as we've pointed out, we're really in our space at least in the model development space. And to build that trust, we have to explore both of these options of explainability and doing these trials. I agree with everything that was just mentioned. I wanna say that in terms of explainability, we often present those saliency maps that a given area lights up on an ACT or an echocardiogram. And if that agrees with our preconceived notion about what the AI should be looking for, then we feel okay. We feel like the model is working. But if not, then we might not feel as comfortable using that model. But actually I think we should be spending more time validating those models as broadly as possible across many different populations. Because those models are not smart. They often take shortcuts when they make their predictions, whether these are demographic shortcuts that make us obey biases, whether these are the simply learned associations that are very specific to the population where they were trained. And there's no substitute for validating those systems at scale across many different cohorts, many different populations, different age groups. We wanna, that's the goal. We wanna, I think that's the key. And that's what is gonna build trust. And we also need to make sure that people use them. And at least they can get their hands on the model. And it's not just a black box that they can only access when they're allowed to five years after it's been published. So I feel like there's a lot that we've learned by having people just play with the model. And then they come up with new things that we haven't thought about. And then we implement them. And there's like a faster cycle of like building a better model. Great, again, we have an open mic, but no one's standing at it. I guess we do. Okay, I'll make, as sort of one of the older people here, I feel like since I just turned 60 last week, I can make a comment. When we went to college, there was no computer. I sat there and wrote a paper with a typewriter with white out and started it all over when the page was done. And it took a little while to get used to it. I think that this is fantastic. This needs to start in medical school. This needs, we need to get, you're a 60 year old physician who's used to looking at echoes and diagnosis hypertrophic cardiomyopathy. It's foreign to them. And it's a major leap to them. And it's good we're doing this, but we need to figure out how early we're educating medical students with AI, how they integrate into their clinical and then to fellows and to residents as well. It needs to be sort of part of the core curriculum. So for the training guidelines that come out by ACC and AHA, there needs to be something in there that we're starting at a younger educational age because I don't think you're going to change the tail end of that. And I think that's one way to do it. And we, for those of us that are involved in medical schools and education and stuff, we have to be really proactive in getting that done. Yeah, I would almost argue that perhaps the challenge of trust may be a generational one. Good point. I have one last question. And that is, we've talked a little bit importantly about validation. And Evan brought up the really fascinating topic of federated learning models. Is it better for us, expand on that. At what point will it be deemed insufficient merely to have generated the model and validated it at one other center and how can federated learning begin to help us achieve this goal of a truly diverse and representative multicentric model development? So I mentioned the word federated and I wish it was federated learning. What we've done is more like deployment at scale. But ideally, if we think about how would this model, how would we train this model? We've trained this model in the data that we had access to, our local data in our EHR. And then we deploy that across every single center. Ideally, we've used data from every single center around the country to build a model that will be the most representative model possible. A model that will learn all different demographics, all different referral patterns. A model that is by definition gonna generalize better. And I think we're eventually gonna reach that point because I think there's a lot more, like there's many teams across many health systems that are working towards this end. And I think at some point we will, and we're increasingly seeing that where people are joining efforts and they're working together. But I think there's still a lot of hesitance in terms of who gets to publish what. And I think once we overcome that and realize that the best model will be the model that we all work on together, then we're gonna eventually reach that point. But are we there yet? I don't know. Any other comments on multicenter collaboration? I'd just like to also point out the concept of federated learning I think sounds wonderful and working together to have the best representation of a model. But as has been pointed out, it's really difficult to execute. I mean we're part of, several of us are on ACPC AI meeting. And I'd say we have work groups monthly. And every monthly meeting that we have we talk about how can we work together to create a model that aggregates all of our data. And we're over a year out and I feel like we haven't made much progress. And I think if any progress. And I think that one of the bottlenecks at least is the red tape involved with aggregating data across institutions. Where does the data sit? Who gets the intellectual property if any? What DUAs are necessary? There's so many pieces that need to be sorted out which helps explain why on the adult side they've been developing these models for five years and there's little to no federated learning applications. I can think of a handful of papers. So it's just really hard to execute. And I'd love to find a way to make it happen because I agree it's necessary. I think one last comment. So completely agree with you. I think that the benefit in adult is that we have so much data that federated learning is not as needed. But especially in congenital diseases, pediatrics, the amount of data each institution has is usually not enough and definitely not diverse enough. Well, we have to have a way to solve this. I think looking at the doors and saying, they were able to do it without it, it might not fit very well. So I completely agree that we should work together and find ways to get these models to patients but make sure they also work for every patient because when we train the model, it gets enough representation. Thank you. Well, we've hit the top of the hour. And on behalf of Zachi Atiyah and myself, I would like to thank the panelists for their excellent presentations and interaction and the audience as well. And thank you very much.
Video Summary
The discussion focused on the application of AI algorithms in healthcare, particularly in the field of cardiology. The conversation highlighted the development and implementation of AI models by various institutions such as Mayo Clinic, Yale, and Boston Children's Hospital. Each speaker presented the progress and challenges facing the integration of AI into medical practice. Key topics included the development of an AI model for low ejection fraction detection using ECG data, the need for validation and real-world testing, and the importance of multi-center collaborations to optimize AI models for diverse populations. The speakers also addressed the challenges of implementing AI in low-resource settings, emphasizing the importance of creating age-specific and culturally sensitive algorithms. They acknowledged the necessity of building trust in AI systems among clinicians and patients, suggesting that this might require both explainability of AI decisions and robust validation studies. The discussion concluded with reflections on the importance of educating the next generation of healthcare professionals on AI technologies and the potential of federated learning to develop more representative and generalizable AI models.
Keywords
AI algorithms
healthcare
cardiology
Mayo Clinic
AI model
ECG data
multi-center collaborations
low-resource settings
federated learning
Heart Rhythm Society
1325 G Street NW, Suite 500
Washington, DC 20005
P: 202-464-3400 F: 202-464-3401
E: questions@heartrhythm365.org
© Heart Rhythm Society
Privacy Policy
|
Cookie Declaration
|
Linking Policy
|
Patient Education Disclaimer
|
State Nonprofit Disclosures
|
FAQ
×
Please select your language
1
English