false
Catalog
AI and EP: Here to Stay!
AI and EP: Here to Stay!
AI and EP: Here to Stay!
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
in heaven and here to stay. Thank you very much, and thank you for the board for inviting the talk. And let's get this started. And we are going to have a lot of fun today. OK, so what can we do, and where are we going? Here are my disclosures. So this is a really large topic, and so I'm going to focus on what we can use these technologies in EP right now. So the foundational example of what we can actively do is reducing false positive alerts, essentially using these technologies for rhythm discrimination. This happens in a lot of different forms. But just to give a discrete example, these are examples of convolutional neural networks applied to implantable for waveform data in implantable loop recorders to classify atrial fibrillation and reduce false positive alerts. So on the left of the screen, you can see an example of a 66% reduction in false positive alerts. And on the right, you can see an example of overall reduction in transmissions when you utilize some of these technologies. Now, if we take the 12-lead electrocardiogram for an example, this is a very well-known paper in which the authors developed a convolutional neural network on the 12-lead electrocardiogram for the detection of atrial fibrillation from a sinus rhythm ECG. This was developed in over 180,000 patients. It was retrospective. And the authors used atrial fibrillation noted on a single electrocardiogram that was physician verified as the definition and the ground truth for atrial fibrillation. And you can see here that the model performed quite well in terms of the area under the receiver operating characteristic curve for discriminating which patients would actually go on to develop atrial fibrillation. Just as a general point, I may breeze past some of the area under the curves a little bit, just because all of the published papers that you're going to see, the models perform pretty well. If the model doesn't work, most often people are not publishing it or submitting it. But what's really interesting is now we can apply some of these technologies into procedures. So this is the tailored AF trial. It was presented at HRS last year and recently published in Nature Medicine. So in this study, the authors randomized 374 patients with persistent atrial fibrillation one-to-one to either tailored ablation of AI-detected spatial temporal dispersion plus pulmonary vein isolation versus pulmonary vein isolation with up to two additional lines of ablation if it was a redo procedure. Patients had a year of follow-up with halter monitors and weekly cardio recordings. Now, I want to highlight that this is really a PVI-plus kind of trial. It's not really an artificial intelligence trial. And the reason I say that is the authors utilized supervised learning, essentially having experts annotate regions of spatial temporal dispersion, of which you can see an example here, and essentially train the model for what spatial temporal dispersion looks like. And then the model can automatically annotate those during the course of ablation with the use of the mapping catheter. So it's really an ablation strategy trial. But nonetheless, the results were quite impressive. If you look at the intention-to-treat population and the per-protocol population, clearly favoring tailored ablation over anatomical ablation. And this also held true for patients with persistent atrial fibrillation of at least six months in duration. Now, let's move beyond atrial fibrillation. At HRS, we had to start with atrial fibrillation. So let's take the next step. So let's talk about PVCs. So PVCs are obviously very frequent. They're commonly encountered and often benign. But some patients go on to develop a cardiomyopathy. So we developed an algorithm to help us with this. So this was a convolutional neural network just on the 12-lead electrocardiogram. We took patients with a baseline normal echo and trained the algorithm to help us predict which patients would go on to develop a cardiomyopathy. Now, just a notion about PVC burden, which is the cornerstone of a lot of decision-making in terms of how aggressive to be with treating patients with PVCs, if you look at the C-statistic, which is comparable to the area under the curve, you can see that the PVC burden is a little bit better than a coin flip, a little bit better than chance at discriminating who's going to go on to develop a cardiomyopathy. And this is actually a recent paper in Jack EP that using a cutoff of a 20% PVC burden was not an independent predictor of developing a cardiomyopathy. So if we look at the algorithm, it performed, again, pretty well in discriminating which patients would go on to develop a cardiomyopathy. And when we take the binary output of the model, high risk or low risk, and put it into a multivariable Cox regression analysis, you can see that it was a very strong predictor, very strong independent predictor of cardiomyopathy development. In our study, the PVC burden was still an independent predictor. Now let's talk about something different. We mentioned atrial fibrillation and cardiomyopathy development, but what about genetics? So long QT syndrome and sudden death is obviously something worth addressing. Long QT syndrome is a heritable channelopathy, of course, that accounts for 5% to 10% of sudden cardiac death cases referred for autopsy. And the mutations underlying long QT syndromes, subtypes 1 through 3, have the strongest level of evidence for causal pathogenicity. So we developed a fusion model to predict which patients would have a pathogenic mutation on genetic testing. So a fusion model takes different models that perform different tasks and combines them to help with your prediction. So we took the electronic health record data, the tabular data, and we used something called a multilayer perceptron and fused that with a convolutional neural network for the electrocardiogram image to be able to detect which patients are going to have a pathogenic mutation on genetic testing. Now, we used the UK Biobank and then fine-tuned the model on the BioME Biobank, which is a Mount Sinai local biorepository. And you can see here that the model performed pretty well at discriminating which patients are going to actually have a pathogenic mutation. And so we've talked a lot about trying to predict future events. And sometimes these are future events that are quite far off in the future. What about near-term events? So this was a great paper that was recently published in which the authors took 14-day ambulatory ECG recordings in almost 250,000 patients across six countries. And they used the first 24 hours of the monitoring to predict sustained VT in the rest of the monitoring. The authors also developed a fusion model, this time using the inputs of a neural network for the heart rate density plot and the electrocardiogram tracing, as well as patient demographics, to flag patients as high or low risk for sustained VT. And not surprisingly, the model performed extremely well, almost too well, in discriminating which patients would have sustained VT during the two-week monitoring period. So we discussed a lot of things that we can do right now. But where are we going in the next five to 10 years? And so a lot of the challenge now is in the implementation science. How do we operationalize these algorithms? So this is a paper that actually just came out today in NEJMAI in which we use a convolutional neural network on the 12-day electrocardiogram to detect hypertrophic cardiomyopathy. We screened around 71,000 patients and had 1,522 positive alerts. Now, there's a lot to get into. But for the sake of this talk, I want to focus on two main points. One is that once we start deploying these algorithms, we're going to start to have work lists of patients who are all flagged as high enough risk by the algorithm that need to have some sort of action taken. And if you have these patients chronologically available, you're going to have all of your false positives mixed in. And so when you need to triage who you're actually going to expedite care for can be challenging. But you can take the continuous probabilistic output of the model and actually sort by that. And we demonstrated that by doing that, you will have all the highest risk patients come to the top of your work list. So this is a very pragmatic workflow consideration for clinical teams. But then what do you tell patients? So when models develop a classification, there's the binary output. But there's also the continuous one that we had just mentioned. However, that probability that the model spits out doesn't actually mean anything. Models are happy to classify any positive and negative predictions around 0% and 100%. So that number doesn't actually mean anything. And you can see here that the flags were all clustered around the upper end of the spectrum here. But if you calibrate the model, of which there are multiple ways of doing that, we used something called Plath scaling, which involves logistic regression, to take these probabilities and spread them out over the course of the spectrum of probabilities, such that the model's output will actually reflect a probability for the patient. So you can tell the patient you have an approximately 60% likelihood of having hypertrophic cardiomyopathy. So this improves granularity and interpretability for the clinical teams and also for patients for counseling. Another challenge is, what are we going to do with the false positives that I just mentioned? So one of the things that we're noticing is that patients who have artificial intelligence false positives, particularly in the ECG work that we've been seeing, have a higher cumulative incidence of developing the disease over the years of follow-up. So this is an example for aortic stenosis, in which patients had a higher cumulative incidence over 15 years of follow-up, who were classified as false positives initially. We replicated this in our study, both for aortic stenosis and mitral regurgitation, in which patients had a higher cumulative incidence of developing their respective valvular disease over five years follow-up. And beyond valvular disease, this has also been demonstrated in low ejection fraction. Again, just to highlight that this is going to be something that we will need to get together to inform how we should follow up with patients based on these false positive alerts. Now, another thing is, thinking about where we're going, we have to have viability to be able to incorporate these new technologies. And so reimbursement is really important. It's vital for technology uptake and provides financial viability for infrastructure to be developed to address all of these things that we were just mentioning. And so, at least here in the United States, CMS recently approved two CPT codes for AIECG algorithms. And so it's still very early days, but we're going to have to see over the next few years how we interact with payers and health systems to be able to responsibly deploy these types of solutions. And we have to really focus on responsible implementation. There are lots of pitfalls, which we're going to get into more in the discussions, but we need to be able to trace issues that come across during the course of treatment. Whether we're using language models and AI agents, where we need to know where in the process something could have gone wrong, was a language model trying to pull an echo report that was available, but it failed to do so. We need to monitor these models. As we start to treat patients based on these model predictions, we're going to be modifying the substrate of the patient populations from which these models were developed. And so we can see degradation in model performance. We, of course, need to focus on the ethical deployment and mitigate bias. But we also need to consider redesigning the systems and the approach to clinical decision making. And the reason why I say that is because all of the examples that I've showed for this entire talk makes the assumption that we are taking these novel approaches and we're using it for a historical task, trying to predict a singular disease process, detect a cardiomyopathy, detect atrial fibrillation. But it ignores the vast amount of data that's present in the electronic health record. And so really, effectively, what we're doing right now is we're really shoving a square peg into a round hole and we're ignoring all of these potential possibilities. And so one of the things that we need to consider is to take 20,000 steps back and think a little bit bigger as to how we're going to redesign systems that are viable, that are safe for patients, that are able to be deployed at scale. And so on that note, I will just highlight that everyone can join us at 4 p.m. where I'll be discussing graph neural network automation of anticoagulation decision making. And with that, thank you very much. Thank you for your attention. Thank you. Good afternoon, everyone, and thank you for joining us. Grateful for the opportunity to present before you my talk titled, Using AI and EP, What Are Its Advantages and Can We Trust It? These are my disclosures. So today I'll underscore the advantages and value of AI applications in EP, address trustworthiness as a keystone of adoption, and outline frameworks for building trust. The AI tools available in EP can be bucketed into two broad categories, discriminative AI and generative AI, although over time we're seeing that generative AI can perform discriminative tasks, so a significant overlap there is now developing. Dr. Lampert did a wonderful job of setting me up for my talk, so without belaboring the point, within discriminative AI, a lot of the models available currently focus on detection of arrhythmias, personalized risk prediction, not only when the patient is in the arrhythmia but also based on sinus EKG predicting risk of atrial fibrillation, ventricular arrhythmias, sudden cardiac death, and cardiomyopathies, as well as predicting prognosis with or without treatment for atrial and ventricular arrhythmias, and we saw a beautiful example of PVCs demonstrated by Dr. Lampert's work. There's also, on the coattails of studies like tailored AF and VMAP, a lot of tools being developed for on-table ablation decisions for AFib, VT, and PVC ablations, using both invasive and non-invasive mapping and guidance during ablative procedures. We're aware of Dr. Shinonova's work at Hopkins, looking at digital twin simulations and also using raw MRI data to predict the risk of sudden cardiac death in HCM. There are also tools available to guide and predict the outcomes of resynchronization therapies in EP patients, and on the generative AI side, a lot of the tools are looking at real-time clinical decision support, medical education of providers using augmented reality and extended reality, as well as health literacy tools for our patients, including shared decision making, consent processes, and lifestyle modification, for example, for atrial fibrillation. In fact, one of the largest studies we're undertaking here at HRS is focusing on studying agentic AI and its role for AFib education and guideline management for AFib patients. Ambient documentation has swept across most health systems in the country and is bound to affect, in a hopefully positive way, our workflow in EP clinics. Care pathways also built on generative AI models hope to close care gaps for patients by providing arrhythmia care earlier for early ablative strategies for AFib, lifestyle modification, as well as VT therapy, and finding patients who are appropriate candidates for devices and getting them down the right pipelines and early referrals to EPs as well. The ultimate North Star is to utilize clinical findings, biomarkers, raw imaging data, signals from remote monitoring devices, and patient biometrics to then perform precise diagnostics and prognostic implications and guide specific treatment patterns, both medical and procedural, for each of our patients individually. So as you can tell, AI is present in the entire continuum of arrhythmia care, from risk prediction to arrhythmia detection, to referral to EP, to treatment planning, execution of that plan on table during ablative procedures, and then long-term surveillance of patients who are at risk of arrhythmias. The ultimate dream of personalized care being realized once we take these population-level algorithms and really fine-tune them to the individual level. But as we know, as EP providers, presence does not always mean participation. So we do not know, at each of these stages of arrhythmia care, to what extent AI will participate in our patients' care in the long run. And a lot of it will be determined by the value that these technologies bring to the table. Unlike your typical drug and device sales cycles or product cycle pathways from discovery to randomized trials, to regulatory oversight, to CMS reimbursement, which is fairly well-crystallized, while for image-based and supervised learning-based algorithms we are getting there, as Dr. Lampert highlighted in some of the CMS codes, that same pipeline has not been crystallized as well for AI algorithms in clinical medicine. So we find a lot of health systems individually investing in technologies they find valuable. So that's an interplay between quality and cost. The costs of testing, validating, deploying these models, the cost per inference, the staffing required to maintain such technologies, the workflow implications of using AI models in our clinical workflow, and the resulting action that we take based on the outputs of these models, whether it's an intervention, whether it's a service performed for the patient, and how that translates into either revenue generation or savings from improving quality outcomes, decreasing hospitalizations, and poor outcomes down the line clinically for our patients. And now the next very loaded question. Can we trust AI in EP? So my answer is, not really, maybe. It's classified. So why the ambiguity? Why on the one hand this immense promise of AI in EP, but at the same time this reticence to accept it 100% in our clinical workflows? Well, that's because the definition of trust in AI is the willingness of people to accept AI and believe in the suggestions, decisions made by the system, share tasks, contribute information to it, and provide support to such technology. So does that mean that trust is a very soft metric and we can never measure or predict or control it? Well, not really, because trust in AI is not just a non-technical ethical consideration. Instead, it also includes various domains, including AI performance, transparency, explainability, compliance with legal and technical regulations. And AI is different from other automated systems in the sense that it can learn, it can behave proactively, unexpectedly, and incomprehensibly for humans. And so to build that trust in order to make sure that we are comfortable accepting this technology into our clinical workflows, we need to first understand the current paradigm of trust in AI. And we really need a shift in perspective from asking the question, do I trust AI in technology, to saying, well, what does make an algorithm trustworthy? What are the metrics that one needs to measure algorithms against in order to make it trustworthy? So as far as the current paradigm of trust in AI is concerned, you may have heard of this study where six clinical vignettes were provided to over 50 physicians, and the diagnostic reasoning outputs were compared between an LLM, LLM plus physician, and conventional diagnostic tools plus physician. And the LLM outperformed both the other arms. And that led to a lot of news coverage and headlines like this. However, the same group at Stanford performed a subsequent study that focused on clinical management reasoning. Now, this is the nuanced clinical decisions we make for our very complex patients. And there, as one would intuitively think, physician plus LLM outperformed both LLM alone and physician plus conventional tools. So very useful information, but again, some conflicting data that spans the spectrum of trust. Why this is pertinent to us is because the next step here is comparing LLM outputs with expert specialist recommendations. We know that there is bias within EP. We know that there's bias in referral patterns for AFib care for patients based on sex and race. We know that anticoagulation prescription patterns vary by race. And we also know that there are, unfortunately, a number of our patients from underrepresented ethnic groups who fall through the cracks when it comes to referrals for device and resynchronization therapy. So looking at the sociodemographic biases that exist in LLMs currently certainly gives us appropriate pause when it comes to accepting AI and EP. And there remain ongoing concerns for inappropriate responses, hallucinations, as well as bias. Now, in a recent state-of-the-art guideline statement by EHRA, which was actually co-authored by Dr. Ganbari here, and looked at developing a checklist based on an expert panel's input, found that less than 50% of the checklist items deemed important by the author committee as a litmus test for the AI algorithms currently in published literature and some being used had more than 85% or an appropriate level of reporting when looking at AFib risk prediction models, sudden cardiac death prediction models, as well as EP AI in the EP lab. So we can see that there's a lot of room for improvement when it comes to reporting exactly how these algorithms are developed, where the data comes from, whether there's open source availability of the data, and how reproducible these models are. So definitely room for improvement there. And when it comes to generative AI studies, most of the studies look at accuracy of question answering for medical examinations, while leaving out some very important dimensions that receive limited attention, including real patient care data, fairness, bias, toxicity, and deployment considerations. Because when we think about generative AI guiding our care pathways and direct to patient tools for lifestyle modification or behavioral interventions, we really ought to be thinking about these dimensions that are getting very little attention, as opposed to asking the question whether the algorithm can pass a medical exam. And why does trust matter? Well, it matters because the road from technology to adoption is paved by trust, and more importantly, the road from technology to sustainable adoption is paved by trust. You might have heard the old adage, culture eats strategy for breakfast, and I owe this next quote to Nigam Shah, workflow eats algorithms for breakfast. And I want to talk about a couple of terms that come up frequently when we think about trustworthiness of AI technologies, the first of which is interpretability. And it's important to understand that the definition of interpretability is different based on different stakeholders. So an engineer's interpretability focuses on the internal mechanics and the code of an algorithm, and even if the output has a diametrically opposite impact than what we might have anticipated or desired, an engineer could explain to you that the model is performing beautifully based on the data we fed it, which is really a reflection of how we are treating our patients. But a clinician's interpretability asks the question whether the output makes sense in terms of the pathophysiology of the disease and the clinical logic, and whether it's applicable and will actually impact clinical outcomes. And then there is the ethicist's interpretability, which asks the question whether the algorithm is fair, if it's transparent, if it causes harm or delivers injustice to certain members of our society. And you can't have a conversation about the clinician and ethicist's interpretability without having a conversation about bias. Bias is reported across discriminative and generative AI. But when we ask the question, well, when we think about the gold standard we have for current clinical guidelines, that is randomized controlled trials, we know that bias exists there. We know that we don't have diverse populations. We know that some of those randomized trials are not generalizable. We know that there is human provider bias in terms of recommendations provided to patients coming in with chest pain based on their race. So it's important to think about narrowing a use case instead of saying, I'm going to make a fair algorithm, but it'll perform poorly across the board, as opposed to having more narrow use cases that cater to must-have and must-not-have criteria for each of these algorithms versus the nice-to-have features. So when you think about achieving trustworthiness, I think about three main categories. Ethical appraisal, which includes frameworks that are developed by expert committees, academic industry partnerships for assurance labs and performance testing, trial registration, external validation, open source data and open source code, maintaining ethical rigor through fairness, bias testing, and then using Nancy Cass's mid-level ethical principles for a learning health system that include engagement, transparency, and accountability. And perhaps the most important component of trustworthiness is collaboration between clinicians, researchers, ethicists, patients, a real team science approach to develop these technologies responsibly and effectively. And of course, we want regulatory oversight to ensure that each of these components is being done safely and effectively. So I'll leave you with these essential takeaways. An algorithm is only as valuable as the end user's willingness to adopt it. Deployment of the end user is therefore essential, and this could be a clinician, this could be a patient, this could be a health system. Technical appraisal and proof of utility build trust, and ongoing demonstration of value is extremely important. We can do this through large RCTs akin to drug and device trials, measurement of outcomes, federated data sets, open source training models, and importantly, a cost and quality analysis before deploying these algorithms at scale. Trust is not a soft metric. It is synonymous with transparency and performance, and is a keystone to adoption. With that, I thank you for your attention. what's still needed for AI to improve arrhythmia care. Thank you so much. Perfect, thank you so much for inviting me here today and my task today is talk about what is still needed for AI to improve arrhythmia care. I'm gonna try to center my talk today on the practical experiences that we've, implementation projects that we've done at University of Michigan. These are my disclosures and as my esteemed colleagues alluded to before, there is great promise with AI to detect arrhythmias, to personalize diagnosis, predict risk and optimize workflow and procedures. But there is a lot of hype and we don't see a lot of value and the question is always why? If I have to leave you with one thing is this, that you have to have an AI strategy. What I see oftentimes is that people just chase the first shiny thing they see and without any coherent strategy about how this is going to serve the mission and the vision of the institution that they're at. We obviously have core strategies in research, culture, education, growth, quality and brand and every institution will have different core strategies. You have to have an AI strategy that advances those existing strategies to serve the mission and the vision of your institution. Once you establish that, the next thing you have to do is you have to establish an AI portfolio. What's the purpose of an AI portfolio? AI portfolio's job is to understand what use cases are actually valuable to your constituents. You have to build literacy through those use cases. You have to understand when should you buy, when should you build. You have to understand what it takes to adopt, what it should take priority and what is actual value that your constituents, the clinicians and patients want. Now that's not enough. You can't just have an AI portfolio. You have to be able to actually put it into practice so you have to have an AI operating model that manages the data, has governance, has change management, manages the technology and engineering. You can't have an AI portfolio without an AI operating model. You can't have an AI operating model without an AI portfolio. Now when I think about building an AI portfolio, which is what my office does mostly, I think about this two by two all the time. You have things that are everyday AI and then there's game changing AI like Josh was talking about. There's external facing ones, things that deal with patients and there's internal operation ones. You, when you're thinking about building an AI portfolio, the best strategy moving forward is to really focus in that left lower quadrant where low hanging fruits are there. When you execute projects in that space, it allows you to understand what is the capabilities, what kind of capabilities do I need to build and over time as you build expertise, you can move to the right upper quadrant which is the really exciting game changing things. Now in EP, there is certain core pain points that every clinician complains about and those I usually bucket them into like time pressure things like I write too many notes, too many pre-auths, signals are too complicated, I can't go through so many things and then I just have too many stuff. So if you think about those three core things, the key point is that AI is probably gonna save time before it saves lives and if you take that as the core thing and don't try to save lives and just think when can I use AI to save time, that's probably the lowest hanging fruit that you can go after. Now let's look at some of the projects that we've implemented. The best one I can talk about is the ambient clinical node generation which has been tremendous for our institution because clinicians have been happier, patient clinician interactions have been better and the way it works is that you get an automatic speech recognition system which takes a raw audio file and turns that into a transcript which then becomes an input that can generate other clinical documentation. And this is an example of a clinical documentation that's been generated to SOAP node where you get the conversation, that's the transcript, that's the output of an ASR, you extract relevant information, you cluster the similar ones together and then you generate a one line summary for subject, object, assessment and plan. Simple low hanging fruit that makes everyone's lives better and moves the mission of the institution forward. Now we're using it right now for clinical node generation where we have a few projects and device report creation and monitoring summaries for quality and you can see like how beneficial that could be to our group. Another pain point for our group is pre-authorizations for procedure, drugs and this is another very low hanging fruit that AI has, we've deployed this across the university and we've had so much success because the AI systems are excellent at looking at the EHR data, populating the forms, sending it to someone to review and then sending it there. We've been able to reduce our denial rate and get approval to, approval rates have gone up significantly and the whole operation has been improved significantly. Another example of where you could really advance the mission and the value of your organization forward. Another example is the patient portal messages, this is a, we have a patient portal message group that builds models for dealing with this problem as all clinicians in this room can attest to, that's a major problem for us. At night you go in, there's all these messages, you don't know who's what, you're always trying to send it to your nurse or to your anticoagulation clinic. This is an example of a natural language process, processing algorithm that assigns a label, a simple label to the message that comes in, allocates the pool, is it urgent, is it for a clinician, is it for refill, is it for scheduling and then routes it to the appropriate person. Just that this simple algorithm can reduce the time to respond significantly. We're building LLM models to actually generate a draft of a response as well which we think even can advance this further so we're very excited about this. Now the signal complexity stuff, EP was first to it and I think Josh showed so many great examples of how AI can be useful and I want to tell you good news is that we're using it all the time, just maybe you guys are not aware of it. So just some examples of we're using AI all the time that it's in practice right now. This was many years ago where we built an algorithm that could detect arrhythmias where you chunk up ECGs, you put it through a DNN and it outputs a rhythm and it could be really, really good for detection of arrhythmias and labeling of them and this is pretty much in every single wearable device that you're using that monitors heart rhythm. Another area that's deployed and it's working very well is taking a complex signal like a PPG. This was a study that we did that looked at 15 minute PPG intervals and then decide whether this was AF or not and you could see that it has tremendous performance for sensitivity and specificity over 96 and 98%. There's lots of issues at hand but these kind of algorithms are available in the wearable devices that we're using right now. Another example is the algorithms that denoise ECGs commonly used. You could take it, there's algorithms running on wearable ECG devices that take noise out and making the analyzable ECGs much more available and that's really helpful because there's more areas that, more intervals that you can analyze for detection of arrhythmias. And I think Josh alluded to this where you could, implantable devices, there's so many AI algorithms running in the background where you can, for detection of arrhythmias. This is one example that Josh had as well where you could take a whole ECG, combine it with a P wave sensing input, put it in a DNN and decide whether it's a-fib or not a-fib and it could significantly reduce the amount of false positives that you have. All right, the data overload problem. Now that's a problem that everyone talks about. We have so many ECGs, we have EGMs, we have wearables and we really, to date, we haven't been really able to deal with this problem very well. The first part of the problem is that data in healthcare is very complex, right? You have one patient that gets multiple forms of data, it's text data, there's CPT codes, there's labs, there's images, there are different systems, they're not talking to each other. There is waveform data. And at every time point, there's different kinds of data available. So over the past few years, we spent tremendous amount of time, money and effort to try to organize the data in a way that it can be AI ready. The data can't be organized in the way that we had previously organized. It has to be in a way that actually AI, especially the large language models, can interact with in an optimal way. Then there is this other set of data that we don't often talk about. It's specialized data that's not in the EHR but it's very specific to EP. Things like guidelines that are specific to your institution, there's specific trial data that you only have perhaps, device references, manuals, EGM pattern libraries that we have it in our head, but sometimes, but we don't have it in a cohesive way. And there's the knowledge base, like what does Dr. Gambari like for anticoagulation, for stopping before colonoscopy, things like that. But there's so many documents that are running around. And you have to have a way of combining all this data because LLMs, as most of you know, are trained and built on static knowledge. So their knowledge is a static snapshot of the data that they've been exposed to. So they don't have up-to-date and also proprietary information that's available to many people. So you have to have a way of combining the two. So this is the architecture that we've been using and it really allows you to combine the superpower of the LLM, which is this generative power of it, with the real-time retrieval of relevant information from an external source. So in a RAG model, you get a document ingestion module. So you have your proprietary documents and then you have a retriever module that queries the question that you have and asks the document, what is the answer? Once the answer is there, then it interacts with the generator module, which comes up with an answer, which is way more up-to-date. It's integrated with private and sensitive data. It's traceable and it's transparent. We're very excited about this architecture and we're deploying it across multiple pain points. One of the pain points that this has been very effective and we've been using is the clinical trial matching. So right now, I have a clinical coordinator that goes through the EHR every morning and there's some modules within Epic that does this to try to find matches for our clinical trials. With this system, you can use a RAG that can go back to the clinical protocol and then use an LLM to digest the EHR and find matches for your clinical trials. And if you compare it with a study staff, it's just as good. So that's a low-hanging fruit that you could be deployed. But this could be deployed all across AP and we have many, many different projects that are running. You could develop AI-driven alerts that use your proprietary data, the alerts that you have. You could build predictive analytics for arrhythmia recurrence using a RAG model. EGM interpretation is really interesting because you have complex EGM libraries that the RAG system can interact to and give you answers. And this one is probably the lowest-hanging fruit, device optimization. Patient comes in with a specific device, programming or an event, and you want to be able to make recommendations of what to do with that. So these systems are very good for dealing with these kind of problems. Now, obviously, there's lots of things to think about and worry about. You have to worry about clinical validation. You have to do prospective clinical trials. You have to make sure that it's explainable and you could trust it. You have to integrate it into the workflow and you have to make sure that you address the biases that are inherent to some of these problems. But I want to tell you that there's so much low-hanging fruit that AI can and should be integrated already in so many of the processes that we deal with. So thank you. Hi, thank you. I'm Alex Kushner, an EP at NYU. Is there any precedent on the legal implications of two scenarios? One, I, as a physician, rely on a language model to prescribe an anticoagulant. There's obviously no backup clause that you can put in a language model that says, use clinical decision-making, because the whole idea of the language model is that its convolutional neural network is smarter than me, able to see things that I can't see. So if I rely on that model and I prescribe, or don't prescribe, the patient has an outcome that's adverse, and who's responsible? Is it me as a physician for using the language model, or the creator of the language model? Question one. Question two, if you use a language model, like Josh had published a paper using neural networks to predict which patients with PVCs are going to develop a myopathy. As a physician, I decide not to use that. My patient goes ahead, has a PVC-induced myopathy, ends up in a hospital, ends up upset, and comes after me. Can they claim that there are models out there that I should be using? So two separate extremes of the same problem. One, using AI and being responsible for it, and then the second is ignoring AI and being responsible for it. But in all seriousness, you know, right now we were talking about, you know, If something bad happens while we're something that is now reflected in guidelines. Going to have events are not good. They've been grandfathered in to clinical care. And it's not our fault. This is the best that we had at the time. And in addition to that, these tools are providing patient-level. Right now, physicians are likely going to be held accountable, but once we establish That's when you're most liable, at least that's the most recent data. I don't know of any legal precedent cases, but I'm almost like... Not in use, not in use. And there's FTA, yeah. then the liability and the You're just saying here's what you should do. Then you turn it into an FDA pathway that is technology as a device essentially and that pathway is a lot more rigorous. Now I don't know exactly how rigorous we're going to get. The regulation should be not stifling but also not callous but as we're discussing that. Yeah, quick question about, I think a lot of us, we see a lot of inspiring examples of AI. How do we, how do you go back to your day-to-day and convince your team, the administrators, the people who have to pay for it, like, what do you recommend as the next step? Because I hear a lot of people want to use the solutions, but they're like, I can make a full-time job just convincing everybody to also implement it. So just curious about your learnings and recommendations of other institutions who might not be as far. That would be great. And you have to, I go back to it again, you have to think about what is the organization's mission. So for example, for us, it's clear, we have to advance health in Michigan and the world. So any project that I do has to serve that. And then we have strategic priorities. We have strategic priorities, for example, in research, we have strategic priorities on our patient satisfaction and quality and brand. So the projects that I just told you about, they serve that strategic priority. They help our patients become happier, they improve our clinicians' lives, they improve our access, and they are revenue positive. So the revenue positive part is really important because it makes it sustainable long term, because it's great to have these great projects, but if they are not, they're not capturing the value that you're creating. A lot of these create value, but you have to capture some part of it that makes it sustainable. That's how you move things, I think. That's the big part, that's the big challenge. And you have to think through these. There's a lot of people involved in implementation of these projects. You have to have the right folks in engineering and data. You have to have the leadership buy-in. And you have to have mechanisms for people on the ground to come up with projects. So, for example, we have grants that, up to $100,000 for people that have. mentor them, and then we set criteria for success. You have to improve these metrics that are important to our institution, and if they hit them, then we present it to our clinical operations folks who then decide to adopt them or not. So it's a lot involved. Every institution will have a different approach to it, but I think you have to think about this. Otherwise, you just follow the shiny object, right? Hi, this question is for Dr. Dande. In your talk, you spoke quite a bit about trustworthiness in AI. Any comments about AI hallucinations, and that's a real thing, and we all do observe it if you're using any chat bots and things like that to even make some day-to-day stuff. So do you think that's gonna be a problem, anything proactively being done, or is it gonna be part of the reason that you said humans plus AI rather than one versus the other by alone itself? guardrails with specific vetted resources, right, to not go outside of those bounds. Now, will that reduce risk to zero? No, but whenever I do. I think there's a lot more low-hanging fruit to be harvested, as opposed to saying will this cause harm to our patients. At least that's the way I think about it. for the impact that that technology might have. Hi, I have two questions. One is more technical. The other is more kind of on the ethos of moral responsibilities. As we discussed that physicians have the moral and fiduciary responsibility for all actions taken by AI at this point. So where do we go with disclosure? So when you're sending these messages which are probably drafted by AI, are you saying that part of this message may have been drafted by AI and how do patients react to that? I think there's some paper, which I think JAMA which was published recently about that as well. And the second one is more technical. How do we address a model drift? Because I think we are probably in the early stage. I think Dr. Lamperty talked about that. There was this paper by Dr. Faith and Annals which said that if we are trying to implement this, then there will come a time where your patients have already been modified by the action that you're trying to have here. So you're gonna run out of ground truths essentially. So it's probably very late, not right now, but how do we address that? So, two questions. So the, it's a great question to touch on your last point that the paper that you're mentioning is referred to as model-lead model. And essentially what it refers to is it's a simulation study when you take two models and you deploy one model and continue to retrain it after you start acting on it. Or if you concurrently deploy two models or sequentially deploy the models, the model performance gets worse. And this kind of goes back to the point as I was mentioning where we're changing the substrate in the patient population from which the models were developed. So this is going to be an area of investigation that we're going to have to see. It may not happen all the time. There are some very discrete tests that probably won't be impacted as much, but there are going to be other things like predicting 10-year atherosclerotic risk for major events that will likely be changed once we start throwing all these medications and now everyone's on a GLP-1. So we're gonna have to see how that impacts some of these things. So that's certainly something that we're going to have to just spend a little bit of time investigating. And quite frankly, we're still early days in being able to understand this. Isn't it like, Josh, if a model is working, it should drift, right? Right, the drift is expected because it's doing the job, so it's changing. And so I don't know if the drift is as much of a problem as we are worried about it. It's more about like monitoring the model performance over time. Isn't that what we're talking about here? I think you make a good point, but I think, at least my concern is, we are talking about. Hey, my name is Huy Nguyen from Geisinger Health Systems. In regards to your comment about objectives for your organizations, I would say the majority of organizations is increase profits and make more money and decrease your expenditures. So my question is on an economic basis. We are currently using ambient dictation. Amazing. There's probably not much of a need for paying scribes at $20 an hour. So as AI gets better and with these technologies, how is this going to affect physician economic value? And I'm not saying us losing our job, but I'm saying the perception of our value gets less because now we can do more in a shortened period of time. Or maybe there's one less EP that we need to hire. Or maybe because we know where the PPC is coming with PFA, it takes us a much shorter period of time. So as we get better in, say, 5, 10 years with the technologies that we're talking about, how do you guys see the perception of an EP's economic value going forward? You ask very easy questions. I think if you think of an economic value as a unit output per unit of time, because I think that's what you're referring to, then you, as models take away probably things that have lower value, you'll probably start doing more things that have more output per unit of time. So in a way, I think there's a potential for the unit economics to get better for clinicians. I don't think that will be true for all clinicians. I think for procedure-based clinicians, have a unique advantage in that field. I don't think that same will be true for primary care. Now, eventually, probably some task within EP also will change. But in the foreseeable future, I see that unit economic value for EP would increase with time. I don't know if you guys have a different. I agree with everything you said. I think if I had a dollar for every time I read, AI is not going to replace a physician. But a physician who uses AI will replace a physician who doesn't. And I believe in that. I think using these technologies to our advantage will increase our value. But that's only if we engage. And if there's one hill I'm willing to die on, it's that physicians really ought to engage. This is no longer an era where we find this technology interesting, intriguing. We want to change the future of AI and medicine. Therefore, we're involved. It's a categorical imperative. We need to be involved in the development, pre-deployment, deployment, and validation testing of these models if we want them to work for us. And there's going to be more things that you can't even think about. Right now, we're seeing an explosion in drugs from LLMs and the drug discovery platforms. There's so many candidate drugs that weren't there. So the need for clinical trials, for example, is going to explode in the next few years, where LLMs have been very helpful and useful for generating new medical devices. The need for clinical trials for those will also explode. There may be new clinical trials and drugs that we're not thinking about yet. So I think things are so dynamic right now that it's hard to project that far out. But I'm positive, mostly. Maybe that's just me. Also, particularly for a procedural specialty, I think it's going to be a long time before patients, even if the technology were available, to have completely autonomously perform procedures without a human in the loop. I think we're a very, very long way, if ever, from really achieving that. And that information is explicit. No one's writing it down somewhere for you that the machine can ingest. So it's really difficult to train models for those kind of specialties, I think. And frankly, even for non-procedural specialties, at least from the preliminary data from some of my work at UPMC, even when it comes to education and basic information about AFib, for example, that you're delivering to patients, they still want to see their physician, their provider, their APP. And a primary care physician, at this point, gets 6,000 inbox messages with certain health systems and inbox methods that the health system deploys. So expecting a human being to look through 6,000 messages a day is extremely brutal. And so if AI will triage them and clear out. What can you learn in terms of mechanisms from that? So for example, it's valuable that it's saying this patient is more likely to develop a cardiomyopathy, but can you also learn, look under the hood and see what it's seeing that's suggesting that that patient will develop a cardiomyopathy? Yes, so actually, great question. I actually cut that slide out because I thought I had two less minutes than I actually had. But one of the parts of that paper, so we used something called gradient-weighted class activation mapping, or GradCam, which essentially creates a heat map of feature importance that helps you identify what the model thinks is important to make its classification. Now, in the paper, we showed multiple examples of what was important, and it looks like all the QRS complexes show up in red. And that isn't particularly helpful. Of course, that makes sense. But what we did note is, one, the PVCs were not highlighted. And that's important because, well, if the PVCs aren't highlighted, it was the sinus rhythm QRS complex in the C-segment, not the PVC. So that means that there's some signal that we're either identifying vulnerable myocardium for which the PVC is a stressor that then the myocardium decompensates, or that there's an impending cardiomyopathy for another reason of which the PVC is a sign. Now, as part of our study, we looked at patients who happened to have undergone PVC ablation, and the majority of them improved their ejection fraction post-ablation. And of those patients, almost all of them completely normalized their ejection fraction. So we clearly did identify true PVC-induced cardiomyopathy. One of the examples that we highlighted in the paper is an inpatient with a left bundle branch block. Only the latter portion of the QRS complex was highlighted in red. So this makes physiologic sense for things like – sorry. So I'm mixing up two distinct points. So for the PVC purpose, the Grad-Chem 1 showed that the QRS complex in the C-segment sinus rhythm was important. So that raises the consideration of chicken or the egg. Vulnerable myocardium or impending cardiomyopathy. But we have other work where we can also identify mechanisms, such as the hypertrophic cardiomyopathy paper, in which we have a patient with a left bundle branch block that only the latter portion of the QRS complex is highlighted in red. And the reason why that makes, obviously, physiologic sense is for a disease like hypertrophic cardiomyopathy, where preferentially the left ventricle is impacted. With a left bundle branch block, you, of course, conduct down the hysperkinesia system, but the left bundle is blocked, so you conduct down the right bundle and then transeptically activate the left ventricle later. So that latter portion of the left bundle is the left ventricular depolarization. So we see physiologic signals that make sense. For PVCs, I don't think that we're ever going to get a discrete, singular mechanism for why it is that PVC, you know, causing that patient to develop a cardiomyopathy. But we do know that there's some physiologic signal, and we know that these can also be dynamic. There was prior work showing electromechanical abnormalities in the sinus beats presaging the PVC. So we do know that there's already been additional work that was not machine learning based that also is consistent with this, that there's something inherent about the myocardium, whether, again, it's vulnerable or an impending cardiomyopathy, that is resulting. Thanks, everybody
Video Summary
The video transcript revolves around a discussion on the applications of artificial intelligence (AI) in electrophysiology (EP), specifically focusing on arrhythmias. The speakers explain how AI, particularly convolutional neural networks, can aid in reducing false positive alerts in cardiac monitoring, improve disease prediction, and assist with tailored ablation strategies for conditions like atrial fibrillation. AI is also being leveraged to predict patient cardiomyopathy development from PVCs using advanced algorithms. The transcript discusses challenges related to AI’s trustworthy integration into EP, focusing on interpretability, bias, and the need for trust. The conversation underscores the importance of an AI strategy aligned with institutional goals and emphasizes building an AI portfolio to ensure responsible deployment. Several practical applications of AI in EP, such as ambient clinical note generation and pre-authorization process improvements, are highlighted. Complexities like data overload, the necessity of proper data management, and model drift are brought up. Legal implications regarding AI's role in decision-making and healthcare practice are also debated, suggesting the need for clear guidelines. Speakers call for physician engagement in AI development to ensure the technology augments medical practices beneficially. Despite potential challenges, there is optimism about AI’s capability to enhance clinical efficiency and patient care while expressing caution over the ethical and legal aspects of AI deployment in EP.
Keywords
artificial intelligence
electrophysiology
arrhythmias
convolutional neural networks
cardiac monitoring
atrial fibrillation
cardiomyopathy
interpretability
AI strategy
healthcare ethics
Heart Rhythm Society
1325 G Street NW, Suite 500
Washington, DC 20005
P: 202-464-3400 F: 202-464-3401
E: questions@heartrhythm365.org
© Heart Rhythm Society
Privacy Policy
|
Cookie Declaration
|
Linking Policy
|
Patient Education Disclaimer
|
State Nonprofit Disclosures
|
FAQ
×
Please select your language
1
English