false
Catalog
Heart Rhythm O2
Presents: Innovations in Di ...
Heart Rhythm O2
Presents: Innovations in Di ...
Heart Rhythm O2
Presents: Innovations in Digital Health
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Thank you all for coming to this session. This session is Heart Rhythm O2 Presents Innovations in Digital Health. Each year the Heart Rhythm O2 Journal presents a best of session. This year we decided to focus on digital health and these were the papers that were voted on by the associate and section editors as being notable. So we're really happy to have these speakers with us today and along with me, next to me is Sunit Mittal from Valley Health Services, is it called services or Valley Health what? System. System, Valley Health System in New Jersey and Hamid Gambari from the University of Michigan who are all part of our journal. Hamid Gambari and Sunit are specifically individuals focused on digital health and digital technology and innovation so we're really excited to be able to have a great conversation with our speakers today. Okay, so our first speaker is Bert Vandenberg from Leuven, Belgium, how was that? And the topic of the paper or the title of the paper is Concerns on Digital Health from a Cardiac Implantable Electrical Device Remote Monitoring Clinic Perspective Results from an International Survey. Thank you very much. It's a pleasure being here, thank you for the invitation. So to start with, I think everybody who runs around here in this area and at this conference knows that digital health applications are everywhere nowadays and it includes a lot of things, medication adherence, diagnostics, but we also use it for clinical research and for lifestyle modification and the digital health market revenue is increasing and it's gonna be piloting much more in the next decade. And so mobile health and digital health involves our, well, our remote monitoring that we do in a device clinic is part of the mobile health and the digital health. And kind of intuitively, new mobile health applications automatically flow towards the remote monitoring device clinics. But there are no standardized workflows or reimbursement models for the work that is done with regards to those new applications. And so the aim was to, what is the additional workload associated with the digital health? What is perceived by the ones operating in these device clinics to, how do they feel about it and what is their experience from this? And so in 2022, at the end of 2022, we performed an international survey with the help of the Heart Rhythm Society Digital Health Committee. And the primary focus was on remote monitoring and the organization of the device clinic. And the target population included all device clinic staff, so both physicians and allied professionals. And the survey included 40 questions. So first, a lot of remote monitoring, the device clinic organization, local reimbursement and local funding. And then there were a few questions on the qualitative aspects of digital health in the device clinic. And this is what we will be talking about today. So with regard to responses, we had a total of 548 responses, of which 204 were incomplete. So when looking at those 344 remaining, we had three from third-party services and a few from industry. Those were excluded as well. And then 63 actually didn't really respond on any of the digital health questions. So those were excluded for this part of the survey analysis. So which results in 276 respondents from 249 unique centers. And when we look at the respondent characteristics, first of all, and I think that's very important for the message, is that there was a good distribution between physicians and allied professionals. So 56% of respondents were physicians, both adult or pediatric electrophysiologists. And the remaining 44 were actually allied professionals with a different background. A lot of the, well, the majority of respondents were Northern American, about 60%. 63% of the device clinic were located in a hospital. As we know, some clinics in the world still don't have funding for the device clinic, for the remote monitoring, as we do in Belgium, for example. We do not have any funding for remote monitoring. 17% actually use a third-party remote monitoring service. And overall, 80% of patients with a device were on remote monitoring. So when we, the first question that they actually received was a ranking question. And there were nine predefined responses, and respondents were asked to order those, rank these from the most concerned, the biggest concern to the least concern that they have about digital health in their clinical practice. And this filing plot actually shows this filing plot actually shows the response. And we ordered them from most to least important, based on the median rank and the frequency that a certain rank, a certain question, or no, certain item was actually ranked first or second most important. The green bar is the median, the red is the quartile one and quartile three. And so the biggest concern that people in the device clinic actually have is a data deluge, followed by the lack of billing workflows and analysis of the wearable tracings. Then there is about the organizational part, where it's uploading to the electronic medical record, the frequency that a patient data transmits, and then the difficulty working in the different platforms with the lack of centralized data platforms. Interestingly, the fact that those digital health applications can give incorrect diagnosis or that we have to train and educate our patients was actually ranked least important, which is, I think, a very interesting finding. When we asked them about the clinical impact of digital health, we summarized it in this figure, and I'm gonna talk you through it. So overall, 50.2% of respondents said that digital health applications improve patient care, which is a good thing. But it's only 50%, only half. When we look at the workload, 73% actually says that there is an increased workload with those digital health applications. And then there are two quadrants that are very interesting, where we see that patient care has improved, but the workload remains equal or has decreased. There's only 5%. So most of the patient improvement is actually also associated with an increased workload. However, 22, one in five respondents actually stated that the patient care remained equal, but the workload has increased with the digital health applications. So when we look at this, I think we can say that the device clinic in its entire system perceived major concerns with the increasing use of the digital health applications. And we can identify a few structural and organizational barriers that need to be addressed. And when we go over them, first of all, there is a lack of reimbursement. And we know that digital health is prone to digital health inequity because of socioeconomic disparities. Not everybody can afford these digital health applications. We need to establish the clinical benefit of a digital health application. And we have to, as a society, look at which applications merit reimbursement, for example. We don't have to do that by brand, because that's gonna be a battle for the companies. But I think we have to do it by field. Where can we gain the most clinical impact with our digital health applications? And then, if we have reimbursement, we can acquire adequate staffing. The second major thing is the data workflow and interoperability. We need to be able to integrate this data in electronic medical records. And for that, it would be useful to have universal data platforms where different companies can actually titrate or give their data to us in a standardized way for an easy integration in our medical records. And the third thing is the digital health literacy, which is training of medical and allied health professionals where that should start earlier on in medical training, in nursing training, and then there's the education of the general public. And so, to conclude, I think we can all say that the potential of digital health seems limitless. But the current workload versus clinical benefit balance is rather poor. And successful integration of digital health requires some improvements with regards to reimbursement and staffing. And then, we also should collaborate with various medical disciplines because it's not because we cardiologists feel that we can improve our patient. A respirologist might have the same feeling. And they use also digital health applications. And so, everyone can benefit from a good structure with standardized workflows and data platforms. And then, the digital health literacy. Thank you. So, we are going to have questions now that can be submitted or you can pop up to the microphone. So, you can just stay there and field questions. I just wanna point out, we have a couple of our other associate editors in the room. Nazima Koum in the back, who's also committed to digital health. And Mary Gleba sitting close by. So, we have a lot of support here today from the journal. And for the rest of you, thank you again so much for being here. But regarding your wonderful presentation, I just get so overwhelmed by the amount of questions I don't, I just get so overwhelmed every time I think about remote clinic and remote interrogations. We all know that there's great data showing the benefits to patients and early diagnosis of ICD, pacemaker problems. But on the end of the user, us, it's really a burden when we think about every single week, the tasks that we have, even though it's been screened by our nurses or remote systems of some sort, you know, the workload involved. And so, I think that this is one of the most important aspects affecting a lot of EP clinicians, including our allied professionals who often pick up the bulk of the work. But I'm just wondering what some of my colleagues here sitting with me might have to say, or those of you in the audience. I completely agree and I wanna congratulate you. I love seeing these qualitative studies because it really gets to the heart of the question a lot of times and informs a lot of questions and answers for us when we try to implement these in clinical practice. My question for you was, can you speak a little bit about like how you develop those questions? And I mean, we all have perceived notions of what the problems are. But I wanted to know like how you develop those question domains and whether you did any qualitative analysis, qualitative interviews and to explore whether there are things that we're not thinking about. So, yeah, excellent question and I must admit that. So, the questions are based on some iterative process between the people who initiated the survey. And so, we didn't really do qualitative interviews upfront, for example, go talk and for example, the ranking question, you could also identify a certain bias in there because most of the things could be interpreted negatively, which may result in a certain, well, a certain pre-selection for a question, of course. So, it is true that from that part of the survey, we could have done a better effort maybe. But in the end, we cannot forget. So, the practical issue was also that the first part of the survey was really focused on how the remote monitoring clinics operate. And we wanted to touch on the digital health from how do they do it at this moment. And so, that was not the primary objective of the survey either. So, but I'm happy that we asked those questions because it was very, very relevant actually, yeah. Go ahead, at the microphone, please identify yourself and where you're from. Go ahead, go ahead. Well, thank you, excellent presentation. My name is Alexis. I'm from the Center for Heart Rhythm Disorders in LA. One of the things I was surprised about and maybe also a little bit concerned about is that the accuracy of the data that we're getting was not really a concern for the respondents. While I think this is a very important topic, we are relying on these tools more and more often. So, I think we should receive accurate data. That's one point. And the second point is that if you have inaccurate data, this will raise more alerts, which will also lead to the data dilution. There's definitely a link there. So, have you any idea on why the concern about the data accuracy was so low? Well, I think we shouldn't over-interpret because it's a ranking question. So, it doesn't mean that it's not a concern at all, but it was the least concern from all the points that we identified or gave them to them. And I think that, for example, we also didn't include any questions about certain specific mobile apps, right? Because if you would go to a certain, if you say my Apple Watch, if we look at the EKGs from the Apple Watch, they're quite good quality and we make diagnosis on them. So, are we then concerned about the quality and the accuracy or is it with the new apps, depending on what they measure, that we are concerned about? And so, I think the accuracy is also depending on the signals that we receive the most, and at least in our clinic, those are still EKG and PPG signals. And yeah, a lot of patients now come in with a bunch of their recordings, but when you ask them whether they measured their blood pressure, then they say no, right? And so, that is one of the bigger issues I feel that is coming up, that patients, they're focusing themselves on their mobile apps, but they're forgetting what other aspects are important. Yeah. Yeah, I actually share your concern as well because to me, I think my personal sense is what people are interpreting that question to mean is that, you know what, I can get an ECG signal and when I look at that ECG signal, it looks like an ECG and from that, I can determine what the rhythm is. But it doesn't get to the fact that 30% are read as unclassified by the system itself or are misclassified and that's, we've accepted somehow that work and I was actually disappointed to see acceptance of that because I think that's what really does need to improve so that we are actually have really good accuracy of what the algorithms are reporting so the work on our end is actually reduced in some way. So, I actually share your concerns. But I think if we're gonna reimburse certain applications, that's a certain threshold we can put in there that they need a certain level of accuracy that we need some harder data about their quality that they're giving us because there are a lot of applications that give us rubbish and that we are not able to work with. That's the simple truth, of course. And then if we're gonna select certain areas where we can improve, then we can put in some quality criteria but I definitely agree with that, yeah. I actually want to follow up on that. So, when you talk about digital health, are you talking about all the apps that are out there and all the tools that are out there or only the ones that are actually approved by, for example, the FDA? The survey was overall, right? So, we didn't specify anything from a certain app or a certain area. We didn't even specify in cardiovascular disease in the survey. It was more like a general interpretation of digital health. But we know, of course, that the remote monitoring clinics, when they receive those signals, that's usually in our own area of expertise, yeah. This really also gets to the matter of trust, right? So, at what point is there gonna be a high enough level of trust that we would be able to move towards truly alert-based remote monitoring? It seems like we have a bit of work to do to get to that point. Any other questions from anyone? Were there any on the pad there? No, no, none? Okay, great. Thank you for that really wonderful presentation. I really appreciate it. Thank you. Okay, the next one is Jake Berquist and Jake is from the University of Utah. And the title of his paper is Performance of Off-the-Shelf Machine Learning Architectures and Biases in Low Lep Ventricular Ejection Fraction Detection. Thank you, Jake. Yeah, there it goes. Okay, thanks for coming, everybody. So, like I said before, also, the disclosures, I needed to update the ones on HRS. I have no relevant disclosures. So, thank you again for inviting me to talk about this paper that we put together on the performance of off-the-shelf machine learning architectures. So, just as a start off, these are our major conclusions from the paper. I'm not gonna read them off immediately. Just, I'm gonna tell you, I'm gonna focus on these first two because I think they're the ones that I can make an illustrative case for to show you why this is an interesting study. So, here's a 12-lead ECG. Well, here are some of the leads from the 12-lead ECGs, particularly the first two limb leads and then the precordial leads. These are actually the leads that are typically recorded in any of your electronic health systems because the other leads can be mathematically derived from these eight. And so, what I'm gonna do here is I'm gonna put up two sets of leads and all the clinicians in the room have immediately already started to make diagnosis and assessments of these. I'm gonna ask you to say, can you tell which of these two leads, because they came from two different patients, was from a patient with an ejection fraction over 40 and an ejection fraction under 40? Okay, everybody look at it for a couple of minutes. Have your guess. Okay, here we go. Here's the results, right? So, the one on, let's see, your right is the one with the patient with an ejection fraction under 40 and the one on the left was one over 40. Now, some of you may have guessed correctly, some of you may have guessed incorrectly, but this is a kind of the classic example that emerged a little while ago now in the field of a case where we could take these ECGs, feed them to a machine learning algorithm, and it can actually successfully diagnose this. And that was a little bit surprising because it's not always clear in traditional ECG analysis when a patient is having mechanical function from this purely electrical signal. Now, in the field of machine learning, there's a lot of different architectures, a lot of different structures, and a lot of these have been developed before they were applied for ECGs for things like image classification, right? So, you'd have two pictures here, a cat and a dog. You'd have the question on, if I'm looking at any of these particular pictures, can you classify it as a cat or as a dog? And you'd plug it into one of these variously developed machine learning algorithms. And our question was, you know, we could go and develop our own machine learning algorithms, our own specific architectures for dealing with ECG signals, but there's already a lot of work and development done in this image processing space, so can we just jam ECG signals in there and see what happens? And in doing so, we need to consider what we're actually doing, what data we're actually feeding in. So, when I say an ECG, I'm not actually meaning a paper tracing, right? There's been several talks throughout this conference that have shown people developing machine learning algorithms actually for dealing with images of the ECGs, but for this application, I mean we're going to take the ECG signal, and we'll zoom in on one of them, and we're actually going to extract the raw signal tracing out of the electronic data record, and we want to represent that in a way that's amenable to these image-based models, because then we can use all of these various off-the-shelf pre-made architectures. So, what do we do in that case? Well, if I was to look at this one individual lead here, we can essentially encode the amplitude sort of like color, right? So, where the high color, or high amplitude is one color, and the low amplitude is another color. By then doing this across the different leads that we have recorded, we can essentially construct a ECG image. This image can now be fed in to these standard ECG, or these standard image-based machine learning networks, and we can ask our questions. So our workflow looks something like this. We have the 12-lead ECG signals, which we pair down to the eight mathematically unique ones recorded from our electronic data repository. We can then take those ECGs and convert them into an image, plug them into the machine learning model, and in our case try and detect left ventricular ejection fraction. So the data set that we worked with, we had a pair of 12-lead ECGs and ejection fraction readings from echocardiograms for 24,000 patients from our University of Utah health system. And with that data we split it into a 90% training and a 10% testing set. And the idea was to use a number of different pre-made image-based machine learning architectures. So we went into the field of image-based classification and we grabbed the most popular architectures that were open source, could readily be just downloaded and immediately ran. There was a little bit of pre-processing that we had to do, so those networks expect to get an image, which is essentially a red, green, blue width by height, and our ECG is not that. And they output a set of features, so we had to do a little bit of pre-processing to get the ECG into kind of the right shape, and a little bit of post-processing to get it down to a single output on which we could do binary classification. And then we went ahead and ran that and we got to our first conclusion, which is that these off-the-shelf machine learning architectures that were designed for image analysis could be readily applied to ECG analysis with favorable results. So across the bottom here you can see the different machine learning network architectures that we tried, ResNet 1850, AlexNet, DenseNet, SqueezeNet, VGG, these are all the creative names that the computer scientists who designed them came up with. And so for each of these I'm going to show you the box plot of area under the curve performance. So if we look at the output that we are zoomed in to the 0.85 to 0.95 range, these networks all perform pretty similar to each other. And in each case I'm going to show you the results of doing five replicates where we retrained the network five separate times to account for variability. So ResNet 18 gave us an area under the curve between 0.91 and 0.92, and the thing to note here is that at the time of when we were publishing this, the most state-of-the-art networks architectures and machine learning of this same task that were coming out of the Mayo Group was also getting kind of around this 0.91, 0.92 range. So already using this off-the-shelf network we're already meeting that expectation. And I've arranged these in a specific pattern so it's going to decrease in performance as we go across these with ResNet 18 showing the highest performance and VGG showing the lowest performance. But already these are within a pretty tight band of each other. So these networks under the hood look very different from each other. Obviously the two residual networks to ResNets are somewhat similar, but all of these other off-the-shelf networks are approaching this image classification problem very differently and yet still getting pretty robust results in this ECG classification task. Now area under the curve is just one of many metrics. So we of course looked at several others including F1 score, sensitivity, and specificity. Across all of those the ResNet 18 was the highest performer with a specificity, so much more specific than sensitive, of 0.95, sensitivity of 0.68 or 0.63, and an F1 score of 0.58. But that was just the primary result. We actually had, you know, once we had done this and shown that these machine learning networks could replicate the performance of more specific and tailored ECG-specific networks, we then wanted to ask a question of what biases might be happening with these machine learning networks. So if I was to look at the output possibilities, right, we could have a positive case or a negative case because it's binary classification, and we could have a positive guess or a negative guess. And so within that we have true positive, false positive, false negative, and true negative. If we split that up into the correct guess group and the incorrect guess group, we can then think about the next analysis where we want to look at a suite of different patient characteristics and understand if the architecture performed better in the correct group or the incorrect group. So what I'm going to show you is I'm going to show you a table of p-values, the statistical analysis. And if the p-value is significant, so less than 0.05, I will either color it as red, which means that the value of the patient characteristic was higher in the incorrect group, or blue, where the value was higher in the correct group. So this is a subset of the many different patient characteristics and comorbidities that we chose to analyze, and I just picked out some of the ones that I thought were interesting. So for example, across all of these different networks that we tried, age was always higher in the groups that were incorrectly classified by our network, right? So that means a failure mode of our network was that it tended to do worse on older patients. On the other hand, when the patients were female, it tended to do better across all of the networks homogeneously. Now it starts to get a little bit less homogenous here when we start looking at things like comorbidities, hypertension, and diabetes. There were some networks where it wasn't a significant difference between patients with or without these comorbidities and their performance, but a lot of them did. If we then zoom out to a few more diseases, chronic kidney disease, liver disease, COPD, we're seeing a lot of red. They're either seeing no significant difference or significantly worse performance when they have this comorbidity. And so if we then just flash out to the rest of the chart, what I want you to take away from this is that patients who have these comorbidities tend to result in poorer performance by our machine learning prediction of low ejection fraction across all of the networks for the most part. So interestingly, body mass index was not associated with any bias across any of the networks. And so with that, I'd like to, hopefully I've encouraged you, when you're doing these machine learning networks, I think this kind of analysis of looking beyond just the initial metrics to seeing where the failure modes are and where the biases may lay in the performance of these networks I think is an important next step for the field to look into. And with that, I'll thank all the co-authors and people who have supported this work, and I'd be happy to take any questions. That was really great. Thank you. So again, I encourage people to pop up to the microphones. I see Dr. Akoum is going to come on up and has a question for you. This was great. Thank you for dumbing it down to a level that I actually can ask you questions. This is a dangerous thing. So I loved how you changed the EKG amplitude into a color signal, but you were showing us strips, so that to me looks more like a cartoon rather than a picture. It's like a movie, not a picture, because you have to take individual beats to call it a picture. So I'm just curious how it would perform if we just took one beat and also broke it down from the eight and see which one of the eight are actually carrying the weight, and this kind of can carry over to wearables and other forms of EKG data. And then how easy it is or how feasible it is to kind of go backwards and see, like you mentioned clinical risk factors, but how about the actual model itself that you used? Where can you tweak it to improve performance? So on the first point of thinking about what we input into the model, so I agree, we're showing more than just one heartbeat, and there are actually more contemporary machine learning approaches that look both at individual heartbeats, but then also sequences of heartbeats and leveraging the fact that this is a time series data, so we could think about it more, like you say, like a movie of multiple heartbeats. And they've been able to exploit that to get nominal increases in the performance and address various other conditions. So I agree that that's definitely an interesting step to take in pursuing these. I do think that these networks, because we are giving it the multiple data, it is in theory extracting information across multiple beats and using the inner beat interval and RR intervals types of metrics, although assigning exactly what a machine learning network is doing is a little bit risky. The other thing I'll say is we have actually done studies about subsetting this data, of saying, okay, what if we only gave it one lead and did each lead at a time? And what we found in that study, we did the exact same thing with a slightly different network based on our findings here. We found that V1 and V2 had the best performance, that were almost as good as using the full 12, or the full eight of the 12. And that, you know, we presented those results at a computing and cardiology conference, so there is a paper describing that. And that definitely brought up conversations of wearables and what's the ideal choice for a wearable for this particular device. As far as your second question of flipping it the other way, what knobs we can tweak, it's difficult, but I think that in the context of thinking about how we can look at these biases that it has of like, oh, it's performing poorly in this group, maybe let's tell the machine learning algorithm to use a multimodal fusion to say, we're going to input the ECG and patient characteristics to try and help it correct for the fact that it suffers from biases in those patient characteristics. Go ahead. What role does the number of samples have in the result that you get and the accuracy? So in other words, what if you pumped three million ECGs into that? Would you expect over time that that accuracy would increase, or does that just show that maybe something needs to be tweaked, like you just talked about, like maybe comorbidities needed to be added? So it's definitely, you know, in the world of machine learning, more data is better. But there's likely to be diminishing returns at some point. So in our electronic health record database, we do have something like two million ECGs. They don't all have ejection fraction as a label, but when we use more data, we tend to get better results in the downstream task to a point. Eventually it gets to a point where you're not seeing enough uniqueness in the data depending on what task you're addressing that you need to try other things, like either augmenting the data or trying to add more information in besides just the ECG. And eventually, theoretically, at least my theory is that we will, you know, at some point we will reach the ceiling of performance, where there's enough noise in the system that you're not going to get better even if you had, you know, infinite data, because you're going to reach the ceiling of beyond which it's kind of random chance. I don't think we're there yet, but I do think that that's something to keep in mind as we're always chasing, you know, the next decimal place of improved performance. Can I just ask you a quick audience with a quick answer? What's the impact of rhythm, AFib versus sinus, on the performance of the algorithm if the endpoint is low EF? I know AF was one of the comorbidities that we addressed. As far as its prediction accuracy, I mean, there was, for most comorbidities, we saw a poorer performance, and I think that was one of the main discussion points that we had in the paper is if the patient had, you know, was more sick, then the algorithm tended to perform poorer on it. We would have to do a little bit more specific digging. The representation of the number who had AFib wasn't necessarily, you know, very high in this particular data set, but they certainly were there, so. Two questions, actually follow up on the questions that you got from the crowd. The first one was the question on the architecture reshaping in the beginning. As you explained, you take, you know, a time series data, and because you're using models that use photos, you kind of have to cram it into it. And like some of the models that you use, like DenseNet or VGG, they require reshaping right before you go into it, so you lose some of that temporal coherence. So do you think some of the performance decreases because you lost that temporal coherence that's inherent to the ECG signal? Certainly. I certainly think that's the case. You know, and that was kind of the tradeoff we were making here, of do we want to go in and tweak DenseNet or VGG a little bit to make it more amenable to this, or do we want to just use it as is? I think that's part of the discussion point of, like, none of these networks were designed with ECGs in mind, and therefore some of their lack of performance, you can't blame them, right? They weren't designed to deal with this type of signal. We were just interested to see, okay, barring that, if we mangle the data a little bit getting in there, it's still pretty impressive that it's able to do this well, even with some of that loss of coherence. And I think that there's definitely a distinction between none of the ones that we had to do that mangling for were in the top three. And then the second question that was raised there on the number of patients they use, I was very impressed that with 22,000 patients you were able to get performance that's similar to, you know, 45,000 to 70,000 patient studies that were done. So did you do any learning curve analysis where you take small samples and then you retrain it for each one and to see where you plateau and what the curves look like? So because what I'm asking you that is that maybe you could do it with 2,000 or 5,000, right? You may not need to go to 20,000. That would be super valuable for a small hospital that wants to, for example, you know, do this. So we have done a little bit of that analysis as part of kind of like use, we've been using this LVEF problem as our whipping post to benchmark all of our other machine learning activities. And with that has included subsetting the data and subsetting the data. And I think what we've found is that the LVEF problem is not actually very hard for the machine learning algorithms. And so they can get away with less. I don't want to speak robustly to how much less, but, you know, less than what we showed here, it can still get pretty far along the way. The problem, the caveat that I would add to that is that when you give it less data and if that data becomes more homogenous, right, you have a small hospital system and you might see a more restricted population base, the network is not going to be able to generalize well like, say, you get a patient who's not well represented by that space. And so I think when we're doing this subsetting going forward, it's important to be smart about how we're subsetting. Instead of just randomly picking certain, picking patients that at least span the distribution so that when we train the network, it still has the ability to generalize as widely as possible while still doing so on more limited data. Great. Thanks so much. That was really awesome. Great. And before I mention our next speaker, I just want to ask if our last speaker, Dr. Iben Zalchik has — oh, do you want to take a seat at the end here? I think there's a chair for you. That way we'll know that you're here. Great. I'm glad you came. Okay. Let me introduce our third speaker then, Dr. Megan Turchio, and she will be speaking on associations between atrial fibrillation, symptom clusters, and major adverse cardiovascular events following catheter ablation. She is — go ahead. Great. Thank you very much, and thank you for the invitation to speak today. So I'm Megan Turchio. I'm an assistant professor at Columbia School of Nursing, and the work I'm going to be presenting today has really been the culmination of several different papers and projects, and we're kind of excited to bring you through all of that culminating in the paper that was in Heart Rhythm 02 here in my funding and conflicts. So this project is really focused on atrial fibrillation symptoms. We know that atrial fibrillation symptoms are extremely variable. There's wide variability in what patients may experience from very cardiac-specific symptoms like palpitations to more general symptoms like malaise, fatigue, and as many as 15 to 30 percent of patients will be totally asymptomatic, and this symptom paradox in which symptoms do not approximate well with cardiac rhythm is part of the challenge. So we know that there can be patients who are in normal sinus rhythm and still telling us that they're having a lot of symptoms. There can be patients who are having a heavy burden of AFib and totally asymptomatic. And so for this reason, we sort of tend to use other measures, more reliable measures when we're trying to prognosticate outcomes. We tend to use things that are a little bit more concrete and quantifiable, but it's important that we don't overlook symptoms because there has been some work showing that symptoms may be predictive of outcomes. This is just two examples here. There was an analysis of the ORBIT-AF registry that showed that patients who had more burdensome AFib symptoms actually had a significantly higher risk of hospitalization and a borderline higher risk of major bleeding, although not death. And similarly, there was a study within the RACE-2 study that looked at similarly the patients who had more burdensome AFib symptoms had a significantly higher odds of a composite endpoint of cardiovascular morbidity and mortality and cardiovascular hospitalizations. And I also just want to introduce this concept of symptom clusters for a moment. So symptom clusters sort of look beyond an isolated single symptom because we recognize that often symptoms actually co-occur within a patient, right? And there actually can be patterns when you look across a population at what types of symptoms tend to co-occur with each other. And so there's actually been a lot of work done in the cancer space on symptom clusters. So for example, there has been some work showing that specific symptom clusters are associated with odds of hospitalization and death. So in order to bring that back into the AFib space, we wanted to look at what symptoms are co-occurring within AFib to identify these co-occurring symptoms or clusters. And particularly, we were looking at patients who were going to be undergoing a catheter ablation for atrial fibrillation. So this was at the time of presenting for the ablation. And then we wanted to determine whether there was any prognostic value in looking at the associations between symptom clusters and post-ablation AF-related major adverse cardiovascular events or MACE. So to do this, we actually used electronic health records. Patient report or patient reported outcomes of their symptoms would be the gold standard without a question. But it can be very challenging to get patient reported symptoms at scale, particularly when you try to collect this data longitudinally. There's a lot of missingness over time. And so in order to be able to do some machine learning work, it's helpful to have a large data set, as we just heard from my colleague. So we looked at electronic health records. At our institution, we had about 33,000 EHRs for AFib patients. And then we captured it in a common data model. This particular one is known as OMOP. And so most of these OMOP tables that you see here, like demographics, diagnoses, medications, are in sort of a structured format already. So with a little bit of cleaning and aggregation, you're basically ready to go with your analysis. However, we knew that symptoms are primarily documented in notes. And so we had to actually apply natural language processing to get those symptoms out of the notes. So we used a very simple rule-based NLP technique. We validated that it was functioning accurately with an F-score of 0.81. And then we used this to extract 10 symptoms from the notes of patients who were going for ablation. We really wanted to look at a broad range of symptoms rather than just sort of like two or three of the most commonly evaluated symptoms because we recognize that there can be a very wide range of this. So we wanted to look at things like anxiety and dizziness and dyspnea as well as things like palpitations. So we first applied an unsupervised machine learning technique in order to identify clusters within the data. This is a way of identifying kind of hierarchical clustering. And then we created our composite MACE endpoints. So this was AFib-specific ED visits, AFib-specific hospitalizations, stroke and or death within a year of post-ablation. And this was all captured from electronic health records. And then we just did some simple linear regression to look at associations between these symptom clusters and the MACE endpoint, adjusting both in unadjusted models and adjusting for age, sex, and race. So our final cohort was almost 1,300 patients. We had, they had a mean age of 65, about one third were female, about 60% were white. And about half of the sample actually had comorbid heart failure. When we did some hierarchical clustering, you can see this is a dendrogram showing the relationship between the clusters. And there were kind of two main branches that emerged. But as you kind of move down, you can see that there are further branchings that show closer clustering of some symptoms. And after we applied a number of different model metrics to evaluate model performance, we identified that six was the optimal number of clusters. And so this visualization shows those six clusters. I'm just going to take a moment to kind of walk through this because this is a very busy visualization. So essentially what you're seeing is the 10 symptoms that we were interested in and the prevalence of those symptoms within each of the clusters. And each line represents a different cluster. So for example, there's a red line here and you can see that there's a pretty high prevalence of many symptoms in the patients who are in this cluster. And so we ended up relabeling this cluster as kind of generally burdensome symptoms. These are patients who are reporting a lot of different symptoms. Similarly, there was a group that was characterized by the hallmark of having a higher prevalence of dyspnea and edema compared to some of the others. There was a group that is showing in the dotted green line, that's a group that had 100% of the patients in there had chest pain. Similarly, the dotted blue line, 100% of the patients had anxiety and so on. So this helped us actually be able to kind of characterize what were the hallmarks of these different clusters? Why were these clusters kind of hanging together? And then our natural next question was, okay, so who are these patients? Like who are the patients who have generally burdensome symptoms compared to this hallmark of anxiety as the most prevalent symptom? And we looked at just some basic demographic characteristics of age, sex, race, and heart failure status. And we did find some differences. So for example, there was a higher proportion of female patients who were in either the broadly symptomatic group or the anxiety group compared to the asymptomatic group. There was a higher prevalence of patients who were in a racial category other than white in basically all of the symptom clusters compared to the asymptomatic group. And there was actually a lower prevalence of patients with comorbid heart failure in certain symptom clusters compared to the asymptomatic group. And you can see that the highest prevalence for the heart failure cohort was asymptomatic. And I think this makes sense when we step back and think about the fact that this was a cohort of patients who were undergoing ablation. And so it's possible that they were undergoing the ablation for another reason other than symptom management, such as for a mortality benefit. And then finally, we of course wanted to look at major adverse cardiovascular events, or MACE. And so one third of the patients had an AF-related major adverse cardiovascular event with an ablation. When we dug a little deeper and looked at what exactly was driving this, it was primarily re-hospitalizations within a year. It was very rare that patients had death or stroke within a year, and even ED visits without a subsequent hospitalization were relatively rare compared to the hospitalization. So even though this is a composite endpoint, we can really think about it as being primarily driven by the hospitalizations. And we did see, again, some differences here when we look at who was having these MACE events compared to who wasn't. So the AF-related MACE group was slightly older, a slightly higher proportion of them were black or African American, and a slightly higher proportion were female. And then lastly, we wanted to look at associations with the symptom clusters that we had identified. And so looking at the six symptom clusters, and here generally symptomatic was our reference group, we found that actually the anxiety cluster was associated with a lower odds of MACE within a year, and it was about 40% lower odds. And similarly, the fatigue palpitations cluster was significantly associated with a 40% or so lower odds of AF-related MACE within a year. So in conclusion, AFib patients' preablation symptoms vary widely. We kind of saw that borne out with the different clusters that we were seeing, and there was some association with the post-ablation major adverse cardiovascular events. But I think we really need to be doing a little bit more digging to disentangle what's going on here and why. What could be driving these differences? One of the working hypotheses that we have is that because this MACE endpoint was really kind of representing hospitalizations, could it be that there are differences in the way that patients are looking at, are seeking healthcare after ablation that are driven by their symptom experiences? So it could be that patients who have one or two non-emergent symptoms were actually getting managed better kind of proactively by the EP clinic and less likely to have to go back to the ED or get re-hospitalized compared to those patients who are a little bit more complex and broadly symptomatic. It's also possible that illness perception is tied up in this a little bit. So illness perception being kind of the perception you have of yourself as somebody who's living with a chronic illness and how symptoms play into that. And then actually the actual ability of somebody to perceive symptoms. And we know that there have been some interesting differences in how well patients perceive their own bodily sensations and symptoms and whether they're actually acting upon those. So all of this could be kind of wrapped up in this phenomenon, this connection that we're seeing. But certainly we think future work is going to be needed to further understand these clinical symptom phenotypes and potentially be able to use them to further guide and refine clinical decision-making. Thank you very much. That was great. Thank you so much. Okay, open for questions, both from the audience as well as up here. Go ahead. So there was a real interesting talk yesterday from Dr. Sears about ambiguity. And I think that plays a part into this. And I think educating our patients prior to ablation about what to expect after ablation could possibly help with some of these things. I think some of the patients that go in for ablation see it as this, I don't know what the term would be, panacea, that I'm gonna have this ablation and it's gonna fix my heart failure, obesity, smoking habit, whatever, depression. They feed all of these things into this one procedure that's handling one piece of the puzzle, whereas their atrial fibrillation, as we know now, seems to be as much of a sign of this chronic inflammatory process. So just feeding back, I just think we need to have these more frank, honest conversations with patients about what to expect after the procedure. Thank you. I would completely agree with that because we've done some work with patients. We've actually developed a decision aid in another project that I'm leading to guide patients through this question of ablation versus antiarrhythmic medication alone and what to expect with an ablation. And to do that, we did some preliminary qualitative work with patients and that was one of the things that was really seen over and over again with not just patients but also the clinicians who we interviewed was that they felt like patients weren't fully prepared for the potential for there to be recurrence or symptoms after the ablation and then that caught them off guard and was leading to sort of undesirable downstream effects. So a lot to unpack there. I have a question for you, but I'm gonna save it till afterwards because unfortunately we're running a little bit. Lights are gonna go on. But thank you so much. That was really fantastic. So we're gonna round out this with our last speaker which is Dr. Ivan Zelchevic from the University Hospital Dubrava in Croatia. And I don't know if I said any of that correctly, but anyway, the title of your paper is The Role of AI in Atrial Fibrillation, Informing, Evaluating, CHAT-GPT-4's Correctness in Patient-Focused Informing and Awareness for Atrial Fibrillation. Thank you. Thank you so much for the kind introduction. You said closely, right, everything. So the topic of my speech is The Role of AI in Atrial Fibrillation, Informing, mostly focusing on the evaluating CHAT-GPT-4 in answering questions from patients who have atrial fibrillation. So what was our main thinking regarding this topic and this study was that large language models are really evolving rapidly and the application in the healthcare is really, really huge and is expanding on a everyday basis. On the other hand, you have the atrial fibrillation which is the most common arrhythmia worldwide. And most of the guidelines are focusing or saying that the AFib management should be patient-centered. And this is included within the first pages of the new European era guidelines on AFib management. So we come to the aim of the study which was to assess the CHAT-GPT-4 ability to inform patients about AFib with a wide range of questions from understanding the disease to optimizing care. So we used CHAT-GPT-4 because at that point, the beginning of the last year, it was the most advanced and sophisticated model that was available. We used it because we tested it, this model in our previous studies and it showed the best results when compared to CHAT-GPT-3.5 or Google BARD as now known as GEMINI, especially in conveying complex medical information to patients. The questions, we had 108 questions within 10 categories and most of them were sourced from real-life patient interactions, mostly within the outpatient clinic and follow-up of atrial fibrillation patients. These are the 10 categories from basics of the atrial fibrillation to sign and symptoms, treatment modalities, PVIs, comorbidities which are related to atrial fibrillation then to lifestyle management, daily life, stroke prevention and different concerns. Regarding the evaluation of the responses from the CHAT-GPT, we had five categories including accuracy, comprehensiveness, clarity and stability, relevance to clinical practice and the last but not the least, patient safety. We also did the repeatability protocol so we choose two questions out of each category and gave it two weeks afterwards to the CHAT-GPT to see if the two responses resemble enough. Regarding the review and the scoring of the responses, we had three senior cardiologists, EP specialists with more than 10 years of clinical work and they reviewed the responses collectively so we didn't have the inter-observer variability. These are the results. So the best score, the highest overall score, perfect 10, CHAT-GPT gained in the category lifestyle adjustments. Close behind were daily life and management and comorbidities and complication and the worst was but not bad at all with 8.3 were signs and symptoms. So as you can see in this table, the lowest was 8.3 and most of the categories were around 9.0 and higher. Regarding the readability assessment because these are the responses for the patients, the range was between 34 to 58 and the mean score was just above 40. The highest mean was for the lifestyle adjustment and the daily life and management which was rather as expected. Just to make more clear, FRESS, if you have like the newspapers are written with FRESS of score between 17 and 80 and the high school books and the university books are written with scores between 30 and 40. So the readability assessment of the responses were maybe quite low if you think about the patients but the lifestyle adjustments with the highest mean of 52 is pretty good. And as expected, comorbidities, complications, you cannot probably use that like an easy language to explain it. And then we come to conclusion. This was one of the most comprehensive evaluation of GPT-4 ability to inform. Covering broad spectrum of queries, the ability performance tends to vary but it can provide very informative and very relevant answers across different categories. The categories which involved straightforward advice and guidelines had higher scores. The ones that need a deeper understanding or integration of complex medical data had lower scores but still higher than eight. And future research should focus on refining the models regardless if it's GPT-4 or anyone else to handle complex medical inquiries. I would like to thank all my co-authors and especially Professor Andrei Novak who is the mastermind of the studies with artificial intelligence that we do in my center. Thank you so much. Thank you so much for that really nice paper. Again, it's open to any of you or to the audience. I can ask a question. So you used, from what I understand, the temperature setting of one. Yes. Did you change that hyperparameter to see if your category scores change and did you do any sensitivity analysis around the different temperature scores? Yes, but only for a couple of questions within categories and then the performance varies a lot. If you push towards two, then it goes higher but then the first goes really, really low. That was just your final data that you showed. Honestly, that's not so bad for a CHAT-GPT-4 to be able to give sort of lifestyle modifications and such as that's often the information that we want to apply and we're often, you know, asking our, again, our allied professionals who I've referred to a number of times today already for everything that they can do to help but often to really have an impact on patients with that and if we can decrease the workload a little bit, giving materials to pre-read before they come to clinic and be able to rely upon that, then it may actually turn into better clinical communication and care. Yeah, definitely with the O3 thinking models, yeah. Go ahead, Jake. So I was wondering, is the use case for this in the context of like having the patient be the one to interact and like ask the questions or is it like just for preparing material for the patient to engage with ahead of time, right? Because if it's the patient who's, you know, if the intention is like to say, patient, here's either a CHAT-GPT or something curated like it, right? Then I wonder how much, you know, your questions I imagine were a curated set and patients are not likely to ask very well-posed questions. Okay, so every query was started with the, I was diagnosed with atrial fibrillation but it was written by either allied professional or the doctor. So it wasn't written by the patient but we were collecting for a month, we were collecting the patient questions in the outpatient clinic. So we tried to resemble as we could the direct questions from the patients but it's still, it still could be that this bias from the doctors who were interacting with the GPT influenced the results. Because I just wonder, I worry that the, you know, we all know CHAT-GPT can hallucinate and say crazy things very confidently and, you know, if you ask the question the right way, you can give it to, you can get it to point in a very different direction with its answer. And I wonder, you know, an assessment of how to, you know, maybe antagonistically interact with it and how robust it is to those sorts of things. If you can answer that quickly, you can or you can have this conversation after we end the session because I think we're gonna have to end. So hold your thoughts there. If the authors of these fantastic four wonderful papers would be willing to hang out for a little bit for anybody that wants to have further conversations and come on up and ask questions, that would be wonderful. But I want to thank all of you for your wonderful presentations. They were really just first class and we're really glad that we chose these four to really highlight. And thank you to all of you who stayed late into the afternoon. I think that you're gonna be seeing many more papers of this nature coming out of Heart Rhythm 02. Thank you for all of your support and have a great what's remaining left of Heart Rhythm Society 2025. Thank you.
Video Summary
The Heart Rhythm O2 Presents session focused on innovations in digital health and highlighted notable papers chosen by the journal's editors. The session included diverse topics related to digital health applications, focusing on the benefits, challenges, and potential improvements within cardiac care.<br /><br />Firstly, Bert Vandenberg discussed concerns about digital health from the perspective of a cardiac implantable electrical device remote monitoring clinic. The presentation highlighted the increased workload associated with digital health, emphasizing the need for standardized workflows, reimbursement models, and data interoperability.<br /><br />Jake Berquist's presentation focused on the evaluation of off-the-shelf machine learning architectures in detecting low left ventricular ejection fraction from ECGs. The study found that existing image-based machine learning models performed favorably in ECG analysis, although there were biases based on patient characteristics like age and comorbidities.<br /><br />Megan Turchio explored associations between atrial fibrillation (AF) symptom clusters and major adverse cardiovascular events after catheter ablation. Her research noted that specific symptom clusters, such as anxiety and fatigue-palpitations, were associated with a lower risk of adverse events, suggesting the need for further research into clinical symptom phenotypes.<br /><br />Finally, Ivan Zelchevic evaluated ChatGPT-4's effectiveness in informing AF patients. The study revealed varying performance across different categories, with the AI showing particularly strong results in providing straightforward lifestyle and daily management advice.<br /><br />Overall, these presentations underline the promising future of digital health technologies in improving clinical care while also recognizing the existing challenges that need to be addressed.
Keywords
digital health
cardiac care
machine learning
ECG analysis
atrial fibrillation
ChatGPT-4
remote monitoring
data interoperability
symptom clusters
cardiovascular events
Heart Rhythm Society
1325 G Street NW, Suite 500
Washington, DC 20005
P: 202-464-3400 F: 202-464-3401
E: questions@heartrhythm365.org
© Heart Rhythm Society
Privacy Policy
|
Cookie Declaration
|
Linking Policy
|
Patient Education Disclaimer
|
State Nonprofit Disclosures
|
FAQ
×
Please select your language
1
English