Development of a natural language processing pipeline for assessment of cardiovascular risk in myeloproliferative neoplasms

August 8, 2024

Andrea DuminucoJoshua Au YeungRaj VaghelaSukhraj VirdeeClaire WoodleySusan AsirvathamNatalia Curto-GarciaPriya SriskandarajahJennifer O’SullivanHugues de Lavallade, et al.

A central feature of myeloproliferative neoplasms (MPN) is an increased risk of cardiovascular thrombotic complications, and this is the primary determinant for the introduction of cytoreductive therapy.1 The landmark ECLAP study in polycythemia vera (PV) patients, showed cardiovascular mortality accounted for 45% of all deaths, with a thrombosis incidence rate of 1.7/100 person/year and a cumulative incidence of 4.5% over a median follow-up of 2.8 years.2

Natural language processing (NLP) is a branch of machine learning involving computational interpretation and analysis of human language. CogStack (https://github.com/CogStack), is an open-source software ecosystem, that retrieves structured and unstructured components of electronic health records (EHR). The Medical Concept Annotation Toolkit (MedCAT), the NLP component of CogStack, structures clinical free text by disambiguating and capturing synonyms, acronyms, and contextual details, such as negation, subject, and grammatical tense, and mapping text to medical Systematized Nomenclature of Medicine–Clinical Terms (SNOMED-CT) concepts. This technique is known as “named entity recognition and linkage” (NER+L). MedCAT has previously been used and validated in many studies to structure EHR data across a range of medical specialties for auditing, observational studies, de-identifying patient records, operational insights, disease modeling, and prediction.38

We employed our NLP pipeline, Cogstack, and MedCAT, to determine the prevalence and impact of cardiovascular risk factors upon thrombotic events during follow-up. We used Cogstack to retrieve outpatient hematology clinic letters and hematology discharge letters. MedCAT was then used for NER+L of relevant clinical free-text to respective SNOMED-CT codes that were determined by two hematology specialists. The base MedCAT model was trained unsupervised on >18 million EHR documents, and this was further fine-tuned using a 80:20 train:test split with 600 clinician-annotated MPN-specific documents. Total SNOMED-CT code counts were aggregated and grouped by individual patient, a unique threshold count was then applied to “infer” presence of the respective SNOMED code. In this process, hematology specialists read through clinical documents and manually highlight correct words or phrases detected by MedCAT that correspond to the SNOMED concept of interest.

Read more

Cardiovascular Risk in Philadelphia-Negative Myeloproliferative Neoplasms: Mechanisms and Implications—A Narrative Review

by Samuel Bogdan TodorCristian IchimAdrian Boicean, and Romeo Gabriel Mihaila

Abstract

Myeloproliferative neoplasms (MPNs), encompassing disorders like polycythemia vera (PV), essential thrombocythemia (ET), and primary myelofibrosis (PMF), are characterized by clonal hematopoiesis without the Philadelphia chromosome. The JAK2 V617F mutation is prevalent in PV, ET, and PMF, while mutations in MPL and CALR also play significant roles. These conditions predispose patients to thrombotic events, with PMF exhibiting the lowest survival among MPNs. Chronic inflammation, driven by cytokine release from aberrant leukocytes and platelets, amplifies cardiovascular risk through various mechanisms, including atherosclerosis and vascular remodeling. Additionally, MPN-related complications like pulmonary hypertension and cardiac fibrosis contribute to cardiovascular morbidity and mortality. This review consolidates recent research on MPNs’ cardiovascular implications, emphasizing thrombotic risk, chronic inflammation, and vascular stiffness. Understanding these associations is crucial for developing targeted therapies and improving outcomes in MPN patients.