Characterization of long COVID temporal sub-phenotypes by distributed representation learning from electronic health record data: a cohort study

Abstract
Background
Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or PASC has been challenging due to the multitude of sub-phenotypes, temporal attributes, and definitions. Scalable characterization of PASC sub-phenotypes can enhance screening capacities, disease management, and treatment planning.
Methods
We conducted a retrospective multi-centre observational cohort study, leveraging longitudinal electronic health record (EHR) data of 30,422 patients from three healthcare systems in the Consortium for the Clinical Characterization of COVID-19 by EHR (4CE). From the total cohort, we applied a deductive approach on 12,424 individuals with follow-up data and developed a distributed representation learning process for providing augmented definitions for PASC sub-phenotypes.
Findings
Our framework characterized seven PASC sub-phenotypes. We estimated that on average 15.7% of the hospitalized COVID-19 patients were likely to suffer from at least one PASC symptom and almost 5.98%, on average, had multiple symptoms. Joint pain and dyspnea had the highest prevalence, with an average prevalence of 5.45% and 4.53%, respectively. Interpretation We provided a scalable framework to every participating healthcare system for estimating PASC sub- phenotypes prevalence and temporal attributes, thus developing a unified model that characterizes augmented sub- phenotypes across the different systems.
Citation
A Dagliati, ZH Strasser, ZS Hossein Abad, JG Klann, KB Wagholikar, R Mesa, S Visweswaran, M Morris, Y Luo, DW Henderson, MJ Samayamuthu, BWQ Tan, G Verdy, GS Omenn, Z Xia, R Bellazzi, JR Aaron, G Agapito, A Albayrak, G Albi, M Alessiani, A Alloni, DF Amendola, F Angoulvant, LLLJ Anthony, BJ Aronow, F Ashraf, A Atz, P Avillach, PS Azevedo, J Balshi, BK Beaulieu-Jones, DS Bell, A Bellasi, R Bellazzi, V Benoit, M Beraghi, JL Bernal-Sobrino, M Bernaux, R Bey, S Bhatnagar, A Blanco-Martínez, CL Bonzel, J Booth, S Bosari, FT Bourgeois, RL Bradford, GA Brat, S Bréant, NW Brown, R Bruno, WA Bryant, M Bucalo, E Bucholz, A Burgun, T Cai, M Cannataro, A Carmona, C Caucheteux, J Champ, J Chen, KY Chen, L Chiovato, L Chiudinelli, K Cho, JJ Cimino, TK Colicchio, S Cormont, S Cossin, JB Craig, JL Cruz-Bermúdez, J Cruz-Rojo, A Dagliati, M Daniar, C Daniel, P Das, B Devkota, A Dionne, R Duan, J Dubiel, SL DuVall, L Esteve, H Estiri, S Fan, RW Follett, T Ganslandt, NG Barrio, LX Garmire, N Gehlenborg, EJ Getzen, A Geva, T Gradinger, A Gramfort, R Griffier, N Griffon, O Grisel, A Gutiérrez-Sacristán, L Han, DA Hanauer, C Haverkamp, DY Hazard, B He, DW Henderson, M Hilka, YL Ho, JH Holmes, C Hong, KM Huling, MR Hutch, RW Issitt, AS Jannot, V Jouhet, R Kavuluru, MS Keller, CJ Kennedy, DA Key, K Kirchoff, JG Klann, IS Kohane, ID Krantz, D Kraska, AK Krishnamurthy, S L’Yi, TT Le, J Leblanc, G Lemaitre, L Lenert, D Leprovost, M Liu, NH Will Loh, Q Long, S Lozano-Zahonero, Y Luo, KE Lynch, S Mahmood, SE Maidlow, A Makoudjou, A Malovini, KD Mandl, C Mao, A Maram, P Martel, MR Martins, JS Marwaha, AJ Masino, M Mazzitelli, A Mensch, M Milano, MF Minicucci, B Moal, TM Ahooyi, JH Moore, C Moraleda, JS Morris, M Morris, KL Moshal, S Mousavi, DL Mowery, DA Murad, SN Murphy, TP Naughton, CT Breda Neto, A Neuraz, J Newburger, KY Ngiam, WFM Njoroge, JB Norman, J Obeid, MP Okoshi, KL Olson, GS Omenn, N Orlova, BD Ostasiewski, NP Palmer, N Paris, LP Patel, M Pedrera-Jiménez, ER Pfaff, AC Pfaff, D Pillion, S Pizzimenti, HU Prokosch, RA Prudente, A Prunotto, V Quirós-González, RB Ramoni, M Raskin, S Rieg, G Roig-Domínguez, P Rojo, P Rubio-Mayo, P Sacchi, C Sáez, E Salamanca, MJ Samayamuthu, LN Sanchez-Pinto, A Sandrin, N Santhanam, JCC Santos, FJ Sanz Vidorreta, M Savino, ER Schriver, P Schubert, J Schuettler, L Scudeller, NJ Sebire, P Serrano-Balazote, P Serre, A Serret-Larmande, M Shah, ZS Hossein Abad, D Silvio, P Sliz, J Son, C Sonday, AM South, A Spiridou, ZH Strasser, ALM Tan, BWQ Tan, BWL Tan, SE Tanni, DM Taylor, AI Terriza-Torres, V Tibollo, P Tippmann, EMS Toh, C Torti, EM Trecarichi, YJ Tseng, AK Vallejos, G Varoquaux, ME Vella, G Verdy, JJ Vie, S Visweswaran, M Vitacca, KB Wagholikar, LR Waitman, X Wang, D Wassermann, GM Weber, M Wolkewitz, S Wong, Z Xia, X Xiong, Y Ye, N Yehya, W Yuan, A Zambelli, HG Zhang, D Zo¨ller, V Zuccaro, C Zucco, SN Murphy, JH Holmes, H Estiri. “Characterization of long COVID temporal sub-phenotypes by distributed representation learning from electronic health record data: a cohort study”, eClinicalMedicine 64:102210 (2023). doi:10.1016/j.eclinm.2023.102210