Resovle PCC: Heterogeneity, risk factors and causal drivers of long-/post-COVID in large German population-based cohorts - towards personalized care

Funded by: Federal Ministry of Education and Research
Duration: 2025 - 2026 (24 months)

Aims of the project

Millions of people report persistent symptoms after SARS-CoV-2 infection, referred to as long-/postCOVID condition (PCC). PCC is difficult to define due to clinical heterogeneity and limited understanding of its etiology. In RESOLVE-PCC, we aim at answering the questions: 1) Which symptoms are causally related to the infection? 2) Which risk factors determine severity and persistence? 3) Are there distinct subgroups with different underlying disease mechanisms and treatment requirements? We will analyze two large and complementary population-based cohorts, the German National Cohort (NAKO) and DigiHero, comprising about 300,000 individuals. So far, 23,000 reported PCC. NAKO provides detailed pre-COVID characterizations, and DigiHero provides details on severity, persistence and trajectories of PCC symptoms. Both cohorts include infected non-symptomatic and non-infected participants as control groups.

Six institutions contribute their complementary expertise to (1) enrich databases by new derivatives of genetics, magnetic resonance imaging, health insurance data, and mental health, (2) harmonize PCC-data across studies, (3) apply and compare machine learning and artificial intelligence (AI) methods such as transformer-based methods or multitask learning to model relationships between symptoms and influencing factors, (4) extract and compare relevant factors from these models using Explainable AI, (5) analyze causal relationships between these factors and symptoms using approaches such as Mendelian Randomization, time series analysis, and counterfactual learning, and (6) disseminate models to university hospital patient care data via the Medical Informatics Initiative. Two workshops will ensure a common database, consented results, and integrated advice from clinicians and patient representatives. RESOLVE-PCC will improve understanding of possible drivers of the PCC symptom spectrum allowing to identify more tailored treatment and prevention concepts.

Contribution of the Department of Medical Bioinformatics

Our department coordinates subproject 4, where we aim to advance our understanding of PCC by developing and implementing a counterfactual machine learning approach. This approach predicts the occurrence of PCC symptoms based on a wide range of baseline and follow-up characteristics. The follow-up characteristics include whether the person had a confirmed SARS-CoV-2 infection, the severity of this infection as well as psychological personality and mental health traits (such as anxiety or depression). Notably, the data set also includes individuals which did not report a SARS-CoV-2 infection. In this sub-project, we aim to train a ML model including a wide range of baseline and follow-up parameters to predict the occurrence of potential PCC symptoms. We aim to apply counterfactual reasoning/learning approaches in order to analyze the causal influence of SARSCoV-2 infection as well as personality and mental health traits on the prediction model (WP1).

Counterfactual reasoning involves assessing what might have happened if different actions or decisions had been taken in the past. In the machine learning context, this entails estimating potential outcomes of scenarios that did not actually occur, relying on observed data. In our context this means, predicting the chance of occurrence of symptoms for an individual with a confirmed SARS-CoV-2 infection if the person would not have had the infection, or vice-versa. Likewise, the differences of models under different personality traits will be analyzed. Thus, we analyze the occurrence and severity of SARS-CoV-2, along with baseline characteristics such as demographic factors, comorbidities, medications, mood and personality traits, as potential influential factors for the onset of symptoms associated with PCC. Artificial intelligence (AI) interpretability methods will be employed to quantify the features with the greatest impact on predicting the occurrence of PCC symptoms (WP2).

Our objective is to refine the definition of Long- /post-COVID by identifying which symptoms are primarily influenced by SARS-CoV-2 infection itself and which are rather influenced by other factors not related to the infection. During our second workshop, comparative analysis with the results of other sub-projects will be conducted to validate and augment our findings (WP3).

Work Package 1: Counterfactual Machine Learning Implementation

Work Package 2: Extracting features relevant for long-/post-COVID development by AI interpretability methods

Work Package 3: Comparative Analysis

Coordination and Project Partners

Project coordinator:
Prof. Dr. rer. nat. Markus Scholz, University of Leipzig,
Institute for Medical Informatics, Statistics and Epidemiology

Partners:

Prof. Dr. Tim Beißbarth
Department of Medical Bioinformatics,
University Medical Center Göttingen

Prof. Dr. med. Rafael Mikolajczyk, Martin Luther University Halle-Wittenberg,
Institute for Medical Epidemiology, Biometrics, and Informatics

Nicole Rübsamen, PhD, University of Münster,
Institute of Epidemiology and Social Medicine

Prof. Dr. rer. hum. biol. Marvin N. Wright, Leibniz Institute for Prevention Research and Epidemiology – BIPS,
Department of Biometry and Data Management

Prof. Dr. rer. nat. Cord Spreckelsen, Jena University Hospital,
Institute of Medical Statistics, Computer and Data Sciences

Prof. Dr. med. Nils Opel, Jena University Hospital,
Department of Psychiatry

 

 

Follow us