Loughborough University
Project details
In this project, we aim to use biostatistical and AI tools to identify key risk factors that protect specific cancer-free individuals from developing cancer compared with individuals at high risk. Typically, this group is unlikely to be found, let alone to be observed longitudinally. The ACCURATE project aims to identify cancer avoiders from population-level cancer screening data that contains health, social status, lifestyle, and genetics-related information. Our approach will address why some high-risk groups will be cancer-free and how they’re ageing with a lower incidence rate of cancer. By constructing longitudinal observations, we can monitor, identify and study a cancer-free cohort, ultimately helping to devise cancer prevention strategies for the general population.
To complete the project, several steps are essential: First, collect and clean the datasets, potentially candidates include the Surveillance, Epidemiology, and End Results (SEER) data in the US, which contains information on cancer patients from 1973. In addition, the National
Family Health Survey data from India, starting from 1992, also includes blood analysis information. There might also be other suitable datasets to explore. Second, merge the datasets using different algorithms, including matching algorithms, deep learning, and probabilistic linkage. Third, the merged dataset will be analysed using various biostatistical and AI techniques. The convenience and snowball sampling methods might be needed, as might subsampling. During the process, novel approaches will be developed to mitigate complex patterns (e.g., incomplete observations, competing risks, internal consistency across different countries and participants, issues arising from fuzzy matching in the multimodal data). The outcome will lead to the developing of novel multidisciplinary techniques and decent publications.
To apply for this job please visit www.lboro.ac.uk.