Wanlin Li

Université de Sherbrooke
M.Sc. candidate

Supervisor: Nadia Tahiri
Start: 2021-09-01
End: 2023-08-31

Project

New algorithm to assess the environmental influence of Coronavirus through phylogeographic analysis
This thesis presents a comprehensive exploration of phylogeographic methodologies designed to unravel intricate interplays between divergence patterns within coronaviruses and relevant environmental attributes. The study encompasses the integration of genetic and climatic factors to discern the complex relationships underlying viral evolution and distribution. The research commences with the development of a Python-based phylogeographic analysis pipeline, facilitating the investigation of the relationship between genetic diversity and geographic distribution. The pipeline employs a sliding window approach to identify regions within viral genetic sequences aligning with regional climatic conditions. This unified system orchestrates a range of analytical operations and is cross-platform compatible, catering to various operating systems. Building upon this foundation, an application is developed, enhancing the reproducibility and accessibility of the analysis. Neo4j and Snakemake technologies are leveraged to empower researchers in data preprocessing, parameters tuning, results saving, and data visualizing. Real-world data, including genomic sequences, lineage information, population statistics, and climate data, are curated and integrated into the Neo4j graph database. To broaden the scope to encompass various Coronaviruses, the study incorporates Host-Virus cophylogeny analysis, horizontal gene transfer analysis, and other strategies, thereby enriching the research landscape. Moreover, the study addresses scalability and efficiency concerns, crucial for accommodating expanding datasets, and evolving research requirements. The enhanced workflow facilitates parallel task execution, significantly boosting performance. The outcomes highlight key fragments correlating with specific environmental factors, reinforcing the platform's utility in deciphering complex evolutionary dynamics. As a result, this research makes a substantial contribution to the field of phylogeography, providing researchers with a powerful toolkit for investigating species distribution patterns and environmental influences. The insights derived from this study have the potential to reveal fundamental principles governing the interplay between genetic variation and geographical attributes across various species.