Department of Mathematics
Development of Statistical and Machine Learning Methods for Large Scale Genomic Data Analysis
This is an ITF project with the industry partner WeGene. Through the WeGene platform, genotype data have been collected from tens of thousands of samples as well as their phenotypic information such as human height and 2D facial images. In this study, it aims to develop statistical and machine learning methods for human genomic big data analysis, such as identification of risk variants for different phenotypes in East Asia population, prediction of human adult height, reconstruction of human 2D facial shape from genomic data.
The integration of the WeGene IT architecture and the newly developed machine learning methods, such as dimension-reduction, regression model, and deep neural network, enables high-performance computational environments to handle genomic big data and reveal the link between human genomic data and human complex phenotypes. The application of the techniques developed in this project not only facilitate health management (risk prediction for complex traits and disease), but also provide new tools to connect genomic data with imaging data.
Related research publications:
- X. Luo, C. Yang, Y. Wei. Detection of cell-type-specific risk-CpG sites in epigenome-wide association studies. Nature Communications, 10, 3113. 2019.
- Z. Yuan, H. Zhu, P. Zeng, S. Yang, S. Sun, C. Yang, J. Liu, Xiang Zhou. Testing and controlling for horizontal pleiotropy with the probabilistic Mendelian randomization in transcriptome-wide association studies. 11, 3861. 2020.
(852) 2358 7462
- Associate Professor, Department of Mathematics