Yifeng Tao photo 

Yifeng Tao

Quantitative Researcher
Tel: 412-320-1617


Hello, I am a Quantitative Researcher at Citadel Securities.

I earned my Ph.D. in Computer Science from Carnegie Mellon University. My Ph.D. research focused on machine learning in computational healthcare. I was fortunate to be advised by Prof. Russell Schwartz. In my first two years at CMU, I collaborated with Dr. William W. Cohen and Prof. Xinghua Lu. I worked with Prof. Jianyang Zeng during my undergrad. I hold a master's degree in machine learning from CMU and a bachelor's degree in automation (double major in economics) from Tsinghua University.


Research: Machine Learning in Cancer Genomics

Cancer proceeds from the accumulation of genomic alterations, and develops into heterogeneous cell populations in an evolutionary process. Therefore, the prognoses of cancer patients, such as survival profile, metastasis, and drug response, are encoded by the large-volume genome data. Our research focuses on the personalized medicine of cancer with machine learning and phylogenetic models (Thesis):
  • Reliable phenotype inference of cancer through well-designed interpretable machine learning models. By leveraging the power of large scale genomic data and external biomedical knowledge base, we have been working on deep learning models for the accurate inference of cancer phenotypes, including transcriptome expression levels (Genomic Impact Transformer; GIT), transcription factor activities (Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism; CITRUS), and drug resistance (Contextual Attention-based Drug REsponse: CADRE; Deep Learning-based Graph Regularized Matrix Factorization: DeepGRMF). We addressed the interpretability of models through techniques such as attention mechanism to identify driver mutations and critical biomarkers.
  • Revealing intra-/inter-tumor heterogeneity and mechanism of tumor progression via robust deconvolution and phylogenetic algorithms. We formulated the deconvolution of bulk tumor molecular data mathematically as a biologically inspired matrix factorization problem, and proposed a neural network (Neural Network Deconvolution; NND) and then an improved hybrid optimizer (Robust and Accurate Deconvolution: RAD; RAD with single-cells: RADs) to solve the problem robustly and accurately. We developed and applied a Minimum Elastic Potential (MEP) algorithm to reconstruct the evolutionary trajectory from the unmixed clones. We utilized mixed integer linear programming to solve deconvolution from bulk genomic data (FISH-Deconv; Joint-Clustering; TUSV-ext).
  • Improving prognostic prediction of cancer by incorporating machine learning and evolutionary methods. Clinicians traditionally focused on the pathological features and driver-level genomic profiles to facilitate the treatment. However, it is possible that critical clones, instead of the bulk tumor as a whole, affect the prognoses. We explored the questions by integrating both the evolutionary mutational features, driver-level features, and clinical features to improve the prognostic prediction of cancer. We developed an L0-regularized Cox regression model (Phylo-Risk), and found that the evolutionary features account for roughly 1/3 of all the available features, depending on cancer types and sequencing techniques.


Note: * indicates equal contribution, indicates co-corresponding author.

Paper image
Genome-Driven Personalized Medicine of Cancer via Machine Learning and Phylogenetic Models
Carnegie Mellon University Ph.D. Thesis. 2021.
Paper image
Interpretable Deep Learning for Chromatin-Informed Inference of Transcriptional Programs Driven by Somatic Alterations Across Cancers
Nucleic Acids Research. 2022. Impact Factor=19.2
Paper image
Improved Deconvolution of Combined Bulk and Single-Cell RNA-Sequencing Data
Cancer Research 82(12_Supplement):5031-5031. 2022. Impact Factor=12.7
Semi-Deconvolution of Bulk and Single-Cell RNA-Seq Data with Application to Metastatic Progression in Breast Cancer
Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2022.
Bioinformatics 38:i386-i394. 2022. Impact Factor=6.9
Paper image
Reconstructing Tumor Clonal Lineage Trees Incorporating Single-Nucleotide Variants, Copy Number Alterations and Structural Variations
Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2022.
Bioinformatics 38:i125-i133. 2022. Impact Factor=6.9
Paper image
De novo Prediction of Cell-Drug Sensitivities Using Deep Learning-based Graph Regularized Matrix Factorization
Proceedings of the Pacific Symposium on Biocomputing 27:278-289 (PSB). 2022. Oral
Paper image
Joint Clustering of Single Cell Sequencing and Fluorescence in situ Hybridization Data for Reconstructing Clonal Heterogeneity in Cancers
Paper image
Tumor Heterogeneity Assessed by Sequencing and Fluorescence in situ Hybridization (FISH) Data
Bioinformatics 37(24):4704-4711. 2021. Impact Factor=6.9
Paper image
Assessing the Contribution of Tumor Mutational Phenotypes to Cancer Progression Risk
PLOS Computational Biology 17(3):e1008777. 2021. Impact Factor=4.4
Paper image
Neural Network Deconvolution Method for Resolving Pathway-Level Progression of Tumor Clonal Expression Programs with Application to Breast Cancer Brain Metastases
Frontiers in Physiology 11:1055. 2020. Impact Factor=4.1
Paper image
Predicting Drug Sensitivity of Cancer Cell Lines via Collaborative Filtering with Contextual Attention
Proceedings of the Machine Learning for Healthcare Conference (MLHC). 2020.
Proceedings of Machine Learning Research 126:660-684 (PMLR). 2020.
Paper image
Robust and Accurate Deconvolution of Tumor Populations Uncovers Evolutionary Mechanisms of Breast Cancer Metastasis
Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2020. Oral
Bioinformatics 36:i407-i416. 2020. Impact Factor=6.9
Paper image
From Genome to Phenome: Predicting Multiple Cancer Phenotypes based on Somatic Genomic Alterations via the Genomic Impact Transformer
Proceedings of the Pacific Symposium on Biocomputing 25:79-90 (PSB). 2020. Oral
Paper image
Improving Personalized Prediction of Cancer Prognoses with Clonal Evolution Models
bioRxiv 761510. 2019.
Paper image
Phylogenies Derived from Matched Transcriptome Reveal the Evolution of Cell Populations and Temporal Order of Perturbed Pathways in Breast Cancer Brain Metastases
Proceedings of the International Symposium on Mathematical and Computational Oncology 3-28 (ISMCO). 2019. Oral
Paper image
Effective Feature Representation for Clinical Text Concept Extraction
Proceedings of the Clinical Natural Language Processing Workshop 1-14 (NAACL-ClinicalNLP). 2019. Oral
Paper image
Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning
Proceedings of the Pacific Symposium on Biocomputing 24:112-123 (PSB). 2019.


Teaching Assistant