Yifeng Tao

Research: Machine Learning in Cancer Genomics

Cancer proceeds from the accumulation of genomic alterations, and develops into heterogeneous cell populations in an evolutionary process. Therefore, the prognoses of cancer patients, such as survival profile, metastasis, and drug response, are encoded by the large-volume genome data. Our research focuses on the personalized medicine of cancer with machine learning and phylogenetic models (Thesis; Book):

Reliable phenotype inference of cancer through well-designed interpretable machine learning models. By leveraging the power of large scale genomic data and external biomedical knowledge base, we have been working on deep learning models for the accurate inference of cancer phenotypes, including transcriptome expression levels (Genomic Impact Transformer: GIT), transcription factor activities (Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism: CITRUS), and drug resistance (Contextual Attention-based Drug REsponse: CADRE; Deep Learning-based Graph Regularized Matrix Factorization: DeepGRMF). We addressed the interpretability of models through techniques such as attention mechanism to identify driver mutations and critical biomarkers.
Revealing intra-/inter-tumor heterogeneity and mechanism of tumor progression via robust deconvolution and phylogenetic algorithms. We formulated the deconvolution of bulk tumor molecular data mathematically as a biologically inspired matrix factorization problem, and proposed a neural network (Neural Network Deconvolution: NND) and then an improved hybrid optimizer (Robust and Accurate Deconvolution: RAD; RAD with single-cells: RADs) to solve the problem robustly and accurately. We developed and applied a Minimum Elastic Potential (MEP) algorithm to reconstruct the evolutionary trajectory from the unmixed clones. We utilized mixed integer linear programming to solve deconvolution from bulk genomic data (FISH-Deconv; Joint-Clustering; TUSV-ext).
Improving prognostic prediction of cancer by incorporating machine learning and evolutionary methods. Clinicians traditionally focused on the pathological features and driver-level genomic profiles to facilitate the treatment. However, it is possible that critical clones, instead of the bulk tumor as a whole, affect the prognoses. We explored the questions by integrating both the evolutionary mutational features, driver-level features, and clinical features to improve the prognostic prediction of cancer. We developed an L0-regularized Cox regression model (Phylo-Risk), and found that the evolutionary features account for roughly 1/3 of all the available features, depending on cancer types and sequencing techniques.

Publications

Note: * indicates equal contribution, † indicates co-corresponding author.

Genome-Driven Personalized Medicine of Cancer via Machine Learning and Phylogenetic Models

Yifeng Tao

Carnegie Mellon University Doctoral Dissertation. 2021.

Paper PDF Slides bibtex

Deep Learning-Based Root Cause Analysis of Process Cycle Images

Kimberly Jean Gietzen, Jingtao Liu, Yifeng Tao

Assignee: Illumina, Inc.

U.S. Patent. No. US12272050B2. 2025.

Patent family: AU2022214943A1, CA3187106A1, CN115812223B, EP4285276A1, IL299597A, JP2024505317A, KR20230136872A, US12272050B2, WO2022165278A1

Paper PDF bibtex

Chapter 3 - Machine Learning Applications in Cancer Genomics

Omar El-Charif, Russell Schwartz, Ye Yuan, Yifeng Tao

Editor(s): John Kang, Tim Rattay, Barry S. Rosenstein

Machine Learning and Artificial Intelligence in Radiation Oncology. 41-72. 2024.

Paper PDF bibtex

Interpretable Deep Learning for Chromatin-Informed Inference of Transcriptional Programs Driven by Somatic Alterations Across Cancers

Yifeng Tao*, Xiaojun Ma*, Drake Palmer, Russell Schwartz, Xinghua Lu, Hatice Ulku Osmanbeyoglu

Nucleic Acids Research. 50(19):10869-10881. 2022. Impact Factor=19.2

Paper PDF Preprint Code Data Slides Video bibtex

Improved Deconvolution of Combined Bulk and Single-Cell RNA-Sequencing Data

Haoyun Lei, Xiaoyan A. Guo, Yifeng Tao, Kai Ding, Xuecong Fu, Steffi Oesterreich, Adrian V. Lee, Russell Schwartz

Cancer Research 82(12_Supplement):5031-5031. 2022. Impact Factor=12.7

Paper bibtex

Semi-Deconvolution of Bulk and Single-Cell RNA-Seq Data with Application to Metastatic Progression in Breast Cancer

Haoyun Lei*, Xiaoyan A. Guo*, Yifeng Tao, Kai Ding, Xuecong Fu, Steffi Oesterreich, Adrian V. Lee, Russell Schwartz

Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2022.

Bioinformatics 38:i386-i394. 2022. Impact Factor=6.9

Paper PDF Code Video bibtex

Reconstructing Tumor Clonal Lineage Trees Incorporating Single-Nucleotide Variants, Copy Number Alterations and Structural Variations

Xuecong Fu, Haoyun Lei, Yifeng Tao, Russell Schwartz

Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2022.

Bioinformatics 38:i125-i133. 2022. Impact Factor=6.9

Paper PDF Code bibtex

De novo Prediction of Cell-Drug Sensitivities Using Deep Learning-based Graph Regularized Matrix Factorization

Shuangxia Ren*, Yifeng Tao*, Ke Yu, Yifan Xue, Russell Schwartz†, Xinghua Lu†

Proceedings of the Pacific Symposium on Biocomputing 27:278-289 (PSB). 2022. Oral

Paper PDF Preprint Code Poster bibtex

Joint Clustering of Single Cell Sequencing and Fluorescence in situ Hybridization Data for Reconstructing Clonal Heterogeneity in Cancers

Xuecong Fu, Haoyun Lei, Yifeng Tao, Kerstin Heselmeyer-Haddad, Irianna Torres, Michael Dean, Thomas Ried, Russell Schwartz

Journal of Computational Biology 28(11):1035-1051. 2021.

Paper PDF Code bibtex

Tumor Heterogeneity Assessed by Sequencing and Fluorescence in situ Hybridization (FISH) Data

Haoyun Lei*, E. Michael Gertz*, Alejandro A. Schäffer, Xuecong Fu, Yifeng Tao, Kerstin Heselmeyer-Haddad, Irianna Torres, Xulian Shi, Kui Wu, Guibo Li, Liqin Xu, Yong Hou, Michael Dean, Thomas Ried, Russell Schwartz

Bioinformatics 37(24):4704-4711. 2021. Impact Factor=6.9

Paper PDF Preprint Code Video bibtex

Assessing the Contribution of Tumor Mutational Phenotypes to Cancer Progression Risk

Yifeng Tao, Ashok Rajaraman, Xiaoyue Cui, Ziyi Cui, Haoran Chen, Yuanqi Zhao, Jesse Eaton, Hannah Kim, Jian Ma†, Russell Schwartz†

PLOS Computational Biology 17(3):e1008777. 2021. Impact Factor=4.4

Paper PDF Code bibtex

Neural Network Deconvolution Method for Resolving Pathway-Level Progression of Tumor Clonal Expression Programs with Application to Breast Cancer Brain Metastases

Yifeng Tao, Haoyun Lei, Adrian V. Lee, Jian Ma, Russell Schwartz

Frontiers in Physiology 11:1055. 2020. Impact Factor=4.1

Paper PDF Code bibtex

Predicting Drug Sensitivity of Cancer Cell Lines via Collaborative Filtering with Contextual Attention

Yifeng Tao*, Shuangxia Ren*, Michael Q. Ding, Russell Schwartz†, Xinghua Lu†

Proceedings of the Machine Learning for Healthcare Conference (MLHC). 2020.

Proceedings of Machine Learning Research 126:660-684 (PMLR). 2020.

Paper PDF Code Slides Video Poster bibtex

Robust and Accurate Deconvolution of Tumor Populations Uncovers Evolutionary Mechanisms of Breast Cancer Metastasis

Yifeng Tao, Haoyun Lei, Xuecong Fu, Adrian V. Lee, Jian Ma, Russell Schwartz

Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2020. Oral

Bioinformatics 36:i407-i416. 2020. Impact Factor=6.9

Paper PDF Code Slides Video bibtex

From Genome to Phenome: Predicting Multiple Cancer Phenotypes based on Somatic Genomic Alterations via the Genomic Impact Transformer

Yifeng Tao, Chunhui Cai, William W. Cohen†, Xinghua Lu†

Proceedings of the Pacific Symposium on Biocomputing 25:79-90 (PSB). 2020. Oral

Paper PDF Preprint Code Data Slides bibtex

Improving Personalized Prediction of Cancer Prognoses with Clonal Evolution Models

Yifeng Tao, Ashok Rajaraman, Xiaoyue Cui, Ziyi Cui, Jesse Eaton, Hannah Kim, Jian Ma†, Russell Schwartz†

bioRxiv 761510. 2019.

PDF Preprint Code bibtex

Phylogenies Derived from Matched Transcriptome Reveal the Evolution of Cell Populations and Temporal Order of Perturbed Pathways in Breast Cancer Brain Metastases

Yifeng Tao, Haoyun Lei, Adrian V. Lee, Jian Ma, Russell Schwartz

Proceedings of the International Symposium on Mathematical and Computational Oncology 3-28 (ISMCO). 2019. Oral

Paper PDF Code Slides bibtex

Effective Feature Representation for Clinical Text Concept Extraction

Yifeng Tao, Bruno Godefroy, Guillaume Genthial, Christopher Potts

Proceedings of the Clinical Natural Language Processing Workshop 1-14 (NAACL-ClinicalNLP). 2019. Oral

Paper PDF Preprint Code Data Slides Media bibtex

Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

Haohan Wang*, Xiang Liu*, Yifeng Tao, Wenting Ye, Qiao Jin, William W. Cohen, Eric P. Xing

Proceedings of the Pacific Symposium on Biocomputing 24:112-123 (PSB). 2019.

Paper PDF Preprint Code bibtex

Yifeng Tao

About

Research: Machine Learning in Cancer Genomics

Publications

Misc

Reviewer

Teaching

Teaching Assistant

Courses