Research Directions

Cell lineage and differentiation

schematic of cell lineage tree and cell differentiation map 

A multicellular organism is the result of a complex process involving division and differentiation of cells. A fundamental goal of developmental biology is to understand (1) the ancestral relationship between the cells, i.e. the cell lineage tree, and (2) all of the intermediate cell types and cell type transitions, i.e. the cell differentiation map, that occur during development. Recent advances in high-throughput lineage tracing technologies hold the key to uncovering the lineage tree and differentiation map for complex organisms. Lineage tracing technologies employ genome editing tools, such as CRISPR-Cas9 editing, in conjunction with single-cell sequencing methods, such as scRNA-seq, to simultaneously measure the lineage and state of individual cells in complex organisms such as zebrafish and mouse. However, inference of the cell lineage tree and the cell differentiation map from lineage tracing data remains challenging because (i) genome editing tools often introduce the same mutations in cells with different lineages (homoplasy) making it hard to accurately trace the ancestry of cells, (ii) lineage tracers do not measure the cell type of the ancestral cells, many of which may have progenitor cell types not observed in the extant cells of the organism and (iii) lineage tracing data has errors and missing information (dropout). We develop specialized algorithms for accurate lineage tree inferene and cell differentiation mapping from lineage tracing data.

Cancer genome evolution

schematic of cancer evolution 

Cancer is an evolutionary process in which somatic mutations across all genomic scales accumulate in a population of cells. This process results in a heterogeneous tumor with subpopulations of cells, called clones, with distinct genomes. Characterization of this intra-tumor heterogeneity and reconstruction of the evolutionary history of the tumor, known as the tumor phylogeny, is crucial for understanding cancer progression and developing effective therapies for treatment. Single-cell DNA sequencing (scDNA-seq) has emerged as a powerful tool for achieving this, enabling parallel genomic sequencing of thousands of individual cells from a tumor. However, current scDNA-seq technologies are error-prone and produce data with missing information. As such, there is a need for robust and scalable computational methods to characterize the heterogeneity within cancer tumors from scDNA-seq data. We develop comprehensive models of cancer evolution that incorporate prevoiusly neglected biological processes and introduce novel problem statements to analyze single-cell sequencing data in innovative ways.

Infectious disease evolution and transmission

alt text 

During an outbreak, pathogens undergo evolutionary changes as they are transmitted from one host to another. Pathogens, such as viruses or bacteria, accumulate genetic mutations or undergo genetic recombination during replication within a host's body. Consequently, the host harbors multiple variants of the pathogen, which is known as within-host diversity. Identifying and characterizing the diversity of pathogen variants is crucial for timely outbreak management and predicting the effectiveness of treatment for infected patients. We work on problems such as quantifying the prevalence of variants of interest (VoIs) of SARS-CoV-2 in wastewater; deciphering the diversity of SARS-CoV-2 within and across patients; assembling coronavirus transcripts; and de novo identification of genes in coronavirus genomes.