Pinar Demetci

Pinar Demetci

(she/her/hers)

Brown University

machine learning, computational biology, computational genomics, multi-modal learning

I am a 5th year Ph.D. student in computer science and computational biology at Brown University, advised by Dr. Ritambhara Singh. My primary research interests are to (1) develop machine learning algorithms tailored to the unique challenges of biomolecular and biomedical data and (2) apply these algorithms to dissect the genomic regulatory mechanisms behind heterogenous disease. My doctoral dissertation is on probabilistic and statistical algorithms for the integrated analysis of multi-modal data taken at single-cell resolution. Before joining Brown, I received my bachelor's degree in engineering with a concentration in bioengineering from Olin College of Engineering, and spent a year as a research associate at Massachusetts Institute of Technology.

Probabilistic machine learning algorithms for integrated analysis of single-cell multi-modal data

With advances in sequencing technologies, genomics has increasingly become a data-driven field. In particular, the recently developed single-cell sequencing assays have increased the data output by yielding individual profiles for every cell in a given sample. This increased resolution in measurements helps to reveal fine-grained biological heterogeneity in tissues. Moreover, thanks to their versatility in profiling different aspects of the genome, single-cell datasets can help scientists to study how different genomic features interact to regulate the cell and how these regulatory mechanisms differ between cells. Thus, single-cell measurements are increasingly used in precision medicine research to understand heterogeneous disease and tailor treatment to individuals. However, there are computational pattern recognition challenges to be addressed for single-cell datasets to reach their potential in precision medicine. Firstly, while some experimental protocols exist for combining different measurements on the same single cell, this is not possible for the majority of combinations due to the destructive nature of sequencing experiments, requiring computational approaches. Secondly, even if multiple aspects are profiled, due to the high dimensional nature of the feature space and the complexity of interactions, machine learning algorithms are needed to infer genomic regulatory relationships.

To address these challenges, the goal of my research is to (1) computationally integrate multi-modal single-cell datasets when they can't be measured together (2) infer cross-modal relationships from multi-modal datasets to infer genomic regulatory mechanisms, and (3) study how these mechanisms are changed in different cell states or conditions, with potential therapeutic applications. Towards the first goal, I have developed an optimal transport-based algorithm, SCOT, to integrate single-cell datasets from different modalities. SCOT performs cell-level alignment in a fully unsupervised manner and can heuristically self-tune its hyperparameters when no validation data is available, unlike existing single-cell integration algorithms. It can additionally align more than two datasets at once, and consider disparities, such as cell-type proportion mismatch  across datasets. An extension of this method, SCOOTR, uses an alternating optimization procedure to simultaneously solve for both a cell-level correspondence matrix and a feature-level correspondence matrix, thus also revealing genomic regulatory relationships. This is a more challenging task to perform in an unsupervised setting, but weak supervision on one level (e.g. cell alignments) significantly boosts and helps to yield high performance on the other level (e.g. feature alignments). Lastly, to infer the regulatory changes associated with different cellular states and to propose interventions for altering cell states, I develop a graph neural network-based algorithm that models gene regulatory networks with probabilistically learned edge sparsities and counterfactual explanations.