Fatemehsadat Mireshghallah

Fatemehsadat Mireshghallah

(she/her/hers)

UC San Diego

Privacy, Fairness, Natural Language Processing

Fatemehsadat Mireshghallah is a Ph.D. Candidate at the CSE department of UC San Diego. Her research interests are Trustworthy Machine Learning and Natural Language Processing. She is a recipient of the National Center for Women & IT (NCWIT) Collegiate award in 2020 for her work on privacy-preserving inference, a finalist of the Qualcomm Innovation Fellowship in 2021, and a 2022 Adversarial ML rising star award winner.

How much can we trust large language models?

Large language Models (LLMs, e.g., GPT-3, TNLG, T-5) are shown to have a remarkably high performance on standard benchmarks, due to their high parameter count, extremely large training datasets, and significant compute. Although the high parameter count in these models leads to more expressiveness, it can also lead to higher memorization, which, coupled with large unvetted, web-scraped datasets can cause multiple different negative societal and ethical impacts: leakage of private, sensitive information-- i.e. LLMs are ‘leaky', generation of biased text--i.e. LLMs are ‘sneaky, and generation hateful or stereotypical text-- i.e. LLMs are ‘creepy'. In this talk, I will go over how the issues mentioned above affect the trustworthiness of LLMs, and zoom in on how we can measure the leakage and memorization of these models. Finally I will discuss what it would actually mean for large LLMs to be privacy preserving, and what are the future research directions on making large models trustworthy.