research interests

I'm interested in exploring methodologies that help us understand and bridge the socio-technical gap in AI systems—particularly in how they are evaluated, deployed, and governed.


Currently thinking about:

 

  • Improving Evaluations e.g. How can we better measure evaluations and datasets to ensure construct and claim validity?
  • Context-specificity e.g. How do we operationalize evaluations in multilingual and code-switching environments?
  • Performance Robustness e.g. To what extent are model behaviors stable—and hence evaluation results robust—across perturbations, shifts, or rephrasings?

 

If you're also thinking about the above, let's chat.

selected projects & work