About
I work at the intersection of reinforcement learning and large models — focusing on scalable RL, preference training, and post-training techniques for deployed models. My work spans model training, efficiency at scale, and practical post-training recipes for large language and multimodal models.
Experience
Senior Research Scientist — Google DeepMind
June 2025 - present
Scalable RL & post-training; core post-training for released Gemini models.
Senior Member of Technical Staff — Cohere
Sept 2024 - June 2025
Research: RL, Preference Training, Model Merging. Technical lead on multiple model releases.
Researcher / Intern roles — Vector Institute, Univ. of Toronto, Cerebras
Awards & Scholarships
- NSERC Research Award (2020)
- Google Summer of Code (2021)
- Multiple Academic In-course Scholarships (2020–2022)
Select Publications
- Back to basics: Revisiting REINFORCE style optimization for learning from human feedback in LLMs — ACL 2024
- Intriguing Properties of Quantization at Scale — NeurIPS 2023
- Aya Expanse: Combining research breakthroughs for a new multilingual frontier — (multi-author)
- Self-improving robust preference optimization — ICLR 2025
- Extremely parameter efficient MoE for instruction tuning — ICLR 2024
- RLHF can speak many languages: Unlocking multilingual preference optimization for LLMs — EMNLP 2024 (Oral)