Yujia Bao

Machine Learning Researcher

Yujia Bao

I am a machine learning researcher and a life-long engineer. I love building things—pushing the frontier of AI to make it more useful, safe, and available to everyone. Currently, I am an Associate Director at Accenture, where I lead the research and engineering development of AI Refinery, an agentic AI platform designed to help Fortune 500 companies build and govern complex agentic workflows.

My recent research spans scalable agent architectures, context management, LLM post-training, and evaluation. I am most excited about research that has direct product impact.

I received my Ph.D. in Computer Science from MIT CSAIL, advised by Regina Barzilay.

Experience & Education

Professional Experience

2023 - Current

Associate Director

Accenture

Leading a team of 80+ researchers and engineers. Developing AI Refinery, an agentic AI platform for enterprise.

2022 - 2023

Lead Machine Learning Scientist

Insitro

Machine learning research for drug discovery and development.

2017 - 2022

Researcher

MIT CSAIL

Machine learning research on interpretability, transfer learning, and fairness.

Education

2017 - 2022

Ph.D. in Computer Science

MIT CSAIL
Advisor: Regina Barzilay
2016 - 2017

M.A. in Mathematics

University of Wisconsin–Madison
2012 - 2016

B.S. in Mathematics

Shanghai Jiao Tong University

Recent Work

AI Refinery: Enterprise Agentic Platform

Leading engineering and research for AI Refinery, enabling developers to build and govern complex agentic workflows.

Enterprise LLM Customization

Leading the development of customized LLMs for high-stakes business and media domains.

Machine Learning Foundations

Foundational research on efficient post-training, inference, hierarchical memory, cross-user collaboration, agent evaluation and more. See publications below for details.

Publications

See Google Scholar for full list of publications.

2026

MCP-Bench: Benchmarking tool-using llm agents with complex real-world tasks via mcp servers

Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Rezazadeh, Ankit Shah, Yujia Bao, Eugene Siow
International Conference on Learning Representations (ICLR)

DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning

Yaxuan Wang, Chris Yuhao Liu, Quan Liu, Jinglong Pang, Wei Wei, Yujia Bao, Yang Liu
International Conference on Learning Representations (ICLR)

2025

PromptBridge: Cross-Model Prompt Transfer for Large Language Models

Yaxuan Wang, Quan Liu, Zhenting Wang, Zichao Li, Wei Wei, Yang Liu, Yujia Bao
arXiv:2512.01420

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Jingbo Yang, Bairu Hou, Wei Wei, Shiyu Chang, Yujia Bao
arXiv:2510.06587

SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models

Gyuhak Kim, Sumiran Singh Thakur, Su Min Park, Wei Wei, Yujia Bao
arXiv:2506.15021

Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control

Alireza Rezazadeh, Zichao Li, Ange Lou, Yuying Zhao, Wei Wei, Yujia Bao
arXiv:2505.18279

Advertising in AI systems: Society must be vigilant

Menghua Wu, Yujia Bao
arXiv:2505.18425

Enhancing Retrieval Systems with Inference-Time Logical Reasoning

Felix Faltings, Wei Wei, Yujia Bao
Annual Meeting of the Association for Computational Linguistics (ACL)

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang
Conference on Neural Information Processing Systems (NeurIPS)

H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models

Martin Kuo, Jianyi Zhang, Aolin Ding, Qinsi Wang, Louis DiValentin, Yujia Bao, Wei Wei, Hai Li, Yiran Chen
arXiv:2502.12893

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Jinlong Pang, Jiaheng Wei, Ankit Parag Shah, Zhaowei Zhu, Yaxuan Wang, Chen Qian, Yang Liu, Yujia Bao, Wei Wei
International Conference on Learning Representations (ICLR)

LLM Unlearning via Loss Adjustment with Only Forget Data

Yaxuan Wang, Jiaheng Wei, Chris Yuhao Liu, Jinlong Pang, Quan Liu, Ankit Parag Shah, Yujia Bao, Yang Liu, Wei Wei
International Conference on Learning Representations (ICLR)

From isolated conversations to hierarchical schemas: Dynamic tree memory representation for llms

Alireza Rezazadeh, Zichao Li, Wei Wei, Yujia Bao
International Conference on Learning Representations (ICLR)

Sample, estimate, aggregate: A recipe for causal discovery foundation models

Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola
Transactions on Machine Learning Research (TMLR)

2024

Harnessing business and media insights with large language models

Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N Barrere, Shelley Evenson, Rahul Basole, Connie Miao, others
arXiv:2406.06559

Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words

Yujia Bao, Srinivasan Sivanandan, Theofanis Karaletsos
International Conference on Learning Representations (ICLR)

2023

Contextual Vision Transformers for Robust Representation Learning

Yujia Bao, Theofanis Karaletsos
arXiv:2305.19402

2022

Learning to Split for Automatic Bias Detection

Yujia Bao, Regina Barzilay
arXiv:2204.13749

Learning Stable Classifiers by Transferring Unstable Features

Yujia Bao, Shiyu Chang, Regina Barzilay
International Conference on Machine Learning (ICML)

2021

Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers

Yujia Bao, Shiyu Chang, Regina Barzilay
International Conference on Machine Learning (ICML)

2020

Few-shot Text Classification with Distributional Signatures

Yujia Bao*, Menghua Wu*, Shiyu Chang, Regina Barzilay
International Conference on Learning Representations (ICLR)

2018

Deriving Machine Attention from Human Rationales

Yujia Bao, Shiyu Chang, Mo Yu, Regina Barzilay
Conference on Empirical Methods in Natural Language Processing (EMNLP)