Dang Nguyen

I am a third-year Ph.D. student in Computer Science at the University of Maryland, College Park, where I am advised by Professor Tianyi Zhou. Previously, I was an AI Research Resident at VinAI Research (now acquired by Qualcomm AI Research), working under the mentorship of Professor Luu Anh Tuan. I received my B.E. degree in Computer Engineering from the Ho Chi Minh City University of Technology.

My research focuses on Large Foundation Models (LFMs) as agents that act via programmatic actions, often referred to as code agents (e.g., DynaSaur 🦖 and smolagents). I am particularly interested in improving agent performance on real-world tasks by improving the capabilities of the underlying LFMs, rather than relying on complex agent scaffolding.

Toward this goal, my work centers around three recurring questions:

Characterizing failure modes of code agents across different base LFMs (e.g., small vs. large, open- vs. closed-source, reasoning vs. non-reasoning),
Understanding why these failures occur through the lens of language model interpretability and programmatic analysis,
Developing new learning methods to mitigate these failure modes.

news

Jan 26, 2026	One paper accepted to ICLR 2026. FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Sep 18, 2025	One paper accepted to NeurIPS 2025. See you in San Diego! 🏖️ ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Jul 07, 2025	One paper accepted to COLM 2025. See you in Montreal! 🇨🇦 DynaSaur 🦖: Large Language Agents Beyond Predefined Actions
May 15, 2025	Our survey on GUI Agents is accepted to ACL 2025 (Findings).
Apr 10, 2025	I will join Amazon (NYC) as an Applied Scientist Intern this summer! 🍎🏙🚕🗽🐀

selected publications

COLM 2025

DynaSaur 🦖: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A. Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, and Tianyi Zhou

In Proceedings of the Conference on Language Modeling, 2025

Abs PDF Code

Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard.
ACL 2025

GUI Agents: A Survey

Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Md. Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, and Franck Dernoncourt

In Findings of the Association for Computational Linguistics, 2025

Abs PDF

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
EMNLP 2022

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Dang Nguyen and Anh Tuan Luu

In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022

Abs PDF Code

Despite the recent success of large pretrained language models in NLP, they are susceptible to adversarial examples. Concurrently, several studies on adversarial images have observed an intriguing property: the adversarial images tend to leave the low-dimensional natural data manifold. In this study, we find a similar phenomenon occurs in the contextualized embedding space of natural sentences induced by pretrained language models in which textual adversarial examples tend to have their embeddings diverge off the manifold of natural sentence embeddings. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that learns the embedding space manifold of the underlying language model and projects novel inputs back to the approximated structure before classification. Through extensive experiments, we find that our method consistently and significantly outperforms previous defenses under various attack settings while remaining unaffected to the clean accuracy. To the best of our knowledge, this is the first kind of manifold-based defense adapted to the NLP domain.