Som Sagar

Hi! I am a second-year computer science PhD student at Arizona State University advised by Prof. Ransalu Senanayake and affiliated with the Laboratory for Learning Evaluation of autoNomous Systems (LENS). My research focuses on developing robust and adaptable machine learning models, with an emphasis on reinforcement learning and uncertainty estimation. I aim to create systems that can effectively handle distribution shifts and new information in dynamic environments. By integrating foundational models, reinforcement learning, and real-world applicability, I seek to enhance model interpretability, improve generalization, and optimize decision-making processes in AI systems.

Previously, I received a B.tech Honors. in computer scince from Indian Institute of Information Technology (IIIT), Kottayam.

I am originally from Kerala, India, and outside of research, I enjoy spending time outdoors, especially playing soccer, hiking, and swimming.

Please feel free to reach out about research or any advice I can help with!

[Email] [CV] [Google Scholar] [LinkedIn] [GitHub] [Blog]

Selected Research

Please see my CV or Google Scholar for a full list of work.

	Failures are fated, but can be faded: Characterizing and mitigating unwanted behaviors in large-scale vision and language models Som Sagar, Aditya Taparia, Ransalu Senanayake International Conference on Machine Learning (ICML), 2024 (Spotlight (top 3%)) [PDF] [Video] [Code] We introduce a framework that maps the failure landscape of large vision and language models, addressing their shortcomings by realigning model behavior with human preferences whether stylistic or ethical rather and mitigate failures,
	Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making Som Sagar, Aditya Taparia, Harsh Mankodiya, Pranav Bidare, Yifan Zhou, Ransalu Senanayake NeurIPS Workshop on Safe & Trustworthy Agents, 2024 [PDF] We introduce BaTCAV, a Bayesian TCAV framework with uncertainty estimations that enhances the interpretability of robotic actions across both simulation platforms and real-world robotic systems.
	LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded" Som Sagar, Aditya Taparia, Ransalu Senanayake NeurIPS Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?, 2024 [PDF] This extension of the Failures are Fated work demonstrates how we use LLM-assisted methods to generate rewards and states in diffusion models. Additionally, we incorporate various RL search strategies to optimize the discovery process.
	Explainable Concept Generation through Vision-Language Preference Learning Aditya Taparia, Som Sagar, Ransalu Senanayake NeurIPS Workshop on Interpretable AI: Past, Present and Future, 2024 [PDF] We propose a Reinforcement Learning-based Preference Optimizing exploration (RLPO) method designed to generate explainable states within a classification models, enabling the discovery of interpretable states that may be difficult or impossible for humans to identify.
	ExpressivityArena: Can LLMs Express Information Implicitly? Joshua Tint, Som Sagar, Aditya Taparia, Caleb Liu, Kelly Raines, Bimsara Pathiraja, Ransalu Senanayake NeurIPS Workshop on Behavioral Machine Learning, 2024 [PDF] We introduce ExpressivityArena, a framework designed to evaluate the expressiveness of large language models (LLMs), enabling systematic assessment of their ability to convey nuanced and implicit information across various contexts.

Website template from here.