assets/abstracts/pulkitverma.txt

Recent advances have brought AI systems closer to laypeople, raising the problem of how these users can assess if an AI system will be safe in a given situation. This is challenging as end users do not design most AI systems; their internal software may be unavailable or change due to learning. To address these issues, in this talk, I will present a paradigm for third-party autonomous assessment of black-box AI systems, with key desiderata: interpretability (describing AI functionality in user-understandable form), correctness (ensuring description accuracy), generalizability (working across AI system types), and minimal requirements (not placing complex demands on AI manufacturers to support assessment). I will present algorithms enabling user-aligned autonomous assessment to help generate a description of the AI system's capabilities. Through theoretical results and empirical evaluations, I will show that (i) a query-response interface enables deriving accurate user-interpretable capability models efficiently, and (ii) such descriptions are more accessible for users to understand and reason with than primitive actions.