Projects

Apr 2026FAccT

The Doctor Will (Still) See You Now: On the Structural Limits of Agentic AI in Healthcare

A qualitative study based on interviews with 20 stakeholders examining how agentic AI is defined, evaluated, and constrained in healthcare, identifying three mutually reinforcing tensions: conceptual fragmentation, an autonomy contradiction, and an evaluation blind spot.

Agentic AIHealthcare AIResponsible AI

Apr 2026FAccT

Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing

A mixed-methods study examining inter-rater reliability among three psychiatrists evaluating 360 LLM-generated mental health responses, revealing systematic expert disagreement driven by incompatible clinical frameworks rather than measurement error.

AI SafetyRLHFMental Health AI

Oct 2025FAccT

Responsible AI in the Global Context

A global survey-based study exploring responsible AI practices across 1000 organizations in 20 industries and 19 regions, defining a conceptual RAI maturity model.

Responsible AIAI Governance

Oct 2025arXiv

The Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity Claims

A systematic review of 84 papers (2023–2025) exposing an evaluation imbalance in agentic AI, where technical metrics dominate (83%) while human-centered, safety, and economic dimensions remain peripheral, with a proposed four-axis evaluation framework.

Agentic AIAI EvaluationBenchmarking

Mar 2025IUI

More than Marketing? On the Information Value of AI Benchmarks for Practitioners

A qualitative interview study with 19 practitioners in academia, product, and policy examining how AI benchmarks are used to inform decision-making, finding that benchmarks serve as relative performance indicators but often lack the real-world relevance needed for substantive deployment decisions.

AI BenchmarksModel Evaluation

Mar 2025ECAI

LeRAAT: LLM-Enabled Real-Time Aviation Advisory Tool

A real-time advisory system leveraging large language models to assist aviation professionals with decision-making during complex operational scenarios.

LLMAviationNLPReal-Time Systems