Profile picture for user WilliamAgnew

William Agnew

Postdoctoral Fellow, Carnegie Mellon University

About William

Agnew’s research uses audits, human subjects research, and critical analysis to predict and understand the impacts of AI on the world. To mitigate harms and broaden the benefits his work surfaces, Agnew conducts research in partnership with policymakers, civil society, and impacted communities to identify and close gaps in knowledge blocking progress. Finally, Agnew develops technical tools that can be used by broad audiences to mitigate AI harms. Agnew's work has been supported by grants from the Ford Foundation, MacArthur Foundation, UK AI Security Institute, Luminate Foundation, and CBI and NDSEG fellowship.

Contributions

In the News

Research discussed by Laura Reiley, in "What My Daughter Told ChatGPT before She Took Her Life," The New York Times, August 18, 2025.
Research discussed by Kashmir Hill & Dylan Freedman, in "Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens.," The New York Times, August 8, 2025.
Quoted by Eileen Guo in "A Major AI Training Data Set Contains Millions of Examples of Personal Data," MIT Technology Review, July 18, 2025.
Research discussed by Elizabeth Gibney, in "Wake Up Call for AI: Computer-Vision Research Increasingly Used for Surveillance," Nature, June 25, 2025.
Quoted by Reece Rogers & Victoria Turk in "OpenAI’s Sora Is Plagued by Sexist, Racist, and Ableist Biases," Wired, March 23, 2025.
Quoted by Melissa Heikkilä in "How This Grassroots Effort Could Make AI Voices More Diverse," MIT Technology Review, November 15, 2024.
Quoted by Reece Rogers in "Here's How Generative AI Depicts Queer People," Wired, April 2, 2024.
Research discussed by Emanuel Maiberg, in "How the “Surveillance AI Pipeline” Literally Objectifies Human Beings," 404 Media, September 28, 2023.
Research discussed by Rachel Crowell, in "Why AI’s Diversity Crisis Matters, and How to Tackle It," Nature, May 19, 2023.
Quoted by Khari Johnson in "How to Stop Robots from Becoming Racist," Wired, August 18, 2022.
Research discussed by Pranshu Verma, in "These Robots Were Trained on AI. They Became Racist and Sexist.," The Washington Post, July 16, 2022.
Quoted by Karen Hao in "Inside the Fight to Reclaim AI From Big Tech’s Control," MIT Technology Review, June 14, 2021.
Quoted by Khari Johnson in "Black and Queer AI Groups Say They'll Spurn Google Funding," Wired, May 10, 2021.
Quoted by Tom Simonite in "AI and the List of Dirty, Naughty, Obscene, and Otherwise Bad Words," Wired, February 4, 2021.

Publications

"Computer-Vision Research Powers Surveillance Technology" (with Pratyusha Ria Kalluri, Myra Cheng, Kentrell Owens, Luca Soldaini, and Abeba Birhane). Nature 643 (2025): 73-79.

Investigates the link between computer vision research and mass surveillance, revealing that the field is deeply intertwined with technologies that enable the monitoring and targeting of human bodies.

"Expressing Stigma and Inappropriate Responses Prevents LLMs From Safely Replacing Mental Health Providers" (with Jared Moore, Declan Grabb, Kevin Klyman, Stevie Chancellor, Desmond C. Ong, and Nick Haber). Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (2025): 599 - 627.

Investigates the use of large language models to replace mental health providers. Reviews established therapy guidelines and tests current LLMs like GPT-4o and finds that LLMs fail to meet key aspects of effective therapy. Concludes that LLMs should not replace therapists, and discusses alternative roles for LLMs in clinical therapy.

"A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset" (with Rachel Hong, Jevan Hutson, Imaad Huda, Tadayoshi Kohno, and Jamie Morgenstern), arXiv, June 2025.

Investigates the contents of web-scraped data for training AI systems, at sizes where human dataset curators and compilers no longer manually annotate every sample. Provides concrete evidence to support the concern that any large-scale web-scraped dataset may contain personal data.

"The Values Encoded in Machine Learning Research" (with Abeba Birhane, Pratyusha Kalluri, Dallas Card, Ravit Dotan, and Michelle Bao). Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (2022): 173-184.

Examines the values embedded in machine learning research by analyzing 100 highly cited papers from ICML and NeurIPS. Findings show that few papers connect their research to social needs or discuss potential harms. 

"Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" (with Jesse Dodge, Maarten Sap, Ana Marasović, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, and Matt Gardner). Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021): 1286–1305.

Provides some of the first documentation for the Colossal Clean Crawled Corpus, a dataset created by applying a set of filters to a single snapshot of Common Crawl. Examines where the data comes from and assesses the effects of C4’s filtering process.