Liberatory AI evaluation facilitates the creation of more just, effective, and community-centered technology by shifting the power of assessment into the hands of those most impacted by these systems.
In a liberatory AI framework, evaluation is not an afterthought, a marketing tool, or a pretense of accountability – it is the fundamental process through which communities collaboratively define what “good AI” looks like, who it serves, and how it aligns with their collective values. Instead of being shaped by opaque corporate interests, evaluation is driven by transparent, participatory methods that ensure AI systems remain responsive to the communities they impact.
For example, a community might evaluate an AI system based on how well it preserves cultural knowledge, promotes social cohesion, or addresses environmental concerns, all of which are often intentionally or unintentionally neglected in traditional technical evaluations. This approach acknowledges that communities have crucial lived experience and domain expertise about their own contexts that others lack, making them uniquely positioned to determine whether an AI system is truly serving their interests.
Liberatory AI evaluation is an ongoing process. It does not treat AI evaluation as a one-time step before deploying models. Instead, it is an iterative and participatory process where communities continuously give and integrate feedback into AI systems to ensure they reflect real-world conditions and community needs. Communities continuously provide feedback, refine success criteria, and hold AI developers accountable
Liberatory AI evaluation facilitates the creation of more just, effective, and community-centered technology by shifting the power of assessment into the hands of those most impacted by these systems. Instead of relying on corporate benchmarks that prioritize efficiency and profitability, this approach ensures that AI is held accountable to real human needs. When communities define what success looks like, AI systems are evaluated not by abstract performance metrics but by its tangible lives in the communities. This prevents the common scenario where technologies receive high numbers in controlled testing settings (e.g. benchmarks) yet fail miserably in real-world contexts, exacerbating harm rather than solving meaningful problems.
Unlike traditional AI evaluation, which applies one-size-fits-all benchmarks, liberatory AI evaluation recognizes that different communities have different values, histories, and aspirations. Success in AI should not be dictated by a single metric, but by criteria shaped by each community that interacts with the AI system. For example, a predictive policing model might achieve crime reduction according to a subjective benchmark dataset, but it fails to meet community needs when people experience it as invasive, discriminatory, and reinforcing existing power imbalance.
A liberatory approach also plays a crucial role in uncovering vulnerabilities and reducing harm. As discussed previously, many AI failures stem from evaluation practices that ignore the communities that the system fails on. For example, commercial facial recognition systems continue to be deployed despite well-documented failures in recognizing darker-skinned individuals. In a liberatory approach, AI is not deemed “successful” unless it works well for the people that it serves – not just those who resemble the datasets which it was trained on.
Beyond AI outcomes, community-driven evaluation also facilitates capacity building. The process of collaboratively defining evaluation metrics gives communities opportunities to gain technological literacy and agency, empowering them to engage, critique, and ultimately modify and own these systems. Rather than being passive recipients of technology imposed upon them, community members become active participants in shaping AI’s role in their lives.
To make liberatory AI evaluation a reality, we need an infrastructure that empowers communities to define, oversee, and enforce evaluation criteria – ensuring that AI systems are accountable to the people they impact, not just corporate interests. This requires both spaces for collective decision-making, accessible tools for auditing AI systems, and transparent documentation and accountability mechanisms.
Liberatory AI evaluation begins with community participation in shaping how AI systems are evaluated. This includes building collaborative spaces (e.g. workshops, focus groups, co-design sessions) that allows diverse community members to share their values, concerns, and priorities. This will be followed with collaborative processes to translate them into concrete measures. One real-world example of this is Smith et al.’s study on fairness in recommendation systems, where researchers co-designed fairness metrics with content creators and dating app users. Participants shared how existing AI systems failed them, leading to the development of new, community-driven evaluation criteria that better reflected their lived experiences. This research demonstrates that AI evaluation is more effective and just when the people affected by these systems actively shape the metrics that define success.
To support the tracking, auditing, and evaluation processes, accessible tools that do not require specialized expertise should be in place. An important example of this is Wikibench, a community-driven AI evaluation framework for Wikipedia contributors to collaboratively curate, refine, and validate evaluation datasets. Their tools enable collective governance over evaluation criteria. Similar models could be employed to other domains – Imagine a world where workers co-create fairness metrics for hiring AI, moderators evaluate content moderation algorithms, etc.
Beyond community-driven evaluation and auditing tools, we also need transparent documentation systems that record AI models, datasets, evaluation suites, and the deliberative processes that they were developed through. Making these information public provides a basis for public and community audits that ensure the systems have met community metrics. In situations of failures, there should exist mechanisms where communities can challenge these systems.
Further Reading
- Smith, J.J., Satwani, A., Burke, R., & Fiesler, C. (2024). Recommend Me? Designing Fairness Metrics with Providers. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency.