When Can We Trust AI Coding of Student-Generated Text? A Committee-Based Approach to Diagnosing Agreement and Uncertainty at Scale

Fanjie Li, Madison Lee Mason, Daniel T. Levin, Alyssa Friend Wise

June 2026

Abstract

This paper operationalizes a committee-based performance diagnostic framework that combines inter-model agreement, consensus entropy, and borderline rate to support interpretable monitoring of AI coding of student text on unlabeled data. In a pilot application to nursing simulation reflections, these complementary metrics revealed distinct ensemble patterns, including stable consensus and divergence between agreement and decisiveness. The results illustrate how committee diagnostics can support ongoing oversight of AI coding as systems encounter new learners, contexts, and language use at scale.

Type

Conference paper

Publication

In Proceedings of the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

When Can We Trust AI Coding of Student-Generated Text? A Committee-Based Approach to Diagnosing Agreement and Uncertainty at Scale

Abstract

Fanjie Li

PhD student