Methodology

Evidence Base for the Five-Dimension Rubric

A reference document for beta reviewers · First Light Assessment Tool

Overview

First Light evaluates course assignments for susceptibility to completion by LLMs. Its rubric is organized around five dimensions grounded in peer-reviewed research. The central question is not “can an LLM do this?” but: does this assignment create conditions where LLM output is indistinguishable from student learning? (Lodge et al., 2023).

A note on method: several studies in this field measure how well an LLM can grade student work rather than how susceptible an assignment is to being completed by AI. We treat those as useful background on LLM capability, not as a scoring basis, and have kept them out of the dimension evidence below.

Context Specificity

Does the assignment require knowledge only a specific student in a specific course could plausibly have?

Source	Finding
Paustian & Slinger (2024)	On a specialized, course-specific prompt, generic LLM output scored well below genuine student work (~55% vs ~80%), direct evidence that course-bound grounding measurably degrades AI completion quality.
Bernabei et al. (2023)	LLMs produced fluent generic description but failed on distinctions taught in specific lectures; assignments bound to course-provided material resist generic AI completion.

Task Openness

Is the prompt broad and genre-predictable, or constrained and novel?

Source	Finding
Ding (2025)	Fully open-ended projects let an LLM gravitate to the familiar, easily generated instances it handles best; constrained, interconnected, multi-step tasks are markedly more AI-resilient.
Akbar (2025)	Across 50 real assignments, broadly framed conceptual and definitional tasks were highly AI-solvable (>70%), while context-rich, higher-order problems scored lowest.
Bernabei et al. (2023)	Assignments framed as standard academic genres (compare/contrast, summary, discussion post) were consistently easier for LLMs than constrained or novel tasks.

Process Visibility

Are there mechanisms that make the student's learning process visible, such as drafts, reflections, oral defenses, or iteration?

Source	Finding
Saltan (2025)	In a large software-engineering course, short video assignments in which students explain their own work curbed AI-assisted misconduct and increased engagement.
Birks & Clare (2023)	Viva-style oral defenses of unsupervised work raise both the effort and the perceived risk of AI-facilitated misuse, functioning as a direct deterrent.
Ncube et al. (2025)	A systematic review concludes that process-based, multi-stage, and oral assessments are central to maintaining integrity in AI-infused learning environments.

Output Type

What deliverable does the assignment require? Some output types are high-frequency in LLM training data; others introduce constraints LLMs struggle with.

Source	Finding
Pudasaini et al. (2024)	Text-based deliverables (essays, reports, homework) are the output types most consistently associated with AI-generated plagiarism.
Shepherd (2025)	Unsupervised text deliverables (essays, reports, projects) align directly with current LLM strengths and carry the highest structural exposure; live, oral, and practical formats carry the lowest.
Bernabei et al. (2023)	LLMs produce fluent descriptive prose but markedly weaker original analysis; formats dominated by description are more susceptible than those demanding sustained reasoning.

Verification Surface

Can the instructor plausibly distinguish between a student who learned and one who prompted?

Source	Finding
Paustian & Slinger (2024)	Verification depends on structural features of the assignment, not detection tools alone; assignments with no secondary verification mechanism leave instructors without a reliable basis for judgment.
Weber-Wulff et al. (2023)	A large comparative test of AI-text detectors found them neither consistently accurate nor reliable, and systematically biased toward labeling AI-generated text as human.
Perkins et al. (2024)	Simple paraphrasing and adversarial edits reliably bypass detectors, which also carry high false-positive rates for non-native writers, making detector-only verification structurally insufficient.
Lodge et al. (2023)	The most durable verification comes from assessment design that makes the student's reasoning process observable, since detection alone is increasingly unreliable.

References

01Akbar, M. S. (2025). Beyond detection: Designing AI-resilient assessments with automated feedback to foster critical thinking. arXiv:2503.23622.
02Bernabei, M., Colabianchi, S., Falegnami, A., & Costantino, F. (2023). Students' use of large language models in engineering education. Computers and Education: Artificial Intelligence, 5, 100172.
03Birks, D., & Clare, J. (2023). Linking artificial intelligence facilitated academic misconduct to existing prevention frameworks. International Journal for Educational Integrity, 19, 20.
04Ding, K. (2025). Designing AI-resilient assessments using interconnected problems: A theoretically grounded and empirically validated framework. arXiv:2512.10758.
05Lodge, J. M., Thompson, K., & Corrin, L. (2023). Mapping the implications of generative artificial intelligence for academic integrity. Australasian Journal of Educational Technology.
06Ncube, P. D. N., Dzvapatsva, G. P., Matobobo, C., & Ranga, M. M. (2025). Redefining student assessment in AI-infused learning environments: A systematic review of challenges and strategies for academic integrity. AI and Ethics.
07Paustian, T., & Slinger, B. (2024). Students are using large language models and AI detectors can often detect their use. Frontiers in Education.
08Perkins, M., Roe, J., Vu, B., Postma, D., Hickerson, D., & McGaughran, J. (2024). Simple techniques to bypass GenAI text detectors: Implications for inclusive education. International Journal of Educational Technology in Higher Education, 21.
09Pudasaini, S., Miralles-Pechuán, L., Lillis, D., & Salvador, M. (2024). Survey on AI-generated plagiarism detection. Journal of Academic Ethics, 23, 1137–1170.
10Saltan, A. (2025). Enhancing learning and mitigating AI-assisted misconduct: A case of using video assignments in a high-enrollment software engineering course. Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering.
11Shepherd, C. (2025). Generative AI misuse potential in cyber security education. arXiv:2501.12883.
12Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., Šigut, P., & Waddington, L. (2023). Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19, 26.