Development and Validation of a Rating Scale for Academic Writing Assessment: Collaborative Agency
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Melbourne
Abstract
The aim of formative assessment is multi-faceted as it informs both language teaching and learning in the L2 academic writing classroom. Research on formative classroom-based assessment, which emphasizes the active involvement of learners in their language acquisition and learning processes and nurtures student agency, has not received as much attention as summative assessment and large-scale testing. More specifically, the study of second language learners’ agency in language education is relatively new, and only a few studies explored student agency in the development of rating scales for classroom-based writing assessment in the L2 context.
To contribute to the existing knowledge of student agency in rating scale development, this dissertation documents the development and validation of a rating scale for the assessment of L2 academic writing in a formative assessment context. The research employed an exploratory sequential mixed-methods research design to collect procedural, quantitative and qualitative evidence which was analyzed and evaluated to support the inferences in the validity argument for the rating scale.
The study was conducted in two phases: the scale development phase and the scale validation phase. The scale development phase aimed to develop a rating scale for academic writing assessment in the L2 classroom, adopting a collaborative agency model through which writing teachers and college-level undergraduate students collaborated to produce the rating scale. Four scale development focus groups, comprising both teachers and students, were held to elicit qualitative data relating to the construct, properties and level descriptors of the rating scale, following a multiple source approach to scale development. The qualitative data was analyzed using the qualitative content analysis (QCA) method to inform scale development. The scale validation phase sought to validate the interpretations and uses of the scale-based scores, drawing on the argument-based validity framework. In this phase, the scale was first operationally implemented by the participant teachers to rate essays of different types following an incomplete yet connected rating design. Many-facets Rasch measurement (MFRM) and classical test theory (CTT) analyses were employed to analyze the rating data. In addition to the measurement-based evidence, scale post-implementation teacher and student focus groups were conducted to explore the perceptions of teachers and students about the functionality of the scale as well as its potential washback effect on teaching and learning. The appraisal of the scale’s validity argument involved the triangulation of the evidence gathered at different stages of the study to establish that the operationalized scale served its intended purposes from both psychometric and scale user perspectives.
The study findings indicate that the validity evidence observed and collected throughout the study provided adequate support for the overarching validity argument, and that the scale has the potential to positively support teaching and learning in the current educational context. Additionally, the findings show that collaborative agency has a positive washback effect on the teaching and learning of academic writing, and that both teachers and students reflected overall positively on the experience. The collaboration between writing teachers and students revealed some differences in the assessment preferences among the two groups, which can illuminate teaching and learning in L2 writing classrooms. The study offers implications for collaborative agency in scale development, scale development in L2 writing classroom contexts, and validation research in classroom contexts. Future research can adapt the current model of collaborative agency in scale development for formative assessment to explore further differences between teachers and students in their assessment preferences and to determine whether the findings of this study hold in other classroom contexts.
Description
Keywords
language testing, language assessment