Use case: grading 30 language exams in one afternoon
A realistic afternoon of AI-assisted grading: from the stack on the desk to the marks in the gradebook, step by step, no shortcuts.
Marta is a language teacher at a state secondary school. Four classes, a department head role that weighs on her schedule, a six-year-old daughter. She's been teaching for fifteen years and she's seen every version of grading: red pen on paper, stapled sheets with margin comments, gradebook spreadsheets, official platforms.
This is the story of one specific afternoon, a Wednesday in February. We tell it because it's the kind of day many teachers will recognize, with its small obstacles and its small reliefs. No product names until the end, because what matters is the dynamic.
The starting point
She gets home at 14:45. She ate in the car. She has one hour before picking up her daughter from school. On the living-room table, thirty exams from the medieval-literature unit of 9th grade, collected that morning. The deadline for marks in the gradebook is Monday. She has an evaluation meeting on Tuesday.
She does the math in her head: 30 exams, eight questions each, written comments half a page per exam, criterion-by-criterion records on a five-category rubric. At her usual pace — eleven or twelve minutes per exam to do it well — that's six hours. Four evenings with kids in the house, or a full Saturday. She knows the math, she's done it a hundred times.
This time, though, she's going to try something different.
14:50 — Preparation: the rubric as anchor
The first thing she does isn't start grading. It's open her notebook and review the rubric she's going to apply. She has it from last year, but she changed three exam questions, so she adjusts two descriptors. It takes her ten minutes. It's not lost time: it's the highest-return investment she'll make all afternoon.
The final rubric has five criteria:
- Comprehension of the medieval text (identifying context, author, genre).
- Analysis of literary devices (simile, metaphor, parallelism, anaphora).
- Production of critical text (structure, argumentation, citation).
- Written expression (spelling correctness, lexical richness).
- Connection to prior knowledge (transversality, intertextuality).
Each with five levels and observable descriptors. The whole rubric fits on one A4 page. Marta knows from experience that one well-written page will save her twenty micro-decisions per exam.
15:00 — Capture: bulk scanning
Now she photographs the thirty exams. She doesn't scan them at the school scanner because she doesn't have time. She uses her phone, on the living-room table, with good natural light. She separates them in piles of ten so she doesn't lose track. In ten minutes she has thirty exams digitized, ordered by student name.
The photos aren't perfect. A few have a finger in the corner, two are slightly tilted, one has a lamp shadow. It doesn't matter: current technology handles that without trouble. Last year, Marta would have wasted twenty minutes with the scanner and another twenty with OCR. Today, she just takes pictures.
15:15 — Automated grading: the first reading
She uploads the thirty exams to the grading tool she's testing. She gives it the rubric she just updated. She presses "start".
While the AI processes, she goes to the kitchen and pours herself a coffee. When she comes back, eight minutes later, the thirty exams are graded. Each has a proposed level for each criterion, with the specific sentence from the exam quoted as evidence.
Marta doesn't accept anything yet. What she has in front of her is a draft, just like a fast and very literal intern had done a first pass. Now comes the part that she has to do.
15:25 — Quick review: pass through everyone
She opens the first exam. She looks at the proposed score for each criterion and the evidence backing it. For the first criterion (comprehension of the medieval text), the AI correctly cited a fragment where the student recognizes the genre: level "Good". Marta agrees and confirms with one click.
For the second criterion (literary devices), the AI marked "Satisfactory" because the student identified the simile but mistook the metaphor for a personification. Marta reads the student's response, sees that yes, there's a slip, but the analysis of the simile is excellent. She raises the level to "Good" because she considers the partial knowledge weighs more than the specific error. The tool registers the change.
And so on. Her pace settles at 40-60 seconds per exam for the cases where she agrees with the proposal and 2-3 minutes for the ones requiring adjustment. Of the thirty exams, she confirms or lightly adjusts 24 and flags 6 for deep review.
By 16:10 she's finished the initial pass. Forty-five minutes, thirty exams, all with marks assigned and evidence reviewed.
16:10 — Pause: the daughter
She closes the laptop. She goes to pick up her daughter. Snack, the park, homework. Her head pings a couple of times but she doesn't engage: she knows she'll come back to grading at 19:00 when her daughter is in bed, and that instead of having four hours of pending work she has one well-defined hour for the six special cases.
This is the difference she already notices. Other years, that afternoon with her daughter would have been contaminated by the weight of the exams waiting. Today, no. The grading is 80% done and she knows it.
19:30 — Focused attention: the cases that matter
Back to the living room. The six exams flagged for deep review are the ones that require her judgment for real. Three are cases where the proposed mark seemed low for what she knows about the student; one is a very original exam the AI scored conservatively; two are exams with unusual responses that deserve a complete read.
For those six, Marta does what she has always done: read the whole exam, decide the mark, write personalized feedback. She doesn't use the AI's proposal as reference — she ignores it and goes back to the original exam. Each takes between 8 and 12 minutes. Total, an hour for the six.
The longest one is a student who wrote a brilliant critical analysis but with serious spelling errors. The rubric gives her "Satisfactory" overall because of spelling. Marta decides the whole deserves "Very Good" and writes in the personalized feedback why: "Your analysis is one of the best I've read this term. The overall mark is brought down by spelling errors you need to fix before the next exam, but I want you to know your critical thinking is at the level of upper grades. Keep going."
That kind of comment is what only Marta can do. It's what justifies the profession existing and not being automatable. And now she has time and energy to do it.
20:30 — Transfer: from gradebook to platform
The final marks — 30 exams, 5 criteria each — are in the grading tool. Marta exports them to her digital gradebook with one click. The gradebook automatically calculates the final numerical mark for each student according to the weights she'd already configured.
She reviews the complete listing on screen. Everything coherent, no surprises. She selects "export to school platform", confirms the term, downloads the CSV with the right encoding. She uploads the file to the platform. The official marks are recorded.
At 20:40 she closes the laptop. She's done.
The balance
Three and a half hours of total work — including the break with her daughter — thirty exams graded with judgment, personalized comments to the students who needed them, marks in the gradebook, file uploaded to the official platform. A Wednesday afternoon. With time for dinner with her partner and to read a bit before sleep.
Last year, this same work would have been the whole Saturday, with the family waiting in the living room for her to finish, the feeling of having lost the day, and the accumulated tiredness for Monday morning.
Marta doesn't think AI replaced her work. She thinks it replaced the mechanical part of her work — the transcription, the counting, the first reading — and gave her back the hours to do the part that matters. The conversation with her daughter, the two hours of deep review with personalized feedback, the email Friday to the brilliant student's tutor suggesting she be included in the advanced group.
That redistribution is what changes. It's not that she works less: it's that she works on what only the teacher can do.
Note: the tool Marta is testing is called Magistral. We left it for the end because the story is what matters, not the product. If you want to try it when access opens, join the waitlist.