Not Easily Sold: One Human’s Evaluation of Using Class Companion’s A.I. to Score FRQs

Class Companion is the hot new gizmo that will likely blast some of your PD time during your start-of-the-year teacher institutes. Basically, Class Companion claims to accurately score your students’ written work, thus freeing up more time for the teacher. It can act iteratively, meaning that it can provide suggestions if the student is off-track. Its prophets also claim that the increase in AP scores this year (2024) is partly a result of more teachers leveraging the platform at the end of the school year, a Holy Grail for AP History teachers who cannot score DBQs fast enough.

For those of us keeping our heads above water, these tempting fandangles capture our imagination of being able to spend more time doing other things. I’ve seen this kind of thing before. And when any new fandangle has more teacher testimonials than tutorials, it is my red flag #1.

For some old dogs who have lived through four tech revolutions in their classrooms, this new whizbang is not likely to impress.

I have much to say about the generative AI revolution, but I will spare you the commentary and cut to the chase. I tested Class Companion to see if it is viable for scoring APHG FRQs. Spoiler alert: I was unimpressed.

It is also understood that within a few months of posting this, more competitors will emerge to compete in the AI student grading space. My critique of Class Companion has as much to do with finding its holes and necessary workarounds as it does with speaking to the cautionary caveats when evaluating student work with generative AI. If you want to learn more reasons why AI should NOT be used for student evaluation, you can read very good arguments HERE.

If you want to see how this thing works, keep scrolling, where I will show you the platform, my set-up procedure, and what I used to test its effectiveness. If you only want my takeaways from my experience, scroll to the end and save yourself the pain. If you already have your mind made up that AI should not be used to evaluate a student’s written work, exit out.

MY TEST FRQ

For my test, I used the 2024 APHG Green Revolution question. I recently scored this FRQ at the in-person reading. For this exercise, I purposely drafted incorrect responses for all seven parts that were common student no-scores. My goal is to foil the machine.

Here is what I wrote to copy and pasted into the bot.

A. Define the concept of carrying capacity.

The amount of people that an area of land can hold.

B. Describe ONE way that humans have altered the environmental sustainability of agricultural lands.

Humans have practiced deforestation in order to increase the amount of agricultural land. In turn, this contributes to the destruction of animal and natural plant habitats. In addition, overcropping and overgrazing lead to soil exhaustion that can contribute to desertification where soil loses its nutrients and too brittle to host plants.

C. Explain how transportation technology has increased economies of scale in the agricultural sector of less developed countries.

Shipping containers can move large and/or larger quantities of crops.

D. Explain a likely negative economic outcome of Green Revolution agricultural practices on rural communities.

The high cost of genetically modified organisms make it difficult for local farmers to purchase. The result is an inability to compete with larger farm operations.

E. Explain ONE weakness of Malthusian theory in predicting the relationship between food production and population growth in contemporary society.

Malthus could not have predicted that later technologies could produce more food.

F. Explain how surplus food production has changed the global market for local agricultural products.

Surplus food drives global prices down, thus impacting the global market for local agricultural products.

G. Explain the degree to which Green Revolution agricultural practices were effective in reducing hunger in less developed countries. (Response must indicate the degree [low, moderate, high] and provide an explanation.)

To a high degree, Green Revolution agricultural practices were effective in reducing hunger in less developed countries because genetically modified organisms permitted LDCs to grow more food than ever before and not allow them to go hungry.

ACCOUNTS

If your state and local district permits you to adopt an app such as Class Companion (make sure to check), it currently permits students and teachers to create free accounts (which will likely change to a paid service). Class Companion’s current competitors include OwlerAI, MagicSchool.ai, and TurnitIn. I easily created two accounts through Google (a professional teacher account and a personal email to play a student).

CREATING AN ASSIGNMENT

Once logged in as a teacher, I eagerly moused around to find out where to upload my 7-part FRQ on the 2024 APHG Green Revolution.

The first step from the teacher’s account is to create a class and make an assignment.

When adding an assignment, you are given three options. I used the Import tab to upload the questions and a scoring guideline, thinking this was the easiest route.

Class Companion generated a “Short-Answer” quiz where it successfully inserted the question text for all seven parts. So far, so good.

You can also create an assignment from scratch, where “Essay” is also an option. I think the only difference is that short answer permits multiple questions?

SCORING AN ASSIGNMENT

While still in the “Questions” screen, when I toggle over to “Scoring,” you will notice that no scoring parameters are listed. This makes sense since I haven’t yet uploaded or typed a rubric.

I still don’t understand why Class Companion permits the user to insert the scoring guidelines for a short answer but not when creating an essay question within the Questions tab. I am sure you are expected to do that under the Rubrics function, but it would be helpful if assigning scores were all under one function. Figuring this out on my own took some time.

And yes, I watched the YouTube tutorials.

Knowing that I need to assign a point value and scoring criteria for my questions, I head over to Rubrics, where I am reminded that my rubric is missing.

Once you click “Add Rubric”, you are presented with three options.

If you are feeling sure of yourself and want to throw caution to the wind, you can let the machine make one for you – a recommendation by Class Companion. Here is what it looks like:

When I attempted this, you can see that Class Companion cannot generate a rubric for each question of the 7-part FRQ; it also only generates a generic skill-based rubric. And, despite my asking for a 7-point rubric, it generates one out of 18.

I quickly lost hope that AI could generate the rubric that I wanted, and since I have the actual scoring guideline, I chose to Import. If you use some sort of skill—or standard-based rubric, the bot has them preloaded, and you can find them in Class Companion’s Content Library (e.g., AP Histories).

So, I dragged my Scoring Guidelines PDF and hit upload.

After pressing Import and Review, I am hit with my first error message.

I really wanted this to work, but like most promising doo-dads, I was let down. Without tips or hints from Class Companion, I still have no explanation for why this happened. The doc was a clean, typed PDF.

After multiple attempts, I abandon ship and tried importing through the Text feature, which also leads to the same error message.

When I finally gave up on uploading a rubric, I started clicking buttons to guess where the problem was. I found myself back at the initial question page, where I could see that the scoring guideline had been magically generated under each of the 7 corresponding questions.

A small victory that I stumbled on. You can see what happened below.

At this point, I thought that I was finished since the bot has both my scoring guidelines and questions, so I went to Submissions in hopes of testing it out as a student. But I was still road-blocked by the Rubric section.

So while the bot took my scoring parameters in the question section, I hadn’t yet assigned any point values. Understandable, so I kept working. This time I decided to try making a rubric from scratch knowing that Class Companion doesn’t like my uploads.

I am presented with two options.

Since I have a seven part question that carries a value of 7 (one point for each question), the checklist feature is not appropriate. And provided that I added every possible bullet point from the scoring guideline, I do not want to permit the machine to use them as equally viable answers for all 7 parts. You can see what this looks like below.

Additionally, for some unknown reason, Class Companion does not permit you to change the maximum score to 7.

So, I abandoned a Checklist rubric and tried the Table option. Here, I had to make the rubric myself, something I initially hoped to avoid.

I added seven criteria and gave the parameters for each part of the FRQ. You can see what I copy/pasted for part A of the 7-part question.

If you leave the text blank under the “0” column, your rubric will not be accepted, and you will be hit with an “Invalid” error. Without clear directions from Class Companion on why this was happening, I sunk a good 20 minutes guessing what the problem was. And though it is a binary obvious what scores a zero, I typed “An Incorrect Response” to qualify it as a zero.

From here, there are no error messages, and my point value is now set at 7.

Or so it says (foreshadowing).

Next, there are a slew of customizable whch you are all familiar with if you’ve ever used an LMS quiz or other online testing program.

You can see the options below.

STUDENT PERSPECTIVE:

I have the ability to preview my questions from my teacher account, but I want the full student experience, so I switched over to my student account. I began filling in my incorrect responses and hitting the green send icon for feedback. Sometimes, submitting a question for A.I. feedback took a long time, and Class Companion generated more error messages than iterative suggestions.

So, I gave up on receiving feedback for individual questions and submitted the entire assignment.

From here, immediate feedback from Class Companion is generated for the student to see. While I prevented students from seeing a final score, based upon its feedback, Class Companion scores my responses a 6/7.

CLASS COMPANION’S RESULTS

Sure enough, when I logged back into my teacher account, Class Companion scored my 7 incorrect responses as a 6/7.

At the AP reading, this scores a ZERO.

For the life of me, I still can’t figure out why it was calculated out of 49 – so if anyone can enlighten me without charging a speaking fee – let me know!

From here, I just start duplicating the assignment to test whether the outcome changes when using Class Companion’s vague APHG rubric. (See below). You can appreciate their rubric below.

After doing this, Class Companion scores it a 4/7. Mind you, I accidentally left the additional scoring criteria alongside the initial questions, unintentionally helping the machine.

So, I tested my 0/7 FRQ one last time, deleting the additional scoring criteria and only relying on the vague “rubric” that Class Companion offers. This time, Class Companion scored it a 5. It actually earned another point!

There you have it. Class Companion scored the same submission with three different scores (6,4,5) when the score is actually ZERO.

TAKEAWAYS

Nuanced language is important. Bots don’t have the human dialogue to help evaluate qualifying responses. If my district approves its use, I am open to testing other ways to use Class Companion. But for now, I still hold greater value in students learning how to evaluate their own work using the scoring guideline instead of generated text telling them what is right.

In addition, when enough of the internet relays incorrect information that is propagated as “common knowledge” (i.e. GMOs being a central innovation of the Green Revolution), unbeknownst to edutainers and Quizlet sets that are generated from their condensed 5-minute videos – the bots are trained on materials that are weak and incorrect. These triggered correct responses. Die-hards for this technology will ignore or downplay these issues.

So, when you walk into this year’s AI PD session, I encourage you to avoid being hypnotized by its generative abilities and instead walk in with critical questions.

I would love to tell you how the handwriting feature works, but it did not let me submit an image. I hand-wrote the 7 parts verbatim and took a clear iPhone picture. A Class Companion tutorial says they will transcribe student work within two days and get back to you.

Perhaps I could hand-grade their work by then?

H.I.

Not Easily Sold: One Human’s Evaluation of Using Class Companion’s A.I. to Score FRQs

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply