🎵 Can AI Learn Music Theory Like a Human? Teaching ChatGPT, Claude, and Gemini with Step-by-Step Prompts

Artificial intelligence is becoming the ultimate multitasker—writing essays, coding apps, even generating music. But what about learning music theory the way a student would? Can Large Language Models (LLMs) like ChatGPT really understand intervals, scales, or cadences—not just generate them, but actually break them down, analyze them, and pass a real music theory exam?

Researchers Liam Pond and Ichiro Fujinaga from McGill University's Schulich School of Music set out to explore this very question. Their study dives deep into how we can “teach” LLMs complex topics like music theory using smart prompts—specifically, in-context learning and chain-of-thought reasoning. Spoiler alert: with the right strategy, these models can learn more than you might think.

In this blog, we’ll unpack how they turned robots into students, evaluated their progress using real exam questions from Canada’s Royal Conservatory of Music (RCM), and what it could all mean for the future of AI education tools.

🎓 Teaching Music Theory to AI: Why It Matters

Music theory isn’t just dry theory—it’s the grammar of music. It allows musicians to understand, analyze, and create music with intention. So naturally, if AI is going to meaningfully engage with or assist in music education, it needs to do more than spit out generic, auto-generated Mozart wannabe sonatas.

Until now, most AI in music has focused on generation—writing songs or recommending playlists—not understanding the foundations that human musicians spend years mastering.

This study takes a different path. It asks: What if we teach AIs the way we teach students? Could they eventually help us learn better, faster, and more affordably?

By using questions from a certified Level 6 RCM theory exam (which thousands of students take across North America), the research gives us real data on the capabilities—and limits—of top AI models in learning core music theory skills.

🧠 How Do You Teach a Robot? Two Key Strategies

The study used two core teaching techniques repurposed for AIs:

1. In-Context Learning (ICL)

Think of this as giving the model a mini class before asking a question. You don’t change or retrain the model itself—you just feed it a prompt that includes instructions and examples. The goal: teach it the rules by showing patterns.

Imagine asking, “What’s the interval between C and G?” But before that, you include:

“An interval is the distance between two notes. For example, C to E is a major third…”

This method lets the model learn on the fly, using just the information in the prompt. And thanks to newer models with massive context windows (up to 2 million tokens in Gemini 1.5 Pro!), you can include a ton of info while still keeping it “local” to the prompt.

But there’s a tradeoff: long prompts risk overwhelming the model. More details mean more decisions—so writing effective prompts is its own art.

2. Chain-of-Thought Prompting (CoT)

Here, instead of just asking a question and expecting an answer, you encourage the model to “think out loud”—and you train it to do so with step-by-step examples.

For example:

“To identify this cadence, I first notice the chord progression ends on V-I. That’s an authentic cadence...”

Showing worked examples, even just a few, helps the model mimic human-style reasoning—breaking problems into baby steps. This is especially helpful in complex multi-rule systems, like music theory.

Combined, these two techniques allow you to turn language models into learners capable of tackling unfamiliar, nuanced subjects.

🎼 Not Just Notes: The Role of Music Encoding Formats

Machines don’t read sheet music. They need everything encoded as data. The researchers tested four common music encoding formats:

ABC: Simple, lightweight, originally for folk music.
Humdrum: Great for detailed analytical work, used in computational musicology.
MEI (Music Encoding Initiative): Flexible and academic-focused, good for early/non-Western music.
MusicXML: The most widely adopted format, but geared toward Western notation.

Interestingly, while all can encode standard sheet music, some work better than others when fed into LLMs—especially based on how much exposure the models might have had to that data during training.

MEI turned out to be a winner—more on that below.

🎹 The Test: A Real RCM Level 6 Exam

Students studying music in Canada and the U.S. often take Royal Conservatory of Music exams as part of their musical journey. Level 6 includes topics like:

Key signatures
Intervals and scales
Chords and cadences
Transpositions
Music terms and history

These aren’t basic trivia questions—they require analysis, reading music, and applying theory.

The researchers asked ChatGPT, Claude, and Gemini these questions—both with and without context (examples and guides)—to see how much the models could figure out on their own vs. how much they could learn from the prompts.

All models were evaluated in each of the four encoding formats. Then their answers were reviewed for accuracy, just like a teacher would grade a student’s test.

📈 What Did the Results Show?

Here’s where it gets juicy.

Without any teaching (no context prompts), all models performed poorly—the best score was 52% (ChatGPT using MEI), far from the 60% passing mark for RCM exams.

But after being given chain-of-thought and in-context guidance?

🎉 Claude scored a whopping 75% using MEI—beating out both ChatGPT and Gemini and well above the human minimum pass grade.

Let’s break it down:

Model	Format	No Context	With Context
ChatGPT	MEI	52%	60% ✔️
Claude	MEI	44%	75% 🌟
Gemini	MEI	30%	52% ❌

Claude also scored 74% on both Humdrum and MusicXML when given full context. That’s the equivalent of receiving Honors on the exam.

Another key finding: Contextual prompts helped across the board, especially for topics like:

Intervals
Scales
Transposition
Cadences

These areas responded especially well to examples and worked solutions.

On the flip side, rhythm, rests, and chords remained tricky. Models had a hard time grasping the complex rules of note grouping and time signatures, even with examples. A likely reason? Rhythm involves a lot of hidden nuance, and possibly noisy or incorrect data from their training sources (think amateur sheet music floating on the internet).

Bit of a relief: all models—with or without context—nailed the music history questions. So at least they can memorize composers.

🛠️ Teaching AI Like a Music Student

What’s fascinating is that many techniques used to improve LLM responses mirror how we teach humans:

Give clear, focused explanations
Show worked examples
Start simple, then increase complexity
Encourage step-by-step reasoning

It turns out that the gap between human and machine learning might not be as wide as we thought—at least in structure, if not in cognition. When we treat the AI like a student, it acts more like one.

Even strategies like asking the model to “summarize what it understands so far before answering”—something teachers do with kids—helped these machines stay on track.

🌍 Real-World Impact: Why This Matters

Still wondering why you should care if ChatGPT knows when to use a G# instead of an A♭?

Here’s why this research is exciting:

For Students

Imagine having a 24/7 AI music tutor that walks you through theory problems, explains tricky concepts, and gives immediate, personalized feedback. Better than late-night YouTube spirals, right?

For Teachers

Educators could generate custom quizzes, interactive problem sets, and adaptive theory guides tailored to each student's level—all while saving tons of time on grading.

For Developers

Creating smarter music theory apps just got a boost. With refined prompting, developers can build tools that don't just quiz, but actually teach.

For AI Research

This is a case study in transferable learning. If we can teach AI music theory using human-style pedagogy, the same techniques might be usable across other difficult subject areas—math, logic, even ethics.

💡 Key Takeaways

LLMs like GPT-4, Claude, and Gemini can learn music theory—to a degree—through well-crafted prompts.
Context is king: prompting the model with examples and explanations (in-context learning + chain-of-thought) massively improves performance—up to 31%.
Claude outperformed the others, especially when using MEI and MusicXML encoding formats.
Not all tasks are equally teachable: while models grasp intervals and cadences well, they still struggle with rhythm and complex chord analysis.
Prompt design matters: effective educational AI isn’t just about the model—it’s about the way you teach it.
Music education is heading toward an AI-augmented future, with digital tutors and assistants that can scale high-quality instruction across time zones and socioeconomic boundaries.

Want to level up your own prompting game? Try designing CoT prompts for the topics you’re helping an LLM understand. Start simple, show step-by-step reasoning, and don’t be afraid to “teach” like a human tutor.

And if you’ve ever struggled through a theory class—well, now you can say you’ve technically got something in common with ChatGPT. 🎶

Let us know your thoughts: Could an AI music tutor replace (or at least assist) real ones? What topics do you want to see LLMs tackle next?