ResourcesMusic TranscriptionAndrew Carlins9 min read

Can AI Transcribe Multiple Instruments at Once?

It is two questions in one. Reading many notes at once from a single instrument is something AI does well, especially on piano. Pulling several different instruments apart from one mix is the hard frontier. Here is what is really going on, and how to get readable notation today.

Whether AI can transcribe multiple instruments at once from a single recording

Part of our guide to transcribing a full band.

It is one of the first things people ask about AI transcription: can it take a full song, with drums and bass and guitar and vocals all going at once, and write out every part? The honest answer is more interesting than a yes or no, because the question is actually two questions wearing one coat, and they have very different answers.

It also comes down to a word people use loosely. Reading many notes at once from a single instrument is polyphony, and AI does it well, especially on piano. Pulling several different instruments apart from one mixed recording is multi-instrument transcription, and that is the hard frontier. People often say "multi-instrument" when they really mean polyphonic, which is worth untangling because the two have very different answers. Short version: polyphony is largely solved, and Songscription models are generally robust enough to isolate the instrument you want from the mix, so you transcribe one instrument at a time without needing to split stems first. Here is what is really going on, and how to get readable notation from a multi-instrument recording right now.

Two Different Questions in One

When people say "multiple instruments at once," they usually mean one of two things, and the difference is everything.

  • Many notes at the same time from one instrument, like a thick piano chord. This is polyphonic transcription, and it is a problem AI has gotten very good at.
  • Many different instruments at the same time in one mixed recording, like a full band. This requires un-mixing the recording into separate sources before transcribing, which is hard.

Conflating the two is what makes the topic confusing. A tool can be excellent at the first and still find the second difficult, because they are not the same skill. Our explainer on monophonic versus polyphonic transcription covers the first axis in detail.

Many Notes at Once: Polyphony

Reading a stack of notes that sound together is polyphonic transcription, and it is where modern AI shines. A piano playing a five-note chord with an inner line moving underneath is exactly the kind of dense, simultaneous music that used to take a trained ear ages to pick apart, and a good model reads it cleanly. So if your "multiple" means the rich, many-voiced playing of a single piano, the answer is a confident yes. Our deep dive on polyphonic piano transcription covers how that works and why piano is the strongest case for it.

Many Instruments at Once: The Harder Problem

A full-band recording mixes several instruments into one stream of sound, and their frequencies overlap and mask each other. To write each instrument on its own staff, a system first has to separate the mix back into its parts, then transcribe each one, then decide how to lay them out. Each of those steps adds error, and the masking means some detail is lost in the mix. This is why no tool turns an arbitrary full song into a flawless, fully separated orchestral score at the press of a button. It is an active research frontier, not a solved problem, and any honest tool will tell you so. What you can reliably get is covered next.

The Practical Workflow

There are two dependable ways to get readable notation from a multi-instrument recording today, and you pick by what you actually want.

  • Want separate parts? Transcribe each instrument on its own with the model that matches it. The models are generally robust enough to isolate the instrument you want from the full mix, so you do not need to split the song into stems first. On an especially dense recording, optional stem separation can make a part cleaner still.
  • Want one playable score? Transcribe the recording to a single condensed part, usually a piano reduction that gathers the melody, harmony, and bass onto a grand staff. Faster, and often all a player needs.

Our guide to transcribing multi-track audio to sheet music walks through the part-by-part route, and transcribing a full band recording covers the condensed-score route. Either way, you start from any recording with audio to sheet music.

Which Instruments Songscription Reads

Songscription transcribes with per-instrument models. Piano is the most mature and handles dense polyphony best. Newer models cover guitar, bass, and drums, along with melodic lines like trumpet, saxophone, and violin, while vocals remain experimental. The throughline is focus: the models are generally robust enough to isolate the instrument you want from a full mix, so transcribing one part at a time reads more accurately than asking one model for the whole band at once. You do not need to split the song into stems first, though optional stem separation can sharpen a part on an especially dense recording. For a single melodic part, our guides to transcribing a bass line and drums show the per-instrument approach in action.

Frequently Asked Questions

Can AI transcribe multiple instruments at once?

It depends what you mean. AI handles many notes at once from one instrument very well, which is what polyphonic transcription does on a piano. Pulling several different instruments apart from one mixed recording into separate written parts is much harder, and it is the frontier of the field. The practical answer today is to transcribe one instrument at a time. Songscription models are generally robust enough to isolate the instrument you want from a full mix, so you do not need to split the song into stems first. You can also transcribe the whole thing to a single condensed part like a piano reduction, which is faster when you just want something playable.

What is the difference between polyphonic transcription and separating instruments?

Polyphonic transcription is reading several notes sounding at the same time from one source, like a piano chord, and writing them all down. Source separation is splitting a mixed recording into its individual instruments, the vocal, the bass, the guitar, before any transcription happens. They are different problems: one is about hearing a stack of notes, the other is about un-mixing a recording. Multi-instrument transcription usually needs both, which is why it is harder than transcribing a single clean instrument.

How do I transcribe a song with several instruments?

Two practical routes. If you want separate parts, transcribe each instrument on its own. Songscription models are generally robust enough to isolate the part you want from the full mix, so you do not need to split the song into stems first, though optional stem separation can help on an especially dense recording. If you want one playable score, transcribe the recording to a condensed arrangement, typically a piano reduction that gathers the melody, harmony, and bass onto a grand staff. Choose by your goal: distinct parts for a score, or a single part to play.

Which instruments can Songscription transcribe?

Piano is the most mature model and handles dense, polyphonic playing best. Newer models cover instruments like guitar, bass, drums, and melodic lines such as trumpet, saxophone, and violin, with vocals still experimental. These models are generally robust enough to isolate the instrument you want from a full mix, so you can transcribe one part at a time without splitting the song into stems first. On an especially dense recording, optional stem separation can make a part cleaner still.

About the author

Andrew Carlins

Written by

Andrew Carlins

Co-Founder & CEO, Songscription

Andrew co-founded Songscription at Stanford with a few fellow musicians who were tired of not finding the notes to the songs they wanted to play. He grew up playing piano and baritone saxophone and performing in musical theater, and though he hasn't performed in years, he likes to think he's still pretty sharp. He writes about getting a song off the recording and onto the page.

More about the team

Keep exploring more posts on the same topics.