Walrus, Carpenter, Robots (no Oysters)

March 24, 2021 10:07 PM

Experimenting with vocal synthesis.

Made using Alter/Ego vocal instrument plug-in by Plogue. Sadly, Plogue has abandoned development and support for Alter/Ego, but it is still available on their website as a free download.

For lyrics I used (some of) the text from Lewis Carroll's "The Walrus and the Carpenter" -- all of the parts related to oysters got left out.

I'm pretty happy with the way it turned out -- the vocals wouldn't fool anyone into thinking that they were actually sung by humans, but I was able to get closer than I had expected when I began playing around with the program. It took a bunch of time tweaking settings and playing around with MIDI control codes and X-SAMPA (a version of International Phonetic Alphabet that can be done all in ASCII characters) to get there, though.

I used this the soundtrack of the video SILICON in my ongoing Patchouli Project. You can access all of the project videos at my Vimeo showcase page or through my personal homepage (previously posted to MetaFilter Projects).

posted by TwoToneRow (6 comments total) 2 users marked this as a favorite

Nice! I agree, this is a really cool plug-in, especially for the price.
posted by The Vice Admiral of the Narrow Seas at 9:07 AM on March 27


This sounds lovely. I may be gullible, but I would have been fooled. Christmas vibes. Apparently there is an oyster bar in my city called The Walrus and The Carpenter.
posted by Corduroy at 10:21 AM on March 28


Thanks for the compliments, y'all. Nice to know people are listening. Please check out some of my other stuff on Vimeo.

In case anyone here wants to play around with the Alter/Ego plugin: one little trick I found that helps make its voices more realistic is to track each voice twice -- once with the "humanize" parameter set to zero, then a second time with it turned up enough to introduce some randomness in the track's pitch, timing, and vibrato. When the two (slightly different) tracks are panned hard left and right in the mix, you get a nice doubling effect that smooths out some of the robotic character. Reverb helps a lot too. On this particular tune, I think it also helps that there are both male and female voice parts singing harmony parts because each "singer" has their own timing and dynamics, like real people do. There are also some synth "doo" voices (a Roland JV-1080 patch) way in the background that help with blending, kind of like smearing Vaseline on a camera lens for a glamour photo.

It's interesting that you hear a Christmas vibe, Corduroy. I was trying for a sort of "early music" feel with the alternating of baroque recorder sections with the vocal parts.

A compositional weakness that I have is a tendency to write melodies using my fingers at the keyboard or guitar instead of actually singing them, because I am a truly horrible singer. For this tune, though, I made a real effort to find a melody for the text by singing the words mentally (with no instrument in reach) until I had a defined tune in my head for the first verse and then working it out on the keyboard. I really hope that I didn't end up plagiarizing an existing tune, but would not be surprised to find that I did. Verses after the first came from applying some contrapuntal trickery to the start of the tune.
posted by TwoToneRow at 1:19 AM on March 29


This is so fascinating. I actually love the sounds of the synthetic voices way more than I thought I would.

Two listens down, I think that some of the xmas feel comes from the way the vocals stick so carefully to the beat? That's a real staple of carols (and hymns?) - they're always sung really straight. In folk/pop etc, the vocalists tend to play around with rhythms and phrasing a lot more.
posted by greenish at 6:06 AM on March 29


Main point, I love this as it is! Lesser point and statin' the bleedin' obvious, there's a lot of heavily-processed human vox out there in The Culture so this is less obviously artifice than it might be. Convergence!

(admittedly I have a tin ear too, but please don't let that detract from my point above)
posted by I'm always feeling, Blue at 1:23 PM on March 29


Your theory makes sense, greenish, and speaks more to my limited vocal abilities than to any deficiencies with the software tool itself.

To create a vocal part with Alter/Ego (and I assume other vocaloid-type engines as well), you create a MIDI track with pitch and timing data, and a separate text file containing the lyrics. It turns out, however, that the note-on and note-off timing content in the midi track ends up being only a loose approximation to the actual audio that gets generated by the Alter/Ego plugin during recording. One reason for this is that the plugin works in realtime during recording, so processing overhead generally tends to make the vocal parts drag behind the beat. What's more, the amount of drag is influenced by the phonetic content of the text as well. Text like "shoes and ships and sealing wax" is tricky because the software positions the beginning of each word's sound based on the midi note position, so "shoes" and "ships" seem way behind the beat because of the long "sh..." sound at their start (that our ears kind of tune out) compared to the much shorter "s.." at the start of "sealing wax". Also, the pitch of the note being sung seems to affect the sound and duration of some of the phonemes as well. For example, the female voice synth seemed determined to pronounce "shoes" as "juice" at the lowest pitch the word occurred in the song but correctly everywhere else. Similar issues come up with other phonemes, like "th", "wh", "ng", etc.

The end result of all this is that I ended up having to go note-by-note through the vocal parts in my DAW's midi editor (I use Reaper, by the way), nudging each note's start time and duration one way or the other until I was satisfied with the result. That's the rub -- the only technique that I have to make these decisions is to sing the parts to myself while trying to line up the audio with what I'm hearing in my head. The problem, of course, is that the synthesized voice ends up sounding like what I would have sung myself (if I could actually sing on pitch, which I generally can't, at least not out loud).

Bottom line, though, is that I had fun experimenting with this, and learned a few new things along the way, which is the point after all, right?
posted by TwoToneRow at 8:30 PM on March 29


« Older The Ballad of Sutton Picklestein   |   Blossom Jam III - 202 Newer »

You are not logged in, either login or create an account to post comments