The future?

Soon maybe one can put all DIY amplifiers and speakers into good use :)

My question to ChatGPT:
Would it be possible to process music recordings with AI in order to separate for example vocals, drums and other instruments into separate channels?

ChatGPT:
Yes, it is indeed possible to use artificial intelligence (AI) techniques to process music recordings and separate different elements such as vocals, drums, and other instruments into separate channels. This process is known as source separation or audio source separation.
There are several AI-based algorithms and techniques that can be used for this purpose, including:
  1. Deep learning-based methods: Deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) can be trained on large datasets of mixed music recordings and their corresponding isolated tracks to learn how to separate different sources.
  2. Non-negative matrix factorization (NMF): NMF is a technique commonly used for audio source separation. It factorizes a mixed spectrogram into the product of two non-negative matrices, each representing the spectrogram of a separate source.
  3. Independent component analysis (ICA): ICA is another popular method for audio source separation. It aims to find a linear transformation of the observed audio signals such that the resulting signals are statistically independent.
  4. End-to-end deep learning models: Recently, end-to-end deep learning models have been developed that can directly separate mixed audio signals into their constituent sources without explicitly decomposing the signals into spectrograms or other intermediate representations.
These techniques have been successfully applied to various audio source separation tasks, including separating vocals from music accompaniment, isolating individual instruments from ensemble recordings, and separating drums from other instruments in mixed tracks.
However, it's worth noting that audio source separation is a challenging task, and the quality of the separation results can vary depending on factors such as the complexity of the audio signal, the quality of the recordings, and the specific algorithm or model used for separation.
 
Last edited by a moderator:
Interesting, already in use.

I have been thinking that for example vocals and bass drums would benefit from separate channels and dedicated speakers.
But very cumbersome to set up such a system and to adjust levels etc. Better to leave it as a thought only :)


Chris
 
I think someone already makes such gear...I'll refer you to Rick Beato's channel, a longtime music producer & music wiz... The video is "What makes this song great? Ep.71 BOSTON (#2)".
Rick dissects the song "More than a feeling" track by track & moment by moment...revealing the immense complexities within the first opening minute of this song.

---------------------------------------------------------------------------------------------------------------------------------------Rick...
 
I think someone already makes such gear...I'll refer you to Rick Beato's channel, a longtime music producer & music wiz... The video is "What makes this song great? Ep.71 BOSTON (#2)".
Interesting video and excellent music. Nice to hear real handclaps instead of digital. At least some things were better before :)

Maybe no reason to complicate more than an amp with 2 speakers. When you dig out the right music it sounds magic on the humblest of equipment.
 
The future is technology being all-pervasive. Cost going to the bottom on everything. Amplifiers that easily best what was commonly available in the 60s and 70s now a dime a dozen. Speakers, same. Bandwidth, both processing and interconnect, same. Storage; dime a dozen Gig.

It will be put to use, sooner than later, for those who really want something like the bass drum has its own cabinet; the bass guitar sounds like it's way over on the left because it is coming out of individual speakers placed where the player once stood. Fanatics will want this, us regular folk, maybe not so much. But in the Future, if you do, "yeah; we can set that up". Real time de mix into as many individual channels / optimized for the task speakers as you want to spend your money on.

Or, such an idea could fiz out like the old quad formats of the 70s. Nobody even wants speakers, too impractical; just earbuds. If you even have speakers, they're in little kitty cat ears you wear on your head, so you can annoy everyone around you. Instead of a PA at a dance, everyone wears those, all playing off the same BT source.

I for one once dreamed of hard location modulation in a quad sound field, note by note, particularly to the music of the likes of Zappa, Corea. Where a guitar player could easily drop a dive bomb into an echo canyon in just the right front; then do it again with a different chord into the left rear. Do a shimmering L-R panned tremolo starting directly behind you, go over your head and into the mix in front. They've given us some; the sound can track what's showing on the screen. But not as an integrated part of the music composition itself. Ok, delightful to hear when I was 19, at this point in life I'd probably find it annoying, as I do loud SPLs.

Invention takes an imagination and a big pile of junk. We have the big pile - biggest pile yet - of tech. Someone will imagine a use for it.
 
Yes of course, if becoming cheap and provided that quality is good, whats not to like. However I still have gear from the 60s that is pleasant to listen to. Even though noise and distortion levels are terrible with modern standards, old stuff sometimes seem able to offer some kind of presence or liveliness that's missing in new mass produced gear.

For entertainment purposes I am sure it would be interesting to experiment with separate channels and dedicated speakers. But in the long run it might be burdening to move around speakers depending on setup. For the masses I believe its still more prioritized with the visual appearance than how it sounds. Apparently stereo sound is not always of interest anymore since many owns a single bluetooth speaker.

But development is not stoppable so if quality sound could be made to come out of discrete equipment its win win. Until then I don't mind a hot heavy amp and ugly boxes :)
 
The only way audio processing software can be effective is when its transparent, not just "good enough"... thats how we ended up with MP3s for music.

It used to be that you could always count on anything used in professional studios to be useful and purpose intended with minimum performance compromises.

The one audio software plug-in which ended up becoming an overused novelty was Antares Autotune. This overpriced piece of software ruined pop music forever. I'd rather listen to those cheesy stuttering samples on late 80s electronic dance music than someone's voice sounding like a keyboard. And NO, Autotune is not to be confused with a vocoder, which is by far a more musical effect. Yes, AT is that bad, especially when the settings are pegged to make voices sound like sampled keyboard notes.

The gentlemen who designed AT was just interested in flexing his engineering muscle. He wasn't really an audio guy by trade wanting to improve our rotting and decaying music world (hence my point saying all this). I saw an interview with the inventor where he explains it and judging by his attitude, its pretty clear he cares more about the financial gains from his product than how its misuse has potentially cheapened the mindset and skill of seasoned and accomplished vocal talents.

The creation of AT mainly set out to prove that pitch correction could be easily done on the fly with relatively low CPU loads using simple algorithms and software code. Its essentially an audio plug-in written by someone who wasn't interested at all in music production. The concept was just thought up as a passing comment made by his wife, stating she wished having a way to fix the pitch of her singing voice. He also goes into bragging about how relatively simple his software was written and how much money its made him. Thats really all he was concerned about, not so much the SQ implications or concerns of its use, including the negative consequences it had on younger, more impressionable musicians looking for that next hot trend in music.

So over time, AT turned into an overused, awful sounding novelty. Its more useful in live settings, where it was originally intended to be used. No one ever uses AT in a subtle, conservative way, especially most amateur home studio guys. They will usully look up someone else's settings on YouTube and try to copy their settings or just cranking around hard on all the basic settings.

So just like the effect Autotune and MP3s had on the quality and sound of modern music production, other software can end up doing the same, even if the inventor didn't intend for his creation to be used as a cheap novelty effect. Its out of their hands as to how much their product gets overused, misused and abused. Just like a gun, it can be used for good or evil.

I have always been fascinated by sound isolation and restoration techniques. Its amazing what is possible restoring old or damaged recordings. The only way to consider it studio quality is to implement a puristic, low compromise approach in its development. Just calling it "good enough" is what potentially turns it into a novelty or gimmick, especially when the person who develops it doesn't really care so much about music.
 
Last edited:
  • Like
Reactions: 1 user
Member
Joined 2019
Paid Member
@profiguy Part of the point of mp3 was storage space. When mp3 was devised storage was orders of magnitude more expensive than today, bandwidth for transmission was also orders of magnitude slower.

Crushing music into smaller files made perfect sense, even with a loss of quality.

I still use OGG/Vorbis on my phone and can get all of my several thousand LPs and CDs on my phone to play at my leisure. When, eventually, I can get a phone I like with ~2tb storage I'll say goodbye to OGG :)
 
Member
Joined 2010
Paid Member
I find that the best bang for my buck is a mono sound system. I like the sensation of being in a live venue when listening to my system, and for me, a speaker built from pro-sound drivers (15" woofer and horn loaded compression driver) with a good amplifier and source does the trick. Speaker placement is not critical and no need for two channels of amplification. Or you can bi-amp with two amps. I have such a system for testing DIY amplification and it is very enjoyable.

A single pro-sound based speaker has good dynamics, is a single point source, and creates the atmosphere of live music. It's relatively low tech, but then I feel that the original mono records played with a proper mono cartridge with one speaker is pretty hard to beat.
 
would be cool for practising any instrument if it could be erased from the mix.
I wouldnt mind that being available either...

AutoTune - the culinary equivalent of refined sugar. Wish it wasnt in everything.

Wish I could code like that, easy. Zero idea of even how to put some fixed filter inside a "plug in", that I could load into foobar2000, or elsewhere and have it actually execute. Forget doing a UI.

I bought a cheapo pitch shifter effect. Singing with a 50-50 mix, I cant tell which one is me, so I'm all over the place trying to hit notes. I'd probably freak out if I sang one thing, heard myself spot on when I wasnt; cant imagine using such a thing live, how it would work.

I thought there was a popular video of some kid just making random sounds with his mouth through autotune and they're all coming out as true concert pitch notes. Cant even imagine having such a thing that could do that in real time, when I was...
 
I think there are already android apps that can remove vocals. I searched for it some time ago.
I find that the best bang for my buck is a mono sound system.
I needed mono material for something and found a violin recording from 56' I was very surprised how awesome it sounds in my bluetooth earphones. Like I can place the violinist and "see" the whole room he was in.
Cheers!
 
Most of the tech to playback instruments separately is already there. Just build a DTS or AC3 mix using the audio editors with good video packages. You get 6 channels to mix to and just place your 6 speakers where you want the band members to go. The .1 or 6th channel is perfect to place the kick and bass lows

I could do this around year 2000 with I think an Adobe suite
 
I highly doubt The Dead would use any form of digital signal processing if they were given the choice to do whats necessary with analog gear. Despite the lowest available latency these days, its still noticeable and likely the biggest crutch of live sound in the digital domain. Even smallest amounts can cause groove and timing issues, especially using miced acoustic instruments.

For example, Allen and Heath Dlive series consoles create about 2ms latency from input to output after typical channel strip DSP functions ie EQ, dynamics is about 2ms. Then add to that any USB interfaces, wireless systems for monitoring and instrument input, you get roughly a minimum of 7-10ms (verified through measurements) by the time the audio gets from the instrument, vocal, etc out to the FOH, IEMs and wedges. Several ms latency is very noticeable, especially as a musician counting on proper timing from analog sources. Yes, whatever goes directly into and out of the console itself is time aligned at the output, but any of the other extra A/D and D/A conversion from exterior sources skews it unless you further delay other channel strips to obtain synchronized audio. This is a potentially big issue for musicians who are used to the immediate output response of their acoustic instruments, vocals and analog sources. The main affected people are rhythm players, bassists, keyboardists and pianists. I'm a bassist, drummer keys player and can confidently say it messes with your timing and groove, especially when other players are affected by it at the same time. You can get somewhat used to this phenomenon as a skilled musician, but its kind of a downer and not very musically inspiring having to fight timing and latency, even smaller amounts, especially when the FOH and monitor engineers don't know how to approach the whole issues the right way, making the best judgment calls on delaying the individual sources.

As for the wall of sound approach by The Dead, its highly impractical to use the crude noise canceling techniques they did back in the day. The rate of complexity and equipment failure was high. All those big, heavy McIntosh, Phase Linear and other various custom made gear they hauled around always kept the sound techs on their toes. John Curl can tell you all sorts of stories about this, as he designed alot of the electronics they used in their console input / output stages and other critical FOH gear they used.

So basically speaking, digital based audio production is a blessing and curse, depending on your view on weighing out its pros and cons. Its just harder to get a decent, engaging live sound with digital gear from an organic, hands on perspective. I used to be a setup / install engineer for Allen & Heath, which was a real eye opener when it came to dealing with and teaching other techs. It was sad to find out actually how undereducated and inexperienced most newer engineers are in terms of raw sound reinforcement practices and being able to throw together a top notch sounding mix. Most don't have the ears for it to start with and barely know how to implement virtual plug-ins and rack gear simulations from a hands on approach, not having worked with older analog rack gear. Even basic compression, dynamics and EQ was a struggle. If it wasn't stored as a preset in the console library and covered in a YouTube tutorial, they couldn't set it up themselves. If a so called engineer doesn't even know how to sidechain compressors or trigger gates in the practical hands on analog domain, you'll be lost as a modern sound tech, even with the help and rescue of Youtube to cover your butt.
 
  • Like
Reactions: 1 user
@MrKlinky I have know quite a few, but of course not all do, and some don't care because of their instrument or style, so they don't listen to other musicians for the sake of precision.

Go ahead and try listening to a mono recording with more conplex rhythm parts and delay one channel by a few ms. Theres no way that's tolerable to work with on a professional level if there was a click track or midi sequenced parts.

When Sony put out their 3324 PCM multitrack recorders, some individual tracks would go out of sync by several ms. It drove most engineers who cared enough about doing clean work bonkers. You can hear it on some older CBS/Columbia 80s recordings. Some people refused to work with the Sony machines and went with the 3M recorders which didn't have the problem, plus they also sounded better.

Elliot Scheiner preferred 3M PCM tape machines because of interchannel latency and sync problems, as well as other little gremlins. He did most of Steely Dan stuff and Donald Fagen's Nightfly, which was tracked, mixed and mastered on the 3M machine). I would tend to believe he knows what he's doing. I learned quite a few things working with him a few times back east.

Roger Water's Amused to Death was mixed and mastered on 2 (loosly) synced 3324s. They constantly went out of sync despite all efforts ie using multiple methods of sync ie SMPTE blind tracks. Even though alot of amazing talent played on that album and it had already taken 2 years to work on, they came close to giving up because of the PCM machines doing weird crap. They were generating check sum errors, dropping frames randomly and it all made it wander out of sync by several ms over the course of the mastering process, as well as constant clicks, pops, noise and junk appearing further than 80+ dB down.

Barry Diament did some of the PCM mixdown and transfer work on the RW album. He explained it all to me regarding how just little channel sync errors are audible over a few ms. Thats why I know about it. Feel free to contact him yourself and ask personally. Just look up Soundkeepers and credits for some of the older Atlantic Records digital transfers from analog tape. I used to do similar work myself, so I understand.

A skilled ear can hear the small timing discrepancies I mentioned but most aren't skilled enough or just don't care... don't know why I sometimes bother taking the time sharing this stuff, since I usually get heckled when others insist their hearing can't pick up on it, so it looks like I'm just talking trash.
 
  • Like
Reactions: 3 users