Audio multi-channel routing and mixing using alsalib

ALSA logoRecently, one of our customers designing an embedded Linux system with specific audio needs had a use case where they had a sound card with more than one audio channel, and they needed to separate individual channels so that they can be used by different applications. This is a fairly common use case, we would like to share in this blog post how we achieved this, for both input and output audio channels.

The most common use case would be separating a 4 or 8-channel sound card in multiple stereo PCM devices. For this, alsa-lib, the userspace API interface to the ALSA drivers, provides PCM plugins. Those plugins are configured through configuration files that are usually known to be /etc/asound.conf or $(HOME)/.asoundrc. However, through the configuration of /usr/share/alsa/alsa.conf, it is also possible, and in fact recommended to use a card-specific configuration, named /usr/share/alsa/cards/<card_name>.conf.

The syntax of this configuration is documented in the alsa-lib configuration documentation, and the most interesting part of the documentation for our purpose is the pcm plugin documentation.

Audio inputs

For example, let’s say we have a 4-channel input sound card, which we want to split in 2 mono inputs and one stereo input, as follows:

Audio input example

In the ALSA configuration file, we start by defining the input pcm:

pcm_slave.ins {
	pcm "hw:0,1"
	rate 44100
	channels 4
}

pcm "hw:0,1" refers to the the second subdevice of the first sound card present in the system. In our case, this is the capture device. rate and channels specify the parameters of the stream we want to set up for the device. It is not strictly necessary but this allows to enable automatic sample rate or size conversion if this is desired.

Then we can split the inputs:

pcm.mic0 {
	type dsnoop
	ipc_key 12342
	slave ins
	bindings.0 0
}

pcm.mic1 {
	type plug
	slave.pcm {
		type dsnoop
		ipc_key 12342
		slave ins
		bindings.0 1
	}
}

pcm.mic2 {
	type dsnoop
	ipc_key 12342
	slave ins
	bindings.0 2
	bindings.1 3
}

mic0 is of type dsnoop, this is the plugin splitting capture PCMs. The ipc_key is an integer that has to be unique: it is used internally to share buffers. slave indicates the underlying PCM that will be split, it refers to the PCM device we have defined before, with the name ins. Finally, bindings is an array mapping the PCM channels to its slave channels. This is why mic0 and mic1, which are mono inputs, both only use bindings.0, while mic2 being stereo has both bindings.0 and bindings.1. Overall, mic0 will have channel 0 of our input PCM, mic1 will have channel 1 of our input PCM, and mic2 will have channels 2 and 3 of our input PCM.

The final interesting thing in this example is the difference between mic0 and mic1. While mic0 and mic2 will not do any conversion on their stream and pass it as is to the slave pcm, mic1 is using the automatic conversion plugin, plug. So whatever type of stream will be requested by the application, what is provided by the sound card will be converted to the correct format and rate. This conversion is done in software and so runs on the CPU, which is usually something that should be avoided on an embedded system.

Also, note that the channel splitting happens at the dsnoop level. Doing it at an upper level would mean that the 4 channels would be copied before being split. For example the following configuration would be a mistake:

pcm.dsnoop {
    type dsnoop
    ipc_key 512
    slave {
        pcm "hw:0,0"
        rate 44100
    }
}

pcm.mic0 {
    type plug
    slave dsnoop
    ttable.0.0 1
}

pcm.mic1 {
    type plug
    slave dsnoop
    ttable.0.1 1
}

Audio outputs

For this example, let’s say we have a 6-channel output that we want to split in 2 mono outputs and 2 stereo outputs:

Audio output example

As before, let’s define the slave PCM for convenience:

pcm_slave.outs {
	pcm "hw:0,0"
	rate 44100
	channels 6
}

Now, for the split:

pcm.out0 {
	type dshare
	ipc_key 4242
	slave outs
	bindings.0 0
}

pcm.out1 {
	type plug {
	slave.pcm {
		type dshare
		ipc_key 4242
		slave outs
		bindings.0 1
	}
}

pcm.out2 {
	type dshare
	ipc_key 4242
	slave outs
	bindings.0 2
	bindings.0 3
}

pcm.out3 {
	type dmix
	ipc_key 4242
	slave outs
	bindings.0 4
	bindings.0 5
}

out0 is of type dshare. While usually dmix is presented as the reverse of dsnoop, dshare is more efficient as it simply gives exclusive access to channels instead of potentially software mixing multiple streams into one. Again, the difference can be significant in terms of CPU utilization in the embedded space. Then, nothing new compared to the audio input example before:

  • out1 is allowing sample format and rate conversion
  • out2 is stereo
  • out3 is stereo and allows multiple concurrent users that will be mixed together as it is of type dmix

A common mistake here would be to use the route plugin on top of dmix to split the streams: this would first transform the mono or stereo stream in 6-channel streams and then mix them all together. All these operations would be costly in CPU utilization while dshare is basically free.

Duplicating streams

Another common use case is trying to copy the same PCM stream to multiple outputs. For example, we have a mono stream, which we want to duplicate into a stereo stream, and then feed this stereo stream to specific channels of a hardware device. This can be achieved using the following configuration snippet:

pcm.out4 {
	type route;
	slave.pcm {
	type dshare
		ipc_key 4242
		slave outs
		bindings.0 0
		bindings.1 5
	}
	ttable.0.0 1;
	ttable.0.1 1;
}

The route plugin allows to duplicate the mono stream into a stereo stream, using the ttable property. Then, the dshare plugin is used to get the first channel of this stereo stream and send it to the hardware first channel (bindings.0 0), while sending the second channel of the stereo stream to the hardware sixth channel (bindings.1 5).

Conclusion

When properly used, the dsnoop, dshare and dmix plugins can be very efficient. In our case, simply rewriting the alsalib configuration on an i.MX6 based system with a 16-channel sound card dropped the CPU utilization from 97% to 1-3%, leaving plenty of CPU time to run further audio processing and other applications.

Author: Alexandre Belloni

Alexandre is Bootlin's co-owner and COO. Alexandre joined as a kernel and embedded Linux engineer in 2013, and became co-owner and COO in 2021. More details...

16 thoughts on “Audio multi-channel routing and mixing using alsalib”

  1. Good Day,

    Thank you for your post it really helped me understand how to configure an 8 channel Input USB Audio Interface to expose each input channel as a separate virtual “device”.
    I did not however fully understand the statement “channel splitting happens at the dsnoop level” and the example given.

    In my usecase the device only allows access to the hw once. This would mean if I used “hw:0,0” notation I would only be able to run one “arecord” command. The second command would fail with an access error.

    For this reason the bindings method did not work but the ttable method worked.

    Is this a big concern or would I be okay?

  2. Hi!

    I am a composer. I exclusively write orchestral. This needs special care. In a nutshell, 2 channel stereo for orchestral rendition is a huge letdown. Recordings with live orchestras are done on 24-36 independent channels/tracks (analog tape) and sometimes much more due to digital technology (on tapeless digital recorders). Nevertheless, no matter how many recorded channels in the file, all of the tracks are condensed into 2 stereo channels during mixing sessions. On consumer equipment the sound usually is atrocious in high dynamic situations, ex. violins harsh in forte, thread-like in piano, full mixes’ chords lack transparency (one hears chords but not the component sounds), etc. Long story short, I am looking at linux software packages that have multichannel outputs for 5.1 and up, and audio equipment (AV receivers with better DA conversion, multichannel amplifiers, etc.) There already are multichannel DAWs (Digital Audio Workstations with large amounts of MIDI tracks) but I don’t know which of the current Linux operating systems have the best multichannel outputs, that is, at least 8 INDEPENDENT channels that will separately transmit orchestral compartments, i.e. high strings, low strings, high wood winds, low woodwinds and horns, trumpets, low brass, percussion, a total of eight tracks out to 8 channels and 8 (pre)amplifier inputs. The OS would be complete, that is, for a composer on deadline there simply is no time for programming and fishing for plugins, mods, technical artifices. I’d build the system and go ahead and write the music. This is what I am looking for. By far no offense, but no time for geeking here. Multichannel is already 11.2 for the cinema, 5.1 is ancient, 7.1 routine, but software doesn’t seem to keep pace, not by a long shot.

    Thank you for reading, John.

    1. Hello,

      Any recent Linux distribution is able to handle an arbitrary number of input and output channels, provided the hardware exposes them properly. We have ben routinely working on 8 to 16 channel sound configuration for our customers. Regarding software, all the common tools support many channels. downmixing should not just happen.

  3. Hi,

    Great article! I too am using imx (6,7d,8) platforms for headless audio streamers but there’s always been one unresolved headache that is still causing problems.. It’s regarding a term called ‘convenience switching’ where multiple applications like MPD, Roon, Bluetooth, Airplay etc all need direct raw access to the same PCM but only one at a time. A bit like the rotary source select switch on an old audio amplifier. In use all applications are seen on the network but as the user connects and uses one ( i.e. Roon ) the other embedded music services must yield the PCM if they own it. Dmix and other audio mixers cannot be used as very high audio rates are expected hence using hw:x,x not plughw or anything else.

    Any suggestions how the PCM output could be simple switched? I’ve tried Jack and various other mechanisms but that just added more complexity when dynamically switching music sources.
    The only other way I considered is a custom version of the snd usb driver ( it’s only ever USB PCMs ) which creates 5 duplicate sound cards from the one physical PCM and using an ioctl of something switch the internals (buffers in/out ) in a rotary switch fashion depending on source request.

    Cheers..

    1. There is indeed no support for dynamically changing PCM routes in alsalib so I’d say the solution would be to either ensure only one of the app is opening the PCM at any time or installing a sound server. The popular options are indeed jack and pulseaudio but pipewire looks very very promising and I plan to use that exclusively from now on.
      Last option would be to write your own alsa plugin to do that, this would certainly be cleaner the functionality in the kernel.

  4. Hi there
    Very interesting information. I am currently working on a 48-Channel recorder, based on an AES50 > USB Interface to Raspberry Pi 4B to 480GB SSD Kingston. Programming is ready, using SoX. In Python this looks like (snippet of the code which takes care of starting the recorder):
    my_env = os.environ.copy()
    my_env[‘AUDIODRIVER’] = ‘alsa’
    my_env[‘AUDIODEV’] = ‘hw:KTUSB,0’
    ledButton[“text”] = “Turn Recorder off”
    filepath = “/media/pi/RECORDER1/Aufnahmen/”
    filedt = datetime.datetime.now().strftime(“%Y%m%d-%H%M%S”)
    filename = filepath + filedt + “_recording.caf”
    subprocess.Popen([‘rec’, ‘-q’, ‘–endian’, ‘little’, ‘–buffer’, ‘4096000’, ‘-c’, ’48’, ‘-b’, ’24’, filename], env = my_env, shell = False, stdout=subprocess.PIPE)

    We are recording in .caf files as we had trouble to use .wav for reason of the 4GB limit (over 4GB the header becomes corrupt, the duration is not reported correctly).

    I am fighting with one issue and that is buffer-overrun. Seems to me that writing to disc sometimes suffers of other cpu-activity on the Pi. In a 90 minutes recording of 48 Channels we got around 10 dropouts. Filesize approx. 36 GB. I am looking for assistance here.

    A second thought is to equip the recorder with a 7″ touch and have on this screen not only start/stop recorder, but also VU Meters for all channels. Here might come in your further developments of ALSA, where we might be able to route the incoming audio not only to the SoX rec-function, but also to the VU-Meter. We are in contact with peppymeter https://github.com/project-owner/PeppyMeter.doc/wiki to see if we can work together to build a 48 channel VU Meter. The current peppy-meter works only during play and is a 2-channel solution.

    Would you be willing to help – conditions willing to speak about, as this might become a commercial product.
    Regards, Rudolf

    1. Hi Rudolf.

      Concerning writing to disc, the following may help.

      Consider increasing the RAM size substantially and creating a RAM disk. This is done with one simple command.

      RAM disc acts like a HD, but exists in RAM only. It is volatile storage, very quick, and I expect writing to it places less load on CPU and other parts of the system.

      Once you get the job done, simply copy the file from RAM disc to non-volatile storage like HDD or SSD.

  5. Great article.
    Splitting outputs was exactly what I was looking for!
    There is a small error in your config though:

    pcm_slave.outs {
    pcm “hw:0,0”
    rate 44800
    channels 6
    }

    Rate 44800 does not exist, and this made things not work at first.
    Changing it to 44100 (or 48000) solved the problem.
    Thanks again.

  6. Hello!

    Thanks for the article. It really helped me to understand things about configuring ALSA devices (which seems a topic without introductory or easy-to-find documentation).

    I’m working on a embedded system with a sound card of 8 channels that are filled by different chips, so this method will be useful for us. I would like to ask if you could explain how these new virtual devices can be exposed through a sound server like pulseaudio or pipewire. I’ve been trying to use “load-module module-alsa-source device=device_name” but that doesn’t work. Could you point me to the right direction?

    Thanks again for the audio articles!

  7. Hi,

    hope this is the right place to ask this. I have an issue on my embedded linux device where sound on the headphone that is only supposed to be played on the left channel is also played on the right channel (quieter) and vice versa. Could that be fixed by using dsnoop, dshare or dmix?

    1. This seems to me that you have some crosstalk. If the stream you play has left and right well separated, this can be caused by a mismatch in the audio format between your CPU DAI and your CODEC DAI (for example left justified versus i2s). Else, the hardware is causing that.

  8. Hello and thanks for the article. Hope you can help with my question.
    I do not understand something.
    I have simple USB audio dongle and “aplay -l” and “arecord -l” give the same output:

    pi@box1:/home $ arecord -l
    **** List of CAPTURE Hardware Devices ****
    card 1: Device [USB Audio Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
    pi@box1:/home $ aplay -l
    **** List of PLAYBACK Hardware Devices ****
    card 1: Device [USB Audio Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
    pi@box1:/home $

    It has mono capture and stereo playback !
    My Linux software expects stereo mic. how would I convert this mic to stereo ?

    Also when I say hw:1,0 – does it refer to playback device or capture device ?
    How would I change channel bindings for capture device and leave them intact for playback device if they both are the same “hw:1,0” ?

    P.S. I use only ALSA without X-server on Raspbian Lite OS.

  9. Hey,
    this is one of the best articles describing the configuration of ALSA! Thanks for your great work. However, I have some trouble understanding the binding option for the audio output.
    It says
    bindings.0 2
    bindings.0 3
    for out2. Could it be possible, that there is a typo in the config and it actually shall rather be
    bindings.0 2
    bindings.1 3
    Same would apply for out3

    Cheers

Leave a Reply to Alexandre Belloni Cancel reply