After the Mix: Encoding and Delivering Dolby Atmos Music

You have completed your Dolby Atmos music mix, bounced it to ADM – either in the Dolby Atmos® Renderer or from Pro Tools – and delivered it to AvidPlay. But what happens next? How is your mix delivered to the consumer? How does the delivery method impact the way your mix sounds to the end user? Knowing the answers to these questions will help you to make informed decisions during the mixing process. This is what I want to address in this blog.

Delivery

There are two ways of delivering a Dolby Atmos mix to your ears: using speakers (whether that's a discrete array or a soundbar) or headphones. Knowing more about how each of these works will help us to understand the difference between encoding processes and, in turn, help our approach to mixing music for Dolby Atmos.

Speakers can reproduce sounds discretely; there is no need to emulate anything for the immersive experience. However, delivery on headphones is quite a different case. In order to deliver an immersive mix on headphones, it will need to be rendered to binaural.

Here is a very comprehensive description of the fundamentals of Binaural Audio.

There are two main codecs that are designed for each of these two playback formats: EC-3 for speaker delivery and AC-4 IMS for headphone delivery. However, it is not as simple as that, as we will see later. We want to be able to check our mix to hear it as close as possible to the way that the consumer hears it—we'll come back to later in the article.

Before going any further, it would be a good idea to understand why we need to compress data in the first place.

Why reduce data size?

It's important to reduce the data size being delivered to the consumer. The raw data stream would be too much. If we consider the Dolby Atmos mix of a four-and-a-half-minute song, the file size will end up being between 1.8 and 2.5 GB, depending on the number of objects used. How much bandwidth would be required to transmit a Dolby Atmos data stream? Using a simple equation [ 48000*24*128/(1024*1024) ] you end up with a data rate of 140.625Mbps. It would be difficult to deal with this bandwidth for audio streaming, which is why we need to reduce the amount of data that we transmit. There are two steps to achieving this: clustering and encoding.

Clustering

The first step to reducing the data stream is clustering. Clustering is used in the encoding process to reduce the amount of data used by Objects and beds. However, you are also able to monitor your mix with clustering by enabling Spatial Coding Emulation in the Dolby Atmos Renderer.

The principle behind clustering is to intelligently group objects that occupy similar spatial positions into groups, called spatial object groups. Spatial object groups are a composite set of the original audio objects. It’s possible to do this without having a detrimental effect on the overall sound of the mix because the typical consumer Dolby Atmos setup has far fewer speakers compared to a cinema. The number of elements to be monitored is set in the preferences of the Dolby Atmos Renderer when Spatial Coding Emulation is enabled. There are three values associated with this: 12, 14 and 16. The choice between these values is usually decided based on the bit-rate at which it is transmitted. If you’re not sure, then use 16 as that is what is used by most streaming platforms.

newimage1

The diagrams below should help you to understand how clustering is implemented. The image on the left shows the objects in blue and the bed positions in red. There are 10 objects, nine bed channels, and an LFE channel. If we were to assume a cluster number of 12, you can see on the image on the right how they are aggregated together. Some of the objects are grouped and some are shared between clusters. Thus, we can reduce the total tracks from 20 to 12. On a technical note, the LFE channel is left untouched without having a positional cluster and so the usable clusters are 11 + the LFE).

newimage2

Monitoring your mix with Spatial encoding emulation enabled is important. It will let you understand how object size, position, number of elements, and so on, will affect the sound of the encoded mix. One example of this is that, if you increase size of an object so that it is more than around 20, then the same object could appear in more than one cluster or there could be decorrelation artefacts, which in turn will skew the sound of your mix. You must also ensure that you do not enable the emulation until all the mix elements are present, as the clustering is based on object, content, position, loudness etc. Enabling the option without all the mix elements present will not give a true picture of how the mix will sound. Finally, it’s important to bear in the mind that this Spatial Encoding Emulation is for monitoring only—the clustering is not exported to your ADM or Dolby Atmos master file.

Encoding

Encoding is where we take the clustered signals and encode them to reduce the file size. Let’s look at the encoded formats to understand why it is important for us engineers to know about it. We will focus on the two most used codecs: AC-4 and EC-3.

AC-4 IMS – Delivery Format for Headphones

AC-4 is an audio codec for traditional channel-based content, immersive channel-based content, object-based immersive content and for audio supporting personalization. It supports object-based content as discrete objects or as Spatial Object Groups as we discussed above. AC-4 handles objects using a method called Advanced Joint Object Coding (A-JOC) where the mix is first downmixed to a 7.1 version and the object details are added as metadata. This is then decoded at the playback stage.

AC-4 is the codec that is used to deliver your Dolby Atmos music to Android devices over streaming platforms. AC-4 can also carry the binaural metadata that we create during a Dolby Atmos Music mix. This means that when your mix is played over headphones, the binaural properties set during the mix process will be heard by the listener.

EC-3 (or Enhanced AC-3) – Delivery Format for Speakers

EC-3 uses a slightly different method to handle objects called DD+JOC (Dolby Digital Plus, Joint Object Coding). EC-3 is used to deliver your Dolby Atmos mix to Apple devices.

On an Apple TV 4K the audio is transmitted over HDMI for an Atmos-enabled soundbar or AV receiver. However, EC-3 is also used for headphone delivery on the Apple iPhone, even though it is a format that is designed for speaker delivery. An iPhone doesn’t use the Dolby Atmos Binaural settings that are baked into the ADM file that we created. Instead, it creates a binaural version of the mix by first downmixing the Dolby Atmos file into a 5.1.4 mix and then virtualizing that 5.1.4 mix into a binaural mix. In fact, this processing is done in the AirPods themselves.

This is really important information to keep in mind when auditioning Atmos Mixes for Apple Music. If you want to hear how your mix will sound on an iPhone you will need to follow the steps below:

Export an MP4 from the Dolby Renderer: after recording the master on the Dolby Renderer, go to File > Export Audio > MP4 and choose the setting for Music, then click ok for export.
Transfer this MP4 to your Apple device and save it in the Files app.
Play back the MP4 in the device from the Files App and monitor using AirPod Pro or AirPod Max headphones—make sure that you disable head tracking.

Conclusions

In summary, these are the points that you should bear in mind during the mixing process:

The Near, Mid and Far parameters in a binaural mix are only utilized by the AC-4 codec that is used for delivery on Android devices.
EC-3 is a speaker-based format utilized by Apple devices and the binaural parameters encoded into the ADM will not be used during playback. This can be tested by exporting an MP4 from the renderer and following the steps above.
Increasing the size value of an Atmos Object beyond 20 should be avoided because it can cause issues with the spatial coding process.
Turn on Spatial Coding Emulation only when you have all the elements of the final mix.

How does all of this influence your mixing process? Obviously you want to hear your mix as the consumer hears it, as far as possible. You can emulate the effect of the clustering process within the Dolby Atmos Renderer itself. However, whilst you will hear your mix through the EC-3 codec using the MP4 method described above, you cannot easily emulate the sound of the AC-4 codec. Experience will help in this regard and as you mix more Atmos content, you will better understand how well your mixes translate to different platforms. To give you an idea of how I work, I like to start the mix on headphones as I know that is how most people will hear it. I regularly send the mix to my phone to check it. I also use the calibration EQ on the MTRX Studio to help to emulate the sound of different types of headphones.

Thank you for taking the time to read this article and I hope that it is useful for you as you create your Atmos mixes. I’d also like to thank my colleague Dave Tyler for his contributions to this blog. Should you want to learn even more about mixing in Atmos, Avid even offers a Dolby Atmos Training and Certification Program.

Dolby Atmos is a registered trademark of Dolby Laboratories.

Sreejesh Nair

I am a Pro Audio Solution Specialist with Avid and an award-winning re-recording mixer. I have worked on more than 200 films in various languages in my career, from mono to Dolby Atmos. More than 1/3 of my life has been cinema and I have great joy in sharing my techniques with everyone.