Sound Guide for UHD and 4K 709 MediaRoom, part 2

The next section is dedicated to ‘Immersive sound, 3D sound or object-based sound’, which has burst into the world of cinema. The best known example is ‘Dolby Atmos’, but there are also other systems competing in this emerging market.

Immersive sound, 3D sound or object-based sound

The new paradigm in audio is immersive sound, 3D sound or object-based sound, which has applications in cinema exhibition in theaters, in virtual reality and also in the home consumption environment of audiovisual content. The new model is no longer based on the number of channels: immersive sound mixes place each sound source, called objects, in 3D space.

 

Schematic of immersive sound reception. Source: Newsbytes

 

The sound field perceived by humans is like a sphere and we are inside it listening to different sound sources or objects. With channel-based sound systems, creators have to think about the configuration that the end customer has (2.0, 5.1 or 7.1) to make the spatiality effects sound effective. But object-based audio systems offer more creative freedom because a spatial description of each object’s location is included in the metadata. From this metadata, the audio processor will have to adjust the mix based on the number of loudspeakers available and their position in each room. The processor will automatically determine how to use this metadata to create the most immersive sound field possible.

 

Dolby user interface for placement of sound objects in 3D space. Source: Dolby

 

Placement of side and top speakers in a Dolby Atmos installation. Source: Dolby

 

This flexibility, facilitated by object-based sound metadata, is accompanied by other innovations, including:

  • Speakers on vertical axis
  • Loudspeakers with higher capacity to reproduce the frequencies of the audible spectrum (20 Hz; 20 kHz)
  • Presence of subwoofers not only behind the screen but also in the rear and side areas of the room.
  • Specific dynamic range management for each playback environment
  • Increased accuracy of sound source panning
  • Increased accuracy and complexity of reverberations and delays

 

The object-based sound design makes the immersive audio experience possible with any speaker configuration, although, obviously, the more speakers, the better the immersive experience, whether in a showroom or at home(Home Cinema).

 

Home Cinema speaker configuration. Source: Auro 3D

 

A three-digit code is used to designate the different loudspeaker configurations. For example, in 9.2.4, the first digit (9) indicates the number of traditional loudspeakers, the second (2) the number ofsubwoofer speakers and the third the maximum number of height or vertical axis speakers.

 

 

Understanding the numbers: on the right with direct ceiling speakers and on the left using reflected sound. Source: Onkyo

 

In cymenatography there are now three competing immersive sound systems:

  • In 2012, Dolby launched ‘Dolby Atmos‘ with the film Brave (2012, Pixar, Mark Andrews & Brenda Chapman).
  • Two years later, in 2015, DTS entered the scene by introducing its object-based sound mixing system, ‘DTS X‘, for the time being more focused on consumer electronics, such as 4K Blu-ray.
  • The third performer is ‘Auro 11.1‘, from the company Barco, which specializes in digital projection for cinema. The first film to use Auro is Red Tails (2012, Lucasfilm, Anthony Hemingway).

 

The biggest difficulty for the implementation of these immersive systems is interoperability: each exhibitor must bet and choose one to install in their theaters. Fortunately, in the domestic environment, most components (UHD TV, AV receivers, multimedia players…) have codecs from several manufacturers at the same time.

Due to the diversity of sound installations in theaters and homes, the same film must be able to be heard in stereo, 5.1, 7.1 and immersive 3D sound. These conversions are based on the downmix concept.

Downmixing is the procedure by which a mix based on a multi-channel/speaker configuration is reduced to a smaller configuration, e.g. going from a 5.1 mix to a stereo mix. It is not only a question of repositioning the sound sources, but also of delays, volume reduction coefficients, dynamics and equalization. The downmix process has to guarantee that a movie will be heard faithfully in any home installation.

For the adaptation of the object-based sound to the different domestic reception scenarios, the metadata associated with the MXF sound file is used.

“The playback equipment, when it reads the MXF with the metadata, has to detect the processor and the distribution mode, and then, it does the spatiality and dynamics adjustment. For example, in downscaling from 64 sound channels to 16, there is a brutal dynamics conflict. If it is not done well, certain speakers can end up saturated. That spatiality adjustment and that second dynamics adjustment, is one of the most interesting things that ‘Dolby Atmos’, or ‘DTS-X’ or ‘Auro11.1’ bring.”(Sergio Marquez)

 

Metadata of a Dolby-prepared MXF file

 

Metadata from a DTS-ready MXF file

 

Dolby’s Dolby Atmos Premier certified sound studios can have up to 128 mapped channels for 64 simultaneous outputs. And thanks to the hardware they are equipped with, they can check in real time how the result looks in other smaller configurations such as 16 speakers, a 5.1 environment or even stereo.

In Spain there are still few theaters that have opted for immersive sound, but fortunately, the number is growing. The large chains, such as Kinepolis, Odeon or Warner, have opted for the Dolby proposal. Auro 11.1′ has very low levels of implementation for two reasons: the first is that it is the least immersive of the contending formats; and the second and main, is that practically only installed when you have previously purchased a DCI projector of the same brand, Barco.

 

Sound for movie theaters

The DCI standard for DCP servers is the technical reference used for digital cinema exhibition in theaters.

These are the basic recommendations made by Fernando Alfonsin in relation to audio for the creation of a DCP:

  • A DCP can include up to 32 PCM linear audio tracks.
  • The most common configurations are 5.1, 7.1 and stereo 2.0.
  • To create a DCP we need each channel to be delivered separately in a .WAV (mono) file with a resolution of 24 bits and a sample rate of 48 kHz or 96 kHz.
  • Once all the channels have been processed, an MXF file is created with them.
  • The multichannel sound, being PCM, does not require Dolby encoding and therefore saves on the cost of this license.

When mastering a film for DCP, the following multichannel audio configurations are possible:

 

Configuration Channels and format
Stereo 2 PCM letf, right
LCR 3 PCM left, center, right
5.1 6 PCM front L., C., front R., surround L., surround R. + Subwoofer
7.1 8 PCM rear L. and rear R. channels are added.
Immersive MXF object sound, 3D sound

 

Stereo is not used in cinematography. The minimum configuration includes a third central channel: LCR (left, center, right). Stereo is only found in some advertising pieces or adaptations of other media that are also shown in theaters (unfortunately without having adapted their mix).

The center channel was already used decades ago, so in DCP for theatrical exhibition it is highly recommended to mix including this third channel. Even for short films with very tight budgets, the recommendation is to at least make an LCR.

In fiction or documentary feature films, the usual configuration is 5.1. In Spain, almost nothing is being done in 7.1. Almost everything is 5.1.

Immersive / object / 3D sound is not yet widespread. Dolby Atmos is being incorporated recently, although not many productions do this type of mix yet.

 

Dolby Atmos Best Digital sound mixing room: source Best Digital

 

Codecs Sound for UHD and Blu-ray TV

Broadcasting:

In post-production of sound for television, the ideal is to make at least two different mixes to the original of the exhibition rooms: a new 5.1 and a stereo, adjusting dynamics, equalizations, etc.

In broadcasting, it is not possible to broadcast both the original and the dubbed version in 5.1 and stereo at the same time. It would take up too much bandwidth. The usual practice is to broadcast with Dolby Digital and use the automatic downmix system to output the stereo version in the home receiver. Unfortunately there is a loss of fidelity with the original mix since the result is dependent on the receiver configuration that decodes at home. The stereo mix produced in post-production will always be better than the stereo that is achieved from a Dolby Digital 5.1 downmix.

For television and multichannel digital audio broadcasting, Dolby created ‘Dolby Digital Plus‘, a codec with more capability than the traditional ‘Dolby Digital (AC-3)’.

 

The system architecture works with a ‘Dolby Digital’ core and extensions(substream) that add more capabilities. Dolby Digital Plus’ can reach up to 6 Mb/s and the ‘Atmos’ extension already allows immersive sound to be received in the home.

The ‘Dolby Digital’ core is always needed, because the ‘Plus’ and ‘Atmos’ extensions are not playable in isolation. They always need the previous package, that’s why they are called extensions.

Dolby Atmos’ for broadcast is a scaled-down version, but retains the spirit of the Atmos mix that was made for theaters. And it is a huge improvement over the previous package, ‘Digital Plus’ and core ‘Dolby Digital’.

For broadcasting there is no DTS solution, nor does it even appear as a recommendation in the current versions of the BT.2020 standard as ‘Dolby Digital Plus’.

 

Speaker configuration for 5.1 home cinema in the home: A) Front speaker (left) B) Front speaker (right) C) Center speaker D) Surround speaker (left) E) Surround speaker (right) F) Bass booster speaker. Source: Sony

Blu-ray:

For Blu-ray, the same scheme is followed as for broadcast, but the codecs are different because the same restrictions on broadcast bandwidth do not apply. With Blu-ray discs the difficulty is that multiple languages are usually incorporated to facilitate international marketing and there is not enough storage space on the disc to put surround sound on all versions.

Therefore, for Blu-ray we find different codecs than those used for television or the Internet. For multichannel sound, there are three possible cases:

  1. 5.1 PCM sound. Uncompressed, patent-free sound.
  2. DTS-HD Master Audio‘. Uncompressed sound; up to 24.5 Mbits/s transfer rate.
  3. Dolby True HD‘. Up to 18 Mbits/s transfer rate.

 

 

The first Blu-ray discs came with uncompressed 5.1 PCM sound, which is not subject to any patent. But with this model, not many language versions fit on a 50 GB disc, and not all equipment is compatible with it (paradoxical as it may seem).

Gradually the industry has moved to ‘DTS HD Master Audio’, which organizes the files so that they take up less space, but with no loss of information (acting in the same way as a ZIP file). DTS HD Master Audio’ is used as a carrier for an extension that can carry DTS-X immersive sound. Similarly, ‘Dolby True HD’ allows the Atmos extension to be integrated.

With all this, the usual scenario for Blu-ray 4K movie releases is this: a 7.1 with immersive DTS-X in the original version; a 7.1 in Spanish in DTS HD Master audio; and the other languages in 5.1 or stereo in Dolby Digital.

In this image you can see the back cover of a 4K Blu-ray where you can see the different audio codecs that have been incorporated in this commercial release.

 

Audio system logos on the back cover of a 4K Blu-ray disc

 

Audio codecs for internet broadcasting

Advanced Audio Coding (AAC) is the most widely used audio format for broadcasting audiovisual content over the Internet. Although it is difficult to generalize, since there are many different technological scenarios: V.O.D. through the telephony provider, OTT/HBBTV internet television, video on streaming/downloading web pages, etc.

In the Internet, the determining factor is bandwidth limitation. In this sense, the scenario is similar to that of broadcasting, where what prevails are very low transfer flows.

In pay-per-view video-on-demand environments, 5.1 is pretty much standardized. But sound is being released at less than 320 Kbits/s which is too low for six channels of sound to sound true to the original mix.

Recently with the rise of new players such as Netflix, Amazon Prime, HBO, etc., they are raising the transfer rates and using new codecs such as ‘Dolby Digital Plus’ between 320 and 640 Kbits/s; and even occasionally with the Atmos extension (maintaining the object sound, but in a smaller version than the original).

Despite this, there are surprising cases, such as Youtube still playing only in stereo, without any 5.1 option!

 

In-home speaker configuration with reflected sound in ceiling. Source: Dobly

 

Pioneer S-BS73A speaker for projecting sound to the ceiling. Pioneer

 

Conclusions

Improving the user experience of ultra high-definition must necessarily involve higher quality sound. However, technological developments to improve audio quality, which have been in place for years, have not been widely implemented.

Manuel Sánchez Cid, professor at the Universidad Rey Juan Carlos (URJC) and expert in surround sound, has repeatedly expressed this idea:

“Ultra high definition does not end up meaning by incorporating the definitive implementation of the highest level of sound quality parameters achieved to date by technological development. However, the arrival of Ultra HD makes possible a reopening towards concepts of sound immersion that, without being novel, are oriented towards the implementation of the second vertical plane, as well as it seems that it allows to assume with greater commitment a spatial planning more connected with the multiperspective and the rupture of the visual anchorage”.

The challenge posed by Sánchez Cid is, therefore, oriented to artistic issues of realization and how the available technical resources are used. With surround sound systems, the representation of space (sound planes) and the point of audition in which the viewer is placed (perspective) are particularly relevant.

The levels of quality that sound technology currently offers are enormous and what is being used in practice is very little compared to what could be done.

Sound in exhibition halls. In exhibition complexes, the trend is to upgrade or adapt to immersive sound or 3D in a basic configuration only in the theater with the largest audience capacity. In new theaters or cinemas, the trend is to build those with more audience capacity directly in immersive sound in advanced configuration and the rest either not to provide them with immersive sound or to do it in a small configuration.

 

Surround sound loudspeaker layout for a cinema exhibition room.

 

Broadcasting. The bottleneck in terms of sound quality is here. For sound capture and post-production there are standards and tools of very high capacity that are held back in their expansion by bandwidth management, the economic cost of renewing equipment and the lack of innovation as an investment in corporate image. Mono or stereo broadcasts are the majority in DTT. In satellite/cable TV, Dolby 5.1 coexists with stereo productions.

Video on demand. It is the current engine of development and technical innovations in the delivery of materials with higher image and sound quality, with higher bit rates, latest generation codecs, immersive sound, etc.

High quality audiovisual consumption at home or Home Cinema. Home Cimena equipment provides the necessary conditions to enjoy the sound with practically the same quality as in the production studio (of course, excluding acoustic conditioning, etc.). If online on-demand signals or Blu-ray discs are delivered to this equipment, we have HDR / 4K / immersive sound / lossless audio etc. at the highest level, when a few years ago this was unthinkable in a home.

 

Rear panel of the Denon AVR-X4200W AV receiver for surround sound. Source: Denon

 

Report prepared by Luis Ochoa, Sergio Márquez and Francisco Utray.

Leave a Reply