Audio quality in networked systems

The subject of this white paper is audio quality in networked systems. One might think that everybody in the audio industry knows what the words ‘audio’, ‘quality’ and ‘system’ mean. The length of this first chapter proves otherwise; the words can be - and very often are - perceived in different ways by different individuals, often causing discussions about a system’s audio quality and sound quality to end up in endless repetitions of the words ‘is’ and ‘is not’.

1.1 Audio

In the field of neurosciences, the human hearing system is named 'human auditory system'. It's a bio-mechanical system that converts acoustic audio waves reaching the human ears through the air and the scull's bone structure into coded neural firing patterns. Auditory nerve strings transfer them to the central auditory nervous system in the human brain, with the latter interpreting the firing patterns to produce a hearing sensation. The hearing sensation is invoked by the heard audio signal, but it is influenced by all other sensations in the brain - from memory, but also from other real-time sensory organs such as vision, smell and touch.

In this white paper, all phenomenon, processes, systems and characteristics pertaining to generating, processing and transporting signals in the audible range of the human auditory system are referred to by the adjective 'audio'.

Audio

(adjective) designates objects (eg. signals, processes, devices, systems) or characteristics (eg. frequency, level, time) to pertain to signals in the audible range of the human auditory system.

table 101: audio designation examples

term

meaning

audio signal

the portion of any time-variant signal that falls in the audible range of the human auditory system, capable of invoking a hearing sensation. The signal can be acoustic, electronic or digital.

audio process

generation, transport, change and/or storage of an audio signal.

audio system

a system that processes audio signals.

audio characteristic

a (physical) feature of an audio signal (eg. level, frequency, time).

audio system characteristic

a (physical) feature of an audio system (eg. dynamic range, frequency range, time range).

sound source

a human voice, musical instrument or any phenomenon that generates an audio signal.

1.2 Sound

Sound is an adjective, a noun or a verb, used to identify or describe the perceptional characteristic of an audio signal - most often by describing the hearing sensation it invokes. The four other main sensory inputs vision, touch, smell and taste influence the hearing sensation in real time. But the most powerful sound influencer is memory. Already starting in the embryonal phase, the human brain 'learns' how to listen to audio signals, developing preferences for timbres, rhythm, patterns, sound colour, word recognition. As the brain actively controls the bio-mechanic processes in the middle and inner ear, we also literally train our auditory system to be as effective as possible. This means that the way we hear is greatly influenced by our hearing experience - including the developing of preferences for musical styles. Because no individual is the same, the same audio signal will invoke different hearing sensations ('sound') for different individuals.

Where audio characteristics describe physical characteristics of an audio signal, the adjective, noun and verb 'sound' most often describe perceptional features such as 'warmth', 'transparency', 'definition'. However, sometimes also physical characteristics are used in conjunction with the word 'sound' - eg. 'speed of sound'.

In this white paper we will use 'audio' as adjective to pertain to physical characteristics, and 'sound' as an adjective, noun and verb to pertain to perceptional characteristics. If exceptions of the use of the words 'audio' and 'sound' occur, the context will be clarified in the accompanying text.

Sound

(adjective, noun, verb) describes the subjective hearing sensation produced by stimulation of the human auditory system of an individual listener by an audio signal, transmitted through the air or other medium.

Sound source

(noun) designates the origin of an audio signal.

1.3 Audio Processes: limitation, unintended change and intended change

An audio system changes the characteristics of an audio signal by applying its audio process. The audio process is divided into three sub-processes: limitation, unintended change and intended change.

Limitation

a system's limits in representing signals in level, frequency, and time.

Unintended change

the change of an audio signal caused by unintended processes in an audio system.

Intended change

the change of an audio signal caused by intended processes in an audio system.

An audio system's limitation poses physical limits to level, frequency and time. For example, the high-end limitation in an audio system's level range is any incapability to reach 120dB(SPL) at a listeners position, while an audio system's low-end level limitation often presents itself as a constant level error signal higher than 0dB(SPL) at the listeners position such as a noise floor. Frequency limitation includes any low and high frequency bandwidth limits within the 20Hz-20kHz frequency range, while timing limitation includes any response time or time coherence incapability of more than 6 microseconds (eg. network latency), or any time coherence problem generating audible level errors (eg. jitter). Chapter 4 presents details on the limits of the human auditory system.

An audio system's unintended change poses changes to the audio signal such as equalising, distortion, compression. These changes are not intended by the designers or operators of the audio system - they are included in the audio process because they could not be avoided due to technological, financial, time and/or expertise constraints of the designer and/or operator. Unintended changes can be represented as error signals that are (partly) linear with the audio signal, sometimes summarized by a percentage (eg. %THD) or level ratio (eg. dB gain of a filter). Most commonly, unintended changes are regarded as having a negative impact on sound quality. But in some cases, if a system's initially unintended change is perceived to have a positive effect on sound quality, the product manufacturer or system designer can actively decide to not take countermeasures - thus turning the unintended change into an intended change.

An audio system's intended change poses changes to the audio signal by intention of the designer and/or operator of the system - most commonly to improve sound quality to the opinion or expectation of the designer/operator (on behalf of an audience), or to change the sound to match an external context (eg. video postproduction). An intended change can be designed into products and systems by manufacturers and system designers as a fixed process, or offered to system operators (sound engineers) to apply as a variable parameter. More on fixed and variable intended change (‘coloured sound vs natural sound') in chapter 3: Performance & Response.

1.4 Quality

Quality is conformance to requirements

This definition of quality comes from the renowned quality management guru Philip B. Crosby(*1A). His idea is that quality management should focus on setting well defined and realistic requirements - and then design clever management processes to make sure that an organisation’s output is meeting up to these requirements.

Crosby's definition states that quality is always related to requirements set for the output of the process. To enable the organisation to achieve the desired output quality, process requirements are set. And this area is where quality discussions in many of the debates in the audio industry go wrong: two individuals seldom agree on the requirements of both the process and the output - not even on the definition of the parameters that represent the requirements.

In this white paper, the term 'audio quality' refers to the physical characteristics of an audio signal, the term ‘sound quality' refers to the perceptional characteristics of the invoked hearing sensation.

Using Crosby's definition of quality, stating (system) audio quality means stating to what degree the audio signal (system) conforms to set requirements. Audio quality requirements can be stated as physical characteristics, for example in the form of electrical system specifications based on international standards (ISO, AES, IEC). If not otherwise specified,100% accurate representation can be assumed as audio (system) quality requirement.

Stating sound quality means stating to what degree a hearing sensation conforms to an individual listener's requirements - which can be either a preferred hearing sensation, or an expected hearing sensation if the individual is assessing the hearing sensation on behalf of an audience, or for use in an external context. Sound characteristics are often discussed using terminology such as 'warmth', 'transparency', 'definition' - which are not always standardized terms. Assumed that a group of persons agree on the definition of these terms, the degree of conformance will still differ from person to person, depending on individual hearing abilities and preferences.

1.5 Audio quality

In this white paper we propose the following requirement for audio signals:

Requirement for an Audio signal

An examined audio signal should represent the originally generated audio signal accurately, disregarding the intended changes of an audio system.

If there is no audio system between generating and examining (eg. hearing) an audio signal, the examined signal is exactly the generated signal. The closest we can get is listening to an acoustic signal at very close distance - think millimetres - without any audio disturbances eg. other signals, wind, movement.

In real life there is always a system between the generation and the hearing - even a short distance already constitutes a system as the turbulence in the air between audio source and listener changes the audio signal. Nearby objects or walls, and of course a networked audio system, add further changes.

In this white paper, we propose the following definition for audio quality:

Audio quality

The degree of representation accuracy of an examined audio signal, disregarding the intended changes of an audio system.

‘Audio quality’ describes how accurate an examined audio signal (at the output of a system) resembles the original audio signal generated by the sound source, disregarding the changes applied intentionally by product manufacturers, sound system designers and engineers.

The audio quality of a system between the input audio signal and an examined output signal is called the ‘system audio quality’. It can be described using the same requirement as for audio quality: an audio system should accurately transport and process the audio signal - without limitations or unintended changes. In common speech, ‘audio quality’ is often used to describe a system’s audio quality.

System audio quality

The degree of representation accuracy of an examined audio system, disregarding the intended changes of the audio system.

1.6 Sound quality

In this white paper, the term ‘sound quality’ refers to the perceptional characteristics of the hearing sensation invoked by an audio signal. Using Crosby’s definition of quality, stating sound quality means stating in what degree the hearing sensation conforms to what we specified as requirements. And here things start to become tricky: every individual has different requirements.

In this white paper we propose the following requirement for sound:

Requirement for sound

An audio signal should satisfy either the expected or the preferred hearing sensation of an individual listener.

For the definition of a system’s sound quality, the sound quality of the original signal has to be considered as well. We will name the sound quality of the original signal ‘source quality’. The requirements for the sound source then read as follows:

Requirement for a sound source

An audio signal generated by a sound source should satisfy either the expected or the preferred hearing sensation of an individual listener without limitation or change by an audio system.

The satisfaction of listening to a sound source - without a system in between ears and source - depends on individual hearing abilities and preferences, and also on the sound characteristics - or the ‘sound’ - of the source. For example, when listening to a solo violin performance, the hearing sensation is influenced by the composition played, the proficiency and virtuosity of the player, the characteristics of the violin. All these parameters together constitute the sound characteristics of the source. Although a statistical average appreciation can be found, for example by assessing the popularity of the solo violin performance by counting the number of persons who bought a concert ticket, every individual will assess source quality in a different way.

Knowing the requirements for the sound source, the sound requirements for the audio system can be defined:

Requirement for an audio system’s sound

The intended change of an audio signal by an audio system should satisfy either the expected or the preferred change in the hearing sensation of an individual listener with a given source sound.

In the case of the solo violin performance, the acoustics of the concert hall constitutes an audio system. If the performance needs amplification, then the PA system constitutes an audio system. In both cases, the audio system intentionally changes the audio signal produced by the sound source, contributing positively to the hearing experiences of the audience: the concert hall adds reverberation, the PA system adds loudness.

Sound requirements for sources and systems use perceptional characteristics ‘warmth’, ‘transparency’, ‘definition’. Note that in real life, multiple sound sources as well as multiple audio systems are involved. In this white paper we propose the following definition for sound quality:

Sound quality

The degree of satisfaction of the expected or the preferred hearing sensation of an individual listener as a result of hearing an audio signal.

Source sound quality

The degree of satisfaction of the expected or the preferred hearing sensation of an individual listener as a result of hearing an audio signal from a sound source disregarding the limitation or change by an audio system.

System sound quality

The degree of satisfaction of the expected or preferred hearing sensation of an individual listener as a result of the intended change of an audio signal by an audio system with a given source sound.

In words: if sound quality and audio quality are summarized as a percentage ‘Q’ - with 0% as ‘minimum quality’ and 100% as ‘maximum quality’, the sound quality of an audio signal experienced by a listener is the product of the source’s sound quality, the system’s audio quality and the system’s sound quality:

1.7 Discussing audio quality

The objective of using an audio system to process one or more sound sources is to achieve a better sound quality compared to not using an audio system. The main issue in the minds of product manufacturers, system designers and sound engineers is therefore sound. Assuming the sound source as a fixed parameter, the main tools to achieve a better sound quality are the audio system's intended changes - either built into the audio system as fixed characteristics, or available to the sound engineer as variable parameters.

However, a system's sound quality is significantly influenced by the system's audio quality. By definition, the more limitations and unintended changes the system imposes on the processed audio signals, the lower the sound quality will be. This white paper aims to provide insight in audio quality issues in networked systems to allow system designers and sound engineers to achieve the highest possible audio quality, allowing them to appreciate the system's intended changes as a basis for investments or rentals, and apply the available variable intended changes at their will to achieve the best possible sound quality.

Audio quality discussions can be conducted based on physical measurements of the audio signal and audio systems. Basing discussions on listening sessions however brings up the issue of disregarding the intended changes of the audio process - to leave only the limitations and unintended changes to discuss. Disregarding intended changes is easy if an audio system is built according to the ‘natural sound' philosophy - such a system passes audio signals as transparent (natural) as possible, and offers the system's intended changes to the sound engineer as variable components (colouring options, eg. equalizers, compressors), including the possibility to switch them off to allow audio quality assessment. Systems designed with a ‘coloured sound' philosophy have fixed intended changes, making it more difficult to assess audio quality issues because the intended changes can never be switched off.

As every system includes some amount of fixed intended changes, most prominently in the loudspeakers, listening session scripts can be used to focus on single parameters when comparing systems - the equivalent of the ceteris paribus approach in the economic sciences. If the compared systems all possess the same fixed intended changes, the listener can decide to concentrate on comparing a selected single audio quality parameter. More detailed information on this topic can be found in chapter 9: Quality assessment methods.

To facilitate system audio quality discussions, system audio quality characteristics can be represented by the characteristics of the difference between input and output of a system - the error signal. This error signal can be constant, linear with the level of a signal's frequency components, partially linear or nonlinear:

table 103: error signal types

difference

type

examples

constant

limitation

audio: HA noise, A/D quantization noise (unintended). sound: masking noise (intended).

audio signal

change linear with signal level

audio: jitter noise, equalising (unintended). sound: equalising (intended).

non linear

change partially or not linear with signal level

audio: amplifier clipping, zero-crossing distortion, compression (unintended). sound: guitar amp distortion, compression (intended).

Figure 108A and 108B on the below presents a listing of audio quality and sound quality issues in a networked system. In figure 108A (audio quality issues in a networked audio system), a selection of limitation and unintended change error signals and their causes are presented as grey bars:

・Constant error signals (eg. limitations) are shown with one arrow pointing to the average error level in dB(FS).

・Linear error signals (eg. unintended changes) are shown with two arrows connected with a dotted line - one arrow pointing to the audio signal level at 0 dB(FS), the other to the error level to indicate that the error level depends on the signal level.

In figure 108B (sound quality issues in a networked audio system), a selection of available intended change processes are presented.

Chapter 3 - Performance & Response

Presents the Performance / Response concept - identifying system process parameters and requirements to help assessing the quality of audio systems. Two design philosophies are presented: ‘natural sound’ - where the focus lies on preserving the artistic quality of the audio event and offering Response tools to the sound engineer as variable parameters, and ‘coloured sound’ where a fixed sound-changing Response is designed into products and systems.

3.1 Unintended and Intended changes

3.2 Performance & Response

3.3 Natural sound and coloured sound

Go to Chapter 3

Chapter 4 - The human auditory system

Briefly presents a description of the human auditory system, including the mechanics of the outer and middle ear, the bio-mechanical coding to the frequency domain by the inner ear, and the transport of the coded firing patterns to the brain through auditory nerves. Using this description, a ‘human audio universe’ is defined to possess three dimensions: level, frequency and time. Also some auditory functions such as localization and masking are presented.

4.1 Ear anatomy

4.2 The audio universe

4.3 Auditory functions

Go to Chapter 4

Chapter 5 - Sampling issues

Presents the audio digitalization (sampling) concept in relation to level, frequency and timing. Dynamic range and frequency range are more or less common concepts, developed to a mature state by the manufacturers of digital audio equipment in the past 25 years. Compared with the 1985 digital (16-bit) technologies, modern 24-bit A/D, D/A and distribution technologies and 32-bit or higher DSP architecture have caused noise floors and distortion levels to move close the boundaries of the audio universe. On timing however, the use of networked audio systems pose new challenges to system designers and sound engineers. This chapter presents the digitalization concept in relation to timing, including latency, jitter and clock phase.

5.1 Digital Audio

5.2 Dynamic range

5.3 Frequency range

5.4 Timing issues

5.5 Absolute latency

5.6 Relative latency

5.7 Word clock

5.8 Clock phase

5.9 Temporal resolution

5.10 Jitter

Go to Chapter 5

Chapter 6 - Distribution & DSP issues

Presents a description of the transport and DSP infrastructure in a digital audio system. Transport and DSP architecture - eg. bit depth, fixed/floating point processing - are described to have an effect on a system’s audio quality, with only the algorithm (plug-in) design to affect the system’s sound quality.

6.1 I/O distribution

6.2 Interconnected DSP distribution

6.3 Constant gain A/D converters

6.4 DSP architecture

6.5 Fixed point vs. Floating point

6.6 DSP user interfaces

Go to Chapter 6

Chapter 7 - Signal chain level issues

Focuses on audio levels in a system, proposing a ‘0dBFS’ level standard as the optimal design paradigm that allows easy identification of quality problems in a signal chain. Several practical quality issues in system design are presented, such as head amps, gain compensation, clip level mismatch, double pass signal chains. Also, audio compression in speaker processing stage (unbalanced output modes) is discussed, placing the responsibility in the Response (sound quality) domain rather than the Performance (audio quality) domain.

7.1 0dBFS

7.2 Head amps

7.3 Gain compensation

7.4 Clip level mismatch

7.5 Double A/D-D/A pass signal paths

7.6 Unbalanced output mode

Go to Chapter 7

Chapter 8 - operational quality

Presents operational quality issues in a networked audio environment, including topology and protocol and their effect on logistics, reliability and redundancy. The use of ethernet - either as protocol or as embedded service - is posed to be of essential importance to comply with operational quality requirements on design freedom and user interfacing.

8.1 Network implications

8.2 Ethernet compliance

8.3 Redundancy

8.4 Switches and cables

Go to Chapter 8

Chapter 9 - quality assessment methods

Presents methods for subjective and objective quality assessments of audio systems. Conditions for controlled listening tests are proposed for audio quality assessment. Full control over the experiments with careful adjustment of test equipment and environment, and proper statistical analysis are crucial to obtain meaningful results that justify statements on product and system audio quality and Response characteristics.

9.1 Quality assessment through electronic measurements

9.2 Quality assessment through listening tests

9.3 Conducting listening tests