VIRTUAL ACOUSTICS AND AUDIO ENGINEERING
Fluid Dynamics and Acoustics Group

 


 

Human Localisation

There are clearly different mechanisms in the localisation of real acoustical events, and of virtual sounds when using headphones. Shaw (1982) proposed that headphone studies of spatial imagery be referred to as 'space perception', since the collection of perceptual data, such as segregation of sound, where a single acoustical event may give rise to more than one auditory event, or ambiguity with respect to whether the sound is externalised (and 'localisation' becomes 'lateralisation'). Most psychoacoustical studies are carried out with headphones, and assume that sound localisation and space perception can be regarded as equivalent.

Binaural cues

Lord Rayleigh's duplex theory (Rayleigh, 1907) was the first to explain how we localise sound: localisation is based on the fact that path lengths are different for the two ears, hence the Interaural Time Difference (ITD), and the head acts as acoustic shadows at higher frequencies producing Interaural Level Difference (ILD). At angles on the median plane or on the 'cone of confusion', additional complex cues are required, and therefore either head movement can resolve ambiguity (Wallach, 1940), or these can be resolved by the filtering of the pinna.
By neglecting the transmission paths within the auditory nervous system, we can assume that binaural cues could be derived by the ratio of the ipsilateral and the contralateral HRTFs in the frequency domain. This ratio produces the ITD and ILD mentioned above. Wightman and Kistler (1997) found that the fact that ITD is frequency dependant and is larger at low frequencies than at high frequencies (Kuhn, 1977) is perceptually irrelevant. Whenever HRTFs are implemented using a minimum phase model (see below), a single value is assigned to the ITD. However, although ITD values are roughly similar among subjects, the auditory nervous system is very sensitive to changes of the interaural phase or timing. The minimum noticeable difference can be as low as 6 msec (see Carlile, 1996, section 2.2.1).
The dominance of the ITD cue at low frequencies (below 1.5 kHz) was demonstrated when it was conflicted in a subjective experiment against other localisation cues (Wightman and Kistler, 1992). It was also claimed that the auditory system is sensitive to ITD in the envelopes of high frequency carriers, but this is a less dominant cue.
ILD presentations are very complex, since at high frequencies their dependence with the change of angle is high, and the response varies rapidly between peaks and notches. Their visualisation in different planes reveals some systematic variations, but variations among individuals are high, especially above approximately 8 kHz. It was shown that when ILD are presented in different frequency bands they have similar patterns (Wightman and Kistler, 1997). In addition, similar patterns are noticed for a specific frequency with a change of the elevation angle (Duda, 1997). In the horizontal plane, Middlebrooks and Green (1991) observed that localisation is mainly based on ITD and ILD without pinna cues. However, Musicant and Butler (1984) found that pinna cues indeed helped in resolving front and back confusion, and increased the localisation accuracy when localising sounds within the same quadrant of the horizontal plane.
Another possible binaural cue was suggested by Searle et al (1975), as 'binaural pinna disparity'. These authors proposed that the asymmetry between pinnae geometry and acoustical response aids in median localisation performance. This cue is still regarded as of a second order of significance.

Monaural cues: time domain interpretation

Many psychoacoustical studies demonstrated that it is possible to localise reasonably well with one ear plugged, in both horizontal and elevation angles (see Blauert, 1997, Section 4.4, Carlile, 1996, Section 2.2). However, localisation accuracy is dependent on the spectral contents, the frequency bandwidth of the stimuli, and other factors related to practice and context effects.
As the external ear is a linear system, time domain and frequency domain behaviours are related through the Fourier transform. It is assumed that the processing of directional information takes place in one of the two domains. Does the external ear encode the source direction through modulation of time delays or modulation of spectral shape?
Batteau (1962, 1967) was a pioneer in relating localisation in elevation, and the physical cues provided by the external ear. He hypothesised that a simple time domain model, which includes the original signal and two echoes, can give rise to the necessary spectral cues. One echo having a latency of 0-80 msec varies with the azimuthal position of the source, and a second echo, having a latency of 100-300 msec varies with the elevation. Some agreement was found by Watkins (1978) and Wright et al (1974) when the method was compared with measurements in the lateral vertical plane. Hebrank and Wright (1974) showed that the notches appearing in the frequency domain are matched with the interference of a variable path-length reflection that occurs on the posterior wall of the concha. Hiranaka and Yamasaki (1983) confirmed that major reflections occur within 350 ?s after the first arriving sound, and that the delay increases as the source is lowered. In an extensive search for physical cues made with KEMAR, it was shown by Han (1991) that if localisation at high frequencies were based on time delays, it would work only in a very limited region. Since the model of Batteau is based on the physical geometry of the external ear but is said to be too simplistic, a further development was formulated by Chen et al (1992) with the addition of reflected paths. The pinna was modelled using a beam-forming approach. Although this model does not rely on physical principles, it is based on the general geometrical properties of the pinna.
Wightman and Kistler (1997) argued that monaural temporal cues are not likely to be relevant for human sound localisation. Firstly because the HRTF impulse responses are too short to be processed in the auditory system (they are of the order of about 2 msec.), and secondly their previous results (Kistler and Wightman, 1992) suggest that changes in the temporal fine structure of the HRIR do not produce subsequent changes in the apparent positions of sound sources.

Monaural cues: frequency domain interpretation

Although it is accepted now that the pinna acts as a 'frequency domain filter', it is not clear which cues are relevant and necessary for the perception of elevation. Two approaches exist when the spectra of HRTFs are analysed: elevation is perceived through the peaks (the resonance of the pinna), or alternatively, as the notches (the anti-resonance).
It was claimed that for narrow band stimulus, the apparent location is directly related to spectral peaks in the subjects HRTFs (Blauert, 1997, Butler, 1997 and Musicant, 1995). However, with regards to vertical localisation, Hebrank and Wright (1974), Butler and Belendiuk (1977), Watkins (1978) and Bloom (1977) have provided strong evidence that with narrow band stimuli, spectral notches are responsible for the sensation of source elevation. Shaw (1982) found that in eight out of ten subjects the spectral minima systematically moved along the frequency axis as source elevation varied from high to low. For two subjects the minimum varied in level but not in frequency.
In their work, Hebrank and Wright (1974) used band-pass filters to investigate the frequency range in which the pinna affects localisation. It was concluded that elevation cues are embedded in a frequency range of 4 kHz to 16 kHz. This conclusion might explain why a large number of reversals occur whenever non-individualised HRTFs are used (Wenzel et al, 1993, Møller et al, 1996).
The significance of the cues provided by the pinna was demonstrated clearly when its shape was disrupted. Gardner and Gardner (1973) showed that when the pinna is altered by filling its cavities with putty the localisation in the median plane is reduced. Subsequently, more localisation and search for physical cues studies appeared with pinna occlusions (e.g. Oldfield and Parker, 1984, Han, 1991) that supported the significance of the contribution from all parts of the pinna to sound localisation in elevation and also to some extent in the horizontal plane.

References