There are clearly different mechanisms
in the localisation of real acoustical events, and of virtual sounds when
using headphones. Shaw (1982) proposed that headphone studies of spatial
imagery be referred to as 'space perception', since the collection of perceptual
data, such as segregation of sound, where a single acoustical event may
give rise to more than one auditory event, or ambiguity with respect to
whether the sound is externalised (and 'localisation' becomes 'lateralisation').
Most psychoacoustical studies are carried out with headphones, and assume
that sound localisation and space perception can be regarded as equivalent.
Lord Rayleigh's duplex theory (Rayleigh,
1907) was the first to explain how we localise sound: localisation is based
on the fact that path lengths are different for the two ears, hence the
Interaural Time Difference (ITD), and the head acts as acoustic shadows
at higher frequencies producing Interaural Level Difference (ILD). At angles
on the median plane or on the 'cone of confusion', additional complex cues
are required, and therefore either head movement can resolve ambiguity
(Wallach, 1940), or these can be resolved by the filtering of the pinna.
By neglecting the transmission
paths within the auditory nervous system, we can assume that binaural cues
could be derived by the ratio of the ipsilateral and the contralateral
HRTFs in the frequency domain. This ratio produces the ITD and ILD mentioned
above. Wightman and Kistler (1997) found that the fact that ITD is frequency
dependant and is larger at low frequencies than at high frequencies (Kuhn,
1977) is perceptually irrelevant. Whenever HRTFs are implemented using
a minimum phase model (see below), a single value is assigned to the ITD.
However, although ITD values are roughly similar among subjects, the auditory
nervous system is very sensitive to changes of the interaural phase or
timing. The minimum noticeable difference can be as low as 6 msec (see
Carlile, 1996, section 2.2.1).
The dominance of the ITD cue at
low frequencies (below 1.5 kHz) was demonstrated when it was conflicted
in a subjective experiment against other localisation cues (Wightman and
Kistler, 1992). It was also claimed that the auditory system is sensitive
to ITD in the envelopes of high frequency carriers, but this is a less
ILD presentations are very complex,
since at high frequencies their dependence with the change of angle is
high, and the response varies rapidly between peaks and notches. Their
visualisation in different planes reveals some systematic variations, but
variations among individuals are high, especially above approximately 8
kHz. It was shown that when ILD are presented in different frequency bands
they have similar patterns (Wightman and Kistler, 1997). In addition, similar
patterns are noticed for a specific frequency with a change of the elevation
angle (Duda, 1997). In the horizontal plane, Middlebrooks and Green (1991)
observed that localisation is mainly based on ITD and ILD without pinna
cues. However, Musicant and Butler (1984) found that pinna cues indeed
helped in resolving front and back confusion, and increased the localisation
accuracy when localising sounds within the same quadrant of the horizontal
Another possible binaural cue was
suggested by Searle et al (1975), as 'binaural pinna disparity'. These
authors proposed that the asymmetry between pinnae geometry and acoustical
response aids in median localisation performance. This cue is still regarded
as of a second order of significance.
Monaural cues: time domain interpretation
Many psychoacoustical studies demonstrated
that it is possible to localise reasonably well with one ear plugged, in
both horizontal and elevation angles (see Blauert, 1997, Section 4.4, Carlile,
1996, Section 2.2). However, localisation accuracy is dependent on the
spectral contents, the frequency bandwidth of the stimuli, and other factors
related to practice and context effects.
As the external ear is a linear
system, time domain and frequency domain behaviours are related through
the Fourier transform. It is assumed that the processing of directional
information takes place in one of the two domains. Does the external ear
encode the source direction through modulation of time delays or modulation
of spectral shape?
Batteau (1962, 1967) was a pioneer
in relating localisation in elevation, and the physical cues provided by
the external ear. He hypothesised that a simple time domain model, which
includes the original signal and two echoes, can give rise to the necessary
spectral cues. One echo having a latency of 0-80 msec varies with the azimuthal
position of the source, and a second echo, having a latency of 100-300
msec varies with the elevation. Some agreement was found by Watkins (1978)
and Wright et al (1974) when the method was compared with measurements
in the lateral vertical plane. Hebrank and Wright (1974) showed that the
notches appearing in the frequency domain are matched with the interference
of a variable path-length reflection that occurs on the posterior wall
of the concha. Hiranaka and Yamasaki (1983) confirmed that major reflections
occur within 350 ?s after the first arriving sound, and that the delay
increases as the source is lowered. In an extensive search for physical
cues made with KEMAR, it was shown by Han (1991) that if localisation at
high frequencies were based on time delays, it would work only in a very
limited region. Since the model of Batteau is based on the physical geometry
of the external ear but is said to be too simplistic, a further development
was formulated by Chen et al (1992) with the addition of reflected paths.
The pinna was modelled using a beam-forming approach. Although this model
does not rely on physical principles, it is based on the general geometrical
properties of the pinna.
Wightman and Kistler (1997) argued
that monaural temporal cues are not likely to be relevant for human sound
localisation. Firstly because the HRTF impulse responses are too short
to be processed in the auditory system (they are of the order of about
2 msec.), and secondly their previous results (Kistler and Wightman, 1992)
suggest that changes in the temporal fine structure of the HRIR do not
produce subsequent changes in the apparent positions of sound sources.
Monaural cues: frequency domain
Although it is accepted now that
the pinna acts as a 'frequency domain filter', it is not clear which cues
are relevant and necessary for the perception of elevation. Two approaches
exist when the spectra of HRTFs are analysed: elevation is perceived through
the peaks (the resonance of the pinna), or alternatively, as the notches
It was claimed that for narrow
band stimulus, the apparent location is directly related to spectral peaks
in the subjects HRTFs (Blauert, 1997, Butler, 1997 and Musicant, 1995).
However, with regards to vertical localisation, Hebrank and Wright (1974),
Butler and Belendiuk (1977), Watkins (1978) and Bloom (1977) have provided
strong evidence that with narrow band stimuli, spectral notches are responsible
for the sensation of source elevation. Shaw (1982) found that in eight
out of ten subjects the spectral minima systematically moved along the
frequency axis as source elevation varied from high to low. For two subjects
the minimum varied in level but not in frequency.
In their work, Hebrank and Wright
(1974) used band-pass filters to investigate the frequency range in which
the pinna affects localisation. It was concluded that elevation cues are
embedded in a frequency range of 4 kHz to 16 kHz. This conclusion might
explain why a large number of reversals occur whenever non-individualised
HRTFs are used (Wenzel et al, 1993, Møller et al, 1996).
The significance of the cues provided
by the pinna was demonstrated clearly when its shape was disrupted. Gardner
and Gardner (1973) showed that when the pinna is altered by filling its
cavities with putty the localisation in the median plane is reduced. Subsequently,
more localisation and search for physical cues studies appeared with pinna
occlusions (e.g. Oldfield and Parker, 1984, Han, 1991) that supported the
significance of the contribution from all parts of the pinna to sound localisation
in elevation and also to some extent in the horizontal plane.