Main

Whether it is the song of a whale, the rustling of a mouse or the quiet sneaking of a cat, when sound emanates from a source, it travels through the surrounding medium as an oscillation of motion and pressure. The ability to detect its direction turns hearing into a spatial sense that is pivotal for survival.

Terrestrial vertebrates sense direction by sampling pressure at the ears (that is, two positions far apart; Fig. 1a). In the late 1940s, Lloyd Jeffress speculated that a combination of delay lines and coincidence detector neurons could measure the interaural time difference (ITD)1. An implementation of the Jeffress model was later found by Carr and Konishi in their seminal work on the barn owl brainstem2, making it one of the most well-known canonical circuits in vertebrates and a textbook example for the convergence of theory, behaviour, biophysics and physiology3,4. In addition to the ITD, the difference in the pressure amplitude across the ears, called the interaural level difference (ILD), also carries information about the sound direction. This quantity is amplified by the head, which reflects airborne sound and casts an acoustic shadow onto the far ear5.

Fig. 1: Sounds elicit a directional startle reflex in Danionella cerebrum.
figure 1

a, Schematic of a pressure wave, arriving at the auditory organs with a detectable ITD in humans. b, ITDs are heavily diminished underwater (value approximated for D. cerebrum). c, Behavioural setup (Methods). d, Playback paradigm. Before the experiment, sound pressure and particle acceleration are calibrated at multiple points inside the inner tank (top left, orange crosses; see also Extended Data Fig. 1c–e). Playback is triggered if three conditions are met: the fish swims into the trigger zone (top right, dotted green rectangle), the fish is oriented ≤45° to the y axis, and ≥5 s have passed since the last playback (Methods). e, Startles are detected by a speed threshold after sound playback (Methods; see Extended Data Fig. 4 for details on startle dynamics). Top: centred trajectories after playback for startles (n = 1,415) and non-startles (n = 2,383) across all fish (n = 65). Bottom: average fish speed for startles and non-startles, aligned to sound trigger at t = 0. f, Centred startle trajectories in two sound configurations show a directional escape away from the left (81% of n = 125 startle trials across 58 fish; two-sided binomial test: P = 2 × 10−12) or right (79% of n = 115 startle trials across 56 fish; two-sided binomial test: P = 2 × 10–10) speaker. g, Pooled centred trajectories from f with flipped trajectories for the right speaker stimulus summarize the directional escape away from the single speaker (80% of n = 240 startle trials across 63 fish; two-sided binomial test: P = 1 × 10−21). f,g, The heat maps are normalized and smoothed two-dimensional histograms over endpoint positions of the trajectories (grey for single-stimulus data; blue for pooled data). Scale bars, 10 cm (d) and 5 mm (f-g).

Source data

The vertebrate sense of hearing evolved from a common fish ancestor, and yet these well-established models for directional hearing falter when applied to underwater environments. As sound travels approximately five times faster in water than in air, ITDs are reduced to very small levels (Fig. 1b). As biological tissues have acoustic impedances similar to water, ILDs are substantially smaller, too. This presents a conundrum: according to prevailing models, fish should not be able to localize sound. Yet, behavioural evidence shows that fishes such as the Atlantic cod7,8,11,12,13,14, the plainfin midshipman15,16,17,18,19, herring20,21,22,23 and goldfish24,25,26,27,28,29,30 can determine the direction of sound sources.

What cues are available to fish that might enable directional hearing? Fish have two distinct peripheral auditory pathways31,32. First, otolithic end organs of the inner ear, in addition to their vestibular function, also act as particle motion sensors for nanometre to micrometre displacements. Owing to the morphology of hair cells, this direct hearing pathway is inherently spatial, but it allows animals to tell only the axis of sound propagation, not the direction—a limitation termed the 180° ambiguity problem of directional hearing6. The second, so-called indirect, hearing pathway relies on the swim bladder, which is filled with compressible gas that oscillates in a pressure field. In Otophysi, a large superorder containing about 66% of freshwater fish species33, a series of bones (Weberian ossicles) transmits this motion to the inner ear34,35,36,37,38,39.

In 1975, Arie Schuijf proposed a model7,8 suggesting that fish could theoretically deduce the direction of sound if they were able to separately sense and compare its motion and pressure components. Alternative hypotheses include the possibility that fish evolved an extreme sensitivity to minuscule ITDs and ILDs or use their mechanosensory lateral line organ for directional inference through an unknown mechanism. Despite almost a century of careful work on this topic (Supplementary Table 1), starting with Karl von Frisch’s studies on minnows in 193540, the mechanism of directional hearing in fish is debated to this date.

Empirical tests of the directional hearing mechanism have been fraught with several difficulties. First, reliably controlling sound stimuli underwater is challenging owing to echoes and sound reverberations, as well as near-field effects in small tanks41,42,43,44,45. Second, although fish are sensitive to particle motion, many studies control only for pressure, so several authors have urged experimenters to control both quantities41,46,47. Third, the sense of hearing, mediated by the inner ear, must be distinguished from the lateral line sense, which is sensitive to low-frequency water flow (<200 Hz)48,49,50,51. Fourth, behavioural paradigms should exclude the possibility of klinotaxis (that is, a sequential gradient sampling strategy) instead of true directional hearing. Finally, the study of sound transduction to the fish’s inner ear36,38,39 is complicated by tissue opacity.

Here we address these challenges and systematically test hypotheses of directional hearing mechanisms using the transparent fish Danionella cerebrum, one of the smallest known vertebrates9,10,52,53,54,55,56,57,58,59. Its small size makes D.cerebrum well suited for high-throughput experiments under controlled laboratory conditions. Moreover, D.cerebrum communicate acoustically, underlining the importance of hearing for their behaviour. With an inner ear separation of less than 1 mm, D.cerebrum put interaural comparison mechanisms to their ultimate test (Fig. 1b).

We find that D.cerebrum perform directional startles away from a sound source and that this ability is independent of the lateral line. We then present an extensive set of controlled particle motion and pressure stimuli that lead to differential predictions for directional responses, depending on seven alternative hypotheses for the mechanism of directional hearing. Finally, we carry out laser-scanning vibrometry of auditory structures across the transparent body to determine the physical cues available to D.cerebrum. Together, the findings of our experiments lead us to reject all but one of the proposed hypotheses for directional hearing, and they provide strong support for Schuijf’s model that fish compare the phase between pressure and motion to tell the direction of a sound source.

Directional startle responses

To test whether D.cerebrum can hear sound direction, we tracked their motion in an aquarium surrounded by underwater speakers (Fig. 1c, Methods and Extended Data Fig. 1a,b). We played back transient sounds (about 12 ms duration, about 0.7 ms rise time, 780 Hz centre frequency; Methods and Extended Data Figs. 2 and 3) and quantified the direction of their startle reflex. The experiment was carried out with one fish at a time, and playback depended on the fish’s orientation and position, to test left–right directional hearing and cancel position-dependent echoes (Fig. 1d and Methods).

Shortly after sound onset (within 17 ms or 2 video frames), D.cerebrum performed a characteristic startle reflex involving fast sideward displacement (see Fig. 1d,e, Methods and Extended Data Figs. 4 and 5 for startle dynamics, probabilities and habituation). We used the relative displacement along the left–right speaker axis 50 ms after startle onset as a readout of directional response and found that D.cerebrum startle away from the speaker, irrespective of the speaker location (left or right; Fig. 1f). Consequently, left and right responses were pooled in a metric for directional escape away from the speaker (Fig. 1g). These single-speaker sound playbacks resulted in directional escapes for 80% of all startles (191 of n = 240 startles across 63 fish with at least one startle, two-sided binomial test: P = 1 × 10−21), an effect present in both sexes (Extended Data Fig. 6a). A replication of the experiment in lateral line-ablated fish showed equivalent results of directional hearing (Extended Data Figs. 7b and 9b). Thus, female and male D.cerebrum exhibit directional hearing independent of lateral line function.

Physical cues and hypotheses

We then inquired about possible algorithms that may underlie this directional hearing behaviour. As directionality has to be inferred from physical acoustic cues, we gathered those cues potentially available to D.cerebrum to list compatible hypotheses for a directional hearing algorithm. Adults have an inner ear separation (about 0.6 mm; Fig. 2a) that is orders of magnitude smaller than the wavelength of the sounds recorded in their natural habitat (≥150 mm for sounds up to 10 kHz). In addition, the mismatch between the characteristic acoustic impedance of water (1.5 MPa s m−1) and biological tissue (about 1.6–1.7 MPa s m−1)60 is small, unlike the approximately 4,000-fold mismatch between biological tissue and air (0.0004 MPa s m−1). Hence, D.cerebrum can hardly break left–right symmetry by casting an acoustic shadow. However, close to a monopole sound source, ILDs could occur owing to the steep decay of the sound field with distance (Extended Data Fig. 8a). In D.cerebrum, the level drop between both inner ears at 3 cm distance from a monopole sound source can be as large as about 2% for pressure, irrespective of frequency, and about 4% for particle velocity at frequencies below 4.2 kHz. Hence, sound direction could be inferred through two pressure sensors (P-ILD; hypothesis 1) or two particle motion sensors (M-ILD; hypothesis 2). These two strategies would work best for nearby sounds coming from a direction along the axis of the sensor pair.

Fig. 2: The relative phase between pressure and particle motion predicts startle direction.
figure 2

a, Schematic of D.cerebrum hearing apparatus anatomy. b, Schuijf’s model: a plane wave from the left differs from a plane wave from the right in terms of the phase relationship between pressure and directed particle velocity. c, Illustration of stimulus configurations. Plus, minus and arrows inside speaker symbols refer to speaker signal polarity. L, left speaker; R, right speaker; P, only pressure; M, only motion. The schematics of resulting traces illustrate the pressure and particle velocity relationship (see Extended Data Figs. 2 and 3 for actual traces). Speaker schematics show a simplified configuration. Active echo cancellation typically involved three speakers (Supplementary Table 2). df, Centred startle trajectories after sound playback. All statistical tests are two-sided binomial tests. d, Directional escapes away from a single speaker for a positive polarity signal (same data as Fig. 1g) and negative polarity signal (80% escapes, P = 7 × 10−23, n = 258 startle trials to this sound in 61 fish). e, Absence of significant directional bias in the positive polarity condition with only pressure (44% of n = 90 to right, not significant; P > 0.05) and only particle motion (46% of n = 192 to right, not significant; P > 0.05). f, Single-speaker playbacks pooled over both polarities evoke a directional escape (80% escapes, P = 4 × 10−43, n = 498 startle trials in 64 fish). Selective inversion of pressure polarity by an additional pair of speakers along the orthogonal axis inverts the relative polarity between pressure and particle velocity, which tricks the fish into performing startles approaching the active speaker (67% approach, P = 1 × 10−9, n = 331 startle trials in 61 fish). Scale bars, 500 µm (a) and 5 mm (df).

Source data

The other binaural cue may be time differences in pressure (P-ITD, hypothesis 3) or particle motion (M-ITD, hypothesis 4). The maximal ITD between D.cerebrum’s inner ears is 0.4 µs, orders of magnitude smaller than in terrestrial vertebrates4. An ITD mechanism would thus point to an extreme sensitivity to minute time differences.

Each inner ear hair cell deflects along a preferred axis. Continuous sinusoidal sounds from opposing directions would stimulate the same hair cells, leading to the 180° ambiguity problem. If, however, all ecologically relevant sounds started with compression, the animal could interpret initial particle motion as motion directed away from the source. This sense would require just a single particle motion sensor for each axis. We call this possibility M-polarity hearing for either positive or negative polarity (hypotheses 5 and 6).

Finally, Schuijf’s model for directional hearing resolves the 180° ambiguity problem by using pressure as a reference quantity (hypothesis 7). The idea is most easily illustrated for a plane wave (Fig. 2b), but its applicability is not limited to it (Extended Data Fig. 8b): at the phase of high compression, particles move away from the source, no matter if it started with push or pull. Thus, the pressure signal can act as a reference signal to resolve the 180° ambiguity of particle motion.

To distinguish between these hypotheses, we took advantage of D.cerebrum’s small size and precise control over stimuli under laboratory conditions. This allowed us to create stimuli that would normally not occur in nature (for example, sounds with pure pressure and no particle motion component) and to behaviourally dissect the algorithm used by D.cerebrum to tell sound direction.

The directional hearing algorithm

The observed directional startle responses (Fig. 1f,g) are a reaction to naturalistic two-component sound consisting of pressure and particle motion signals (Fig. 2c(i)). To investigate how the isolated components of this sound field affect D.cerebrum, we created a pure pressure stimulus and a pure particle motion stimulus. The pure pressure stimulus was realized by driving opposing speakers in phase (Fig. 2c(iv)), and the pure motion stimulus was realized by driving them out of phase (Fig. 2c(v)), creating standing waves with nodes at the animal’s location. Our approach additionally took into account echo cancellation and the spatially mapped frequency response of our recording tanks to present controlled and reproducible stimuli (Methods and Extended Data Figs. 2 and 3). We found that either component alone (pure pressure or pure motion) can elicit startles (Extended Data Fig. 7). However, neither component by itself elicited directionally biased responses, suggesting that both are necessary to tell sound direction (Fig. 2e and Extended Data Fig. 7).

The unbiased responses to the pure particle motion stimulus were the first evidence against a directional hearing algorithm based on initial particle motion polarity (M-polarity). To further test this hypothesis, we presented amplitude-inverted waveforms, for which the M-polarity algorithm would predict a reversed startle response. However, D.cerebrum performed startles away from the speaker for both polarities (Fig. 2d; pooled single-speaker data in Fig. 2f: 80% escape of n = 498 startles across 64 fish, two-sided binomial test: P = 4 × 10−43). Hence, we ruled out particle motion polarity (M-polarity) as the sole cue for directional hearing.

To test whether it may be the relationship between pressure and motion that determines the startle direction, we selectively inverted the pressure signal while leaving particle motion unchanged (similar to refs. 8,14). As we have seen, activating the left speaker (for example) evokes startles towards the right, away from the speaker, for both positive or negative waveform polarity (configurations shown in Fig. 2c(i,ii)). By introducing an additional pressure source through in-phase activation of two speakers orthogonal to the left–right axis (Fig. 2c(vi)), we were able to selectively invert pressure, creating stimuli with pressure and particle motion cues akin to a sound originating from the right (Fig. 2c(iii)), despite the right speaker being inactive (Methods). In this ‘trick condition’, Schuijf’s model predicts reversed startle behaviour (that is, ‘escape’ towards a speaker).

Indeed, D.cerebrum could be tricked: following pressure inversion, D.cerebrum performed startles towards the active speaker rather than away (Fig. 2f, 67% approach, two-sided binomial test: P = 1 × 10−9, n = 331 startle trials in 61 fish). This result held true for both sexes, within individual fish, for both pressure polarities and in lateral line-ablated fish (Extended Data Figs. 6b–d, 7 and 9d).

To check whether any binaural mechanism explains startles towards the speaker in the trick condition, we estimated the amplitudes and signs of P-ITD, M-ITD, P-ILD and M-ILD on the basis of theory and pressure measurements on both sides of the fish (see the section of the Methods entitled Estimation of binaural cues). We found that the signs of M-ITD and M-ILD remain unchanged when creating the trick condition and cannot explain the reversal of D.cerebrum’s escape direction. This suggests that neither interaural time nor level comparisons are sufficient to infer sound direction from particle motion.

In the single-speaker experiments, the distance-dependent decay of absolute pressure is 4.4 Pa over 600 µm, which is 2% of the pressure amplitude and potentially large enough to be detected. When we selectively invert pressure to realize the trick condition, we also effectively invert the sign of the level gradient along the horizontal x axis, as well as the sign of the phase delay (see Extended Data Fig. 8d for a geometrical explanation). Therefore, P-ILD and P-ITD are the other mechanisms that are in agreement with the behavioural response of D.cerebrum.

In summary, D.cerebrum can be tricked into startling towards the speaker rather than away (see also Supplementary Video 1). Among all seven hypotheses considered, this behaviour is consistent with only three (Fig. 4): Schuijf’s model, which relies on the phase comparison between pressure and particle motion; and the P-ILD and P-ITD mechanisms, which both rely on sensing pressure level at two positions in space.

To determine which of these remaining three hypotheses is compatible with D.cerebrum’s sensory anatomy, we next asked whether D.cerebrum possess sensory organs that can detect pressure and particle motion, as required by Schuijf’s model, or sensory organs that can detect a difference in pressure amplitude along the azimuth, as required by the P-ILD and P-ITD mechanisms.

The hearing apparatus

On the basis of micro-computed tomography (micro-CT) and optical vibrometry, we characterized D.cerebrum’s auditory organs to narrow down the candidate mechanisms for directional hearing. We started by visualizing the anatomy of D.cerebrum’s hearing apparatus with micro-CT imaging of a phosphomolybdic acid-stained sample (Methods). We segmented the main components of the hearing apparatus such as the swim bladder, the labyrinths, the otoliths (lapillus, asteriscus and sagitta) and otolithic end organs (utricle, lagena and saccule), the ossicles of the Weberian apparatus (including tripus and scaphium), and the lymphatic chambers that engulf the lagena and the saccule (Fig. 3a and Supplementary Video 2).

Fig. 3: The D.cerebrum hearing apparatus can detect pressure and particle motion separately.
figure 3

a, Segmentation of the hearing apparatus based on micro-CT. The fields of view for vibrometry phase maps in ce are indicated. b, Illustration of the vibrometry method. A laser-scanning confocal reflectance microscope is used to image the motion of auditory structures in a sound field. Each pixel is sampled at four sound phases to record sound-induced motion in the xy plane. See Methods and Extended Data Fig. 10 for details. ce, Particle velocity phase (φ) and amplitude (A) maps for motion along the left–right speaker axis; phase colour wheel shown in b, normalized to the amplitude Amax indicated above the maps. c, A pressure stimulus causes anti-phase motion of the tips of the tripus and scaphium along the medial–lateral axis (indicated with red and cyan arrows). d, Along the dorso-ventral axis, a pressure stimulus creates phase-lagged motion of the scaphium and sagitta with the surrounding tissue. e, A particle motion stimulus results in relative motion between the lapillus and surrounding tissue along the medial–lateral axis. Pooling over pixels within the region of interest, the displacement amplitude of the surrounding tissue is 1.24 µm ± 0.14 µm, and that of the lapillus is 0.95 µm ± 0.05 µm. At a relative phase of 0.14 π ± 0.05 π, the relative displacement can be as large as 0.56 µm (that is, 45% of the surrounding tissue motion). f, Indirect pathway for pressure sensing: in a pressure field, the swim bladder oscillates, which moves the tripus and the scaphium inwards and outwards. Lymphatic spaces probably couple the motion of the scaphium to the sagitta73,74. g, Direct pathway for particle motion sensing: the tissue of the fish couples to the particle motion of water, but the denser lapillus lags in phase and moves with lower amplitude, leading to a relative displacement that could be sensed by utricular hair cells underneath the lapillus. Observations shown in ce were repeated once, four times and once in other fish, respectively, with similar results. Arrows in f,g indicate direction of motion. Scale bars, 250 µm (a) and 100 µm (ce).

To study the motion of these auditory structures in response to the sound stimuli, we built an imaging vibrometer based on laser-scanning confocal reflectance microscopy: by time-gated acquisition of each pixel with respect to a continuous sinusoidal sound playback, it was possible to infer the relative phase of the motion of auditory structures in two dimensions (Fig. 3bMethods and Extended Data Fig. 10).

First, we investigated whether the hearing apparatus of D.cerebrum is capable of detecting pressure. To this end, we drove two opposite speakers in phase, thereby creating a ‘pressure-only’ stimulus. In this condition, we observed anti-phase motion of the left and right tripus and scaphium along the medial–lateral axis, periodically compressing lymphatic space (Fig. 3c,d). This motion stems from swim bladder compression and expansion in a pressure field. In addition, we observed rotational motion of the sagitta, part of the saccular end organ that lies embedded in an adjacent second lymphatic space (Fig. 3d). By contrast, we did not observe relative motion between the surrounding tissue and the lapillus or the asteriscus (Extended Data Fig. 11). We concluded that an indirect pressure sensing pathway exists in D.cerebrum and that the saccule may be its main end organ, in agreement with findings in other fishes38,39,61,62.

Second, we studied particle motion sensing by driving two opposite speakers in anti-phase. In this condition, with large particle motion but low-pressure signal, the lapillus, the otolith of the utricular end organ, moved with a phase lag (about 0.14 π ± 0.05 π) and at a lower amplitude (about 76% ± 10%) than the surrounding tissue (Fig. 3e). This creates a relative motion that is expected to stimulate the underlying hair cell epithelium (direct pathway). We did not detect such relative motion for the tripus, the sagitta and the asteriscus. Note that we used stimuli with particle motion along the mediolateral axis relevant for the left–right startle behaviour. To support directional hearing in three dimensions, D.cerebrum may have further direct motion sensing pathways along additional axes, in line with particle motion tuning in saccular63,64 or lagenar63,65 afferents in other species. In summary, we did not find two pressure sensors that could detect a pressure level difference (P-ILD) or time difference (P-ITD) along the binaural axis, but rather a single pressure sensor (Fig. 3f) and a set of particle motion sensors (Fig. 3g).

We consider whether D.cerebrum could have another pressure sensor that the vibrometry measurements did not detect. Like in other cyprinids, their swim bladder is divided into an anterior and a posterior part. To rule out D.cerebrum using a pressure difference between these divisions, we repeated our behavioural analysis for only those startles in which the anterior–posterior axis of the fish was near-orthogonal to the axis of sound presentation, giving equivalent results (Extended Data Fig. 12a). Theoretically, D.cerebrum might possess other, unknown pressure sensors to implement P-ILD or P-ITD. However, all known sound pressure sensors in fish are based on compressible gas-filled organs. As gas-filled structures have a high micro-CT contrast, hypothetical pressure sensing organs would have to be either microscopic to evade detection, or based on an unknown principle of sound pressure transduction without compressible gas. Neither of these options is supported by our current knowledge of fish biology and physics of sound.

We therefore reject P-ILD and P-ITD as plausible mechanisms. Instead, the D.cerebrum anatomy is well suited for implementing Schuijf’s model for directional hearing.

Discussion

In this study, we report direct experimental evidence for directional hearing in fish and identify its underlying biophysical mechanism. Directional startles were acoustically elicited in male and female D.cerebrum, irrespective of lateral line ablation. A selective inversion of the pressure component tricked D.cerebrum into performing a startle towards the active speaker rather than away, consistent with the hypothesis that fish compare pressure and particle motion signals to infer direction (Schuijf’s model). On the basis of anatomical and behavioural data, we rejected all known alternative models, including binaural models for directional hearing. Using optical vibrometry, we confirmed the existence of a direct pathway for particle motion sensing and an indirect pathway for pressure sensing in D.cerebrum. Hence, the D.cerebrum hearing apparatus supports a dual sense of hearing that allows for a comparison between pressure and particle motion. Together, these findings suggest that Schuijf’s model is actually implemented in nature (Fig. 4).

Fig. 4: Evidence for Schuijf’s model of phase comparison.
figure 4

We consider seven models for directional hearing that depend on different sensory structures and predict different behaviours (an eighth lateral line (LL)-based mechanism can be ruled out as we observe directional behaviour despite lateral line ablation). Interaural time difference (M-ITD) and level difference (M-ILD) mechanisms based on particle motion can be rejected on the basis of the behavioural data in the trick configuration. A strategy that is based on escaping positive (M-polarity (+)) or negative (M-polarity (−)) initial motion can be rejected as inverted polarity waveforms fail to invert the startle direction. Finally, a mechanism based on sensing pressure level or time differences (P-ILD, P-ITD)—consistent with behavioural data—can be rejected as D.cerebrum possess only a single pressure sensor. This leaves Schuijf’s model as the one that correctly predicts an inversion of startle direction in the trick configuration and that is based on sensory cues that D.cerebrum is able to sense.

Our work builds on a large body of pioneering work on fish hearing, summarized in Supplementary Table 1. To provide an overview of past evidence for directional hearing in fish, we categorized publications into five study types. We indicated to what degree authors controlled acoustic variables and whether they could judge their importance to directional hearing. These previous studies faced a trade-off: they were carried out either in open water, in which reverberations are negligible, but experiments are challenging so that only a few fish were tested66, or in the laboratory, in which more fish could be tested but pressure and particle motion were not fully controlled, and the lateral line function could not be ruled out as a near-field sense that aids directional hearing. Here we addressed these limitations by measuring the impulse response of our speakers, actively cancelling reverberations and precisely controlling natural and unnatural stimuli.

Hearing in humans refers to the perception of pressure oscillations. We have shown that D.cerebrum instead have a dual sense of hearing comprising pressure and particle motion sensing, which is used for directional hearing. It could be that the acoustic world is much richer for fish than for humans, with acoustic events carrying stereotypic dual pressure-motion signatures67,68 that may even reveal their distance69. As D.cerebrum is a vocal species, directional hearing may also have a social function.

Schuijf’s model has recently been extended through a proposal that fish compute the time-averaged product of pressure and motion (the acoustic intensity vector)6. This theory can account for phonotaxis of plainfin midshipmen towards monopole and dipole sources17,18. Future neurophysiological work may test whether D.cerebrum implements Schuijf’s model this way.

D.cerebrum shares the Weberian apparatus with other otophysans, a superorder that includes 66% of living freshwater fish species and 15% of all living vertebrate species33,70. Even in fishes lacking the Weberian apparatus, otolithic end organs may still inherit pressure-induced swim bladder motion38,62,71,72. Hence, Schuijf’s model may have widespread applicability.

Methods

Animals

All animal experiments conformed to Berlin state, German federal and European Union animal welfare regulations and were approved by the LAGeSo, the Berlin authority for animal experiments. D.cerebrum were kept in commercial zebrafish aquaria (Tecniplast) with the following water parameters: pH 7.3, conductivity 350 µS cm−1, temperature 27 °C. We used male and female adult fish between 4 and 11 months of age.

Behavioural setup and protocol

The experimental setup comprised an inner 10 cm × 10 cm (length × width) tank with <200 µm thin optically opaque but acoustically transparent polypropylene sheet walls (cut out of plastic folders), surrounded by an outer tank with submerged speakers (4 × 3 Ekulit LSF-27M/SC 8 Ω in custom waterproof enclosures). Thus, the speakers were visually shielded from the fish inside the inner tank, and the fish were confined between the speakers (Fig. 1c provides a schematic; further details are provided in Extended Data Fig. 1a,b). The height of the water was 10 cm, and the transparent bottom of the inner tank was at 6.3 cm, leaving 3.7 cm to the water surface as a water column for the fish to swim in. The speakers were level with this water column, and all sounds were targeted for this water column. Infrared light-emitting diodes illuminated the fish from below. The inner tank was filmed with an overhead camera at 120 fps at 336 × 336 pixel resolution, and live tracking of the fish was carried out on a subset of frames at 15 fps. White light-emitting diodes lit the setup indirectly via reflections from the room walls. The room and water temperature was kept at 27 °C.

Each fish was tested once, and one fish was tested at a time. In the first minutes of the recording, a 10 cm × 10 cm acrylic plate with centimetre markings was placed in the inner tank to match the sound calibration grid to the video frame. Three minutes after placing the fish in the inner tank, playbacks were triggered for 45 min.

To probe into left–right directional hearing, playbacks from the front or back of the fish should be avoided. We prompted D.cerebrum to align with respect to the left–right speaker x axis to increase experimental throughput: previously, we observed that D.cerebrum swim closer to white than to black walls. By using two black plastic films as walls across the x axis and two white films across the y axis, we encouraged D.cerebrum to oscillate between the white walls, along the y axis (Extended Data Fig. 1c). Consequently, the ratio of distance covered along the y axis to the distance covered along the x axis was 1.6. Sound playback occurred only when fish were orthogonally oriented within a 45° angle measured from the orthogonal axis and within a 1.5 cm × 3 cm trigger zone at the centre, leaving at least 3.5 cm distance to the nearest wall (the typical startle displacement is mean ± s.d. = 1 cm ± 0.4 cm after the first 50 ms). The minimal delay between playbacks was set to ≥5 s with a minimum delay of 5 s plus a random delay, drawn from an exponential distribution with a mean of 5 s for each trial. This paradigm averaged to about 1.3 playbacks per minute in untreated fish and to about 0.6 playbacks per minute in lateral line-ablated fish.

Twelve target sounds were generated from a recorded pressure waveform (see the section of the Methods entitled Sound stimulation waveforms), targeted to the fish’s current position to cancel reverberations (see the section of the Methods entitled Calibration and reverberation cancellation), and presented to the fish in random order following trigger events using custom-written code in Python 3.

The data in Figs. 14 and Extended Data Figs. 47 and 12a stem from 65 untreated fish (3,798 playbacks, 1,415 startles, about 37% startles). For each stimulus, we indicated the number of fish that responded with at least one startle. The same experiment, also comprising 12 sound configurations, was repeated with 74 lateral line-ablated fish (Extended Data Figs. 7 and 9; 2013 playbacks, 910 startles, 45% startles). A third sound playback experiment was carried out in the dark in 43 untreated fish, testing a subset of 4 sound configurations (Extended Data Fig. 12b).

Behavioural analysis

Tracking

Pose tracking of D.cerebrum’s swimming behaviour was carried out with SLEAP75. In total, 140 frames across nine random recordings of male and female fish were hand-labelled with a skeleton consisting of 7 equidistant nodes along the fish’s body segments and 2 additional nodes, 1 for each eye. The ‘single-animal’ model was used for training. The model parameters and the trained model are available at the G-Node repository (see Data availability).

Startle detection

Plotting the fish’s velocity against time around playback revealed a sharp increase in velocity after playback, clearly visible across all playbacks (Fig. 1e and Extended Data Fig. 4a). We defined a 25-ms time window around the time of peak velocity at which the speed distribution is bimodal and computed the average velocity in this time window for each trial to classify all playback trials with an average velocity above 17 cm s−1 as startle trials (Extended Data Fig. 4b,c). The remaining ones were classified as non-startle trials. The decision criterion based on speed also resulted in a clear separation in terms of body bend (Extended Data Fig. 4d).

Directional bias

To classify startles into left or right, we measured the fish’s x displacement during the first 50 ms after startle initiation (Fig. 1e). This duration was chosen because displacement heat maps at varying delays revealed that the initial, lateral displacement phase of the startle response peaks after 50 ms (Extended Data Fig. 4f). Wherever centred trajectories are shown, these initial 50 ms are depicted. The directional bias of the startle response is the fraction of startles to one indicated direction (left or right, away or towards speaker). This bias can be computed in two ways.

Directional bias across trials and fish. For each stimulus or set of stimuli, startle trials were pooled across all fish, and the fraction of startles in one direction was calculated. Using the two-sided binomial test, we calculated how likely a measured directional bias (approach or escape) would have been observed if the response was unbiased.

Directional bias per fish. In the analysis of bias across trials and fish, theoretically, all trials could stem from one performing animal (not of concern here; Extended Data Fig. 5a,b). To complement this measure, we also quantified the directional bias per fish. We had 12 sound configurations in each experiment and startles averaged to a total of about 22 startles per fish in an experiment; hence, a meaningful per-fish bias could be computed only on pooled sound configurations and for fish with many startles. To estimate the directional bias of individual fish, we filtered for fish that had ≥10 startles in both the single-speaker condition (pooled over 4 stimuli) and the trick condition (pooled over 4 stimuli; Extended Data Fig. 6c,d). Although the value reflects directional behaviour in the population and estimates fish-to-fish variability, it selects for fish that trigger many playbacks and startle often.

Micro-CT

A 12-month-old male wild-type D.cerebrum was euthanized by ice shock and fixed with 4% paraformaldehyde in phosphate-buffered saline (PBS) at 4 °C overnight. The next day, the fish was washed for 15 min in PBS before being stained with 5% phosphomolybdic acid (Sigma Aldrich) solution in PBS at 4 °C overnight. After staining, the fish was washed in PBS for 15 min before embedding in 1% PBS-buffered agarose inside a cryo tube. The micro-CT scan was carried out at the ANATOMIX beamline at SOLEIL synchrotron by XPLORAYTION. The sample was placed into a 40-keV polychromatic (white) X-ray beam. A scan consisted of 3,200 projections collected at about ×10 optical magnification by a digital camera (Orca Flash 4.0 V2) with a sensor pixel size of 6.5 µm at 150 ms exposure time, yielding an effective pixel size of 0.6485 µm. The registered data were binned to 1.2970 µm voxel size. Key structures of the hearing apparatus were manually segmented. To this end, planes were hand-labelled using 3D Slicer76 (v5.6, https://slicer.org) and then interpolated using Biomedisa (v23)77. FIJI ImageJ (v1.5)78 was used to convert between different file types. The segments were turned into mesh grids and loaded into Blender for cleaning and rendering.

Lateral line ablation and DASPEI staining

To rule out that the lateral line organ senses sound directionality in our experiments, we ablated the lateral line using neomycin79. To ablate the neuromasts, fish were placed in a 200 µM neomycin solution for about 30 min. Afterwards, they were transferred to a beaker with tank water. Behavioural experiments started after ≥30 min. To confirm the reliability of the lateral line ablation protocol, we stained 30 neomycin-treated fish. After the behaviour experiment, they were transferred to a 100 µM DASPEI (2-[4-(dimethylamino)styryl]-1-ethylpyridinium iodide) solution and then to a beaker with tank water to wash out unbound DASPEI. Afterwards, the fish were euthanized with an ice shock and imaged with an epifluorescence microscope. Neuromasts were reliably stained in control fish but not in neomycin-treated fish, indicating reliable ablation (see Extended Data Fig. 9e,f for example images). As functional metrics we report an increase in number of wall contacts after startles (Extended Data Fig. 9g) and a decrease in foraging strikes in the dark (Extended Data Fig. 9h) in neomycin-treated fish.

Vibrometry

Confocal microscope

The confocal reflectance microscope was based on a custom-built laser-scanning two-photon microscope (Extended Data Fig. 10a). The illumination source was a Ti:sapphire laser (MaiTai DeepSee; SpectraPhysics) operated at 810 nm (with or without mode-locking). Before entering a laser-scanning two-photon microscope, the beam passed through a 90:10 beam splitter (90% reflection, 10% transmission). The light back-scattered by the fish inner structures was descanned, reflected by the 90:10 beam splitter, and then focused by a lens (f = 50 mm) into a single-mode fibre (core diameter: 25 µm, numerical aperture: 0.1) acting as a confocal pinhole. The microscope was controlled by custom-written software (https://github.com/danionella/lsmaq).

Acoustic stimulation

Fish were anaesthetized in 120 mg l−1 fish water-buffered MS-222. They were subsequently placed on a preformed agarose mould, which allowed the gill covers to move freely, and immobilized with 2% low-melting-point agarose (melting point 25 °C). A flow of aerated aquarium water (with anaesthetic) was delivered to their mouth through a glass capillary.

The fish was acoustically stimulated using two facing speakers sealed in custom-made waterproof enclosures. The diaphragms were exposed to water. The speakers were each placed about 1.3 cm away from the fish. They were driven using a DAQ card (National Instruments USB-6211), connected through audio amplifiers (Kemo M031N, 3.5 W). Pressures of up to about 176 dB (referenced to 1 µPa) were thus generated at the fish position in the pressure-only configuration and particle motion of up to about 8 mm s−1 in the particle-motion-only configuration, consistent with the expected amplitude relationship between pressure and particle motion in the sound monopole near field.

Motion phase maps

The principle of the laser-scanning vibrometric measurement is illustrated in Fig. 3b and Extended Data Fig. 10b. The sample (Extended Data Fig. 10b(i)) was stimulated with an acoustic sinusoidal wave at frequency \({f}_{{\rm{stim}}}\), and imaged with a laser-scanning microscope with a line rate \({f}_{{\rm{scan}}}\) (Extended Data Fig. 10b(ii)).

To reconstruct amplitudes and relative phases of sinusoidal object motion, we needed to measure each pixel under more than two different phases according to the Shannon–Nyquist sampling theorem. As noise can influence this measurement, we used four phase steps here, ensuring proper phase reconstruction while keeping acquisition sessions reasonably short.

To reconstruct the displacement of the moving structures inside the fish, each line of the image was repeatedly scanned \({\rm{nStep}}=4\) times, with a phase offset of π/2 between each line (Extended Data Fig. 10b(iii)). To this end, the stimulation frequency and the line rate must follow the relationship:

$${f}_{{\rm{stim}}}={(N+1/{\rm{nStep}})f}_{{\rm{scan}}}$$

with N being an integer. To maximize the line rate, we took \(N={\rm{floor}}(\,{f}_{{\rm{stim}}}/{f}_{{\rm{scan}}}\,).\)

This in turn set additional constraints on the various scanning parameters. We used \({f}_{{\rm{scan}}}=800\,{\rm{Hz}}\) and \({f}_{{\rm{stim}}}=\mathrm{1,000}\,{\rm{Hz}}\) for the data presented in Fig. 3 and Extended Data Fig. 11.

To ensure repeatable measurements, the acoustic stimulation and the galvanometric scanning mirrors were synchronized so that each pixel was recorded at a known sound phase. This was achieved by triggering the sound generation on each single frame scan trigger.

Doing so, each pixel was stroboscopically probed at \({\rm{nStep}}=\,4\) different phases of the acoustic stimulation cycle. As sound propagates while scanning two consecutive pixels, the probed acoustic phase is shifted by \(2{\rm{\pi }}\times {\rm{pixelPeriod}}\), which was taken into account in the motion reconstruction of the imaged structures (Extended Data Fig. 10c). These images were then reshaped to yield an (Nx,Ny,nStep) dataset (Extended Data Fig. 10b(iv)).

To analyse the motion of the inner structures of the fish, we used Matlab 2019b and a particle image velocimetry toolbox PIVlab80, originally developed to characterize the motion of flowing particles for fluid mechanics. Essentially, the particle displacement is assessed by cross-correlating subregions with decreasing sizes of consecutive images (Extended Data Fig. 10b(v)). The contrast of the reflectance images was enhanced before the displacement analysis, and the results were curated in post-processing by removing outliers and interpolating detection gaps.

The motion detection yielded x- and y-displacement maps at each of the four phases in the acoustic stimulation period. The first Fourier component was computed for each pixel to extract the amplitude and phase of the local displacement (Extended Data Fig. 10b(vi)). The phase was finally corrected for the accumulating phase offset along the horizontal x direction due to the line scanning procedure (Extended Data Fig. 10c). Owing to the synchronization of the acoustic stimulation with the line scanning process, we could carry out this measurement in several planes and obtain a consistent volumetric complex map characterizing the motion response of the various inner structures to the acoustic stimulation. Maximum-amplitude projections across planes delivered the shown two-dimensional phase maps, one for motion along the speaker–speaker axis (x) and one for motion orthogonal to the speaker–speaker axis (y).

Sound stimulation waveforms

We reasoned that D.cerebrum sense pressure and particle motion. Hence, our sound stimuli were defined in terms of three quantities: pressure, x acceleration and y acceleration, which were delivered to the fish’s current position by utilizing the frequency responses of speakers to cancel position-dependent reverberations (see the section of the Methods entitled Calibration and reverberation cancellation). y acceleration was always kept at zero, and only pressure and x acceleration were varied. In summary, 12 sounds were generated from a recorded pressure waveform and presented to the fish in a random sequence upon trigger events. The 12 sounds consisted of four single-speaker sounds (left or right × positive polarity or negative polarity), two sounds with only a pressure component (positive polarity or negative polarity), two sounds with only horizontal x-motion components (positive polarity or negative polarity) and four trick conditions, which exactly matched the four single-speaker target waveforms, but differed by the speakers that were active to realize these.

We observed that D.cerebrum startle when we drop a cylindrical piece of rubber into the water. We recorded the pressure waveform of this sound, high-pass filtered it at 100 Hz, and extracted a 12-ms snippet to serve as our pressure waveform template (note that conditioned sounds—that is, the actual speaker signals—were band-pass-filtered between 200 Hz and 1,200 Hz; see the following section). The target pressure amplitude was set to a peak sound pressure level of 167 dB (referenced to 1 µPa) by rescaling this pressure waveform accordingly. This amplitude was loud enough to elicit startles reliably and still supported by our small 2.7-cm-diameter speakers. The first peak’s rise time (10% to 90% absolute amplitude) was 0.664 ms and the centre frequency of the pulse was about 780 Hz. The target horizontal particle acceleration waveform was computed from the pressure waveform using monopole theory for each Fourier component, as follows.

The pressure signal decays as \(1/r\) with radial distance \(r\) away from a sound monopole with amplitude \({p}_{0}\) at distance \({r}_{0}\)

$$\widehat{p}(r,t)={\widehat{p}}_{0}\frac{{r}_{0}{{\rm{e}}}^{ikr}}{r}{{\rm{e}}}^{-i\omega t}$$

and with frequency f, \(\omega =2{\rm{\pi }}f\), wavenumber \(k=2{\rm{\pi }}/\lambda \), wavelength \(\lambda \) and speed of sound \({c}=\,\lambda f\).

In a medium of density \(\rho \), the radial particle velocity decays quadratically with distance in the near field (\({kr}\ll 1\), limit dependent on frequency):

$${\hat{v}}_{r}(r,t)=\left[\frac{1}{\rho c}\left(1+\frac{i}{{kr}}\right)\hat{p}(r,t)\right]$$

By contrast, particle acceleration—the temporal derivative of particle velocity—decays quadratically with distance for nearby sounds (\(r\ll 1\), limit independent of frequency):

$${\hat{a}}_{r}(r,t)=-\,i\omega {\hat{v}}_{r}(r,t)=\frac{1}{\rho }\left(\frac{1}{r}-{ik}\right)\hat{p}(r,t)$$

To compute the particle acceleration \({a}_{r}(r,t)\) at a distance \(r\) to a sound monopole with pressure \(p(r,t)\) for discrete signals of arbitrary waveform, we applied this equation separately for each Fourier component. Given a pressure waveform \(\{{{\bf{p}}}_{n}\}\,:={p}_{0}{,p}_{1},\cdots ,{p}_{N-1}\) with \(N\) samples \({p}_{n}\), spaced at \(T=1/{sr}\) with sample rate \({sr}\), the particle acceleration \(\{{{\bf{a}}}_{n}\}\,:={a}_{0}{,a}_{1},\cdots ,{a}_{N-1}\) that would be observed at a distance \(r={r}_{0}\) from a sound monopole was calculated by carrying out the discrete Fourier transform \(\{{{\bf{P}}}_{l}\}\,:={P}_{0}{,P}_{1},\cdots ,{P}_{N-1}\)

$${P}_{l}=\mathop{\sum }\limits_{n=0}^{N-1}{p}_{n}{{\rm{e}}}^{-\frac{i2{\rm{\pi }}}{N}{\rm{ln}}}$$

and deriving particle acceleration for each Fourier component \(\{{{\bf{A}}}_{l}\}\,:=\,{A}_{0}{,A}_{1},\cdots ,{A}_{N-1}\) independently. With corresponding frequencies \({f}_{l}\approx l/({NT})\), such that \(k\approx 2\pi l/({NTc})\), and the relationship between pressure and particle acceleration, \({A}_{l}\), is calculated as

$${A}_{l}=\frac{1}{\rho }\left(\frac{1}{{r}_{0}}-i{k}_{l}\right){P}_{l}$$

which defines the radial particle acceleration through the inverse Fourier transform:

$${a}_{n}=\frac{1}{N}\mathop{\sum }\limits_{l=0}^{N-1}{A}_{l}\,{{\rm{e}}}^{i\frac{2{\rm{\pi }}}{N}{\rm{ln}}}$$

In the experiments, \({r}_{0}\) was set to 3 cm, thus simulating a monopole sound source at 3 cm, irrespective of D.cerebrum’s relative position to the speakers. This resulted in a peak particle acceleration of 7.59 m s−2. Other parameters used were \(c=\,\mathrm{1,500}\,{\rm{m}}\,{{\rm{s}}}^{-1}\), \(\rho \,=\,\mathrm{1,000}\,{\rm{kg}}\,{{\rm{m}}}^{-3}\) and \({\rm{sr}}=\,\mathrm{51,200}\,{\rm{Hz}}\). In terms of pressure, x acceleration and y acceleration (p, ax and ay), there were eight different target configurations, with ‘+’ indicating polarity of the template waveform and ‘−’ indicating opposite polarity: four monopole configurations (+,+,0), (−,−,0), (+,−,0) and (−,+,0); two pressure configurations (+,0,0) and (−,0,0); and two motion configurations (0,+,0) and (0,−,0). Despite a total of eight target configurations, there were 12 sound configurations as the four monopole configurations can be realized in two ways, either with a single speaker or with three speakers (trick configuration; see the next section).

Calibration and reverberation cancellation

Conducting experiments in small tanks presents challenges as both tank geometry and the receiver’s position affect the sound amplitude and waveform sensed by the receiver (Extended Data Fig. 2c). By recording the speakers’ impulse responses inside the inner tank in terms of pressure and particle acceleration (Extended Data Fig. 2b), speakers could be activated to precisely control pressure and particle acceleration components at the fish’s location (Extended Data Fig. 2d).

Pressure and acceleration measurements

Pressure. Pressure was measured with a hydrophone (Aquarian Scientific AS-1, preamplifier: Aquarian Scientific PA-4, acquisition: NI-9231 sound and vibration module, National Instruments; Extended Data Fig. 1d). During repeated playback of the same sound, a single hydrophone was automatically moved across a 5 × 5 grid inside the inner tank, sampling with a spacing of 1.5 cm (Extended Data Fig. 1c,e). Hence, a 25-point pressure field was obtained for each sound configuration, spanning a 6 cm × 6 cm square at the centre of the inner tank between the speakers.

Acceleration. Particle acceleration was measured in two ways.

In the first method, particle acceleration was measured indirectly through the pressure gradient. Newton’s second law of motion (pressure gradient force)

$${\bf{a}}=-\frac{1}{\rho }{\boldsymbol{\nabla }}P$$

links the spatial pressure gradient to particle acceleration. In water, with density \(\rho =\mathrm{1,000}\,{\rm{kg}}\,{{\rm{m}}}^{-3}\) and speed of sound \(c=\mathrm{1,500}\,{\rm{m}}\,{{\rm{s}}}^{-1}\), the following approximation holds for pressure signal frequencies \(f\ll \,100\,{\rm{kHz}}\), if the pressure gradient is sampled with step size \({x}_{2}-{x}_{1}=\,1.5\,{\rm{cm}}\):

$${a}_{x}=-\frac{1}{\rho }\frac{p({x}_{2})-p({x}_{1})}{{x}_{2}-{x}_{1}}$$

The approximation holds for all frequencies used in this experiment. For measuring gradients, moving a single hydrophone is preferred over a hydrophone array, as the gradient could be biased by small differences in hydrophone sensitivity and perturbations of the sound field by the presence of other hydrophones. We calculated x and y acceleration on the basis of the 25-point pressure field recorded with a single hydrophone. The pressure field included points outside the trigger zone to compute pressure gradients (that is, acceleration) across the trigger zone boundary.

In the second method, particle acceleration was additionally directly measured along all three axes with an acceleration sensor (Triaxial ICP - Model 356A45, PCB Piezotronics, acquired with NI-9231 sound and vibration module, National Instruments; Extended Data Fig. 1d). Like the hydrophone, the acceleration sensor was moved across all 5 × 5 grid positions during repeated playback of the same sound, giving measurements for x, y and z acceleration.

Whereas hydrophones are manufactured and calibrated for underwater use, the particle acceleration sensor is not made to measure particle acceleration underwater and is meant to be glued onto the vibrating object. Owing to an acoustic impedance mismatch between metal and water, we expected the PCB sensor to underestimate particle acceleration.

We compared x and y acceleration waveforms for both measurement methods and found that the acceleration waveforms acquired through the direct method match the waveforms acquired through the indirect method after multiplication by a factor of about 2.4. The close match in rescaled waveforms confirms the validity of the gradient approximation in the indirect method.

Hence, in all experiments, x and y acceleration were measured through the indirect method, on the basis of spatial pressure gradients. The particle acceleration sensor still proved useful in measuring the vertical z acceleration in our setup.

Impulse response-based sound targeting

To create the same sounds at any position inside the inner tank, impulse responses for all 4 speakers were measured across 25 positions on a 5 × 5 grid with 1.5-cm spacing. In the following, the sound targeting method is described for one position.

Let \({k}_{i,p}\) be the pressure impulse response kernel, \({k}_{i,{a}_{x}}\) be the x acceleration impulse response kernel, and \({k}_{i,{a}_{y}}\) be the y acceleration impulse response kernel for the ith speaker. Using \(M\) speakers with signal \({s}_{i}\), pressure and acceleration can be predicted through convolution (\(* \)):

$$\begin{array}{l}\,p=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,p}\ast {s}_{i}\\ {a}_{x}=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,{a}_{x}}\ast {s}_{i}\\ {a}_{y}=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,{a}_{y}}\ast {s}_{i}\end{array}$$

In the Fourier domain, utilizing the convolution theorem, these become a system of equations for each Fourier component \(l\).

$$\begin{array}{l}\,{P}_{l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,p,l}\,{S}_{i,l}\\ {A}_{x,l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,{a}_{x},l}{S}_{i,l}\\ {A}_{y,l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,{a}_{y},l}{S}_{i,l}\end{array}$$

On the basis of the Fourier components of the target waveforms (see the section of the Methods entitled Sound stimulation waveforms), \({P}_{l}\), \({A}_{x,l}\) and \({A}_{y,l}\), and the Fourier components of the impulse response kernel \({K}_{i,p,l}\),\({K}_{i,{a}_{x},l}\) and \({K}_{i,{a}_{y},l}\), the system of equations can be solved for the Fourier components of the speaker signals \({S}_{i,l}\) as long as \({M}\ge 3\) and the kernel components are non-zero and non-identical. The time-domain signal for the ith speaker is then given by the inverse Fourier transform using components \({S}_{i,l}\).

To increase robustness of the solutions (for example, to avoid speakers cancelling themselves unnecessarily and to limit speaker amplitude), speaker signal waveforms were forced to become similar to the target waveform. This was implemented by solving the system of equations with a least-square solver (scipy.optimize.lsq_linear) with bounds \(-{B}_{i,l} < {S}_{i,l} < {B}_{i,l}\). The bound \({B}_{i,l}\) was computed as a rescaling of the absolute Fourier components of the target pressure waveform \({P}_{l}\)

$${B}_{l}={\alpha }_{i}\,y| {P}_{l}| $$

in which \(\gamma \) is fixed and scales pressure to voltage and \({\alpha }_{i}\) is a rescaling parameter set independently for each speaker to give additional control over active speakers. We list our values for αi used in different sound configurations in Supplementary Table 2.

After conditioning, all computed speaker signals were band-pass-filtered between 200 Hz and 1,200 Hz to avoid activating the lateral line.

To ensure that the trick configuration differed from the single-speaker configuration only by selective pressure inversion, a two-step sound conditioning was carried out. First, the speaker signals for the single-speaker configuration were calculated. Then, these signals were effectively fixed to closely resemble the single-speaker signal and only activations of the two speakers along the orthogonal axis were conditioned.

The above calculation was carried out for the 25 grid positions. The computed speaker signals accurately delivered the target waveforms to the target position (Extended Data Figs. 2d and 3 and Supplementary Table 2). To ensure consistency over experiments, the water level was kept at precisely 10 cm, and the pressure and acceleration fields inside the inner tank were measured several times (this includes before the first recording and after the last recording).

During the experiment, the fish’s xy position was detected at 15 Hz, and the loading for the speakers was linearly interpolated on the basis of targeted sounds at neighbouring grid positions.

In the section entitled Sound stimulation waveforms, we describe how we defined the pressure and particle motion target waveforms that were conditioned this way.

Estimation of binaural cues (P-ILD, M-ILD, P-ITD, M-ITD)

ILDs

To estimate binaural cues in our behavioural experiment, we analysed the pressure and particle motion at sound calibration grid points 3 cm apart, (\({x}_{0}\)) 1.5 cm to the left and (\({x}_{1}\)) 1.5 cm to the right of the centre grid point. To estimate sign and peak amplitude of level differences, we calculated P-ILD (pressure ILD) as \({\max }_{t}({\rm{abs}}(p({x}_{0},t)))\,-{\max }_{t}({\rm{abs}}(p({x}_{1},t)))\) and M-ILD (particle motion ILD) as \({\max }_{t}({\rm{abs}}({a}_{x}({x}_{0},t)))\,-{\max }_{t}({\rm{abs}}({a}_{x}({x}_{1},t)))\). The level differences between these two points were divided by a factor 50 to estimate the level difference across the left-to-right inner ear axis of the fish (about 0.6 mm). Comparing the single-speaker configuration with the trick configuration, these data show that the sign of M-ILD remains the same (+0.11 m s−2 versus +0.30 m s−2), but the sign of P-ILD is inverted (+4.4 Pa versus −4.6 Pa). For a geometrical illustration of the inversion of P-ILD, see Extended Data Fig. 8d.

ITDs

ITDs were estimated by calculating the phase propagation in different sound configurations under a monopole approximation (Extended Data Fig. 8c–f).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.