Home Health Care Praying mantises use 3-D perception for hunting

Praying mantises use 3-D perception for hunting

Giugno 28, 2019

1496

In stunning images captured under the microscope for the first time, the neurons were found in praying mantises.

The work is published in Nature Communications today.

In a specially-designed insect cinema, the mantises were fitted with 3-D glasses and shown 3-D movies of simulated bugs while their brain activity was monitored.

When the image of the bug came into striking range for a predatory attack, scientist Dr. Ronny Rosner was able to record the activity of individual neurons.

Dr. Rosner, Research Associate in the Institute of Neuroscience at Newcastle University, is lead author of the paper.

He said: “This helps us answer how insects achieve surprisingly complex behaviour with such tiny brains and understanding this can help us develop simpler algorithms to develop better robot and machine vision.”

The “3-D neurons”

Praying mantises use 3-D perception, scientifically known as stereopsis, for hunting.

By using the disparity between the two retinas they are able to compute distances and trigger a strike of their forelegs when prey is within reach.

The neurons recorded were stained, revealing their shape which allowed the team to identify four classes of neuron likely to be involved in mantis stereopsis.

The images captured using a powerful microscope show the dendritic tree of a nerve cell – where the nerve cell receives inputs from the rest of the brain – believed to enable this behaviour.

Going the distance: Brain cells for 3D vision discovered — Praying mantis wearing tiny 3D glasses. Credit: Newcastle University, UK

Dr. Rosner explains: “Despite their tiny size, mantis brains contain a surprising number of neurons which seem specialised for 3-D vision.

This suggests that mantis depth perception is more complex than we thought.

And while these neurons compute distance, we still don’t know how exactly.

“Even so, as theirs are so much smaller than our own brains, we hope mantises can help us develop simpler algorithms for machine vision.”

The wider research programme which is funded by the Leverhulme Trust, is led by Professor Jenny Read, professor of Vision Science at Newcastle University.

She says: “In some ways, the properties in the mantises are similar to what we see in the visual cortex of primates.

When we see two very different species have independently evolved similar solutions like this, we know this must be a really good way of solving 3-D vision.

“But we’ve also found some feedback loops within the 3-D vision circuit which haven’t previously been reported in vertebrates.

Our 3-D vision may well include similar feedback loops, but they are much easier to identify in a less complex insect brain and this provides us with new avenues to explore.”

It’s the first time that anyone has identified specific neuron types in the brain of an invertebrate which are tuned to locations in 3-D space.

The Newcastle team intend to further develop their research to better understand the computation of the relatively simple brain of the praying mantis with the aim of developing simpler algorithms for machine and robot vision.

Stereoscopy and the Human Visual System

GETTING THE GEOMETRY RIGHT

What are we trying to do when we present stereo displays?

Are we trying to recreate the scene as a physically present viewer would have seen it, or simply give a good depth percept?

How should we capture and display the images to achieve each of these?

Vision science does not yet have answers regarding what makes a good depth percept, but in this section we aim to cover the geometrical constraints and lay out what is currently known about how the brain responds to violations of those constraints.

Puppet Theater

Figure 1 depicts a three-dimensional (3D) display reproducing a visual scene as a miniature model—a puppet theater, if you will—in front of the viewer. Conventional stereoscopic displays cannot recreate the optical wavefronts of a real visual scene.

For example, the images are all presented on the same physical screen; therefore, they cannot reproduce the varying accommodative demand of real objects at different distances.

In addition, they cannot provide the appropriate motion parallax as the viewer’s head moves left and right. However, they can in principle, reproduce the exact binocular disparities of a real scene.

In many instances, this is an impossible or inappropriate goal.

However, we argue that it is important to understand the underlying geometrical constraints of the “puppet theater” to understand what we are doing when we violate the constraints.

Thus, it is a helpful exercise to consider what we need to reproduce the disparities that would be created by a set of physical objects seen by the viewer.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f1.jpg — Figure 1
A visual scene as a miniature model in front of the viewer.

Epipolar Geometry and Vertical Disparity

The images we need to create depend on how they will be displayed.

In the real world, a point in space and the projection centers of the two eyes define a plane; this is the so-called epipolar plane.

To recreate this situation on a stereo 3D (S3D) display, we have to recreate such an epipolar plane.

Assume the images will be displayed on a screen frontoparallel to the viewer so that horizontal lines on the screen are parallel to the line joining the two eyes (Figs. 1 and and2).2).

This is approximately the case for home viewing of S3D television (TV).

The first constraint in this situation is that to simulate real objects in the puppet theater, there must be no vertical parallax on the screen; otherwise, the points on the display screen seen by the left and right eyes will not lie on an epipolar plane.

(We use the convention that parallax refers to separation on the screen and disparity refers to separation on the retina.) Figure 2 illustrates why.

Irrespective of where the eyes are looking (provided that the viewer’s head does not tilt to the side), the rays joining each eye to a single object in space intersect the screen at points that are displaced horizontally on the screen; that is, epipolar-plane geometry is preserved.

Thus, to simulate objects physically present in front of the viewer, the left and right images must be presented on the display screen with no vertical separation.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f2.jpg — Figure 2
An object in space and the centers of projection of the eyes define an epipolar plane. If the screen displaying the S3D content contains a vector parallel to the interocular axis, then the intersection of this plane with the screen is also parallel to the interocular axis. In the usual case, where the interocular axis is horizontal, this means that to reproduce the disparity of the real object, its two images must have zero vertical parallax. Their horizontal parallax depends on how far the simulated object is in front of or behind the screen.

What Happens When We Get the Geometry Wrong?

If the stereo images on the display do contain vertical parallax, they are not consistent with a physically present object. Vertical parallax can be introduced by obvious problems, such as misalignments of the cameras during filming or misalignments of the images during presentation, but they can also be introduced by more subtle issues, such as filming with converged cameras (“toe-in”).¹

These two sources of vertical parallax cause a change in the vertical disparities at the viewer’s retinas and are likely to affect the 3D percept.

Vertical Disparities Arising from Misalignments

The eyes move partly to minimize retinal disparities. For example, vergence eye movements work to ensure that the lines of sight of the two eyes intersect at a desired point in space.

Horizontal vergence (convergence or divergence) is triggered by horizontal disparities.

If the eyes are vertically misaligned, the lines of sight do not intersect in space. Instead, there is a constant vertical offset between the two eyes’ images (Fig. 3 (a)).

The human visual system contains self-correcting mechanisms designed to detect such a vertical offset and correct for it by moving the eyes back into alignment.^2–4

The eye movement that accomplishes this is a vertical vergence..

An external file that holds a picture, illustration, etc.
Object name is nihms379002f3.jpg

Figure 3

Different eye postures cause characteristic patterns of vertical disparity on the retina, largely independent of the scene viewed.

Here, the eyes view an array of points on a grid in space, directly in front of the viewer.

The eyes are not converged, so the points have a large horizontal disparity on the retina. (a) The eyes have a vertical vergence misalignment.

This introduces a constant vertical disparity across the retina. (b) The eyes are slightly cyclodiverged (rotated in opposite directions about the lines of sight). This introduces a shearlike pattern of vertical disparity across the retina.

In stereo displays, small vertical misalignments of the images activate vertical vergence.

This could occur, for example, if the cameras are misaligned because one camera is rotated about the axis joining the centers of the two cameras or, in a cinema, if the projectors are offset vertically. In these instances, the viewers automatically diverge their eyes vertically so as to remove the offset between the retinal images.

This happens automatically, so inexperienced viewers are usually not consciously aware of it. It is likely to cause fatigue and eyestrain if it persists.

Passive stereo displays in which the left and right images are presented on alternate pixel rows could introduce a constant vertical disparity corresponding to 1 pixel—if the left and right images were captured from vertically aligned cameras and then presented with an offset (Fig. 4 (a)).

If instead the images are captured at twice the vertical resolution of each eye’s image, with the left and right images pulled from the odd and even pixel rows, respectively, there is no overall vertical disparity but just slightly different sampling (Fig. 4 (b)).

In any case, the vertical disparity corresponding to 1 pixel viewed at 3 picture heights is only about a minute of arc, which is probably too small to cause eyestrain.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f4.jpg

Figure 4

Passive stereo in which left and right images that are displayed on different pixel rows (a) can introduce vertical parallax but (b) need not do so if created appropriately.

A similar situation occurs if the images are misaligned by being rotated about an axis perpendicular to the screen.

Again, the brain automatically seeks to null out rotations of up to a few degrees by rotating the eyes about the lines of sight, an eye movement known as cyclovergence (Fig. 3 (b)).⁵ This also produces discomfort, fatigue, and eyestrain.

Vertical Disparities Arising from Viewing Geometry

The human visual system (and human stereographers) works hard to avoid misalignments. But even if both eyes (or both cameras) are perfectly aligned, vertical disparities between the retinal (or filmed) images can still occur. Figure 5 shows two cameras converged on a square structure in front of them.

Because each camera is viewing the square obliquely, its image on the film is a trapezoid because of keystoning.

The corners of the square are thus in different vertical positions on the two films, and this violates epipolarplane geometry (Fig. 2).

isparities that is inconsistent with the original scene.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f5.jpg

Figure 5

Vertical parallax introduced by camera convergence.

One can deduce the relative alignment of the cameras from the pattern of vertical disparities.⁶

The exact position of objects in space can then be estimated by backprojecting from the retinal images.

The visual system uses the pattern of vertical disparities across the retina to interpret and scale the information available from horizontal disparities.^7–9

For this reason, vertical disparities in stereo displays may not just degrade the 3D experience but also produce systematic distortions in depth perception.

One well-known example is the induced effect.¹⁰ In this illusion, a vertical magnification of one eye’s image relative to the other causes a perception that the whole screen is slightly slanted, that is, rotated about a vertical axis.

This is thought to be because similar vertical magnification occurs naturally when we view a surface obliquely.

Estimating Convergence from Vertical Disparity

For the purpose of S3D displays, a pertinent example concerns vertical disparities associated with convergence.

For the brain to interpret 3D information correctly, it must estimate the current convergence angle with which it is viewing the world.

This is because, as Fig. 6 shows, a given retinal disparity can specify vastly different depth estimates depending on whether the eyes are converging on a point close to or far from the viewer.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f6.jpg

Figure 6

Mapping from disparity to depth depends on the convergence angle. In both panels, the eyes are fixating on the purple sphere. The retinal disparity between the two spheres is the same in both panels. (a) The sphere is close, so the eyes are more strongly converged. (b) The physical distance the eyes map onto is much larger when the convergence angle is smaller.

The brain has several sources of information about convergence. Some of these are independent of the visual content, for example, sensory information from the eye muscles. However, the pattern of vertical disparities also provides a purely retinal source of information. Consider the example in Fig. 5.

The equal and opposite keystoning in the two images instantly tells us that these images must have been acquired by converged cameras.

The larger the keystoning, the more converged the cameras.

An extensive vision science literature examines humans’ ability to use these cues.

This shows that humans use both retinal and extraretinal information about eye position.^11–13

As we expect from a well-engineered system, more weight is placed on whichever cue is most reliable. Generally, the visual system places more weight on the retinal information, relying on the physical convergence angle only when the retinal images are less informative.^11,14

For example, because the vertical disparity introduced by convergence is larger at the edge of the visual field, less weight is given to the retinal information when it is available only in the center of the visual field.¹³

These vertical disparities can have substantial effects on the experience of depth. In one experiment, the same horizontal disparity (10 arcmin) resulted in a perceived depth difference of 5 cm when the vertical disparity pattern indicated viewing at infinity but only 3 cm when the vertical disparity pattern indicated viewing at 28 cm—although in both cases, the physical viewing distance was 57 cm.⁹

Effects of Filming with Converged Cameras

Epipolar-plane geometry is relevant to the vexing issue of whether stereo content should be shot with camera axes parallel or converged (toe-in).

Some stereographers have argued that cameras should converge on the subject of interest in filming because the eyes converge in natural viewing.

While there are good reasons for filming toe-in, this particular justification is not correct. It depends on the fallacy that cameras during filming are equivalent to eyes during viewing.

This would be the case only if the images recorded during filming were presented directly to the audience’s retinas, without distortion. Instead, the images recorded during filming are presented on a screen that is usually roughly frontoparallel to the interocular axis (Fig. 2).

The images displayed on the screen are thus viewed obliquely by each eye, introducing keystoning at the retinas.

As described in Fig. 5, the retinal images therefore contain vertical disparities even if there is no vertical parallax on the screen.

If the images displayed on the screen have vertical parallax because they were captured with converged cameras, this adds to the vertical disparity introduced by the viewer’s own convergence.

The resulting vertical disparity indicates that the viewer’s eyes are more converged than they really are. As we have seen, this could potentially reduce the amount of perceived depth for a given horizontal disparity.

To correctly simulate physical objects, one should film with the camera axes parallel, as shown in Fig. 7.

To display the resulting images, one should shift them horizontally so that objects meant to have the same simulated distance as the screen distance have zero horizontal parallax on the screen.

Provided that the viewer keeps the interocular axis horizontal and parallel to the screen, this ensures that all objects have correct horizontal and vertical disparity on the retina, independent of the viewer’s convergence angle.

An external file that holds a picture, illustration, etc.
Object name is nihms379002f7.jpg

Figure 7

Filming with parallel camera axes.

Back-of-the-Envelope Calculations

To get a feel for how serious these effects might be, consider some back-of-the-envelope calculations.

For convergence on the midline (i.e., looking straight ahead, not to the left or right), vertical disparity is independent of scene structure and simply scales with convergence angle.

To close approximation, the retinal vertical disparity at different points in the visual field is given by the following equation⁸:[retinal vertical disparity] = [convergence angle] × 0.5^∗sin(2^∗elevation) × tan(azimuth),

where azimuth and elevation refer to location in the visual field. This equation is for the vertical disparity in natural viewing.

That is, even if an object is displayed with zero screen parallax, it still has a vertical disparity of 7 arcmin when viewed with 1° convergence at 20° elevation and 20° azimuth. The same equation can be used to compute the on-screen vertical disparity resulting from filming toed-in.

For example, what degree of toe-in is necessary to cause a 1-pixel vertical disparity? For 36mm film with a 50 mm focal length, the corners of the image are at an azimuth equal to 20° and elevation equal to 13°. If the 36mm is represented by 2048 pixels, a vertical disparity of 1 pixel is 1.2 arcmin. This can be caused by a toe-in of just 14 arcmin.

Is this enough to alter perception? Suppose that the images on the screen have a pattern of on-screen vertical parallax resulting from having been filmed toed-in:[on-screen vertical parallax] = [some scale factor K] × 0.5^∗sin(2^∗elevation) × tan(azimuth).

This combines with the natural vertical disparity, indicating the wrong convergence angle. The scale factor K, which has angular units, is the additional, artifactual component of the convergence estimate that would be added if the visual system worked solely on the retinal information.

Suppose the viewer is in an IMAX cinema, screen size 22 × 16 m, viewing it at a distance of one screen height: 16 m. The true convergence angle is therefore 14 arcmin. At the corner of the screen, elevation equals 27° and azimuth equals 35°.

Physical objects at the corners of the screen produce a retinal vertical disparity of 2.0 arcmin just because of the geometry.

Suppose the toed-in vertical parallax is such that it is just 1 cm even at the corners of the screen (clearly, it is smaller everywhere else). This means that the toe-in contributes an additional 2.1 arcmin of vertical disparity at elevation equals 27° and azimuth equals 35°. That is, the barely noticeable on-screen parallax more than doubles the vertical disparity at the retina; hence, the retinal cue to convergence is 29 arcmin instead of the physical value of 14 arcmin.

Roughly speaking, the convergence overestimate in degrees equals 180/π* [viewing distance] [on-screen vertical separation at (x,y)]/x/y.

In the preceding calculation, the viewing distance was 16 m and the vertical separation was 1 cm at x = 11 m and y = 8 m, implying a convergence angle that is too large by about 0.1°.

What implications might this convergence error have for perceived shape? Suppose the images accurately simulate a transparent sphere, with a 1 m radius, at the center of the screen. The sphere has an angular radius of 3.6°, and its front and back surfaces have a horizontal disparity of −0.93 arcmin and 0.82 arcmin, respectively.

If these disparities were interpreted with the actual viewing distance of 16 m and convergence of 14 arcmin, the viewer should correctly perceive a spherical object, with a 1 m radius, 16 m away.

But if the images are interpreted assuming a convergence of 29 arcmin and viewing distance of 8 m, then the on-screen parallax implies a spheroid with an aspect ratio of 2: that is, a radius of 0.5 m in the screen plane and just 0.25 m perpendicular to the plane of the screen. Thus, for the same horizontal parallax and the same viewing position, a supposedly spherical object could be perceived as flattened by a factor of 2 simply because of toed-in vertical parallax, even when this is just 1 cm at the corners of the screen.

In practice, the distortion may not be so obvious. For example, other powerful perspective and shading cues may indicate that the object is spherical. Nevertheless, these calculations suggest that small vertical parallax can potentially have a significant effect on perception.

As yet, little work has been done to investigate depth distortions caused by toed-in filming. From the vision science literature to date, we predict different effects for S3D cinema versus TV. In a cinema, the display typically occupies much of the visual field.

Thus, we expect convergence estimates to be dominated by the retinal information, rather than the physical value.

In this situation, the same horizontal disparities could produce measurably different depth percepts if acquired with converged camera axes versus parallel.

In home viewing of 3DTV, the visual periphery is generally stimulated by objects in the room.

These necessarily produce vertical disparities consistent with the viewer’s physical convergence angle, while vertical disparities within the relatively small TV screen are likely to have less effect. This means that horizontal parallax on the TV screen is likely to be converted into depth estimates using the viewer’s physical convergence angle. Thus, we expect the angle between the camera axes to have less effect on the depth perceived in this situation.

Interaxial Distance

The separation of the cameras during filming is another important topic.

To exactly recreate the puppet theater, one should film with the cameras one interocular distance apart.

However, stereographers regularly play with interaxial distance (i.e., the separation between the optical axes of the cameras).

For example, they might start with a large interaxial distance to produce measurable parallax in a shot of distant mountains and then reduce the interaxial distance as the scene changes to a close-up of a dragon on the mountain.

A remarkable recent experiment demonstrated that most observers are insensitive to changes in interaxial distance within a scene. Although we could detect the resulting changes in disparity if they occurred in isolation, when they occur within a given scene we do not perceive them, because we assume the objects stay the same size. In the words of the authors, “Humans ignore motion and stereo cues [to absolute size] in favor of a fictional stable world.”¹⁵

Why We Don’t Need to Get It Right

Ultimately, the central mystery for vision science may be why S3D TV and cinema works as well as they do.

By providing an additional, highly potent depth cue, S3D content risks alerting the visual system to errors it might have forgiven in two-dimensional (2D) content.

s an example, an actor’s head on a cinema screen may be 10 ft high, but we do not perceive it as gigantic. We could argue that this is because a 10 ft head viewed from a distance of 30 ft subtends the same angle on the retina as a 1 ft head viewed from 3 ft. Stereo displays, however, potentially provide depth information confirming that the actor is indeed gigantic.

In addition, stereo displays often depict disparities that are quite unnatural, that is, disparities that are physically impossible for any real scene to produce given the viewer’s eye position or disparities that conflict with highly reliable real-world statistics (mountains are hundreds of feet high, people are around 6 ft high, etc.).

This is reminiscent of the “uncanny valley” in robotics, where improving the realism of a simulated human can produce revulsion.¹⁶

Presumably, such conflicts are the reason a minority of people find S3D content disturbing or nauseating.

However, most of us find S3D content highly compelling despite these violations of the natural order.

An analogy can be drawn with the way we perceive most photographs as veridical depictions of the world.

We do not usually perceive objects in photographs as distorted or straight lines as curved, even though the image on our retina is substantially different from that produced by the real scene—unless we are viewing the photograph from the exact spot the camera was located to take it.¹⁷

It is not yet known to what extent this is a learned ability, raising the possibility that as stereo displays become more commonplace, our visual systems will become even better at interpreting them without adverse effects.

More information: A neuronal correlate of insect stereopsis. Ronny Rosner, Joss von Hadeln, Ghaith Tarawneh, Jenny C.A. Read. Nature Communications. DOI: 10.1038/s41467-019-10721-z

Journal information: Nature Communications
Provided by Newcastle University