The researchers point to the challenge of online shopping, where customers sometimes mis-estimate the size of a product based on their expectations, discovering for example that a sweater purchased online is indeed lovely but sized for a doll not an adult.
This happens in part because the physical cues to size that are present when seeing an item in a store are typically eliminated when viewing photos online. Without seeing the physical object, customers base their expectations of familiar size on prior experience. Since most sweaters are sized for people, not dolls, the visual system assumes that an unfamiliar sweater is, too.
A research team, led by Canada Research Chair in Immersive Neuroscience Jody Culham, presented study participants with a variety of familiar objects like dice and sports balls in virtual reality and asked them to estimate the object sizes. The trick? Objects were presented not only at their typical ‘familiar’ sizes, but also at unusual sizes (e.g., die-sized Rubik’s cubes).
The researchers found that participants consistently perceived the virtual objects at the size they expected, rather than the actual presented size. This effect was much stronger in virtual reality than for real objects.
“It is promising to see advances in virtual reality and its applications, but there is still a lot we don’t understand about how we process information in virtual environments. If we need to rely heavily on past experiences to judge the size of objects in virtual reality, this suggests other visual cues to size may be less reliable than in the real world.”
Yet, the results of this study also have some promising implications.
“If we know that familiar objects can serve as strong size cues in virtual reality, we can use this information to our advantage,” said Anna Rzepka, a former student in the Culham Lab and co-first author on the study.
“Think about viewing an item in a scene where accurate size perception is crucial, such as when removing a tumor using image-guided surgery. Adding other familiar objects to the virtual scene could improve perception of the tumor’s size and location, leading to better outcomes.”
Depth perception in virtual reality
Depth perception can be defined as the ability to perceive the volume of objects as well as their relative position in three-dimensional space [6]. Egocentric depth perception thereby refers to the space between an observer and a reference, whereas exocentric distances concern the space between two external objects. In virtual environments, egocentric distances consistently tend to be underestimated [4, 7, 8], whereas [2] reported an overestimation of exocentric distances. Underestimations in particular affect egocentric distances larger than 1.0 m [8]. A relatively constant degree of underestimation between 2.0 and 7.0 m indicates a categorical rather than a continuous increase in distance compression [7].
Regarding the multisensory integration of depth cues, most literature focuses on visual perception and its interplay with proprioceptive and vestibular feedback resulting from active motion [9, 10]. Auditory [11] and haptic cues [12], in contrast, are likely to influence depth perception to some extent, but not necessarily applicable to all virtual environments. In the following, we thus focus on visual, proprioceptive, and vestibular information.
Visual depth perception
Visual depth perception is based on structural, pictorial, and motion-induced cues [6] (cf. S1 Fig). Structural depth cues refer to physical adjustments and anatomic relations between the two human eyes, including stereopsis, accomodation, and vergence [6]. Pictorial depth cues arise from features of a two-dimensional scene, such as occlusion, shadows, relative size and height in the visual field, linear and aerial perspective, texture gradient, and the arrangement of edges [4, 6]. Motion-induced visual cues, such as looming, optic flow, and motion parallax [4, 6, 13], further facilitate distance perception if either the spectator or objects in the visual scenery move.
Visual cues in VR may differ from physical environments. For stereoscopic displays, a dissociation of accomodation and vergence arises from presenting different images to both eyes, whereas the curvature of the lenses accommodates to the distance of the display [4, 14]. A lack of details may further limit the availability of pictorial depth cues. However, even in a photorealistic virtual environment displayed by a head-mounted camera, distances were underestimated by 23% (in comparison to only 4% in real world [7]). Similarly, visualizing a reference of known length did not result in more accurate judgments [8]. Hence, distance compression cannot primarily be attributed to a lack of visual details in simplified virtual surroundings or the cognitive misrepresentation of physical units.
Effects of locomotion
Active locomotion in terms of walking interaction seems to counteract distance compression in virtual environments [15–17]. It thereby appears more effective than other measures, such as presenting participants with a real-world reference [17]. Walking experience in a virtual environment was also shown to affect subsequent distance estimates in the physical world [16]: Prior to the walking interaction, estimates in real world were almost veridical, whereas post-interaction measurements increased by approximately 10%. In [18], however, accuracy only increased for distances that were equal to or smaller than those the participants had previously walked and the calibration of depth perception seemed most effective for larger distances.
In case of locomotion, not only visual, but also proprioceptive and vestibular feedback provides information on the distance covered. Investigating effects of optic flow in the absence of non-visual motion cues, [19] noted a persistent underestimation of the simulated distances, with larger deviations occurring at a shorter duration of the simulated movement. Although humans were thus able to interpret optic flow in terms of distance traveled, estimates were biased. Comparing depth perception in virtual and physical environments, [20] found a less pronounced effect of locomotion in VR. While again, virtual motion was inferred only from optic flow, actual walking provided vestibular and proprioceptive feedback in real world, possibly resulting in a higher gain from locomotion [20]. In the absence of vestibular feedback, [21] reported their subjects to rely primarily on visual information when assessing the distance traveled in comparison to a reference. Interestingly, however, they found proprioceptive feedback from cycling movements to enhance estimates, even if incongruent with the distance indicated by vision.
To distinguish the relative impact of different sensory modalities, the ratio between visually perceived and physically traveled distance may be adjusted. For ratios of 0.5, 1.0, and 2.0, [10] observed physical motion to have a stronger impact on distance estimates than visual perception. The authors thus assumed the sensitivity to visual cues to decrease in the presence of physical motion and interpreted their results as an example of sensory capture, with interoceptive cues overriding visual perception in case of conflicting information. For ratios of 0.7, 1.0, and 1.4, in contrast, [22] found estimates for multisensory conditions to range between unisensory conditions, implying that all available information influenced depth perception. They did, however, note a dominance of cues arising from physical movements for active locomotion, whereas visual cues seemed to prevail in passive locomotion. Elaborating on the differences between active and passive movements, they assumed vestibular cues to be more influential than proprioception, suggesting a linear weighted function to account for the integration of vestibular, proprioceptive, and visual cues.
While the previous results suggest locomotion to influence perceived distances via both visual and non-visual cues, [16] found optic flow to be not only insufficient to counteract distance compression in a blind walking task, but also irrelevant when proprioceptive and vestibular feedback were available. Such discrepancies may be related to the modalities used for the presentation and reproduction of distances. [10], for example, observed that participants strongly underestimated the distance of a visual target when walking towards it blindfolded, whereas estimates were relatively accurate when the distance was not presented visually but by passive motion. If distances were only represented visually, in contrast, they were matched relatively closely when simulating optic flow without actual movements. Visual and non-visual cues thus seem to yield specific and possibly even incongruent information. The performance in estimation tasks thereby depends on the agreement of sensory modalities used for encoding and reproducing distances. Hence, although active locomotion has been demonstrated to counteract the distance compression common to virtual environments, its effectiveness may vary for different types of distance estimates.
Methodologies for measuring depth perception
Because perceptual processes cannot be observed directly, distance estimates require subjects to express a mental state formed previously. Empirical data suggests that the mode of expression affects experimental results. [23], for example, instructed participants to either indicate when they felt the location of a reference had been reached or to adjust the location of this reference to a distance traveled previously. Distances under consideration ranged from 2 to 64 m. For distances beyond 12 m, the authors reported an underestimation of the traveled distance when placing an external object, whereas distances were overestimated when participants judged the moment they reached a given location. This effect was confirmed by a similar study conducted in a non-virtual environment for distances between 8 and 32 m [20].
Reviewing empirical user studies on egocentric distance perception in VR, [4] stressed the importance to acknowledge differences between measuring methodologies. Summarizing applicable methods, they differentiated between verbal estimates, perceptual matching, and visually directed actions. [24] furthermore distinguished visually guided and visually imagined actions based on differences between blindfolded and imaginary actions.
Verbal estimates
Verbal estimates require participants to indicate the perceived distance in a familiar or visible reference unit [4]. The target can either be visible during the judgment or participants can be blindfolded [4]. While this method does not require any translational motion and is fast and convenient to use, cognitive processing, a misrepresentation of physical measurement units, and prior knowledge might confound the results [4, 7, 24]. Estimates seem to be relatively precise for short distances [4, 8], whereas underestimation is exacerbated by large distances.
Perceptual matching
In perceptual matching, the size or distance of objects is compared to a given visual reference. With regard to VR, this reference is either virtual or must be memorized [4]. The corresponding action consists in either adjusting the size or distance of the virtual object or indicating the result of a mental comparison to the reference [24]. In the case of perceptual bisection, the midpoint of a distance is indicated, thereby providing information on relative depth perception [24].
Visually guided actions
Visually guided movements include throwing, walking and reaching as well as triangulated pointing. Common to all these measures is that the target is not visible during the distance quantification. [4] reported visually directed actions to be the most frequent measure of distance perception, with blind walking being particularly common. Although fairly accurate for a broad range of distances, cognitive processes such as counting steps might bias the results if participants are supposed to indicate a distance they previously walked to. To prevent such effects, triangulation tasks require participants to walk to a designated position and to subsequently indicate the assumed location of the object by pointing or stepping towards the corresponding direction [4].
Visually imagined actions
Visually imagined actions, with timed imagined walking being the most common variant, no longer require participants to actually perform a movement, but to indicate the expected time needed to do so [4]. Again, estimates can be given while the target is visible, as well as after subjects are blindfolded [24]. Just as verbal estimates, visually imagined actions are independent of spatial restrictions. However, estimates have to be compared to individual walking speed, usually measured prior to the actual experiment. Further variance is introduced by differences in the ability to imagine the walking process [4] and uncertainty as to whether participants mentally include phases of acceleration and deceleration.
reference link : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0224651#sec001
Original Research: Open access.
“Familiar size affects perception differently in virtual reality and the real world” by Anna M. Rzepka et al. Philosophical Transactions of the Royal Society B: Biological Sciences