Seeing the World through Your Eyes

University of Maryland, College Park

CVPR 2024 (Oral)

arXiv Code

From a handful of portrait photos of a person, we compute a 3D reconstruction of what they are observing using the eye reflections!

By placing realistic eye models in synthetic scenes, we perform full scene reconstruction using only the eye reflections.

Abstract

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we capture multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a radiance field beyond the camera's line of sight using portrait images containing eye reflections.
This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the iris textures and the scene reflections. To address these, our method jointly optimizes the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further present a regularization prior on the iris texture to improve scene reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, and lighting conditions, we demonstrate the feasibility of our approach to recover the radiance field using cornea reflections.

How we did it?

The cornea geometry is approximately the same across all healthy adults. Because of this fact, if we count the pixel size of a person's cornea in the image, we can compute exactly where their eyes are. Using this insight, we train the radiance field on the eye reflections by shooting rays from the camera, and reflecting them off the approximated eye geometry. To remove the iris from showing up in the reconstruction, we perform texture decomposition by simultaneously training a 2D texture map that learns the iris texture.
However, approximating the eye pose just from the image is always very noisy. To address this issue, we perform eye pose optimization which is critical for performance as we show below.

Eye pose optimization ablation

Without pose optimization

With pose optimization

Texture decomposition ablation

Without texture decomposition

With texture decomposition

Failure cases

When only small number of images used (4 or less) or if the motion is too small. The method can fail in reconstructing the geometry or include holes and missing regions in the reconstruction