GazeOnce360: Fisheye-Based 360° Multi-Person Gaze Estimation with Global-Local Feature Fusion

Zhuojiang Cai, Zhenghui Sun, Feng Lu*
State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University
*Corresponding Author. {caizhuojiang, sunzhenghui, lufeng}@buaa.edu.cn
GazeOnce360 Teaser Image

GazeOnce360 presents an efficient, end-to-end solution for multi-person 3D gaze estimation from a fisheye perspective, significantly improving robustness over existing multi-step methods.

Abstract

We present GazeOnce360, a novel end-to-end model for multi-person gaze estimation from a single tabletop-mounted upward-facing fisheye camera. Unlike conventional approaches that rely on forward-facing cameras in constrained viewpoints, we address the underexplored setting of estimating the 3D gaze direction of multiple people distributed across a 360° scene from an upward fisheye perspective. To support research in this setting, we introduce MPSGaze360, a large-scale synthetic dataset rendered using Unreal Engine, featuring diverse multi-person configurations with accurate 3D gaze and eye landmark annotations. Our model tackles the severe distortion and perspective variation inherent in fisheye imagery by incorporating rotational convolutions and eye landmark supervision. To better capture fine-grained eye features crucial for gaze estimation, we propose a dual-resolution architecture that fuses global low-resolution context with high-resolution local eye regions. Experimental results demonstrate the effectiveness of each component in our model. This work highlights the feasibility and potential of fisheye-based 360° gaze estimation in practical multi-person scenarios.

MPSGaze360 Dataset

Our MPSGaze360 dataset is generated using Unreal Engine 5 and MetaHuman. These examples demonstrate the realism and diversity of the data samples, as well as the accuracy of the annotations.

Model Architecture

GazeOnce360 Model Architecture

Our pipeline features a dual-resolution architecture that extracts global context via a low-resolution stream and fine-grained eye details via a high-resolution stream. This combination allows for efficient feature extraction.

Qualitative Results

GazeOnce360 Qualitative Results

Qualitative results on real fisheye images. Despite pure synthetic training, our method demonstrates promising generalization to real-world fisheye captures, effectively estimating 3D gaze across diverse people and head poses.

BibTeX

@misc{cai2026gazeonce360fisheyebased360degmultiperson,
      title={GazeOnce360: Fisheye-Based 360{\deg} Multi-Person Gaze Estimation with Global-Local Feature Fusion}, 
      author={Zhuojiang Cai and Zhenghui Sun and Feng Lu},
      year={2026},
      eprint={2603.17161},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.17161}, 
}