An Audio-Visual Dataset for People Re-Identification and
Verification in Human-Robot Interaction


AveRobot is an audio-visual dataset of people vocalizing short sentences under simulated robot assistance scenarios.






They span different nationalities (e.g. Chinese, Indian, Spanish), ages (AVG: 27; S.D. 11), and heights (AVG. 1.74m; S.D. 0.10m).


They are recorded in three locations (stairs, floor, lift) for each one of the three floors of a common university building.

Devices per User

Two floors host a video camera, a smartphone camera, and a compact camera; one floor only the last two types of device.

Sentences per User

Each person vocalizes 9 different sentences selected from a pre-defined list of 34 sentences (e.g. Where is the lift?)

Sample recorded on the first-floor lift

Sample recorded on the second-floor stairs

Sample recorded on the third-floor corridor


Some relevant specifications and statistics along the dataset.

# Model Type Resolution FpS Format Height Floor
1 Casio Exilim EXFH20 Compact Camera 1280 x 720 30 AVI 130 cm Floor 0
2 Huawei P10 Lite Smartphone Camera 1920 x 1080 30 MP4 130 cm Floor 0
3 Sony HDR-XR520VE Video Camera 1920 x 1080 30 MTS 120 cm Floor 1
4 Samsung NX1000 Compact Camera 1920 x 1080 30 MP4 120 cm Floor 1
5 iPhone 6S Smartphone Camera 1280 x 720 30 MOV 120 cm Floor 1
6 Sony DCR-SR90 Video Camera 702 x 576 25 MPG 150 cm Floor 2
7 Olympus VR310 Compact Camera 1280 x 720 30 AVI 150 cm Floor 2
8 Samsung Galaxy A5 Smartphone Camera 1280 x 720 30 MP4 150 cm Floor 2

The specifications of the recording devices used for the dataset construction


Gender per age distribution


User height distribution


Pronounced sentence distribution


The AveRobot videos are annotated to keep track of the participant's identity, the recorded floor, the recorded location in the floor, the pronounced sentence and the device id. The gender, the age, and the height of each participant is also provided. Hence, AveRobot is tailored for testing several applications within the Human-Robot Interaction (HRI) scenario, including:


Audio-Visual Re-Identification


Audio-Visual Verification


Audio-Visual Speech Recognition


If you use AveRobot in your research, please cite the following publication:

AveRobot: An Audio-visual Dataset for People Re-identification and Verification in Human-Robot Interaction

Marras M., Marin-Reyes P. A., Lorenzo-Navarro J. , Castrillon-Santana M., Fenu G.

{mirko.marras, fenu}@unica.it, {javier.lorenzo, modesto.castrillon}@ulpgc.es, pedro.marin102@alu.ulpgc.es

8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019)

Bibtex | Pre-Print | Publisher


Please complete the following steps for getting access to AveRobot:

  1. Download, fill, and sign the End User License Agreement (EULA).
  2. Send an email to the authors with subject "AveRobot Download Request" and the EULA as attachment.
  3. Follow the instructions that you will receive in our e-mail response.