← Projects

Camera-Based Human Tracking and Following Algorithm for Mobile Robots

Autonomous ground vehicle (AGV) following system using CNNs and Range Imaging

Overview

Developed a camera-based human tracking pipeline to enable the safe and efficient relocation of airport robots. This algorithm detects and tracks human movement in real time using a CNN and a 3D depth camera to create a dynamic path plan.

Technical Implementation

⚠️ Note on Source Code
Due to employer confidentiality, source code cannot be shared. This web page contains a sanitized technical overview and demo materials.

  • ROS2 node: Developed a ROS2 node that subscribes to an Intel RealSense D455's RGB and depth streams along with the camera info containing its camera intrinsics
  • Tracking model: Used Ultralytics' YOLOv11 tracking model to publish an annotated image and an ENU pose
  • Simulation: Created a Python script to generate custom Gazebo SDF worlds for simulation
  • Deployment: Built and optimized CUDA-enabled Dockerfiles to enable GPU-acceleration on an NVIDIA Jetson Orin NX
  • Operator UI: Built an app using HTML and Javascript to adjust parameters, toggle tracking, and visualize the current person being tracked in real time

Key Features

  • Multi-person tracking: Distinguishes between multiple people in a scene
  • Real-time inference: Detection publishes at 10 Hz with GPU-acceleration enabled
  • Following behavior: Maintains a fixed distance between the robot and the tracked person (adaptive cruise control)
  • Navigation integration: Seamless integration with ROS navigation stack
  • Validation: Comprehensive testing in simulation as well as real life
  • Tunable parameters: Tracking parameters can be adjusted before tracking is toggled
  • Live visualization: Visualization occurs in real time

Results & Future Work

The system successfully worked on the sim as well as the real robot. In the future, planned enhancements include integrating gesture-based control to remove the dependence of a carried mobile device and enabling more intuitive human-robot interaction. Additionally, implementing facial recognition will allow the system to identify authorized personnel, ensuring only those approved can be followed which will enhance overall security.

Stack

ROS2 Python HTML JS OpenCV Gazebo Computer Vision Deep Learning YOLO HRI CUDA