The hardware is the easy part. MANUS gloves track fingers. Xsens suits track bodies. VR headsets provide visual feedback. Every component in isolation is well-understood and well-documented.
What fails in most teleoperation projects is the layer between them: getting synchronized, calibrated, retargeted motion data from a human operator's body into a robot's joint controllers reliably, at low latency, without breaking. This is the middleware problem — and it is where the majority of XR-to-robot deployments stall.
This article maps the complete teleoperation stack layer by layer: what each component does, how they connect, where things break, and what a minimum viable working configuration looks like.
Why the Middleware Layer Is Where Stacks Fail
The naive assumption is that "gloves + suit = robot control." This is wrong in three specific ways:
- Human and robot kinematics don't match. A human index finger has different joint angles, segment lengths, and range of motion than any robot finger. Raw hand joint data cannot be sent directly to a robot without translation. This translation is retargeting, and it is non-trivial.
- Robots have hard constraints humans don't. Joint velocity limits, torque limits, singularities, and mechanical stops. If your retargeting layer ignores these, the robot controller will either clip the commands silently (causing unexpected behavior) or fault (causing the session to stop).
- No feedback loop means operators are flying blind. Teleoperation without haptic or visual feedback produces poor-quality demonstrations — operators over-grip, under-grip, and cannot feel when contact has been made. The feedback layer is not optional for quality data collection.
All three failures are solvable. This article explains how.
The 5-Layer Teleoperation Architecture
A complete teleoperation stack has five distinct layers. Each can be independently configured, debugged, and replaced. Understanding the separation matters: when something goes wrong, the layer boundaries tell you where to look.
MANUS Metagloves (hands), Xsens Link suit (full body), VR headset (head pose). Raw sensor measurements — electromagnetic field readings, IMU data, camera-based tracking. Nothing has been processed or interpreted yet.
MANUS Core and Xsens MVN software. Converts raw sensor data into a calibrated, anatomical skeleton — full-body pose including 25 DoF per hand. Handles sensor fusion, drift correction, and synchronization between gloves and suit. Outputs a clean skeleton stream.
Maps human skeleton joint angles to robot joint commands. Handles scale normalization (human vs robot geometry), kinematic structure mapping, IK solving, and robot constraint enforcement. This is the layer most teams underestimate. In ROS 2, this is typically a custom node or a set of configured nodes.
Robot-specific controllers that receive joint commands and execute them safely. Includes joint velocity and torque limiting, collision avoidance, and safety monitoring. Each robot platform has its own controller interface — ROS 2 controllers abstract most of this, but per-platform configuration is required.
Returns information from the robot to the operator. Visual feedback: camera stream displayed in a VR headset. Haptic feedback: MANUS Pro Haptic gloves receive vibrotactile signals when the robot hand makes contact with an object. Without this layer, teleoperation quality degrades significantly.
Layer 1: Human Input
The input layer determines the maximum data quality ceiling for the entire stack. Higher precision here does not automatically improve downstream robot control — retargeting and robot constraints set their own limits — but lower precision here propagates as errors through every subsequent layer.
MANUS Metagloves (hand layer)
Electromagnetic field tracking, 25 DoF per hand, continuous and occlusion-free. MANUS Core software publishes three simultaneous output streams: raw skeleton, retargeted skeleton (mapped to a configured target — human or robot), and raw sensor data for custom pipelines. The C++ SDK makes all three streams available on both Windows and Linux.
Xsens Link suit (body layer)
17 IMU sensors distributed across the body, up to 400Hz, with magnetic disturbance compensation for metal-rich lab environments. When paired with MANUS gloves, Xsens MVN replaces its estimated hand pose with the measured MANUS finger data — the integration happens at the software level in MANUS Core's MVN mode. The result is a single synchronized skeleton covering the full body plus 25 DoF hands.
See Article 2 for the full integration detail. For arm-only teleoperation without full-body humanoid control, the Xsens suit is optional — MANUS gloves alone provide sufficient input for hand and wrist teleoperation.
VR headset (head layer)
Quest 3 or Pico 4 Ultra Enterprise provide 6DoF head pose as an additional input stream. In humanoid teleoperation, head pose maps to the robot's neck or camera gimbal — the operator looks where they want the robot to look. The headset display simultaneously serves as the visual feedback channel (Layer 5).
Layer 2: Motion Processing
Raw sensor data is not directly usable. MANUS Core and Xsens MVN perform several critical functions before data reaches the retargeting layer:
- Sensor fusion — combining accelerometer, gyroscope, and magnetometer data from each Xsens IMU into a stable orientation estimate using Kalman filtering
- Hand solver — converting MANUS electromagnetic measurements into anatomically correct joint angles using the biomechanical hand model
- Temporal synchronization — aligning the MANUS glove stream (its own clock) with the Xsens body stream (its own clock) into a single timestamped skeleton. This synchronization is handled automatically when MANUS Core is running in MVN integration mode
- Calibration application — applying the per-operator hand scale calibration and body dimensions to produce measurements that are accurate for this specific operator, not a default human model
The output of Layer 2 is a single, clean, synchronized, calibrated skeleton stream ready for retargeting.
Layer 3: Retargeting — The Critical Layer
This is where most teleoperation stacks break. Retargeting solves the mapping from human skeleton to robot joint commands, and it is not a simple joint-angle copy. Three specific problems need to be addressed:
Scale normalization
Human fingers and robot fingers have different segment lengths, even on anthropomorphic robot hands. A 45-degree flex of a human middle finger proximal joint does not produce the same fingertip position as a 45-degree flex of the corresponding robot joint. The retargeter must account for the geometric differences between the two kinematics and find the robot configuration that best reproduces the intended human motion.
Kinematic structure differences
Some robot hands have fewer degrees of freedom than a human hand. Some have underactuated fingers where multiple joints are coupled. The retargeter needs to know the robot's specific kinematic structure and map human joint angles to whatever configuration space the robot actually has, not an idealized one.
Robot constraint enforcement
Every joint has velocity limits, acceleration limits, and range-of-motion limits. If the human operator moves faster than the robot joint can follow, the retargeter must decide what to do: clip the velocity, scale it, or predict intent. Unconstrained commands sent directly to a robot controller will either be saturated silently or trigger a safety stop.
Three approaches to retargeting in ROS 2
Option A — Custom ROS 2 node (most common): Write a C++ or Python ROS 2 node that subscribes to the MANUS skeleton topic and publishes robot joint commands. Full control over the mapping, but requires implementation effort. The MANUS open-source retargeter provides a starting point for dexterous hand retargeting.
Option B — Isaac Lab retargeter: NVIDIA Isaac Lab 2.3 includes a dexterous retargeting framework with MANUS native support. The retargeter handles the mapping from MANUS 25 DoF to robot hand joint positions inside Isaac Lab, with UI feedback for IK errors. Recommended for teams working within the NVIDIA Isaac ecosystem. See Article 3 for integration detail.
Option C — Unity or Unreal bridge: For teams running visualization or simulation in game engines, the MANUS Unity/Unreal plugins handle the skeleton output and can publish to a ROS bridge. Useful for debugging retargeting quality visually before connecting real hardware.
Layer 4: Robot Control
The robot control layer receives joint commands from the retargeter and executes them safely on the physical or simulated hardware. The specific interface depends on the robot platform, but ROS 2 controllers abstract most of the differences.
Common robot platforms
The most common research arm for MANUS teleoperation. Native ROS 2 driver (ur_robot_driver), well-documented, collaborative and safe to operate near humans. The UR5 is the standard starting point for most research stacks.
High-precision research arm with torque sensing on all joints. The libfranka library and franka_ros2 package provide ROS 2 integration. Commonly used with OPEN TEACH and similar systems for dexterous manipulation research.
Humanoid platforms with whole-body teleoperation support in NVIDIA Isaac Lab. Pink IK controller in Isaac Lab 2.3 handles bimanual control and waist DOF. Used with MANUS gloves + Xsens for full humanoid loco-manipulation demonstrations.
Industrial-grade arm with 7 DOF and integrated torque sensing. iiwa_ros2 provides the ROS 2 interface. Common in manufacturing automation research where payload and repeatability matter alongside teleoperation capability.
Standalone robot hands for manipulation research. Connect to the retargeting layer directly — the hand receives MANUS finger joint commands mapped to its own DOF. Available through Knoxlabs; see the robotics hub for compatible models.
Research platforms with custom ROS 2 interfaces. MANUS SDK and Xsens ROS 2 driver output standard message types; any robot with a ROS 2 joint command interface can receive retargeted skeleton data with appropriate configuration.
Layer 5: Feedback
Feedback closes the control loop. Without it, teleoperation is open-loop from the operator's perspective: commands go out, but there is no sensory confirmation that the robot has done what was intended. This produces two specific problems in data collection:
- Force estimation errors: operators cannot feel whether a grasp is secure. They routinely over-grip (excessive force, potential object damage or robot fault) or under-grip (insufficient force, object drops) without haptic confirmation that contact has been made
- Spatial awareness gaps: without visual feedback from the robot's perspective, operators make positioning errors that they cannot correct — the demonstration data contains approach trajectories that end in poor final configurations
Visual feedback
A camera mounted on or near the robot streams video to the operator's VR headset display. For most teleoperation setups, this is a standard RGB camera routed through a VR streaming solution — Meta's Air Link, SteamVR, or NVIDIA CloudXR. The operator's head pose (Layer 1) can drive a camera gimbal so the view follows where the operator looks.
Haptic feedback
The MANUS Metagloves Pro Haptic include one vibrotactile actuator per finger with 256 modulation channels per motor. When the robot hand's contact sensors detect an object, the signal routes back to the corresponding MANUS glove finger. The operator feels contact, adjusts grip force accordingly, and the demonstration data reflects natural human grasping behavior rather than open-loop guesswork.
Latency Budget
Total end-to-end latency in a teleoperation stack is the sum of every stage in the pipeline. For teleoperation to feel natural, total latency should be below approximately 80–100ms. Above this threshold, operators perceive the lag and begin to compensate with slower, more deliberate movements — which degrades the naturalness of the demonstration data.
| Stage | Typical latency | Notes |
|---|---|---|
| MANUS EMF tracking | <7ms | From physical hand motion to skeleton joint angles output by MANUS Core |
| Xsens body tracking | ~10ms | IMU sensor fusion; additional latency if running full MVN body solve |
| ROS 2 topic publish / receive | 1–5ms | On local network; significantly higher over WAN or VPN |
| Retargeting computation | 1–10ms | Depends on IK solver complexity; simple joint mapping is near-zero |
| Robot controller execution | 5–20ms | Varies by platform; UR5 ~5ms, custom humanoids 10–20ms |
| Visual feedback (local) | 15–40ms | Camera capture, encode, stream, decode, display; varies significantly by setup |
| Typical total (co-located) | 30–80ms | Well within natural teleoperation threshold for most operators |
| Remote (WAN) teleoperation | 100–300ms+ | Network latency dominates; requires prediction / shared autonomy to compensate |
Common Mistakes
The four failure modes seen repeatedly in teleoperation deployments
- Skipping retargeting and sending raw joint angles to the robot. Human and robot kinematics are not equivalent. Direct angle copy produces unstable, uncontrolled robot motion. The retargeting layer is not optional.
- No robot constraint enforcement in the retargeter. Joint velocity limits, torque limits, and range-of-motion stops exist for mechanical and safety reasons. Unconstrained commands either clip silently (behavior becomes unpredictable) or trigger safety stops (session ends). Map the human motion into the robot's safe operating envelope before sending commands.
- No feedback loop. Open-loop teleoperation produces open-loop demonstrations. Policies trained on open-loop data learn to approximate the operator's commands, not the task. Haptic and visual feedback are the difference between demonstrations that contain useful contact information and demonstrations that don't.
-
MANUS and other USB peripherals on the same USB controller bus. MANUS gloves require dedicated USB bandwidth. When sharing a USB controller with cameras, headsets, or other sensors, bandwidth contention causes intermittent tracking dropouts that corrupt demonstration sessions. Use separate USB buses (check with
lsusb -ton Linux) for MANUS and other high-bandwidth peripherals.
Minimum Working Setup
For teams wanting to validate a teleoperation pipeline before committing to full-scale hardware, this is the practical minimum configuration that actually works end-to-end:
MANUS Metagloves Pro (one pair) — hand tracking. One robot arm: UR5, Franka Emika, or similar with ROS 2 driver. Optional: Xsens Link suit for full-body control. Optional: Meta Quest 3 for visual feedback.
MANUS Core (Windows or Linux). ROS 2 (Humble or later). Robot-specific ROS 2 driver (ur_robot_driver, franka_ros2, etc.). Custom retargeting node or Isaac Lab retargeter if using NVIDIA stack.
Run the hardware status console to confirm all glove sensors are active and the EMF transmitter is calibrated. Run data flow verification to confirm skeleton output is streaming. Calibrate hand scale for the operator.
Launch the MANUS ROS 2 bridge (publishes hand skeleton as standard ROS topics). Launch the robot ROS 2 driver and verify joint state feedback is publishing. Confirm both are visible on the ROS 2 topic graph before writing any retargeting logic.
Start with a simple joint-angle mapping for one finger. Verify the robot finger follows the MANUS finger with expected behavior. Add scale normalization. Add joint limit enforcement. Expand to full hand. Test with slow, deliberate operator movements before running at full speed.
Add a robot camera stream to the operator's display (even a simple monitor is better than nothing). If using MANUS Pro Haptic gloves, configure the haptic feedback routing from robot contact sensors to glove actuators. Run a short recording session and review the demonstration data quality before committing to large collection runs.
Hardware for the Full Stack
All hardware in the teleoperation stack is available through Knoxlabs on a single order. For teams building from scratch, Knoxlabs can advise on configuration for your specific robot platform before procurement.
25 DoF + per-finger vibrotactile feedback. The complete teleoperation glove — closes both the motion and haptic feedback loops.
View on Knoxlabs
17-sensor IMU suit. 400Hz. ROS 1/2 drivers. Pairs with MANUS gloves for complete humanoid teleoperation input.
View on Knoxlabs
6DoF head tracking for camera control. Display for robot visual feedback. Also supports hand tracking for mass data collection in parallel stations.
View on KnoxlabsBuilding a teleoperation stack?
Tell us your robot platform and use case. We'll specify the right input hardware and quote the full stack as one order.
The full component catalog — MANUS, Xsens, VR headsets, and compatible dexterous robot hands — is on the Knoxlabs Robotics & Teleoperation hub. For the complete deployment process, see From Vision to Reality: White-Glove XR Deployment That Actually Works and the robotics-specific guide in Knoxlabs Robotics Deployment.
Leave a comment