Humanoid Motion Embedding Transformer

Overview

Humanoid Motion Embedding Transformer (HMET) is a lightweight representation model designed to encode humanoid motion sequences into structured latent embeddings.

The model focuses on transforming joint-angle trajectories and sensor streams into compact motion representations that can be reused for planning, imitation learning, and behavior similarity analysis.

Architecture

Transformer encoder backbone
Positional encoding for temporal motion steps
Motion-token embedding layer
128-d latent motion representation

Intended Use

Motion similarity search
Behavior clustering
Pre-training for planning systems
Robotics representation research

Input Format

Time-series motion frames:

joint_positions
joint_velocities
imu_data

Output

Fixed-length motion embedding vector.

Research Contribution

Provides a standardized embedding layer for cross-robot motion representation alignment.

License

MIT

Downloads last month: 10