Humanoid Motion Embedding Transformer
Overview
Humanoid Motion Embedding Transformer (HMET) is a lightweight representation model designed to encode humanoid motion sequences into structured latent embeddings.
The model focuses on transforming joint-angle trajectories and sensor streams into compact motion representations that can be reused for planning, imitation learning, and behavior similarity analysis.
Architecture
- Transformer encoder backbone
- Positional encoding for temporal motion steps
- Motion-token embedding layer
- 128-d latent motion representation
Intended Use
- Motion similarity search
- Behavior clustering
- Pre-training for planning systems
- Robotics representation research
Input Format
Time-series motion frames:
- joint_positions
- joint_velocities
- imu_data
Output
Fixed-length motion embedding vector.
Research Contribution
Provides a standardized embedding layer for cross-robot motion representation alignment.
License
MIT
- Downloads last month
- 10