Humanoid Motion Embedding Transformer

Overview

Humanoid Motion Embedding Transformer (HMET) is a lightweight representation model designed to encode humanoid motion sequences into structured latent embeddings.

The model focuses on transforming joint-angle trajectories and sensor streams into compact motion representations that can be reused for planning, imitation learning, and behavior similarity analysis.

Architecture

  • Transformer encoder backbone
  • Positional encoding for temporal motion steps
  • Motion-token embedding layer
  • 128-d latent motion representation

Intended Use

  • Motion similarity search
  • Behavior clustering
  • Pre-training for planning systems
  • Robotics representation research

Input Format

Time-series motion frames:

  • joint_positions
  • joint_velocities
  • imu_data

Output

Fixed-length motion embedding vector.

Research Contribution

Provides a standardized embedding layer for cross-robot motion representation alignment.

License

MIT

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support