YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Mobile VLA: Vision-Language-Action System for Omniwheel Robot Navigation

Model Description

This model is a Vision-Language-Action (VLA) system adapted from RoboVLMs framework for omniwheel robot navigation. It demonstrates framework robustness by successfully adapting from robot manipulator tasks to mobile robot navigation tasks.

Performance

  • MAE: 0.222 (72.5% improvement from baseline)
  • Task: Omniwheel Mobile Robot Navigation
  • Framework: RoboVLMs adapted for mobile robots
  • Performance Level: Practical

Key Features

  • Task Adaptation: Successfully adapted from manipulator to mobile robot tasks
  • Framework Robustness: Cross-domain application capability
  • Omniwheel Optimization: Omnidirectional control for mobile robots
  • Real-world Applicability: Practical navigation performance

Model Architecture

  • Vision Encoder: Kosmos-2 based image processing
  • Language Encoder: Korean text command understanding
  • Action Predictor: 2D action prediction (linear_x, linear_y)
  • Output: Continuous action values for robot control

Usage

import torch

# Load model
model = torch.load("best_simple_lstm_model.pth")

# Example usage
image = load_image("robot_environment.jpg")
text_command = "Move forward to the target"
action = model.predict_action(image, text_command)

Training Data

  • Dataset: Mobile VLA Dataset
  • Total Frames: 1,296
  • Action Range: linear_x [0.0, 1.15], linear_y [-1.15, 1.15]
  • Action Pattern: Forward (56.1%), Left turn (10%), Right turn (7.2%)

Research Contribution

This work demonstrates the robustness of VLA frameworks by successfully adapting RoboVLMs from robot manipulator tasks to mobile robot navigation tasks, achieving practical performance with MAE 0.222.

Citation

@article{mobile_vla_2024,
  title={Mobile VLA: Vision-Language-Action System for Omniwheel Robot Navigation},
  author={Your Name},
  journal={arXiv preprint},
  year={2024}
}

License

MIT License


Model Performance: MAE 0.222 | Task: Omniwheel Robot Navigation | Framework: RoboVLMs Adapted

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support