Instructions to use Tomas0413/so100_screw_lid_smolvla with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Tomas0413/so100_screw_lid_smolvla with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Tomas0413/so100_screw_lid_smolvla \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Tomas0413/so100_screw_lid_smolvla - Notebooks
- Google Colab
- Kaggle
SmolVLA SO100 Screw-Lid Model
A Vision-Language-Action (VLA) model fine-tuned on the SO100 Screw-Lid Dataset for robotic manipulation tasks.
Model Description
This model is a SmolVLA variant trained specifically on the SO100 screw-lid manipulation task. It learns to perform the complete sequence: picking up a jar, placing it on a silicone puck, seating the lid with a half-turn, and transporting the assembled jar to a goal location.
- Developed by: Tomas0413
- Model type: Vision-Language-Action (VLA)
- Base architecture: SmolVLA
- Training data: SO100 Screw-Lid Dataset (v0)
- Task domain: Robotic manipulation (screw-lid assembly)
Training Details
Training Data
The model was trained on 51 teleoperated demonstrations from the SO100 Screw-Lid Dataset, featuring:
- Dual camera views (wrist + overhead) at 1280×720 @ 30 FPS
- 6-DOF joint positions, velocities, and gripper states
- Synchronized action sequences for pick-place-assemble-transport tasks
- Total of ~45k training frames
Training Procedure
Training regime: Fine-tuned from SmolVLA base model on SO100 screw-lid demonstrations
Intended Uses
Direct Use
- Robotic manipulation: Deploy on SO100 or similar 6-DOF robotic arms for screw-lid assembly tasks
- Research: Study vision-language-action learning for fine manipulation
- Benchmarking: Evaluate VLA performance on multi-step manipulation sequences
Downstream Use
- Transfer learning to related assembly tasks
- Few-shot adaptation to different jar/lid combinations
- Integration into larger robotic task planning systems
Limitations and Bias
- Domain-specific: Trained only on screw-lid assembly with specific objects
- Robot morphology: Optimized for SO100 arm kinematics and gripper
- Environmental constraints: Single lighting condition, fixed camera positions
- Limited generalization: May not transfer well to significantly different manipulation tasks
Usage
# Example usage with LeRobot
from lerobot.common.policies import load_policy
# Load the trained model
policy = load_policy("Tomas0413/so100_screw_lid_smolvla")
# Run inference on robot observations
action = policy.select_action(observation)
Training Dataset
This model was trained on the SO100 Screw-Lid Dataset (v0), which contains 51 teleoperated episodes of the complete screw-lid manipulation sequence recorded during the LeRobot Worldwide Hackathon (June 15-16, 2025).
Model Card Contact
Tomas0413
- Downloads last month
- 7
Model tree for Tomas0413/so100_screw_lid_smolvla
Base model
lerobot/smolvla_base