Instructions to use canopylabs/orpheus-3b-0.1-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use canopylabs/orpheus-3b-0.1-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="canopylabs/orpheus-3b-0.1-ft")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("canopylabs/orpheus-3b-0.1-ft") model = AutoModelForCausalLM.from_pretrained("canopylabs/orpheus-3b-0.1-ft") - Inference
- Notebooks
- Google Colab
- Kaggle
Pre-training dataset format?
#7
by asif00 - opened
Did anyone try to pre-train it for a new language?
I'm a bit confused. What should the structure of the pre-training datasets be?
text_QA_dataset: [I'm assuming this is for training the LLM]
TTS_dataset: [This is for training the TTS]
I'm just unsure what their format should be. An example (sample) or dataset link for both types would be awesome! Thanks in advance!
See the github for more info - there is also a similar issue there, where I go over the format. Happy to go into more detail if unclear (posting the issue in the github will probably be looked at sooner) https://github.com/canopyai/Orpheus-TTS
amuvarma changed discussion status to closed