Text Generation
Transformers
Safetensors
English
llama
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use DeepAuto-AI/Explore_Llama-3.2-1B-Inst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DeepAuto-AI/Explore_Llama-3.2-1B-Inst with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DeepAuto-AI/Explore_Llama-3.2-1B-Inst") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DeepAuto-AI/Explore_Llama-3.2-1B-Inst") model = AutoModelForCausalLM.from_pretrained("DeepAuto-AI/Explore_Llama-3.2-1B-Inst") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DeepAuto-AI/Explore_Llama-3.2-1B-Inst with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DeepAuto-AI/Explore_Llama-3.2-1B-Inst" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepAuto-AI/Explore_Llama-3.2-1B-Inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DeepAuto-AI/Explore_Llama-3.2-1B-Inst
- SGLang
How to use DeepAuto-AI/Explore_Llama-3.2-1B-Inst with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DeepAuto-AI/Explore_Llama-3.2-1B-Inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepAuto-AI/Explore_Llama-3.2-1B-Inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DeepAuto-AI/Explore_Llama-3.2-1B-Inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepAuto-AI/Explore_Llama-3.2-1B-Inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DeepAuto-AI/Explore_Llama-3.2-1B-Inst with Docker Model Runner:
docker model run hf.co/DeepAuto-AI/Explore_Llama-3.2-1B-Inst
| library_name: transformers | |
| model-index: | |
| - name: Explore_Llama-3.2-1B-Inst | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: IFEval (0-Shot) | |
| type: HuggingFaceH4/ifeval | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: inst_level_strict_acc and prompt_level_strict_acc | |
| value: 57.68 | |
| name: strict accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: BBH (3-Shot) | |
| type: BBH | |
| args: | |
| num_few_shot: 3 | |
| metrics: | |
| - type: acc_norm | |
| value: 8.31 | |
| name: normalized accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MATH Lvl 5 (4-Shot) | |
| type: hendrycks/competition_math | |
| args: | |
| num_few_shot: 4 | |
| metrics: | |
| - type: exact_match | |
| value: 4.53 | |
| name: exact match | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: GPQA (0-shot) | |
| type: Idavidrein/gpqa | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 1.57 | |
| name: acc_norm | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MuSR (0-shot) | |
| type: TAUR-Lab/MuSR | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 1.09 | |
| name: acc_norm | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MMLU-PRO (5-shot) | |
| type: TIGER-Lab/MMLU-Pro | |
| config: main | |
| split: test | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: acc | |
| value: 8.31 | |
| name: accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst | |
| name: Open LLM Leaderboard | |
| license: apache-2.0 | |
| language: | |
| - en | |
| base_model: | |
| - meta-llama/Llama-3.2-1B | |
| # Model Card for Model ID | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| ## Overview | |
| **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct. | |
| Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by | |
| training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of transformer layers from 16 to 31. | |
| Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations. | |
| We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets. | |
| These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**. | |
| This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training. | |
| The work is currently in progress | |
| ## Model Details | |
| <!-- Provide a longer summary of what this model is. --> | |
| We trained a diffusion model to learn the distribution of subset of llama to enable generation weights that improve the performance. | |
| We generate task specific weights on winogrande and arc_challenge then transfer the best model for leaderboard benchmarking. | |
| - **Developed by:** DeepAuto.ai | |
| - **Funded by [optional]:** DeepAuto.ai | |
| - **Shared by [optional]:** DeepAuto.ai | |
| - **Model type:** llama-3.2-1B | |
| - **Language(s) (NLP):** English | |
| - **License:** Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in | |
| - compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | |
| - **Finetuned from model [optional]:** No fine-tuning | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Repository:** Under construction | |
| - **Paper [optional]:** To be announce | |
| ## Uses | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> | |
| The direct use case of our work is o improve existing model performance as well as generating task specific weights with no training. | |
| <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> | |
| Performance improvement of existing large models with limited compute | |
| ### Out-of-Scope Use | |
| <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> | |
| No fine-tuning or architecture generalization | |
| ## Bias, Risks, and Limitations | |
| <!-- This section is meant to convey both technical and sociotechnical limitations. --> | |
| Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content | |
| will still fall within the range of what the base model is inherently capable of producing. | |
| ## How to Get Started with the Model | |
| The work is under progress | |
| ## Training Details | |
| We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks. | |
| Remarkably, even within the constraints of one-shot learning, our approach consistently produces a wide range of weight variations, each offering | |
| distinct performance characteristics. These generated weights not only open opportunities for weight averaging and model merging but also have the | |
| potential to significantly enhance model performance. Moreover, they enable the creation of task-specific weights, tailored to optimize performance | |
| for specialized applications | |
| ### Training Data | |
| The training data used to produced the current model is the base pretrained weights | |
| <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| ### Training Procedure | |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | |
| - We selected a set of layers and combined their pretrained weights, then trained a Variational Autoencoder (VAE) to encode these weights into the layer dimension. | |
| - We conditionally trained a diffusion model on this set of weights, allowing individual sampling of layer-specific weights. | |
| - All selected layers were encoded into a 1024-dimensional space. This model exclusively contained the sampled weights for layer normalization." | |
| <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> | |
| ## Evaluation | |
| <!-- This section describes the evaluation protocols and provides the results. --> | |
| ### Testing Data, Factors & Metrics | |
| <!-- This should link to a Dataset Card if possible. --> | |
| We test our method on Winogrande and arc_challenge, and hellaswag | |
| #### Factors | |
| <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> | |
| [More Information Needed] | |
| #### Metrics | |
| <!-- These are the evaluation metrics being used, ideally with a description of why. --> | |
| [More Information Needed] | |
| ### Results | |
| [More Information Needed] | |
| #### Summary | |
| ## Model Examination [optional] | |
| <!-- Relevant interpretability work for the model goes here --> | |
| [More Information Needed] | |
| ## Environmental Impact | |
| <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - **Hardware Type:** Nvidia-A100-40Gb | |
| - **Hours used:** VAE is trained for 4 hour and diffusion process 4 hours | |
| - **Compute Region:** South Korea | |
| - **Carbon Emitted:** 0.96kg | |
| ## Technical Specifications [optional] | |
| ### Model Architecture and Objective | |
| We used Latent diffusion for weights generation, and llama3-2-1B as target architectures. | |
| The primary objective of this weight generation process was to demonstrate that by learning only the distribution | |
| of few layers weights (normlaization layers in this case) in an 1-billion-parameter model, it is possible to significantly enhance the | |
| model's capabilities. Notably, this is achieved using a fraction of the computational resources and without the | |
| need for fine-tuning, showcasing the efficiency and potential of this approach. | |
| ### Compute Infrastructure | |
| Nvidia-A100 cluster | |
| #### Hardware | |
| A single Nvidia-A100 | |
| #### Software | |
| Model is tested using lm-harness tool version 0.4.3 | |
| ## Citation [optional] | |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> | |
| **BibTeX:** | |
| [More Information Needed] | |
| **APA:** | |
| [More Information Needed] | |
| ## Glossary [optional] | |
| <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> | |
| [More Information Needed] | |
| ## More Information [optional] | |
| [More Information Needed] | |
| ## Model Card Authors [optional] | |
| [More Information Needed] | |
| ## Model Card Contact | |
| # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) | |
| Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst) | |
| | Metric |Value| | |
| |-------------------|----:| | |
| |Avg. |13.58| | |
| |IFEval (0-Shot) |57.68| | |
| |BBH (3-Shot) | 8.31| | |
| |MATH Lvl 5 (4-Shot)| 4.53| | |
| |GPQA (0-shot) | 1.57| | |
| |MuSR (0-shot) | 1.09| | |
| |MMLU-PRO (5-shot) | 8.31| |