LLaMA 2 7B Chat Fine-Tuned

This is a fine-tuned version of the LLaMA 2 7B model designed for chat-based tasks. The model has been fine-tuned to improve its performance on generating conversational responses.

Model Details

Model Name: LLaMA 2 7B Chat Fine-Tuned
Base Model: NousResearch/Llama-2-7b-chat-hf
Architecture: LlamaForCausalLM
Tokenization: Supported
- pad_token_id: 0
- bos_token_id: 1
- eos_token_id: 2

Supported Tasks

This model supports the following task:

Text Generation

Configuration

Model Configuration (config.json):
- Hidden Size: 4096
- Number of Layers: 32
- Number of Attention Heads: 32
- Vocab Size: 32000
- Token Type IDs: Not used
Generation Configuration (generation_config.json):
- Sampling Temperature: 0.9
- Top-p (nucleus sampling): 0.6
- Pad Token ID: 32000
- Bos Token ID: 1
- Eos Token ID: 2

Usage

To use this model for text generation via the Hugging Face API, use the following Python code snippet:

import requests

api_url = "https://api-inference.huggingface.co/models/rahul77/llama-2-7b-chat-finetune"
headers = {
    "Authorization": "Bearer YOUR_API_TOKEN",  # Replace with your Hugging Face API token
    "Content-Type": "application/json"
}

data = {
    "inputs": "What is a large language model?",
    "parameters": {
        "max_length": 50
    }
}

response = requests.post(api_url, headers=headers, json=data)

if response.status_code == 200:
    print(response.json())
else:
    print(f"Error: {response.status_code}")
    print(response.json())


---
license: apache-2.0
---

Downloads last month: -