Instructions to use maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="maum-ai/Llama-3-MAAL-8B-Instruct-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("maum-ai/Llama-3-MAAL-8B-Instruct-v0.1")
model = AutoModelForCausalLM.from_pretrained("maum-ai/Llama-3-MAAL-8B-Instruct-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/maum-ai/Llama-3-MAAL-8B-Instruct-v0.1

SGLang

How to use maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 with Docker Model Runner:
```
docker model run hf.co/maum-ai/Llama-3-MAAL-8B-Instruct-v0.1
```

MAAL-8B 모델을 잘 사용하고 있습니다. 학습을 해보고 싶은데 데이터 포맷을 어떻게 구성해야 하나요?

by Kim3 - opened Jun 20, 2024

Discussion

Kim3

Jun 20, 2024

모델을 추가적으로 학습해 보고 싶은데 데이터 포맷을 어떻게 구성해야 하나요?
혹시, 학습 데이터 템플릿을 공개해 주실 수 있나요?

lastdefiance20

maum-ai org Jun 20, 2024

안녕하세요, 학습 데이터 템플릿의 경우 meta-llama/Meta-Llama-3-8B-Instruct와 같은 포맷을 사용하고 있습니다.

https://huggingface.co/blog/llama3#how-to-prompt-llama-3
위 링크에 에 나와있듯이

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|>

여기에 system_prompt, user_msg_1, model_answer_1 부분을 학습할 데이터로 채워주면 됩니다. (system prompt가 없을 경우에는 system 부분을 제거)

e.g. no system prompt 가정

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

1 더하기 1은?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

1+1 = 2입니다.<|eot_id|>

Kim3

Jun 24, 2024

답변해주셔서 감사합니다. 추가 질문이 있어 댓글 남깁니다.

우선 LoRA로 한 분야의 지식을 학습시키려고 합니다.

위의 탬플릿 처럼 Q&A 형식으로 학습 데이터를 구축하면, 지식이 학습되는지 궁금합니다.
보통 데이터 셋은 어느 정도 구축해야 하는지?
적절한 epochs이나 batch_size가 몇 정도 되는지?

공부하면서 해보려는데 팁을 좀 주시면 감사하겠습니다.

lastdefiance20

maum-ai org Jun 25, 2024

"지식의 학습" 정의에 따라 좀 다를것 같긴 하지만, 기본적으로 위의 템플릿을 이용해 Q&A 형식, 지시문 형식 등으로 학습 데이터를 구축하면 지식이 학습됩니다.
보통 데이터셋은 많으면 많을수록 좋습니다. 도메인에 따라 다르긴 하지만, 최소 1만건 이상, 적어도 1천건 이상은 구축하면 좋습니다. 이보다 적을 경우에는 학습보다 RAG와 같은 방식을 사용해 지식을 가져오는것을 추천합니다.
epoch와 batch_size는 하이퍼파라미터 튜닝의 영역이기 때문에, 적절함의 정의가 어렵습니다. 그래도 예시를 드리자면, fine-tuning이라면 한 3epoch 정도를 진행하고 (batch size는 gradient accumulation을 활용해서 32~256 사이) 1, 2, 3번째 epoch 모델의 성능을 비교해보고 가장 잘나오는 셋업을 찾아가며 사용해보시면 좋을 것 같습니다.

Kim3

Jul 8, 2024

친절한 답변 감사합니다.

MAAL-8B에 RAG방식으로 마크다운 테이블 형식을 전달하여 정보를 가져오려고 합니다. 마크다운이나 마크다운 테이블을 이해하지 못하는 것 같습니다. MAAL-8B가 잘 이해하는 데이터 포맷 (e.g json, yaml, ..)이 있을까요?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment