Instructions to use zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial")
model = AutoModelForImageTextToText.from_pretrained("zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial

SGLang

How to use zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial with Docker Model Runner:
```
docker model run hf.co/zesquirrelnator/idefics2-8b-docvqa-finetuned-tutorial
```

idefics2-8b-docvqa-finetuned-tutorial / handler.py

zesquirrelnator

Update handler.py

5d906c8 verified about 2 years ago

raw

history blame contribute delete

1.77 kB

	from transformers import AutoModelForCausalLM, AutoTokenizer
	from PIL import Image
	import torch
	from io import BytesIO
	import base64

	# Initialize the model and tokenizer
	model_id = "HuggingFaceM4/idefics2-8b"
	model = AutoModelForCausalLM.from_pretrained(model_id)
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	# Check if CUDA (GPU support) is available and then set the device to GPU or CPU
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	def preprocess_image(encoded_image):
	"""Decode and preprocess the input image."""
	decoded_image = base64.b64decode(encoded_image)
	img = Image.open(BytesIO(decoded_image)).convert("RGB")
	return img

	def handler(event, context):
	"""Handle the incoming request."""
	try:
	# Extract the base64-encoded image and question from the event
	input_image = event['body']['image']
	question = event['body'].get('question', "What is this image about?")

	# Preprocess the image
	img = preprocess_image(input_image)

	# Perform inference
	enc_image = model.encode_image(img).to(device)
	answer = model.answer_question(enc_image, question, tokenizer)

	# If the output is a tensor, move it back to CPU and convert to list
	if isinstance(answer, torch.Tensor):
	answer = answer.cpu().numpy().tolist()

	# Create the response
	response = {
	"statusCode": 200,
	"body": {
	"answer": answer
	}
	}
	return response
	except Exception as e:
	# Handle any errors
	response = {
	"statusCode": 500,
	"body": {
	"error": str(e)
	}
	}
	return response