Update README.md

458f3b9 verified 7 months ago

1.94 kB

library_name: transformers
datasets:
  - allenai/qasc
base_model:
  - microsoft/deberta-v3-base
pipeline_tag: text-classification

DRM-DeBERTa-v3-Base-qasc

This model is a fine-tuned version of microsoft/deberta-v3-base trained on the QASC dataset.

This model is a part of the artifact release for the research paper: Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking.

Paper: https://arxiv.org/abs/2505.23117
Repository: https://github.com/yophis/decom-renorm-merge

Uses

The model can be loaded as follows:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_id = "yophis/DRM-DeBERTa-v3-Base-qasc"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# Load the model
model = AutoModelForSequenceClassification.from_pretrained(model_id, device_map="auto")
model.config.pad_token_id = model.config.eos_token_id

# Input template
input_text = "Question: {formatted_question} Context: {combinedfact}"

Training Details

Training Data

We finetune the model on QASC dataset.

Training Hyperparameters

Learning Rate: 1e-4
Weight Decay: 0.0
Training Steps: 50000
Batch Size: 1024
Precision: bf16 mixed precision

Citation

If you find this model useful, please consider citing our paper:

@article{chaichana2025decom,
  title={Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking},
  author={Chaichana, Yuatyong and Trachu, Thanapat and Limkonchotiwat, Peerat and Preechakul, Konpat and Khandhawit, Tirasan and Chuangsuwanich, Ekapol},
  journal={arXiv preprint arXiv:2505.23117},
  year={2025}
}

Please also cite QASC and the original DeBERTa model.