Argonne-2.5-ctx13568

Argonne-2.5-ctx13568 is a long-context continuation of PursuitOfDataScience/Argonne2.5-base.

Model architecture

Component Specification
Parameters 1,273,807,360
Layers 28 transformer blocks
Hidden size 1,792
Attention heads 14 query / 7 key-value (GQA)
Context length 13,568 tokens
Vocabulary size 151,669
Position encoding RoPE (θ = 10,000)

Training details

Item Value
Start checkpoint PursuitOfDataScience/Argonne2.5-base
Long-context tokens trained 16.0B tokens
Final cumulative tokens 92,050,960,384
Batch size per GPU 4
Gradient accumulation 1
Effective batch 108,544 tokens
Precision bf16 autocast
Checkpoint dtype bfloat16
Weight format 3 sharded safetensors

Long-context data

  • Dataset source: allenai/dolma3_longmino_pool
  • Selected subset: 16k-32k LongMino pool
  • Kept documents: 4,595,978
  • Kept tokens (Qwen tokenizer): 105,223,923,033
  • This release used the long-context stage only (no additional short-context stage in this run).

Tokenizer

This model uses the Qwen3 tokenizer family via the Qwen2Tokenizer compatibility class.

Source code

The release was built from the GitHub main branch codebase: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Key scripts:

Loss curve

Midtraining loss curve

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/Argonne-2.5-ctx13568"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
)

prompt = "Write a short paragraph about scientific computing at Argonne National Laboratory."
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)

output_ids = model.generate(
    input_ids,
    max_length=input_ids.shape[1] + 256,
    temperature=0.8,
    top_p=0.9,
    top_k=50,
    do_sample=True,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Usage notes

  • Load with trust_remote_code=True.
  • max_position_embeddings is 13,568.
  • Weights are published as 3 bf16 safetensor shards.

References

Citation

@misc{argonne25ctx13568,
  author = {PursuitOfDataScience},
  title = {Argonne-2.5-ctx13568},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/Argonne-2.5-ctx13568}
}
Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PursuitOfDataScience/Argonne-2.5-ctx13568

Finetunes
2 models

Collection including PursuitOfDataScience/Argonne-2.5-ctx13568