chorous-50m / README.md
Ill-Ness's picture
Update README.md
bec7446 verified
metadata
license: other
license_name: 8f-ai-license-v1.0
license_link: https://huggingface.co/8Fai/license
tags:
  - time-series-forecasting
  - pytorch
  - gqa
  - rope
  - swiglu
  - revin
  - patch-transformer
language:
  - en
library_name: torch

Chorous1

Chorous1 is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data.


Table of Contents


Model Variants

Variant Parameters Hidden Size Layers Query Heads / KV Heads
chorous1-100m ~100M 768 12 12 / 4
chorous1-50m ~50M 512 16 8 / 2
chorous1-27m ~27M 384 16 6 / 2

Architecture

Component Specification
Context Length 512 steps
Forecast Horizon 96 steps
Patch Size 16 (non-overlapping)
Number of Patches 32
FFN Multiplier 2.667×
Activation SwiGLU
Positional Encoding RoPE (θ = 500,000)
Normalization RMSNorm
Masking Ratio 25% (training only)
Loss Function Huber Loss + MAE
Precision bfloat16

How It Works

Stage 1 — Neural Encoding. The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure.

Stage 2 — RevIN Normalization. A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments.


Quickstart

import torch
from safetensors.torch import load_file

# Replace "100m" with "50m" or "27m" as needed
weights = load_file("./chorous_checkpoint/100m/model.safetensors")
model.load_state_dict(weights)
model.eval()

# Input shape: [Batch, Channels, Time]
x = torch.randn(1, 7, 512)

with torch.no_grad():
    forecast = model(x)  # Output shape: [1, 7, 96]

Performance

Metric chorous1-100m chorous1-50m chorous1-27m
Weights Size ~200 MB ~110 MB ~65 MB
VRAM (Inference) ~12 GB ~8 GB ~6 GB

Limitations

  • Fixed Forecast Horizon — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy.
  • Channel Count Constraint — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box.
  • Patch Alignment Requirement — Input context length must be an exact multiple of the patch size (16).

License

Chorous1 is released under the 8f-ai-license-v1.0. Please review the full terms before use in production or commercial applications.