--- license: other license_name: 8f-ai-license-v1.0 license_link: https://huggingface.co/8Fai/license tags: - time-series-forecasting - pytorch - gqa - rope - swiglu - revin - patch-transformer language: - en library_name: torch --- # Chorous1

> **Chorous1** is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data. --- ## Table of Contents - [Model Variants](#model-variants) - [Architecture](#architecture) - [Quickstart](#quickstart) - [Performance](#performance) - [Limitations](#limitations) - [License](#license) --- ## Model Variants | Variant | Parameters | Hidden Size | Layers | Query Heads / KV Heads | |---|---|---|---|---| | `chorous1-100m` | ~100M | 768 | 12 | 12 / 4 | | `chorous1-50m` | ~50M | 512 | 16 | 8 / 2 | | `chorous1-27m` | ~27M | 384 | 16 | 6 / 2 | --- ## Architecture | Component | Specification | |---|---| | Context Length | 512 steps | | Forecast Horizon | 96 steps | | Patch Size | 16 (non-overlapping) | | Number of Patches | 32 | | FFN Multiplier | 2.667× | | Activation | SwiGLU | | Positional Encoding | RoPE (θ = 500,000) | | Normalization | RMSNorm | | Masking Ratio | 25% (training only) | | Loss Function | Huber Loss + MAE | | Precision | bfloat16 | ### How It Works **Stage 1 — Neural Encoding.** The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure. **Stage 2 — RevIN Normalization.** A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments. --- ## Quickstart ```python import torch from safetensors.torch import load_file # Replace "100m" with "50m" or "27m" as needed weights = load_file("./chorous_checkpoint/100m/model.safetensors") model.load_state_dict(weights) model.eval() # Input shape: [Batch, Channels, Time] x = torch.randn(1, 7, 512) with torch.no_grad(): forecast = model(x) # Output shape: [1, 7, 96] ``` --- ## Performance | Metric | `chorous1-100m` | `chorous1-50m` | `chorous1-27m` | |---|---|---|---| | Weights Size | ~200 MB | ~110 MB | ~65 MB | | VRAM (Inference) | ~12 GB | ~8 GB | ~6 GB | --- ## Limitations - **Fixed Forecast Horizon** — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy. - **Channel Count Constraint** — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box. - **Patch Alignment Requirement** — Input context length must be an exact multiple of the patch size (16). --- ## License Chorous1 is released under the [8f-ai-license-v1.0](https://huggingface.co/8Fai/license). Please review the full terms before use in production or commercial applications.