KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning
Paper • 2602.14293 • Published • 1
None defined yet.
HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration
SageBwd: A Trainable Low-bit Attention