Kian Kyars's picture

5 3 6

Kian Kyars

kyars

·

https://sites.ualberta.ca/~kkyars/

AI & ML interests

None yet

Recent Activity

commented on an article 2 days ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

commented on an article 4 days ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

commented on an article 8 days ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

View all activity

Organizations

commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 2 days ago

Yes, it's done for each transformer block in an LM because each transformer block has different attention heads. If you do it for only one transformer block across all blocks, then you don't get the same representation.

commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 4 days ago

I think I got lost around the standard inference versus Kv caching section because I couldn't understand the matmuls happening based on each flashing repetition of those yellow blocks. But perhaps I just need to go through the blog post once again to try to better understand it.

commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 8 days ago

I didn't understand the explanation

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 2 months ago

This technique is not a better lesson pilled at all. Waste of time when the model will just learn to do this anyways.

commented 2 papers 4 months ago

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17 • 3 •

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17 • 3 •

updated a Space 4 months ago

Decider MCP

Chat with a friendly AI assistant

liked 2 models 4 months ago

openai/gpt-oss-20b

Text Generation • 22B • Updated Aug 26 • 7.7M • • 4.06k

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26 • 4.36M • • 4.24k

New activity in nanotron/README 5 months ago

awesome resource

#1 opened 5 months ago by

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

thanks!

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

makes sense

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

For the reasoning mid-training we used the ChatML template (so no system prompt), for SFT we use SmolLM3's final chat template.

ChatML does have a system prompt, can you please elaborate?

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

is it not possible to do on-policy distillation?

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

nice thoughts

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

tied embeddings were just used because llama uses it or was there an ablation?

upvoted an article 5 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

Jul 8

•

737

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

Let's do this with Muon.

New activity in nanotron/ultrascale-playbook 6 months ago

TP Question

#113 opened 6 months ago by

published a Space 6 months ago

First Agent Template

Answer questions and perform searches