4 3

Boris Orekhov

nevmenandr

https://nevmenandr.github.io/portfolio/

AI & ML interests

Natural Language Processing, Poetry Generation, Linguistics, Low-resource languages

Recent Activity

posted an update about 23 hours ago

🔥 New Russian Stylometry Dataset! Russian Stylometric Dataset (RSD) — 322 texts from the 19th – early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python). 📚 What's inside? Fiction, journalism, scientific texts, drama, poetry Grouped by author, gender, age, genre, literary movements (Romanticism/Realism) Character speech (Tolstoy, Gogol, Ostrovsky) Generated texts (LSTM, GPT) 📊 Use cases: authorship attribution, clustering, classification, benchmarking methods. 🔓 Public domain + GPL-3.0 license. 👉 Learn more: https://github.com/nevmenandr/RSD DOI: 10.5281/zenodo.20701309

posted an update 3 days ago

https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-mandelshtam https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-hexameter https://huggingface.co/papers/2306.02771 📜 RNN vs. Transformers: How an Old Architecture Better Perceives Poetic Style In the era of Transformer dominance, we often forget that old RNNs (especially character-level LSTMs) remain irreplaceable for tasks where *individual style*, rhythm, and micro-patterns matter. These three models are clear proof of that. 🎯 Why does this matter today? - **Stylistic analysis**: RNNs better capture meter, repetitions, and unexpected tonal shifts. - **Teaching poetics**: generating "almost correct" but hallucinating lines helps explore the boundaries of style. - **Nostalgia and replication**: a reminder that not everything is measured by BLEU and perplexity. 🖼️ Visualization Attached is an infographic comparing the three models (architecture, style, generation sample). > RNNs aren't dead. They're just writing poetry in silence.

updated a dataset 8 months ago

nevmenandr/russian-20th-century-bigrams

View all activity

Organizations

Posts 6

Post

🔥 New Russian Stylometry Dataset!

Russian Stylometric Dataset (RSD) — 322 texts from the 19th – early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).

📚 What's inside?

Fiction, journalism, scientific texts, drama, poetry

Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)

Character speech (Tolstoy, Gogol, Ostrovsky)

Generated texts (LSTM, GPT)

📊 Use cases: authorship attribution, clustering, classification, benchmarking methods.

🔓 Public domain + GPL-3.0 license.

👉 Learn more: https://github.com/nevmenandr/RSD

DOI: 10.5281/zenodo.20701309

Post

https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-mandelshtam
nevmenandr/char-based-lstm-russian-poetry-hexameter

Identifying the style by a qualified reader on a short fragment of generated poetry (2306.02771)

📜 RNN vs. Transformers: How an Old Architecture Better Perceives Poetic Style

In the era of Transformer dominance, we often forget that old RNNs (especially character-level LSTMs) remain irreplaceable for tasks where *individual style*, rhythm, and micro-patterns matter. These three models are clear proof of that.

🎯 Why does this matter today?

- **Stylistic analysis**: RNNs better capture meter, repetitions, and unexpected tonal shifts.
- **Teaching poetics**: generating "almost correct" but hallucinating lines helps explore the boundaries of style.
- **Nostalgia and replication**: a reminder that not everything is measured by BLEU and perplexity.

🖼️ Visualization

Attached is an infographic comparing the three models (architecture, style, generation sample).

> RNNs aren't dead. They're just writing poetry in silence.