Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nevmenandrΒ 
posted an update 1 day ago
Post
281
πŸ”₯ New Russian Stylometry Dataset!

Russian Stylometric Dataset (RSD) β€” 322 texts from the 19th – early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).

πŸ“š What's inside?

Fiction, journalism, scientific texts, drama, poetry

Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)

Character speech (Tolstoy, Gogol, Ostrovsky)

Generated texts (LSTM, GPT)

πŸ“Š Use cases: authorship attribution, clustering, classification, benchmarking methods.

πŸ”“ Public domain + GPL-3.0 license.

πŸ‘‰ Learn more: https://github.com/nevmenandr/RSD

DOI: 10.5281/zenodo.20701309
In this post