some high-quality Chinese corpus you can find
Leon
Leon-Leee
AI & ML interests
LLMs, code generation, chatbot, workflows
Recent Activity
updated
a model about 4 hours ago
Leon-Leee/codescout-alg-variants published
a model about 4 hours ago
Leon-Leee/codescout-alg-variants updated
a model 2 days ago
Leon-Leee/q3-4b-2507-8x8-custom_tool-mtreward4-300 Organizations
Useful Pretrain-Datasets
pretrain-datasets with (maybe) good quality
High Quality Instruct Collections
GPT-4 generated datasets
Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs.
Code, Math, and Reasoning related Instruct datasets
as this title
-
ise-uiuc/Magicoder-Evol-Instruct-110K
Viewer • Updated • 111k • 2.76k • 172 -
ise-uiuc/Magicoder-OSS-Instruct-75K
Viewer • Updated • 75.2k • 2.94k • 160 -
Leon-Leee/OSS_Instruct_Python_zh_GPT35
Viewer • Updated • 73.5k • 39 -
Leon-Leee/Wizardlm_Evol_Instruct_v2_196K_backuped
Viewer • Updated • 143k • 15 • 1
Code Benchmarks
awesome-zh-corpus
some high-quality Chinese corpus you can find
GPT-4 generated datasets
Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs.
Useful Pretrain-Datasets
pretrain-datasets with (maybe) good quality
Code, Math, and Reasoning related Instruct datasets
as this title
-
ise-uiuc/Magicoder-Evol-Instruct-110K
Viewer • Updated • 111k • 2.76k • 172 -
ise-uiuc/Magicoder-OSS-Instruct-75K
Viewer • Updated • 75.2k • 2.94k • 160 -
Leon-Leee/OSS_Instruct_Python_zh_GPT35
Viewer • Updated • 73.5k • 39 -
Leon-Leee/Wizardlm_Evol_Instruct_v2_196K_backuped
Viewer • Updated • 143k • 15 • 1
High Quality Instruct Collections
Code Benchmarks