hikariming/alpaca_chinese_dataset - stats on ReviewGithub

Jupyter Notebook Python dataset alpaca chatglm

This is stars and forks stats for /hikariming/alpaca_chinese_dataset repository. As of 02 May, 2024 this repository has 970 stars and 78 forks.

alpaca_chinese_dataset 鲁迅说过：有多少人工，才有多少智能当前的聊天对话模型数据集主要都是由英文构成，但是当前中文聊天模型构建的需求也较为迫切，因此我们将斯坦福的alpaca数据集进行中文翻译，并再制造一些对话数据，并开源提供。我们的目标是：基于我们数据+自己领域的数据，以及一定的微调策略，可以在模型内加入某领域的知识的同时，尽量保持原有模型的能力，虽然此目标当前还未做到，但至少能够缓解一些模型微调后过专的问题。此翻译并非完全的chatgpt机翻，会进行人工校验，遇到英文特异性表达的时候会变为较为中文化的表述，因此每日翻译量有限。 0327更新：我们感觉alpaca数据集太多表述过于英文化，所以人工翻译完这六部分后不再翻译，改为构建自己的数据集 Currently, most chatbot datasets are composed in English, but there is an urgent need to train Chinese chatbot models. Therefore, we have translated the Alpaca...

Read on Github Github Stats Page

repo	techs	stars	weekly	forks	weekly
openai/chatgpt-retrieval-plugin	PythonOther	19.8k	0	3.6k	0
cisagov/untitledgoosetool	PythonPowerShell	839	0	69	0
gururise/AlpacaDataCleaned	PythonHTMLJavaScript	1.3k	0	133	0
binary-husky/chatgpt_academic	PythonCSSOther	42.9k	+437	5.6k	+41
sahil280114/codealpaca	Python	1.3k	0	96	0
feizc/MLE-LLaMA	Python	292	0	19	0
liusj5257/azurlane_anti_name	ShellPython	428	0	93	0
bazelbuild/bazel-central-registry	StarlarkPythonShell	174	0	136	0
furrtek/VGChips	VerilogPython	131	+1	12	0
coolwanglu/pdf2htmlEX	HTMLC++JavaScript	10k	+31	1.8k	+9