tatsu-lab/alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

PythonShellnatural-language-processingdeep-learninginstruction-followinglarge-language-modelsreinforcement-learning-from-human-feedback
This is stars and forks stats for /tatsu-lab/alpaca_farm repository. As of 03 May, 2024 this repository has 557 stars and 42 forks.

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback Research and development on learning from human feedback is difficult because methods like RLHF are complex and costly to run. AlpacaFarm is a simulator that enables research and development on learning from feedback at a fraction of the usual cost, promoting accessible research on instruction following and alignment. Please read our paper and blog post for details on our research findings. This repo contains code for simulating...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
microsoft/OlivePythonOther864+778+1
amd/RyzenAI-cloud-to-client-demoPythonQMLHTML52070
paritytech/frontierRustTypeScriptJavaScript493+3395+1
typelevel/skunkScalaShell1.5k01470
hackerschoice/ssh-key-backdoorShell2810400
deathline94/sing-REALITY-BoxShell192+2105+26
OffchainLabs/nitro-contractsSolidityTypeScriptJavaScript36+1230
WankkoRee/eaioPythonStarlark1k0150
adoptium/aqa-testsHTMLJavaShell1170284+1
apache/skywalking-javaJavaShellKotlin60904950