openai/evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

PythonJupyter NotebookJavaScript
This is stars and forks stats for /openai/evals repository. As of 27 Apr, 2024 this repository has 12088 stars and 2278 forks.

OpenAI Evals Evals is a framework for evaluating LLMs (large language models) or systems built using LLMs as components. It also includes an open-source registry of challenging evals. We now support evaluating the behavior of any system including prompt chains or tool-using agents, via the Completion Function Protocol. With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. An "eval" is a task used to evaluate the quality of a system's behavior. To...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
THUDM/GLMPythonShellDockerfile2.7k+62710
awslabs/mountpoint-s3RustShellPython3.4k0900
apache/incubator-opendalRustJavaMDX1.9k+17275+2
facebook/buck2RustStarlarkPython2.9k01510
creativetimofficial/corporate-ui-dashboardSCSSCSSHTML190680
giuspek/FormalMethods2023SMTPython3080
QingyangKong/ChainlinkLearningPathSolidityJavaScript400380
facebook/buck2-preludeStarlarkPythonErlang310200
sveltia/sveltia-cmsJavaScriptSvelte2790150
vda-lab/datavis-technologies-handsonSvelteJavaScriptOther101250