Maxx StacksUniversityWikiHumanEval
Evaluation

HumanEval

Evaluation· Advanced

Definition

A code generation benchmark consisting of 164 hand-crafted Python programming problems with unit tests. Models are evaluated on their pass@k metric — the probability that at least one of k generated solutions passes all tests. Standard benchmark for comparing LLM coding capabilities.

Tags

#code#benchmark#Python#programming#testing
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules