An end-to-end neural ad-hoc ranking pipeline.

This project is maintained by Georgetown-IR-Lab



OpenNIR supports a variety of standard information retrieval metrics by interfacing with tools such as trec_eval and

Naming of metrics

Because the naming conventions of metrics in these tools is inconsistent, we made a new convention of specifying the metric names and parameters: metric_par1-val1_par2-val2@cutoff. Note that parameters are not necessarily applicable to all metrics, and are sometimes optional. For instance, nDCG optionally supports a cutoff, but does not support the ranking cutoff parameter.

Usage in the standard pipeline

The metrics that you want to calculate for each validaiton/test epcoh are specified with valid_pred.measures=metric1,metric2,... and test_pred.measures=.... To choose the primary metric for validation, use pipeline.val_metric=metric.


Metrics can be calcualted by using the onir.metric.calc(qrels, run, metrics) function.



The the mean values across a collection of queries can be calculated using onir.metrics.mean(vals)

Command line tool

scripts/eval qrels_file run_file metric1 [metric2, ...] [-v] [-j] [-n] [-q]

Supported metrics

Precision @ k

p[_rel-r]@k, e.g., p@10, p_rel-4@1

Mean Average Precision

map[_rel-r][@k], e.g., map, map@100, map_rel-4@100


rprec[_rel-r], e.g., rprec, rprec_rel-4

Mean Reciprocal Rank

mrr[_rel-r], e.g., mrr, mrr_rel-4

Expected Reciprocal Rank

err@k, e.g., err@20

Normalized Discounted Cumulative Gain

ndcg[@k], e.g., ndcg, ndcg@20

Judged @ k

judged@k, e.g., judged@20

Number of judged documents (any relevance level) at level k

Metirc Providers


From the pytrec-eval python package (link). Interfaces direclty with the trec_eval code, making it more efficient than spawning a trec_eval subprocess, and allowing better support accross platforms.

Intorduction paper of pytrec_eval: Van Gysel, Christophe, and Maarten de Rijke. “Pytrec_eval: An Extremely Fast Python Interface to trec_eval.” SIGIR 2018.

Supported metrics: map, rprec, mrr, p@5,10,15,20,30,100,200,500,1000, ndcg, ndcg@5,10,15,20,30,100,200,500,1000, map@5,10,15,20,30,100,200,500,1000. Note: does not support custom cutoff thresholds, or relevance levels (_rel-r).


Starts trec_eval (link)[] as a sub-process. Not supported by all platforms, but does support more metrics and features than PyTrecEvalMetrics.

Supported metrics: mrr[_rel-r], rprec[_rel-r], map[_rel-r][@k], ndcg[@k], p[_rel-r]@k


Starts (link)[] as a sub-process. Requires perl in system PATH.

Supported metrics: ndcg@k, err@k. Note: does not support relevance levels.


Calcuates the percentage of documents that are judged in the top k ranked documents per query (in python).

Only supports judged@k.