An end-to-end neural ad-hoc ranking pipeline.
This project is maintained by Georgetown-IR-Lab
OpenNIR supports a variety of standard information retrieval metrics by interfacing with tools such
as trec_eval and gdeval.pl.
Because the naming conventions of metrics in these tools is
inconsistent, we made a new convention of specifying the metric names and
parameters: metric_par1-val1_par2-val2@cutoff. Note that parameters are not necessarily applicable
to all metrics, and are sometimes optional. For instance, nDCG optionally supports a cutoff, but
does not support the ranking cutoff parameter.
The metrics that you want to calculate for each validaiton/test epcoh are specified with
valid_pred.measures=metric1,metric2,... and test_pred.measures=.... To choose the primary
metric for validation, use pipeline.val_metric=metric.
Metrics can be calcualted by using the onir.metric.calc(qrels, run, metrics) function.
Inputs:
qrels: Path to TREC-style qrels file, or dictionary in form of {qid: {did: rel_score}}run: Path to TREC-style run file, or dictionary in form of {qid: {did: rank_score}}metrics: Iterable of metric names to calculateOutputs:
{metric: {qid: score}}The the mean values across a collection of queries can be calculated using onir.metrics.mean(vals)
scripts/eval qrels_file run_file metric1 [metric2, ...] [-v] [-j] [-n] [-q]
qrels_file: TREC-formatted query relevance filerun_file: TREC-formatted run filemetric1 metric2 ...: metric names to run-v: verbose output-j json-formatted output-n no summary output-q output by queryp[_rel-r]@k, e.g., p@10, p_rel-4@1
r: (optional) minimum relevance level (defualt 1)k: (required) ranking cutoff thresholdmap[_rel-r][@k], e.g., map, map@100, map_rel-4@100
r: (optional) minimum relevance level (defualt 1)k: (optional) ranking cutoff thresholdrprec[_rel-r], e.g., rprec, rprec_rel-4
r: (optional) minimum relevance level (defualt 1)mrr[_rel-r], e.g., mrr, mrr_rel-4
r: (optional) minimum relevance level (defualt 1)err@k, e.g., err@20
k: (required) ranking cutoff thresholdndcg[@k], e.g., ndcg, ndcg@20
k: (optional) ranking cutoff thresholdjudged@k, e.g., judged@20
Number of judged documents (any relevance level) at level k
k: (required) ranking cutoff thresholdPyTrecEvalMetricsFrom the pytrec-eval python package (link). Interfaces
direclty with the trec_eval code, making it more efficient than spawning a trec_eval subprocess,
and allowing better support accross platforms.
Intorduction paper of pytrec_eval: Van Gysel, Christophe, and Maarten de Rijke. “Pytrec_eval:
An Extremely Fast Python Interface to trec_eval.” SIGIR 2018.
Supported metrics: map, rprec, mrr, p@5,10,15,20,30,100,200,500,1000,
ndcg, ndcg@5,10,15,20,30,100,200,500,1000, map@5,10,15,20,30,100,200,500,1000.
Note: does not support custom cutoff thresholds,
or relevance levels (_rel-r).
TrecEvalMetricsStarts trec_eval (link)[https://trec.nist.gov/trec_eval/] as a sub-process. Not supported by all
platforms, but does support more metrics and features than PyTrecEvalMetrics.
Supported metrics: mrr[_rel-r], rprec[_rel-r], map[_rel-r][@k], ndcg[@k], p[_rel-r]@k
GdevalMetricsStarts gdeval.pl (link)[https://trec.nist.gov/data/web/12/eval-README.txt] as a sub-process.
Requires perl in system PATH.
Supported metrics: ndcg@k, err@k. Note: does not support relevance levels.
JudgedMetricsCalcuates the percentage of documents that are judged in the top k ranked documents per query
(in python).
Only supports judged@k.