GUANinE overview

Genome Understanding And aNnotation in silico Evaluation, or GUANinE, is a benchmark for sequence-to-function models in genomics, concentrating on human (and eukaryotic) reference genomes.

As a benchmark, GUANinE offers modelers a chance to evaluate and develop competitive models on controlled, high-quality data designed for generalizability. Unique to GUANinE is its unparalleled scale (~ 1M test set datapoints, not including the train or dev splits), which allows for deeper profiling of experimental models and more thorough statistical testing.

Check out the getting started page for tips on downloading and accessing the data, or inspect the current leaderboard.

GUANinE Tasks, at a high level

task name

task type

task target

domain

dnase_prop

Accessibility

Sequence region

Human (hg38)

ccre_prop

Functional elements

Sequence region

Human (hg38)

cons30

Seq. Conservation

Sequence region

Human-Mammal

cons100

Seq. Conservation

Sequence region

Human-Vertebrate

gpra-c

Promoter expression

Short sequence

Yeast (synthetic)

gpra-d

Promoter expression

Short sequence

Yeast (synthetic)

cadd-snv

Deleteriousness

Sequence variant

Human (simulated)

cadd-indel

Deleteriousness

Indel variant

Human (simulated)

clinvar-snv

Pathogenicity

Sequence variant

Human (clinical)

See also

For a detailed comparison of tasks, consult the task comparison page.


GUANinE is developed and maintained by eyes robson, a PhD candidate under Nilah Ioannidis. To cite GUANinE, use the following .bibtex

v1.0 .bibtex
@InProceedings{pmlr-v240-robson24a,
title =      {GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models},
author =       {robson, eyes s. and Ioannidis, Nilah},
booktitle =          {Proceedings of the 18th Machine Learning in Computational Biology meeting},
pages =      {250--266},
year =       {2024},
editor =     {Knowles, David A. and Mostafavi, Sara},
volume =     {240},
series =     {Proceedings of Machine Learning Research},
month =      {30 Nov--01 Dec},
publisher =    {PMLR},
pdf =        {https://proceedings.mlr.press/v240/robson24a/robson24a.pdf},
url =        {https://proceedings.mlr.press/v240/robson24a.html}
}