API & programmatic access

There is no single "unified API" — the platforms are heterogeneous. Instead we document each one's existing interface honestly, and provide a client-side cookbook (oryza19k) that orchestrates across them.

PlatformProgrammatic methodAuthStatus
GrameneOryzaEnsembl REST (rest.ensembl.org) · BioMartnoneStable
Gramene FTPRemote tabix of bgzipped VCFsnoneStandard
SNP-Seek v3REST (genotype / variety / SNP)noneConfirm paths (IRRI)
Oryza CLIMtoolsDownloadable tables (no REST API)noneTables
Code & ModelsGit · Docker · oryza19k.predict_trait()noneOpen
Precomputed summariesZenodo (DOI) · oryza19k.summary_table()noneAt publication

The oryza19k cookbook

oryza19k is a single-file Python helper (notebooks/oryza19k.py) — an access cookbook, not a server or a new API. Each function wraps an interface that already exists, so a workflow is a few clean calls. Use it as import oryza19k as o19, then o19.<function>(...).

FunctionWrapsNetwork?
lookup_gene · region_features · vep_effectsEnsembl RESTyes
region_genotypes(source="tabix")stream a bgzipped VCFyes (public VCF host)
climate_for_gene · accession_climateCLIMtools tablesno
predict_traitthe repo's pre-trained modelno (local model files)
summary_tableprecomputed tablesno (local) / yes (Zenodo)

What it needs

To use…LibrariesData
Import + Ensembl + local tablespandas, numpy, pyarrow (requests optional)none / the precomputed .tsv files
predict_traitjoblib, scikit-learn, xgboost, lightgbma clone of the GitHub repo (model + imputer + features)
region_genotypes("tabix")pysam or the tabix CLIa public bgzipped + indexed VCF URL

Environments: a minimal pip install pandas numpy pyarrow covers Ensembl, tabix, and the precomputed tables; Colab or the project Docker image (which bundles the ML stack) is recommended for predict_trait. Full list in notebooks/requirements.txt.

import oryza19k as o19
o19.lookup_gene("Os03g0752800")                       # Ensembl REST — no extra deps
o19.summary_table("allele_freq_by_group")             # precomputed table — pandas only
o19.predict_trait(genotypes, "hdg_80head",            # the repo's pre-trained model
                  models_dir="…/AI-drive Predictive Phenotype Modeling")

predict_trait is your GitHub demo, wrapped: it loads the team's own .pkl model + imputer and calls model.predict() on the SHAP-selected top-1,000 SNPs — it does not retrain or substitute a model. See Code & Models.

Ensembl REST (GrameneOryza) — stable

Documented, login-free access for Oryza sativa gene, region, variant-effect, and sequence queries. Verified live.

curl -s "https://rest.ensembl.org/lookup/id/Os03g0752800?content-type=application/json"
curl -s "https://rest.ensembl.org/overlap/region/oryza_sativa/3:31031753-31041563?feature=gene;feature=variation;content-type=application/json"
curl -s "https://rest.ensembl.org/vep/oryza_sativa/region/3:31037240-31037240/A?content-type=application/json"
curl -s "https://rest.ensembl.org/sequence/id/Os03t0752800?type=genomic;content-type=application/json"

SNP-Seek REST

Genotype-by-region, variety, and SNP queries follow the SNP-Seek II design (Mansueto et al. 2017).

Exact endpoint paths/params are being confirmed with the IRRI team. Until then, prefer remote tabix (below) or the precomputed allele-frequency table for an equivalent, credential-free query.

Remote tabix — query 19,035 genomes without downloading them

The single most efficient access technique: the per-reference VCFs are bgzip-compressed and tabix-indexed, so any locus can be streamed over HTTPS.

tabix -h https://<public-host>/19K-RGP/IRGSP-1.0/19K-RGP.IRGSP-1.0.snps.vcf.gz 3:31031753-31041563
bcftools view -r 3:31031753-31041563 https://<host>/.../19K-RGP.IRGSP-1.0.snps.vcf.gz \
  | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%GT]\n'

Requires the VCFs to be served with HTTP range support; the one-time bgzip/tabix preparation is coordinated with the Gramene team.

Oryza CLIMtools — tables (no REST API)

CLIMtools is an R/Shiny resource; the supported programmatic path is its downloadable result tables, read directly in pandas/R. See the CLIMtools examples.

Precomputed summaries

A DOI-citable Zenodo deposit (mirrored on the Gramene FTP and the KAUST repository), so the most common questions need no large query.

TableContentsFormat
Allele frequencies (core SNPs)165,640 SNPs × global + per-group frequencyTSV
Accession passportaccession, varietal group, #phenotypes scoredTSV
Phenotypesaccession × 24 traitsTSV
Genomic-prediction benchmark23 models × 5 traits (Spearman, R², time)TSV
Per-gene variant summary · HEV · GEA hits · SHAP→gene · haplotypesexported by the platform teamsTSV
import oryza19k as o19
af = o19.summary_table("allele_freq_by_group")     # per-group allele frequencies
bm = o19.summary_table("benchmark")                # model x trait benchmark