API & programmatic access
There is no single "unified API" — the platforms are heterogeneous. Instead we document each one's existing interface honestly, and provide a client-side cookbook (oryza19k) that orchestrates across them.
| Platform | Programmatic method | Auth | Status |
|---|---|---|---|
| GrameneOryza | Ensembl REST (rest.ensembl.org) · BioMart | none | Stable |
| Gramene FTP | Remote tabix of bgzipped VCFs | none | Standard |
| SNP-Seek v3 | REST (genotype / variety / SNP) | none | Confirm paths (IRRI) |
| Oryza CLIMtools | Downloadable tables (no REST API) | none | Tables |
| Code & Models | Git · Docker · oryza19k.predict_trait() | none | Open |
| Precomputed summaries | Zenodo (DOI) · oryza19k.summary_table() | none | At publication |
The oryza19k cookbook
oryza19k is a single-file Python helper (notebooks/oryza19k.py) — an access cookbook, not a server or a new API. Each function wraps an interface that already exists, so a workflow is a few clean calls. Use it as import oryza19k as o19, then o19.<function>(...).
| Function | Wraps | Network? |
|---|---|---|
lookup_gene · region_features · vep_effects | Ensembl REST | yes |
region_genotypes(source="tabix") | stream a bgzipped VCF | yes (public VCF host) |
climate_for_gene · accession_climate | CLIMtools tables | no |
predict_trait | the repo's pre-trained model | no (local model files) |
summary_table | precomputed tables | no (local) / yes (Zenodo) |
What it needs
| To use… | Libraries | Data |
|---|---|---|
| Import + Ensembl + local tables | pandas, numpy, pyarrow (requests optional) | none / the precomputed .tsv files |
predict_trait | joblib, scikit-learn, xgboost, lightgbm | a clone of the GitHub repo (model + imputer + features) |
region_genotypes("tabix") | pysam or the tabix CLI | a public bgzipped + indexed VCF URL |
Environments: a minimal pip install pandas numpy pyarrow covers Ensembl, tabix, and the precomputed tables; Colab or the project Docker image (which bundles the ML stack) is recommended for predict_trait. Full list in notebooks/requirements.txt.
import oryza19k as o19
o19.lookup_gene("Os03g0752800") # Ensembl REST — no extra deps
o19.summary_table("allele_freq_by_group") # precomputed table — pandas only
o19.predict_trait(genotypes, "hdg_80head", # the repo's pre-trained model
models_dir="…/AI-drive Predictive Phenotype Modeling")predict_trait is your GitHub demo, wrapped: it loads the team's own .pkl model + imputer and calls model.predict() on the SHAP-selected top-1,000 SNPs — it does not retrain or substitute a model. See Code & Models.
Ensembl REST (GrameneOryza) — stable
Documented, login-free access for Oryza sativa gene, region, variant-effect, and sequence queries. Verified live.
curl -s "https://rest.ensembl.org/lookup/id/Os03g0752800?content-type=application/json"
curl -s "https://rest.ensembl.org/overlap/region/oryza_sativa/3:31031753-31041563?feature=gene;feature=variation;content-type=application/json"
curl -s "https://rest.ensembl.org/vep/oryza_sativa/region/3:31037240-31037240/A?content-type=application/json"
curl -s "https://rest.ensembl.org/sequence/id/Os03t0752800?type=genomic;content-type=application/json"SNP-Seek REST
Genotype-by-region, variety, and SNP queries follow the SNP-Seek II design (Mansueto et al. 2017).
Exact endpoint paths/params are being confirmed with the IRRI team. Until then, prefer remote tabix (below) or the precomputed allele-frequency table for an equivalent, credential-free query.
Remote tabix — query 19,035 genomes without downloading them
The single most efficient access technique: the per-reference VCFs are bgzip-compressed and tabix-indexed, so any locus can be streamed over HTTPS.
tabix -h https://<public-host>/19K-RGP/IRGSP-1.0/19K-RGP.IRGSP-1.0.snps.vcf.gz 3:31031753-31041563
bcftools view -r 3:31031753-31041563 https://<host>/.../19K-RGP.IRGSP-1.0.snps.vcf.gz \
| bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%GT]\n'import pysam
vcf = pysam.VariantFile("https://<host>/.../19K-RGP.IRGSP-1.0.snps.vcf.gz")
for rec in vcf.fetch("3", 31031753, 31041563):
print(rec.chrom, rec.pos, rec.ref, rec.alts)Requires the VCFs to be served with HTTP range support; the one-time bgzip/tabix preparation is coordinated with the Gramene team.
Oryza CLIMtools — tables (no REST API)
CLIMtools is an R/Shiny resource; the supported programmatic path is its downloadable result tables, read directly in pandas/R. See the CLIMtools examples.
Precomputed summaries
A DOI-citable Zenodo deposit (mirrored on the Gramene FTP and the KAUST repository), so the most common questions need no large query.
| Table | Contents | Format |
|---|---|---|
| Allele frequencies (core SNPs) | 165,640 SNPs × global + per-group frequency | TSV |
| Accession passport | accession, varietal group, #phenotypes scored | TSV |
| Phenotypes | accession × 24 traits | TSV |
| Genomic-prediction benchmark | 23 models × 5 traits (Spearman, R², time) | TSV |
| Per-gene variant summary · HEV · GEA hits · SHAP→gene · haplotypes | exported by the platform teams | TSV |
import oryza19k as o19
af = o19.summary_table("allele_freq_by_group") # per-group allele frequencies
bm = o19.summary_table("benchmark") # model x trait benchmark