Example workflows

Three runnable, cross-platform recipes anchored to results in the paper. Each ships as a notebook (Colab / Docker) with committed outputs, and falls back to precomputed tables so it always runs.

Notebooks on GitHub ↗

A — From the MADS50/MADS14 locus to a heading-date hypothesis

Question: does natural variation at MADS14 in Xian/Indica track local temperature, and can I turn that into a testable heading-date prediction?

Resolve the gene (GrameneOryza / Ensembl REST) → MADS14, chr3:31,031,753–31,041,563.
Variants & effects in the locus (overlap + VEP).
Genotypes across XI via remote tabix (or SNP-Seek).
Climate association (Oryza CLIMtools) → top variable BIO6 (min temp, coldest month).
Join climate + phenotype, reproduce the haplotype–climate signal.
Predict heading date genome-wide and rank candidates (pre-trained model).

import oryza19k as o19
gene  = o19.lookup_gene("Os03g0752800")                  # MADS14
vars  = o19.region_features("3:31031753-31041563", feature="variation")
clim  = o19.climate_for_gene("MADS14", group="XI")       # -> BIO6
preds = o19.predict_trait(genotypes, "hdg_80head")

B — A rare TB1/FC1 high-effect variant and tiller number

Question: find carriers of the rare TB1/FC1 promoter variant that strengthens an OsBZR1 motif, and check the tiller-number association.

Locate TB1/FC1 (Ensembl) — FC1, chr3:28,428,504–28,430,438.
Genotype the site across the panel; get allele frequency (≈38 carriers, 0.95%).
Confirm the regulatory consequence (VEP + the high-effect-variant table).
Pull tiller number + group/geography for carriers.
Effect size with a 95% CI, plus a relatedness/geography confounding check.

With ~38 carriers, always report the confidence interval and confirm the carriers are not a single clade or locale before interpreting the effect.

C — Whole-collection trait imputation & benchmarking

Question: I have genotypes for accessions with no measured grain weight — predict it, and show the method's standing against GBLUP.

Get the core-SNP matrix (precomputed table or stream regions).
Load a pre-trained model and predict on the missing set.
Benchmark against rrBLUP / GBLUP / BayesA–C and tabular deep models.
Interpret via the SHAP-top-SNP → gene table.

import oryza19k as o19
bench = o19.summary_table("benchmark")                   # 23 models x 5 traits
bench[bench["trait"] == "Heading date"].sort_values("spearman", ascending=False).head()
#  CatBoost 0.838 · XGBoost 0.837 · RandomForest 0.834 · LightGBM 0.833 · GBLUP 0.830