Download the Markdown source ↓
19K-RGP — Data Access & Usability Package
Resource: Unity in Diversity: A Global Atlas of 19,035 Rice Genomes (19K Rice Genome Project, 19K-RGP) Purpose: a unified, reproducible guide to how to access, query, and analyze the 19K-RGP across its data platforms — APIs, precomputed summaries, and runnable example workflows.
The resource at a glance
| Rice genomes | 19,035 (9,309 newly sequenced + 9,726 public) |
| Reference assemblies | 5 platinum genomes — Nipponbare IRGSP‑1.0 (GJ), IR64RS2 (XI, new gap‑free), MH63RS3 (XI), ARC 10497 (cB), N22 (cA) |
| Variants | ~57 million across references; 2.02M high‑effect regulatory variants |
| Analyses | population genomics · rare variants · regulatory (HEV) modeling · genome × environment · small RNA · AlphaFold3 protein models · AI trait prediction |
The platforms (one documentation page each)
| Platform | What it's for | Programmatic access |
|---|---|---|
SNP‑Seek v3 (snp-seek.irri.org) | Interactive genotypes, haplotypes, allele frequencies, phenotypes | REST API (endpoints to confirm with IRRI) |
GrameneOryza (oryza.gramene.org) | Genome browser, search, FTP of extended variants + predicted effects | Ensembl REST · BioMart · remote tabix |
Oryza CLIMtools (gramene.org/CLIMtools/oryza_19K-RGP) | Climate ↔ genome (G×E) associations | Downloadable tables (no REST API) |
Code & Models (github.com/YongZhou2019/19K-RGP) | Pipelines + pre‑trained AI trait‑prediction models | Git, Docker, Colab |
| Archives & Bulk (NCBI · EVA · KAUST) | Raw reads, full variant archives, citable DOIs | Accession download |
What this package contains
Reviewer_webpage/
├── README.md ← you are here
├── supplementary_note_10/ Expanded "Data Access and Visualization" note (§§10.1–10.6)
├── notebooks/ Runnable example workflows + the `oryza19k` access-cookbook helper
├── precomputed_tables/ DOI-citable summary tables (+ Zenodo deposit manifest)
├── response/ Point-by-point reply to the reviewer + rebuttal paragraph
└── site/ Static documentation website (vanilla HTML/CSS/JS)
├── index.html start.html workflows.html api.html about.html
├── platforms/ one page per platform (README · Tutorial · Workflow · Examples)
├── partials/ shared header/footer/assistant
└── assets/ css · js · data · fonts · img · iconsHow it maps to the three asks
- APIs →
site/api.html(Access matrix + endpoint reference) and §§10.1–10.5 of the note; honest per‑platform tiers, copy‑paste curl/Python/R. - Precomputed summaries →
precomputed_tables/(allele frequencies, per‑gene variant summaries, the core‑SNP matrix, trait tables, benchmarks), published to Zenodo with a DOI. - Example workflows →
notebooks/+site/workflows.html: three end‑to‑end recipes anchored to results in the paper, runnable in Colab or Docker, with committed outputs and offline fallback.
Efficient data access — the headline technique
You do not need to download 19,035 genomes to query a locus. The package documents remote tabix streaming of the bgzipped VCFs (slice any region across all accessions over HTTPS), plus precomputed summary tables for the most common questions.
Conventions & compliance (please read before contributing)
- No credentials in any tracked file. Review‑only passwords live solely in the editor/cover‑letter channel. The website ships credential‑free; any review build toggle lives in an untracked
site/assets/js/config.local.js. - No endpoint is documented unless it returns HTTP 200 with no login. Endpoints are tagged by access tier; anything unverified is served as a precomputed table instead.
- Notebooks always run for a reader — every networked cell ships with committed output and falls back to a precomputed table if a live service is unavailable.
- Cite DOIs, not bare hostnames, for anything durable.
Status
Local‑first build (this folder). Public deployment (org GitHub Pages) and a Zenodo DOI are prepared but minted only after co‑author sign‑off. See the approved plan for the full design and open items.