19K-RGP — Data Access & Usability Package

Resource: Unity in Diversity: A Global Atlas of 19,035 Rice Genomes (19K Rice Genome Project, 19K-RGP) Purpose: a unified, reproducible guide to how to access, query, and analyze the 19K-RGP across its data platforms — APIs, precomputed summaries, and runnable example workflows.

This package was assembled to make a large, multi-platform genomics resource easy to use: clear documentation, copy-paste API recipes, end-to-end example notebooks, and precomputed summary tables, tied together by a small website and an in-page assistant.

The resource at a glance


Rice genomes	19,035 (9,309 newly sequenced + 9,726 public)
Reference assemblies	5 platinum genomes — Nipponbare IRGSP‑1.0 (GJ), IR64RS2 (XI, new gap‑free), MH63RS3 (XI), ARC 10497 (cB), N22 (cA)
Variants	~57 million across references; 2.02M high‑effect regulatory variants
Analyses	population genomics · rare variants · regulatory (HEV) modeling · genome × environment · small RNA · AlphaFold3 protein models · AI trait prediction

The platforms (one documentation page each)

Platform	What it's for	Programmatic access
SNP‑Seek v3 (`snp-seek.irri.org`)	Interactive genotypes, haplotypes, allele frequencies, phenotypes	REST API (endpoints to confirm with IRRI)
GrameneOryza (`oryza.gramene.org`)	Genome browser, search, FTP of extended variants + predicted effects	Ensembl REST · BioMart · remote `tabix`
Oryza CLIMtools (`gramene.org/CLIMtools/oryza_19K-RGP`)	Climate ↔ genome (G×E) associations	Downloadable tables (no REST API)
Code & Models (`github.com/YongZhou2019/19K-RGP`)	Pipelines + pre‑trained AI trait‑prediction models	Git, Docker, Colab
Archives & Bulk (NCBI · EVA · KAUST)	Raw reads, full variant archives, citable DOIs	Accession download

What this package contains

Reviewer_webpage/
├── README.md                  ← you are here
├── supplementary_note_10/     Expanded "Data Access and Visualization" note (§§10.1–10.6)
├── notebooks/                 Runnable example workflows + the `oryza19k` access-cookbook helper
├── precomputed_tables/        DOI-citable summary tables (+ Zenodo deposit manifest)
├── response/                  Point-by-point reply to the reviewer + rebuttal paragraph
└── site/                      Static documentation website (vanilla HTML/CSS/JS)
    ├── index.html  start.html  workflows.html  api.html  about.html
    ├── platforms/             one page per platform (README · Tutorial · Workflow · Examples)
    ├── partials/              shared header/footer/assistant
    └── assets/                css · js · data · fonts · img · icons

How it maps to the three asks

APIs → site/api.html (Access matrix + endpoint reference) and §§10.1–10.5 of the note; honest per‑platform tiers, copy‑paste curl/Python/R.
Precomputed summaries → precomputed_tables/ (allele frequencies, per‑gene variant summaries, the core‑SNP matrix, trait tables, benchmarks), published to Zenodo with a DOI.
Example workflows → notebooks/ + site/workflows.html: three end‑to‑end recipes anchored to results in the paper, runnable in Colab or Docker, with committed outputs and offline fallback.

Efficient data access — the headline technique

You do not need to download 19,035 genomes to query a locus. The package documents remote tabix streaming of the bgzipped VCFs (slice any region across all accessions over HTTPS), plus precomputed summary tables for the most common questions.

Conventions & compliance (please read before contributing)

No credentials in any tracked file. Review‑only passwords live solely in the editor/cover‑letter channel. The website ships credential‑free; any review build toggle lives in an untracked site/assets/js/config.local.js.
No endpoint is documented unless it returns HTTP 200 with no login. Endpoints are tagged by access tier; anything unverified is served as a precomputed table instead.
Notebooks always run for a reader — every networked cell ships with committed output and falls back to a precomputed table if a live service is unavailable.
Cite DOIs, not bare hostnames, for anything durable.

Status

Local‑first build (this folder). Public deployment (org GitHub Pages) and a Zenodo DOI are prepared but minted only after co‑author sign‑off. See the approved plan for the full design and open items.