# 19K-RGP — Data Access & Usability Package

**Resource:** *Unity in Diversity: A Global Atlas of 19,035 Rice Genomes* (19K Rice Genome Project, 19K-RGP)
**Purpose:** a unified, reproducible guide to **how to access, query, and analyze** the 19K-RGP across its data platforms — APIs, precomputed summaries, and runnable example workflows.

> This package was assembled to make a large, multi-platform genomics resource easy to use: clear documentation, copy-paste API recipes, end-to-end example notebooks, and precomputed summary tables, tied together by a small website and an in-page assistant.

---

## The resource at a glance

| | |
|---|---|
| Rice genomes | **19,035** (9,309 newly sequenced + 9,726 public) |
| Reference assemblies | **5 platinum genomes** — Nipponbare IRGSP‑1.0 (GJ), IR64RS2 (XI, new gap‑free), MH63RS3 (XI), ARC 10497 (cB), N22 (cA) |
| Variants | **~57 million** across references; 2.02M high‑effect regulatory variants |
| Analyses | population genomics · rare variants · regulatory (HEV) modeling · genome × environment · small RNA · AlphaFold3 protein models · AI trait prediction |

## The platforms (one documentation page each)

| Platform | What it's for | Programmatic access |
|---|---|---|
| **SNP‑Seek v3** (`snp-seek.irri.org`) | Interactive genotypes, haplotypes, allele frequencies, phenotypes | REST API *(endpoints to confirm with IRRI)* |
| **GrameneOryza** (`oryza.gramene.org`) | Genome browser, search, FTP of extended variants + predicted effects | Ensembl REST · BioMart · remote `tabix` |
| **Oryza CLIMtools** (`gramene.org/CLIMtools/oryza_19K-RGP`) | Climate ↔ genome (G×E) associations | Downloadable tables (no REST API) |
| **Code & Models** (`github.com/YongZhou2019/19K-RGP`) | Pipelines + pre‑trained AI trait‑prediction models | Git, Docker, Colab |
| **Archives & Bulk** (NCBI · EVA · KAUST) | Raw reads, full variant archives, citable DOIs | Accession download |

---

## What this package contains

```
Reviewer_webpage/
├── README.md                  ← you are here
├── supplementary_note_10/     Expanded "Data Access and Visualization" note (§§10.1–10.6)
├── notebooks/                 Runnable example workflows + the `oryza19k` access-cookbook helper
├── precomputed_tables/        DOI-citable summary tables (+ Zenodo deposit manifest)
├── response/                  Point-by-point reply to the reviewer + rebuttal paragraph
└── site/                      Static documentation website (vanilla HTML/CSS/JS)
    ├── index.html  start.html  workflows.html  api.html  about.html
    ├── platforms/             one page per platform (README · Tutorial · Workflow · Examples)
    ├── partials/              shared header/footer/assistant
    └── assets/                css · js · data · fonts · img · icons
```

## How it maps to the three asks

- **APIs** → `site/api.html` (Access matrix + endpoint reference) and §§10.1–10.5 of the note; honest per‑platform tiers, copy‑paste curl/Python/R.
- **Precomputed summaries** → `precomputed_tables/` (allele frequencies, per‑gene variant summaries, the core‑SNP matrix, trait tables, benchmarks), published to Zenodo with a DOI.
- **Example workflows** → `notebooks/` + `site/workflows.html`: three end‑to‑end recipes anchored to results in the paper, runnable in Colab or Docker, with committed outputs and offline fallback.

## Efficient data access — the headline technique

You do **not** need to download 19,035 genomes to query a locus. The package documents **remote `tabix` streaming** of the bgzipped VCFs (slice any region across all accessions over HTTPS), plus precomputed summary tables for the most common questions.

---

## Conventions & compliance (please read before contributing)

- **No credentials in any tracked file.** Review‑only passwords live solely in the editor/cover‑letter channel. The website ships credential‑free; any review build toggle lives in an untracked `site/assets/js/config.local.js`.
- **No endpoint is documented unless it returns HTTP 200 with no login.** Endpoints are tagged by access tier; anything unverified is served as a precomputed table instead.
- **Notebooks always run for a reader** — every networked cell ships with committed output and falls back to a precomputed table if a live service is unavailable.
- **Cite DOIs, not bare hostnames**, for anything durable.

## Status

Local‑first build (this folder). Public deployment (org GitHub Pages) and a Zenodo DOI are prepared but minted only after co‑author sign‑off. See the approved plan for the full design and open items.
