Download the Markdown source ↓
Precomputed Summary Tables — Zenodo Deposit Manifest
DOI-citable summary tables for the 19K Rice Genome Project (19K-RGP), so that the most common questions about the resource need no large query. This directly answers Reviewer #1.6's request for precomputed summaries.
Hosting plan (at publication): Zenodo (mints a versioned DOI) → mirrored on the Gramene FTP (…/oryza/19K-RGP/summaries/) and the KAUST repository (DOI 10.25781/N3AF-NP78). Cite the Zenodo DOI, not a hostname.
Provenance: tables 1–4 are regenerated from local data by generate_tables.py (reproducible, no credentials). Tables 5–9 are exported from cluster-scale variant data by the platform teams and are listed here so the catalog is complete.
Generated here (generate_tables.py)
| # | File | Key columns | Format / size | Answers |
|---|---|---|---|---|
| 1 | genomic_prediction_benchmark.tsv | model, family, trait, spearman, r2, training_time_sec | TSV, ~10 KB (115 rows = 23 models × 5 traits) | example workflows; head-to-head vs rrBLUP/GBLUP/BayesA-C (Rev #1.4) |
| 2 | accession_passport.tsv | accession, varietal_group, n_phenotypes_scored | TSV, ~0.5 MB (16,779 accessions) | precomputed summaries; panel composition (Rev #1.2) |
| 3 | phenotypes_by_accession.tsv | accession, 24 phenotypes, varietal_group | TSV, ~3 MB | precomputed summaries; example workflows |
| 4 | allele_frequency_core_snps.tsv | snp, chrom, pos, ref, alt, n_called, alt_allele_freq, maf, af_\<group\> | TSV, ~15 MB (165,640 SNPs) | precomputed summaries; rare/group-specific variants (Rev #1.3, #2) |
Varietal groups present in the local labels: XI_indica, GJ_trop, GJ_temp, GJ-adm, cAus, cBasmati (plus "Unknown" for accessions not yet group-assigned).
Exported by the platform teams (cluster-scale; full ~57 M variants)
| # | File | Key columns | Owner | Answers |
|---|---|---|---|---|
| 5 | gene_variant_summary.tsv | gene, n_variants, n_high_impact, consequence_counts, pangene_class | Gramene | precomputed summaries |
| 6 | hev_table.tsv | variant, gene, pred_delta_accessibility, group, motif (e.g. OsBZR1) | KAUST | example workflow B; transparency (Rev #2.2/#2.4) |
| 7 | gea_hits_by_group.tsv | variant, climate_variable, effect, P, FDR, group | CLIMtools | lets reviewers re-test FDR (Rev #2.6) |
| 8 | mads14_mads50_haplotypes.tsv | gene, group, haplotype, defining_variants, mean_heading_date, mean_BIO6 | KAUST | flagship workflow A reproducibility |
| 9 | shap_top_snp_to_gene.tsv | trait, snp, shap_importance, nearest_gene, known_novel | KAUST (Halimi) | interpretability; example workflow C |
Large master artifacts (Zenodo, not git)
genotypes_with_phenos.parquet— the 165,640 core-SNP × 16,779-accession genotype+phenotype matrix (the input to the AI models). ~1–2 GB.- Per-reference bgzipped + tabix-indexed VCFs (for remote
tabixstreaming) — served from the public Gramene FTP at publication.
Reproduce tables 1–4
cd precomputed_tables
python generate_tables.py # all four (allele-freq pass reads ~1–2 GB)
python generate_tables.py --quick # skip the allele-frequency pass