Archives & Bulk Data
The cold-storage, citable layer: raw reads, the full variant archive, and permanent DOIs — for when you need the underlying data rather than an interactive query.
README — accessions & what's there
| Resource | Accession / DOI | Contents |
|---|---|---|
| NCBI BioProject | PRJNA954521 | 9K-RGP clean sequencing data |
| NCBI BioProject | PRJNA597070 | 3K-RGP data |
| NCBI BioProject | PRJNA952097 | PacBio HiFi for IR64RS2 |
| European Variation Archive | PRJEB105137 | Variants — SNP set ERZ28769989, InDel set ERZ28769990 |
| KAUST repository | DOI 10.25781/N3AF-NP78 | Raw reads (permanent handle) |
Use this when… you need raw FASTQ/gVCF, the complete variant archive, or a permanent citable handle — not for interactive queries.
Tutorial — fetch from the archives
- Locate the accession (above) at the EVA/ENA or NCBI SRA.
- Download via the archive's interface or command-line tools.
- Verify checksums before use.
Workflow — get a VCF, then slice it locally
Goal: get the genome-wide SNP VCF for one reference from the EVA, then slice a region locally with tabix (bridging to efficient access).
Examples
# Download a variant file by accession, then index for tabix
wget https://ftp.ebi.ac.uk/eva/.../19K-RGP.IRGSP-1.0.snps.vcf.gz
tabix -p vcf 19K-RGP.IRGSP-1.0.snps.vcf.gz# Fetch raw reads for one run
prefetch SRR24065674 && fasterq-dump SRR24065674Access & cite
Public Open, anonymous archive access (some records are embargoed during peer review and released at publication).
Cite the manuscript and the relevant archive accession (see Cite & about).