Archives & Bulk Data

The cold-storage, citable layer: raw reads, the full variant archive, and permanent DOIs — for when you need the underlying data rather than an interactive query.

README — accessions & what's there

ResourceAccession / DOIContents
NCBI BioProjectPRJNA9545219K-RGP clean sequencing data
NCBI BioProjectPRJNA5970703K-RGP data
NCBI BioProjectPRJNA952097PacBio HiFi for IR64RS2
European Variation ArchivePRJEB105137Variants — SNP set ERZ28769989, InDel set ERZ28769990
KAUST repositoryDOI 10.25781/N3AF-NP78Raw reads (permanent handle)

Use this when… you need raw FASTQ/gVCF, the complete variant archive, or a permanent citable handle — not for interactive queries.

Tutorial — fetch from the archives

  1. Locate the accession (above) at the EVA/ENA or NCBI SRA.
  2. Download via the archive's interface or command-line tools.
  3. Verify checksums before use.

Workflow — get a VCF, then slice it locally

Goal: get the genome-wide SNP VCF for one reference from the EVA, then slice a region locally with tabix (bridging to efficient access).

Examples

# Download a variant file by accession, then index for tabix
wget https://ftp.ebi.ac.uk/eva/.../19K-RGP.IRGSP-1.0.snps.vcf.gz
tabix -p vcf 19K-RGP.IRGSP-1.0.snps.vcf.gz

Access & cite

Public Open, anonymous archive access (some records are embargoed during peer review and released at publication).

Cite the manuscript and the relevant archive accession (see Cite & about).