Skip to main content

Data Releases

An ARGO data release is a curated data set of clinical and molecular data submitted to the ARGO Data Platform. Data releases happen approximately four times a year and are cumulative in nature. Released data can be browsed using the File Repository and downloaded using a client tool, provided that access to controlled data has been granted. To access controlled data, please see the DACO application process here.

Data Release 8.0

Release Date: March 11, 2024

Data Release 8.0 includes new clinical and molecular data from the Mutographs (MUTO-INTL) and Polyethnic-1000 (P1000-US) programs.

This release includes:

  • MUTO-INTL: 448 donors with clinical data, aligned WGS reads, and Mutect2 variant calls
    • MUTO-INTL: 285/448 donors additionally have Sanger variant calls
    • MUTO-INTL: 9/448 donors have open access Sanger variant Calls
  • P1000-US: 67 donors with clinical data, aligned WGS reads, and Mutect2 variant calls

Data Release 7.0

Release Date: November 06, 2023

New Updates

ICGC ARGO is excited to announce the first release of new clinical data available now for Australian Pancreatic Cancer Genome Initiative (APGI-AU), Pancreatic Cancer (PACA-CA), and Papillary Thyroid Cancer (PTC-SA). APGI-AU was a member of the original ICGC-25K data portal as PACA-AU.

This release includes:

  • APGI-AU: New clinical data has been added for 84 donors
  • PACA-CA: New clinical data has been added for 38 donors
  • PTC-SA: New clinical data has been added for 239 donors

Data Release 6.0

Release Date: April 05, 2023

New Updates

ICGC ARGO is excited to announce the first release of a new RNA-Seq analysis workflow: RNA Seq Alignment, available now for transcriptome sequencing data from Australian Pancreatic Cancer Genome Initiative (APGI-AU). APGI-AU was a member of the original ICGC-25K data portal as PACA-AU. While clinical data is being actively submitted by the program, a selection of its RNA-Seq data has been reprocessed against the latest GRCh38 Human Reference Genome using the ARGO RNA Seq Alignment Workflow.

This release includes:

  • APGI-AU: 28 previously released donors now have
    • STAR aligned reads in both genome and transcript coordinates
    • HISAT2 aligned reads in genome coordinates
    • High confidence splice junctions generated by both STAR and HISAT2

Data Release 5.0

Release Date: March 7, 2022

New Updates

Data Release 5.0 include the first release of a new DNA-Seq Pipeline: Open Access Variant Filtering Workflow, available now for both WXS and WGS samples.

This release includes:

  • APGI-AU: 89 previously released donors now have Open Access variant calls
  • PACA-CA: 136 previously released donors now have Open Access variant calls
  • OCCAMS-GB: 400 previously released donors now have Open Access variant calls
  • PTC-SA: 239 previously released donors now have Open Access variant calls
  • LUCA-KR: 167 previously released donors now have Open Access variant calls

Data Release 4.0

Release Date: September 27, 2021

New Updates

This release includes a new programs' data becoming available:

  • APGI-AU: 89 new donors with aligned WXS reads, Sanger and Mutect2 variant calls.

Additionally, more data has been released for:

  • OCCAMS-GB: 168 new donors with aligned WGS reads, Sanger and Mutect2 variant calls.
  • LUCA-KR: 8 new donors with aligned reads, Sanger and Mutect2 variant calls.

Data Release 3.0

Release Date: May 10, 2021

New Updates

We are happy to announce a release of a new DNA-Seq Pipeline Variant calling data type: GATK Mutect2, available now for both WXS and WGS samples.

This release includes:

  • PACA-CA: 1 new donor with aligned reads, Sanger and Mutect2 variant calls with. Additionally, 135 previously released donors now have GATK Mutect2 variant calls.
  • OCCAMS-GB: 130 new donors with aligned WGS reads, Sanger and Mutect2 variant calls. 94 previously released donors additionally have Mutect2 variant calls.
  • PTC-SA: 99 new donors with aligned WXS reads, Sanger and Mutect2 variant calls. Additionally, 140 previously released donors now have Mutect2 variant calls.
  • LUCA-KR: 130 new donors with aligned reads, Sanger and Mutect2 variant calls. Additionally, 29 previously released donors now have Mutect2 variant calls.

Bug Fixes

  • PACA-CA/DO35226 was removed as only a Tumour Sample has completed workflow processing. We are working on resolving processing issues for this donor.

Data Release 2.0

Release Date: October 23, 2020

New Updates

Data Release 2.0 includes the first release of whole exome sequencing (WXS) data from papillary thyroid cancer (PTC-SA) and the first release of whole genome sequencing (WGS) data from lung cancer (LUKA-KR). Both programs were members of the original ICGC-25K data portal. While clinical data will soon be submitted by the programs, a selection of their molecular data has been reprocessed against the latest GRCh38 Human Reference Genome using the ARGO DNA Seq Pipeline.

This release includes:

  • PACA-CA: 52 new donors with WGS data totaling 133 donors with WGS reads and 121 donors with variant calls
  • PACA-CA: 12 new donors with aligned WXS reads
  • LUCA-KR: 29 donors with aligned WGS reads, and 29 donors with variant calls.
  • PTC-SA: 142 donors with aligned WXS reads, and 140 donors with variant calls.

In addition to these genomic files, the first release of pipeline Quality Control metrics is also included. For a breakdown of the provided QC metrics, please review the qc metrics file type documentation. For an explanation of qc metrics as they are generated through the pipeline, please review the DNA-Seq Analysis Pipeline documentation.

Bug Fixes

  • Three OCCAMS-GB donors had alignment and variant data removed from release due to an issue in file naming in the analysis workflow. The data is being reprocessed and will be re-released in the next release.

Known Issues

None to report.

Data Release 1.0

Release Date: June 19, 2020

New Updates

ICGC ARGO is excited to announce its initial data release including whole genome sequencing (WGS) data from pancreatic cancer (PACA-CA) and esophageal cancer (OCCAMS-GB). Both programs were members of the original ICGC-25K data portal. While clinical data will soon be submitted by the programs, a selection of their molecular data has been reprocessed against the latest GRCh38 Human Reference Genome using the ARGO DNA Seq Pipeline. The resulting somatic mutation calls include single nucleotide variations (SNVs), insertion-deletion (indels), copy number variations (CNVs) and structural variations (SVs). This release includes:

  • PACA-CA: 81 donors with aligned WGS reads, and 62 donors with variant calls.
  • OCCAMS-GB: 96 donors with aligned WGS reads, and 95 donors with variant calls.