Skip to main content

Quality Control Metrics

ICGC ARGO provides access to data from ARGO member programs generated through the standardized ARGO Analysis pipelines. The following page lists quality control metrics analyses and data files generated by the ARGO Analysis pipelines.

Quality control metrics are collected and recorded at several checkpoints in the ARGO Analysis pipeline to ensure that only high-quality data is released. All QC Metrics are associated to a corresponding analysis/file set that they annotate.

Sequencing QC

Data files containing sequencing quality related metrics.

File Types

Data SubtypesFilename PatternDescriptionAnalysis TypeData CategoryGenerating Workflow(s)
Read Group Metrics*.ubam_qc_metrics.tgzData files containing read group (lane) level QC metrics generated by Picard:CollectQualityYieldMetrics.qc_metricsQuality Control MetricsDNA Seq Alignment

Aligned Reads QC

Data files containing various quality control metrics for aligned CRAM files.

File Types

Data SubtypesFilename PatternDescriptionAnalysis TypeData CategoryGenerating Workflow(s)
Alignment Metrics*.qc_metrics.tgzData files containing comprehensive statistics for aligned CRAM files generated by Samtools:stats.qc_metricsQuality Control MetricsDNA Seq Alignment
Alignment Metrics*.bas_metrics.tgzData files containing comprehensive statistics for aligned CRAM files generated by Sanger:bam_stats.qc_metricsQuality Control Metrics
  • Sanger WGS Variant Calling
  • Sanger WXS Variant Calling
Alignment Metrics*.collectrnaseqmetrics.tgzData files containing comprehensive statistics for aligned CRAM files generated by Picard:CollectRnaSeqMetrics.qc_metricsQuality Control MetricsRNA Seq Alignment
Duplicates Metrics*.duplicates_metrics.tgzData files containing duplicates metrics generated by biobambam2:bammarkduplicates2. Multiple reads that match at the same position in the genome are located and tagged as duplicate reads in the CRAM file, since duplicate reads are defined as originating from a single fragment of DNA.qc_metricsQuality Control Metrics
  • DNA Seq Alignment
  • RNA Seq Alignment
OxoG Metrics*.oxog_metrics.tgzData files containing OxoG metrics generated by GATK:CollectOxoGMetrics. OxoG quantifies the error rate resulting from oxidative artifacts for aligned CRAM files.qc_metricsQuality Control MetricsDNA Seq Alignment

Analysis QC

Data files containing assessment metrics for various bioinformatics analysis quality control.

File Types

Data SubtypesFilename PatternDescriptionAnalysis TypeData CategoryGenerating Workflow(s)
  • Ploidy
  • Tumour Purity
*.ascat_metrics.tgzData files containing tumour purity and ploidy estimated by Sanger ASCAT CNV caller.qc_metricsQuality Control MetricsSanger WGS Variant Calling
Genotyping Stats*.genotyped_gender_metrics.tgzData files containing genotyping stats of CRAM files from the same donor generated by Sanger:compareBamGenotypes, including the fraction of matched genotypes and inferred donor gender.qc_metricsQuality Control MetricsSanger WGS Variant Calling
Variant Filtering Stats*.mutect_filtering_metrics.tgzData file generated by GATK:FilterMutectCalls, including the information on the probability threshold chosen to optimize the F score and the number of false positives and false negatives for each filter to be expected from this choice.qc_metricsQuality Control MetricsGATK Mutect2 Variant Calling
Variant Callable Stats*.mutect_callable_metrics.tgzData file generated by GATK:Mutect2, containing number of sites that are considered callable for GATK Mutect2 calling with read depth equal to or higher than callable-depth of default 10.qc_metricsQuality Control MetricsGATK Mutect2 Variant Calling
Runtime Stats*.timings-supplement.tgzCollected for different processing steps during the Sanger variant caller.variant_calling_supplementQuality Control Metrics
  • Sanger WGS Variant Calling
  • Sanger WXS Variant Calling

Sample QC

Data files containing assessment metrics for sample quality control.

File Types

Data SubtypesFilename PatternDescriptionAnalysis TypeData CategoryGenerating Workflow(s)
Cross Sample Contamination*.contamination_metrics.tgzData files containing cross sample contamination estimated by either Sanger:verifyBamHomChk or GATK:CalculateContamination , which provides information to determine whether the sample is possibly contaminated or swapped.qc_metricsQuality Control Metrics
  • Sanger WGS Variant Calling
  • GATK Mutect2 Variant Calling