Quality Control Metrics
ICGC ARGO provides access to data from ARGO member programs generated through the standardized ARGO Analysis pipelines. The following page lists quality control metrics analyses and data files generated by the ARGO Analysis pipelines.
Quality control metrics are collected and recorded at several checkpoints in the ARGO Analysis pipeline to ensure that only high-quality data is released. All QC Metrics are associated to a corresponding analysis/file set that they annotate.
Sequencing QC
Data files containing sequencing quality related metrics.
File Types
Data Subtypes | Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
---|---|---|---|---|---|
Read Group Metrics | *.ubam_qc_metrics.tgz | Data files containing read group (lane) level QC metrics generated by Picard:CollectQualityYieldMetrics . | qc_metrics | Quality Control Metrics | DNA Seq Alignment |
Aligned Reads QC
Data files containing various quality control metrics for aligned CRAM files.
File Types
Data Subtypes | Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
---|---|---|---|---|---|
Alignment Metrics | *.qc_metrics.tgz | Data files containing comprehensive statistics for aligned CRAM files generated by Samtools:stats . | qc_metrics | Quality Control Metrics | DNA Seq Alignment |
Alignment Metrics | *.bas_metrics.tgz | Data files containing comprehensive statistics for aligned CRAM files generated by Sanger:bam_stats . | qc_metrics | Quality Control Metrics |
|
Alignment Metrics | *.collectrnaseqmetrics.tgz | Data files containing comprehensive statistics for aligned CRAM files generated by Picard:CollectRnaSeqMetrics . | qc_metrics | Quality Control Metrics | RNA Seq Alignment |
Duplicates Metrics | *.duplicates_metrics.tgz | Data files containing duplicates metrics generated by biobambam2:bammarkduplicates2 . Multiple reads that match at the same position in the genome are located and tagged as duplicate reads in the CRAM file, since duplicate reads are defined as originating from a single fragment of DNA. | qc_metrics | Quality Control Metrics |
|
OxoG Metrics | *.oxog_metrics.tgz | Data files containing OxoG metrics generated by GATK:CollectOxoGMetrics . OxoG quantifies the error rate resulting from oxidative artifacts for aligned CRAM files. | qc_metrics | Quality Control Metrics | DNA Seq Alignment |
Analysis QC
Data files containing assessment metrics for various bioinformatics analysis quality control.
File Types
Data Subtypes | Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
---|---|---|---|---|---|
| *.ascat_metrics.tgz | Data files containing tumour purity and ploidy estimated by Sanger ASCAT CNV caller. | qc_metrics | Quality Control Metrics | Sanger WGS Variant Calling |
Genotyping Stats | *.genotyped_gender_metrics.tgz | Data files containing genotyping stats of CRAM files from the same donor generated by Sanger:compareBamGenotypes , including the fraction of matched genotypes and inferred donor gender. | qc_metrics | Quality Control Metrics | Sanger WGS Variant Calling |
Variant Filtering Stats | *.mutect_filtering_metrics.tgz | Data file generated by GATK:FilterMutectCalls , including the information on the probability threshold chosen to optimize the F score and the number of false positives and false negatives for each filter to be expected from this choice. | qc_metrics | Quality Control Metrics | GATK Mutect2 Variant Calling |
Variant Callable Stats | *.mutect_callable_metrics.tgz | Data file generated by GATK:Mutect2 , containing number of sites that are considered callable for GATK Mutect2 calling with read depth equal to or higher than callable-depth of default 10. | qc_metrics | Quality Control Metrics | GATK Mutect2 Variant Calling |
Runtime Stats | *.timings-supplement.tgz | Collected for different processing steps during the Sanger variant caller. | variant_calling_supplement | Quality Control Metrics |
|
Sample QC
Data files containing assessment metrics for sample quality control.
File Types
Data Subtypes | Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
---|---|---|---|---|---|
Cross Sample Contamination | *.contamination_metrics.tgz | Data files containing cross sample contamination estimated by either Sanger:verifyBamHomChk or GATK:CalculateContamination , which provides information to determine whether the sample is possibly contaminated or swapped. | qc_metrics | Quality Control Metrics |
|