DNA-Seq Analysis Pipeline

The DNA-Sequencing (DNA-Seq) analysis pipeline identifies multiple types of somatic variant from both Whole Exome Sequencing (WXS) and Whole Genome Sequencing (WGS) sample data. DNA-Seq analysis is implemented across two main procedures:

  • Sequence Alignment
  • Variant Calling

In the future, these procedures will be extended to include:

  • Variant Masking
  • Variant Annotation
  • Consensus Calling

Alignment

The ARGO Data Platform accepts raw sequencing data in both FASTQ and BAM (aligned or unaligned) format. The first processing step in the DNA-Seq Pipeline is uniformly aligning samples to the GRCh38 reference genome. For details, please see the latest version of the ARGO DNA Alignment.

Inputs

  • All alignments are performed using GRCh38 as the human reference genome
  • Submitted FASTQ or BAM files(s)

Preprocessing

  • Submitted sequencing reads (FASTQ or BAM) are converted into lane level (i.e read group level) BAMs
  • Picard CollectQualityYieldMetrics is used for read group level BAM QC

Processing

Outputs

  • Aligned read CRAM and index files
  • Alignment QC metrics files

Alignment Workflow

Sanger WGS Variant Calling

Whole genome sequencing (WGS) aligned CRAM files are processed through the Sanger WGS Variant Calling Workflow as tumour/normal pairs. The ARGO DNA Seq pipeline has adopted the Sanger Whole Genome Sequencing Analysis Docker Image as the base workflow. For details, please see the latest version of the ARGO Sanger WGS Variant Calling workflow.

Inputs

  • Normal WGS aligned CRAM and index files
  • Tumour WGS aligned CRAM and index files
  • Reference files

Processing

  • Pindel InDel caller is used for somatic insertion/deletion variant detection.
  • ASCAT CNV caller is used for somatic copy number variant analysis.
  • CaVEMan SNV caller is used for somatic single nucleotide variant analysis.
  • BRASS SV caller is used for somatic structural variation detection.

Collect QC Metrics

  • WGS aligned reads statistics are generated by Sanger bam_stats.pl script. The files containing normal/tumour aligned reads statistics are further used by Pindel and BRASS callers.
  • Cross sample contamination is estimated by Sanger verifyBamHomChk.pl script for both normal and tumour samples.
  • Purity and ploidy are estimated by ASCAT CNV caller
  • Genotypes of CRAM files from the matched normal/tumour pair are compared and the fraction of matched genotypes are produced by Sanger compareBamGenotypes.pl script. It also checks if the inferred genders are matched.

Outputs

  • SNV VCF and index files
  • InDel VCF and index files
  • CNV VCF and index files
  • SV VCF and index files
  • Variant calling supplement files
  • QC metrics files

Sanger WGS Variant Calling Workflow

Sanger WXS Variant Calling

Whole exome sequencing (WXS) aligned CRAM files are processed through the Sanger WXS Variant Calling Workflow as tumour/normal pairs. The ARGO DNA Seq pipeline has adopted the Sanger Whole Exome Sequencing Analysis Docker Image as the base workflow. For details, please see the latest version of the ARGO Sanger WXS Variant Calling workflow.

Inputs

  • Normal WXS aligned CRAM and index files
  • Tumour WXS aligned CRAM and index files
  • Reference files

Processing

  • Pindel InDel caller is used for somatic insertion/deletion variant detection.
  • CaVEMan SNV caller is used for somatic single nucleotide variant analysis.

Collect QC Metrics

  • WXS aligned reads statistics are generated by Sanger bam_stats.pl script. The files containing normal/tumour aligned reads statistics are further used by Pindel caller.

Outputs

  • SNV VCF and index files
  • InDel VCF and index files
  • Variant calling supplement files
  • QC metrics files

Sanger WXS Variant Calling Workflow