Skip to main content

XML Variant Ingestion

The ARGO Data Platform supports the ingestion of variant calling data submitted in XML format (based on the GRCh37/hg19 genome reference). These XML files are processed through a standardized workflow to convert them into VCF format, followed by a liftover to the GRCh38 reference genome. For details, please see the latest version of the ARGO xml_variant_ingestion workflow.

Inputs

  • Submitted XML file(s): Containing variant calls based on the GRCh37 (hg19) genome reference.
  • Metadata mapping file: Used to map identifiers and provide additional sequencing and variant calling context.
  • Human reference genome:
    • GRCh38 used as target of liftover
    • GRCh37 used as reference to call variants in the XML file
  • Genome liftover chain file: Required for coordinate transformation from GRCh37 to GRCh38.

Processing

  1. XML to VCF Conversion:

    • Variant records are parsed from XML and converted into corresponding VCF records.
    • Variants are separated by type:
      • Short Variants: SNVs and InDels
      • Copy Number Alterations (CNVs)
      • Structural Variants / Rearrangements (SVs)
  2. Reference Genome Liftover:

Outputs

Workflow Diagram

XML-2-VCF