Clinical Data Validation Rules
Clinical Data Encoding Rules
The data dictionary contains certain data elements regarded as "identifiers". These fields have an descriptor in the data dictionary and include:
- Primary Diagnosis:
- Follow Up:
These fields must to be coded specifically for ICGC ARGO purposes using the following rules:
- These identifiers should not be derived from biobank or hospital identifiers or any other personal identifying information. These IDs are to be coded in such a way that they cannot be tracked back to the individual donors, except by the submitting program. Only the program will keep the key that will permit the data to be linked back to the individual donors. This key must not be communicated to the data users.
- Identifiers are assigned by each submitting program and must be unique within all the data submitted by that program (no duplicate IDs allowed).
- Identifiers referring to the same entity should be consistent across separate program submissions and should not be re-used for different entities. For example, the same donor should not be assigned different identifiers in different files or subsequent data submissions.
- Identifiers are case-sensitive.
Primary Diagnosis, Treatment and Follow Up Identifiers:
These identifier fields allow for linking across the different clinical events and should be coded using the following rules:
- Each primary diagnosis should be assigned a unique
submitter_primary_diagnosis_id, so in the case where a donor has multiple primary diagnoses, each primary diagnosis should have a different
submitter_primary_diagnosis_id. You will be required to submit the
Specimenfile - this provides information about which primary diagnosis the specimen is linked to. The
submitter_primary_diagnosis_idis also required in the
Treatmentfile, so it is understood which primary diagnosis the treatment is being administered for.
- Each treatment regimen in the
Treatmentfile should be assigned a unique
submitter_treatment_id. If the treatment regimen consists of chemotherapy, hormone therapy or radiation therapy, then you will use the same
submitter_treatment_idin the appropriate
Hormone Therapyfiles. For example, a treatment regimen consisting of Chemotherapy and Radiation therapy is assigned
Treatmentfile. You would then submit the relevant clinical treatment information in the
Radiationfiles using the same
cr01) in those files. This allows the information in the two files to be linked together so it is understood that the two therapies were combined.
- Each follow up should be assigned a unique
submitter_follow_up_id. Optionally, if a follow up is linked to a specific treatment, you may include the
submitter_treatment_idfor that follow up.
To prevent potential identification of donors, actual calendar dates are not permitted. The timing of different clinical events are collected in days counted from the date of primary diagnosis. The date of primary diagnosis should be based on the earliest diagnosis of cancer. Validation checks are in place to ensure the values submitted for the different time interval fields make sense according to the following assumptions:
age_at_primary_diagnosisis used as the reference time point.
- The day the patient dies is the clinical endpoint (
Examples of time interval validation checks:
- If a patient's
Deceased, all time intervals must be less than or equal to
relapse_intervalmust be less than the
interval_of_followupin the follow up entry that the relapse was recorded.
- If a follow up is associated with a particular treatment (via the
interval_of_followupmust be greater than the
Donors Older than 90 years old
Since the occurrence of individuals over the age of 90 is rare, it is therefore considered a potentially identifiable value. Thus, the allowed value for the
age_at_diagnosis field is capped at 90.
Submitting Missing Values for Extended Clinical Fields
If reporting missing values is required for extended fields, data submitters must use the appropriate term, as defined below:
|Unknown||A value that would be meaningful for analysis if observed, but is not available.||The ER status is relevant for breast cancer, but the value cannot be found in the patient's medical record. (|
|Not applicable||The determination of the value is not relevant in the current context. (Reference NCIt C48660)||Clinical data regarding tobacco smoking status is not relevant for pediatric cancers. (ie. |
|Cannot be assessed/determined||Aspects of the context prevent the evaluation needed to determine a value. (Reference NCIt C48657)||Lymph nodes cannot be assessed for metastases because the lymph nodes were previously removed, or surgery is not possible because the patient is too frail. (|
Cross Field Validations
View Script buttons in the notes column. Examples include:
- Criteria for staging fields are dependent on the selected
tumour_gradeis checked against selected
- Valid values for
specimen_typeare cross-checked with the
- The requirement for fields related to relapse/recurrence are dependent on the
- The requirement for
survival_timeis dependent on the
Cross File Validations
Relationships between different clinical fields across files are validated to ensure data integrity and correctness. This requires checking the existence and relationships of different identifiers in different files, and checking the value of a field in another file to validate the current field or enforce supplemental file requirements. Examples include:
submitter_sample_idmust belong to only one
submitter_specimen_idmust belong to only one
submitter_specimen_idsubmitted in any of the clinical submission files must have been submitted in the
Specimenfile must belong to a registered
Follow Upfile must have been submitted using the
Follow Upfile must have been submitted using the
- The value of a specimen's
tumour_normal_designationfield in the
Sample Registrationfile is checked to determine whether fields in the
Specimenfile are required.
survival_timeis submitted in the
Donorfile, all time interval fields are validated to ensure they are less than or equal to the
- Depending on the
treatment_typeselected in the Treatment file, additional treatment details may be required to be submitted. For example, if
Chemotherapy, the supplemental
Chemotherapytreatment file is required.
Clinical Data Completion
Once all core clinical fields and files have been submitted for a donor, the donor is considered "clinically complete".
A donor must be clinically complete before any of their molecular analysis files are released to the program members for download.
How is clinical data completion calculated?
Complete clinical data means that a donor has a valid value submitted for all fields labelled "core" in the data dictionary, for a minimum set of clinical files. In more detail:
- A donor must have a donor file submitted with all core fields provided.
- A donor must have at least one primary diagnosis with all core fields provided.
- A donor must have at least one tumour and one normal specimen submitted.
- For each registered specimen, a donor must have all specimen core fields provided.
- A donor must have at least one treatment and a corresponding treatment detail file (if applicable, e.g. for chemotherapy, hormonal therapy, radiation, or immunotherapy) with all core fields provided.
- A donor must have at least one follow up with all core fields provided.