Submitting Molecular Data
Molecular data consists of raw molecular data files (e.g. sequencing reads), as well as any associated file metadata (data that describes your data).
Raw molecular data is submitted to a Regional Data Processing Centre (RDPC). RDPCs are responsible for processing your program's molecular data according to the Analysis Pipeline. If you are unsure which RDPC you should submit to, please contact the DCC.
note
Sample Registration is the first step in the data submission process. You must register samples before submitting molecular data. Please ensure that your samples are registered on the ARGO Data Platform before continuing with this step.
Data Submission Client Configuration
Molecular data is uploaded to the ARGO Data Platform using the Song and Score CLIs (Command Line Clients). Song is an open source system used to track and validate metadata about raw data submissions. Score securely manages upload and download of files to cloud repositories managed by the RDPCs. The Song and Score clients are used in conjunction to upload raw data files while maintaining file metadata and provenance.
Song-Client
Download the latest version of the song-client. Once you have unzipped the tarball, change directories into the unzipped folder:
Update the conf/application.yaml
configuration file with the correct user and data submission program values, including:
- serverURL: The Song server URL for your local RDPC metadata storage server.
- accessToken: Your personal API Token.
- studyID: The ARGO Program ID for which you are submitting data.
To do this, change directories into conf
folder and open the application.yaml
file. This is an example of how your application.yaml
configuration file should look:
Score-Client
Download the latest version of the score-client. Once you have unzipped the tarball, change directories into the unzipped folder:
Update the conf/application.properties
configuration file with the correct user and data submission program values, including:
- accessToken: Your personal API Token.
- metadata.url: The file metadata Song server URL for your local RDPC.
- storage.url: The object storage Score server URL for your local RDPC.
To do this, change directories into conf
folder and open the application.properties
file. This is an example of how your application.properties
configuration file should look:
How to Upload Molecular Data
Step 1. Prepare molecular metadata sequencing_experiment payload
Before proceeding, please read the instructions on how to prepare and validate molecular metadata payloads.
Step 2. Upload the metadata file
Once you have formatted the payload correctly, use the song-client submit
command to upload the payload.
If your payload is not formatted correctly, you will receive an error message detailing what is wrong. Please fix any errors and resubmit. If your payload is formatted correctly, you will get an analysisId
in response:
At this point, since the payload data has successfully been submitted and accepted by Song, it is now referred to as an analysis. The newly created analysis will be state UNPUBLISHED
.
Step 3. Generate a manifest file
Use the returned analysis_id
from step 2 to generate a manifest for file upload. This manifest will be used with the score-client in the next step. Using the song-client manifest
command, define
- the analysis id using
-a
parameter - the location of your input files with the
-d
parameter, - the output file path for the manifest file with the
-f
parameter
The manifest.txt
file will be written out to the directory /some/output/dir/. If the output directory does not exist, it will be automatically created.
Step 4. Upload sequencing files
Using the score-client upload
command, upload all files associated with the payload. This requires the manifest file generated in step 3.
If the file(s) successfully upload, then you will receive an Upload completed
message.
Step 5. Publish the analysis
The final step to submitting molecular data is to set the state of an analysis to PUBLISHED
. A published analysis signals to the DCC that this data is ready to be processed.
Once your sequencing_experiment
analysis has been successfully submitted and published, it will be queued for data processing. You can follow the progress of molecular data processing for submitted data on your Program Dashboard.