Introduction

This command-line utility takes advantage of the Sophia DDM API, allowing automatic upload of raw sequencing data.

CLI can be used to trigger upload of a specific run and to download output of runs. The upload can be performed using the Illumina sample sheet file to define the sample details.

Setup of this functionality needs support from SOPHiA GENETICS' IT team and familiarity with command-line utilities, please contact the support team using support@sophiagenetics.com.

This utility is a Java tool that is delivered separately and this user guide helps getting started with the tool. It has no graphical user interface and can only be used using the command line.

Requirements

The CLI tool requires either a Java Runtime Environment (the latest update of Oracle JRE 1.8 is recommended) or a Java Development Kit in order to start.

The wrapper script requires Python 3.8 or higher.

Prior to using the CLI tool, you should have logged in at least one time in the SOPHiA DDM application to change your password from the auto-generated one sent to you via mail to one of your convenience. This is a security layer implemented in SOPHiA DDM that prevent users from performing any actions using the auto-generated password.

Installation

Download the Python script sg-upload-v2-wrapper.py which will keep you up-to-date with the latest bug fixes and security upgrades of the tool.

Run the following command. If the output displays as below, you now have the latest version in the current directory.

$ python3 sg-upload-v2-wrapper.py --help

Sophia Genetics - Downloader v
Usage: Sophia Genetics upload-api client [-hvV] [COMMAND]
-h, --help Show this help message and exit.
-v, --verbose Talks a bit more
-V, --version Print version information and exit.
Commands:
login, li Login with your Sophia Genetics credentials
logout, lo Logs out the current user
login-iam
logout-iam
new, n New batch request analysis (run)
upload, up Uploads the last created batch analysis request (run)
manage, mg Manages batch analysis requests (runs)
status, s Get the status of one or many batch requests (runs)
file, f File management commands
export, e Export files from completed interpretations
tests, t Tests commands
version, cv Check version
patient Get patient information
order Manage patient orders
pipeline Pipeline management commands
sample Get sample information
userInfo User management commands

Enabling TLS 1.2 for Uploads with Java 11+

If you are using Java 11 or later versions, you may experience connection issues when executing Upload commands. This is due to Java 11+ defaulting to TLS 1.3, which does not support TLS renegotiation—a feature required by the upload process.

To resolve this, add the following option to your Upload command to explicitly enforce TLS 1.2:

-Djdk.tls.client.protocols=TLSv1.2

This will instruct Java to use TLS 1.2, ensuring successful and secure communication during the upload.

NOTE

The terms Batch Analysis Request and run are used interchangeably in this document.


Sample Sheet upload workflow

The SOPHiA DDM™ CLI enables automated run upload using a Sample Sheet CSV file generated with Illumina® sequencers. The CLI supports both the new Illumina sample sheet v2 format (with [SOPHIA_DDM_Settings] and [SOPHIA_DDM_Data] sections) and the legacy format (with [SOPHIA_DDM_Data_v1] section) for backward compatibility. The CLI Sample Sheet upload workflow supports the upload of large runs in multiple batches, organized by hybrid capture groups or other custom grouping. It also enables specification of SOPHiA DDM™ Bundle Serial Numbers and SOPHiA DDM™ Pipeline IDs for each sample.

Option 1 - all samples from same assay, to be analyzed using the same pipeline: In this case, you can exclude the Pipeline_ID column from the sample sheet.

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq/files --sampleSheet samplesheet.csv --pipeline=PIPID --upload

Option 2 - samples from different assays, to be analyzed using different pipelines. The sample sheet then needs to contain the Pipeline_ID column and filled for each sample.

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq/files --ref Run_reference --sampleSheet samplesheet.csv --upload

See example: full format example.

The CLI uses the Illumina sample sheet v2 format (recommended). For CLI usage, only the [SOPHIA_DDM_Data] section is required. The [SOPHIA_DDM_Settings] section is optional and defaults to version 1 if omitted. The Illumina header sections ([Header], [BCLConvert_Data]) are optional as well and only needed if you're using the sample sheet with Illumina BCL converter tools.

1. Minimal format example

Download: sampleSheet_minimal.csv

[SOPHIA_DDM_Data]
Sample_ID,Capture_ID,Pipeline_ID
SG10000008,1,1234
SG10000009,1,1234
library01,2,1235
library02,2,1235

2. Full format example

Full format example (with optional Illumina sections for BCL converter compatibility):

Download: sampleSheet_full.csv

[Header],,,,
FileFormatVersion,2,,,
RunName,MyRun,,,
InstrumentPlatform,NextSeq1k2k,,,
InstrumentType,NextSeq2000,,,
,,,,
[Reads],,,,
Read1Cycles,125,,,
Read2Cycles,125,,,
Index1Cycles,8,,,
Index2Cycles,8,,,
,,,,
[BCLConvert_Settings],,,,
SoftwareVersion,x.y.z,,,
,,,,
[BCLConvert_Data],,,,
Lane,Sample_ID,index,index2,,
1,S01-TOO-12plex-P1-rep1,ATCCACTG,AGGTGCGT,,
1,S02-TOO-12plex-P1-rep2,GCTTGTCA,GAACATAC,,
,,,,
[SOPHIA_DDM_Settings],,,,
version,1,,,
,,,,
[SOPHIA_DDM_Data],,,,
Sample_ID,Capture_ID,Upload_Batch,Bundle_SN,Pipeline_ID,Patient_Ref,SIS_No,Order_ID,HPO_ID,Virtual_Panel_ID,Gene_Filter_ID,Patient_Lock,Disease_ID,Patient_First_Name,Patient_Last_Name,Patient_DOB,Patient_Gender,Test_Date,Date_Collected,Sample_Type_ID,Library_Type
SG10000008,1,1,BDS-1111111111-10,5,SDSD12,sis-744587603-44,Order1,1-2-3,123,32323,0,12-23-22,John,Doe,2000-02-03,M,2025-09-03,2025-09-01,308000,DNA
SG10000009,1,1,BDS-1111111112-11,6,DFDF111,sis-744587603-44,Order2,2-3-5,345,12344,1,334-44,John,Doe,2000-02-03,M,2025-09-03,2025-09-01,408000,RNA
library01,1,2,BDS-1111111113-12,6,DSFF111,sis-744587603-44,Order3,4-6-9,445,2444,1,34-54,Jane,Doe,2000-02-03,F,2025-08-03,2025-08-01,608000,-
library02,1,2,BDS-1111111114-13,6,DFDS11,-,Order4,1-8-7,2323,55666,0,34-65-64,Jane,Doe,2000-02-03,F,2025-08-03,2025-08-01,808000,-

Format Structure:

  • [SOPHIA_DDM_Settings]: Optional section supporting the following fields:
  • version: Sample sheet version. Defaults to 1 if omitted.
  • filetype: File type for the upload. Set to VCF for VCF-based uploads. If omitted, FASTQ files are assumed.
  • [SOPHIA_DDM_Data]: Main data section containing sample information with all required and optional columns. Required.
  • [Header] and [BCLConvert_Data]: Optional Illumina sections. Only needed if using the sample sheet with Illumina BCL converter tools. Not required for CLI usage.

Note: The CLI also supports the legacy v1 format (with [SOPHIA_DDM_Data_v1] section) for backward compatibility. For information about the legacy format and how to migrate from v1 to v2, see the Migration Guides section.

Sample Sheet Columns

Sample_ID: the id of the sample - needs to match the first part of the fastq file names (e.g. SAMPLEID in SAMPLEID_S01_L001_R1_001.fastq.gz)

Capture_ID: sample capture group id, identical id means the samples have been captured together in the hybridization capture workflow. By default, run splitting will be based on Capture_ID groups.

Upload_Batch: if the lab prefers multiple captures to be included in uploads (e.g. 2 captures per upload batch) a separate column can be used to define upload batch id - samples with same id will be uploaded together as one batch / run

Bundle_SN: for SOPHiA Bundle Solutions only - the Serial Number from the box containing reagents, replaces current separate mapping file

SIS_No: the SIS or DIS number corresponding to the purchase order if sample has been processed as part of Integrated or Dispatch service. Use - (hyphen) if the sample is not part of SIS.

Pipeline_ID: the ID of the pipeline to be launched for the sample - specifying different ids allows mixing multiple panels in the same run / upload batch. (Can be retrieved using the pipeline -l command.)

Patient_Ref: the patient reference to be associated with the sample - defaults to Sample_ID (like when uploading via SOPHiA DDM UI)

Disease_ID: the disease ids that needs to be added to the patient. (Multiple disease id's seperated by "-" hyphen)

Order_ID: the Order id needed to be added for the sample

HPO_ID: the hpo ids need to be added for the order. (Multiple hpo id's seperated by "-" hyphen)

Virtual_Panel_ID: the Virtual panel Id that asscoiated for the Order.

Gene_Filter_ID: the Gene filter Id for the Order

Patient_Lock: sets the patient lock for the order. ("1" is used to set the lock. "0" for not to set lock)

Gene_List: a semicolon-separated list of gene names used to dynamically create a virtual gene panel for the order (e.g., BRCA1;TP53;MAP2K1). Only letters, digits, hyphens, and semicolons are accepted. Gene panel resolution is performed before run creation — if the gene list cannot be resolved (e.g. unknown gene names), the upload fails immediately and no orphaned run or samples are left in the system. The error message will indicate which sample's gene list could not be resolved.

Patient_First_Name: The patient's first name, formatted as a string.

Patient_Last_Name: The patient's last name, formatted as a string.

Patient_DOB: The patient's date of birth, formatted as a string in the format yyyy-mm-dd (e.g., 1990-01-01).

Patient_Gender: The patient's gender, formatted as a string, accepted values are ("M" for male, "F" for female, or "U" unknown)

Test_Date: The date the test was performed, formatted as a string in the format yyyy-mm-dd (e.g., 2025-06-13).

Date_Collected: The date the sample was collected, formatted as a string in the format yyyy-mm-dd (e.g., 2025-06-13).

Sample_Type_ID: The sample type id of the sample. Below is listed the mapping to be used

| Sample Type ID | Sample Type Name               |
|----------------|--------------------------------|
| 108000         | PERIPHERAL_BLOOD               |
| 208000         | FRESH_TUMOR                    |
| 308000         | FFPE                           |
| 408000         | BIOPSY                         |
| 508000         | CELL_LINE                      |
| 1308000        | CFDNA                          |
| 608000         | CTDNA                          |
| 708000         | BUCCAL_SWAB                    |
| 808000         | NASOPHARYNGEAL_SWAB            |
| 908000         | SPUTUM                         |
| 1008000        | BRONCHOALVEOLAR_LAVAGE         |
| 1108000        | SALIVA                         |
| 1208000        | BONE_MARROW                    |
| 8000           | OTHER                          |

Library_Type : The library type of the sample, used to specify only which one to include. To group the analysis, either add a Library_Type column or use the specified suffixes in the Sample_ID column. For none, use "-" as the column value. Allowed values are described below.

| Library_Type     | Sample ID Suffix         |
|------------------|--------------------------|
| DNA              | -D                       |
| DNA_WGS          | -DW                      |
| WGS              | -W                       |
| RNA              | -R                       |
| LIB1             | -lib1                    |
| LIB2             | -lib2                    |
| NORMAL           | -N                       |
| TUMOR            | -T                       |
| NONE             |                          |

Library type grouped DNA/RNA using column:

[SOPHIA_DDM_Data],,,,
Sample_ID,Library_Type,,
sample1,DNA,,
sample1,RNA,,
sample2,DNA,,
sample3,RNA,,

Library type grouped DNA/RNA using suffix

[SOPHIA_DDM_Data],,,,
Sample_ID,,,
sample1-D,,,
sample1-R,,,
sample2-D,,,
sample3-R,,,

Library type grouped Tumor/Normal using suffix:

[SOPHIA_DDM_Data],,,,
Sample_ID,,,
sample1-T,,,
sample1-N,,,
sample2-T,,,
sample3-T,,,

Only Sample_ID column is mandatory in the sheet; others are optional.

VCF Sample Sheet

For VCF-based uploads, set filetype,VCF in the [SOPHIA_DDM_Settings] section. The VCF sample sheet uses a dedicated column set instead of the FASTQ columns above.

VCF format example (tumor-normal pair):

[SOPHIA_DDM_Settings]
version,2
filetype,VCF
[SOPHIA_DDM_Data]
Sample_ID,Patient_Ref,Pipeline_ID,Library_Type,Group_ID,File_Name,Order_ID,Order_Date,Icd10_Info,Tumor_Site
ND-PATIENT-001,PT-2026-001,8579,NORMAL,GROUP_001,short_variants.vcf,ORD-001,2024-06-15,C34.10 Lung malignancy,lung
TD-PATIENT-001,PT-2026-001,8579,TUMOR,GROUP_001,short_variants.vcf,ORD-001,2024-06-15,C34.10 Lung malignancy,lung

VCF-specific columns:

File_Name: the VCF file name to associate with the sample row. VCF files must contain the standard header columns (#CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO).

Order_Date: the date the order was placed, in yyyy-MM-dd format. Requires Order_ID to be present.

Icd10_Info: ICD-10 diagnosis code and description (e.g., C34.10 Lung malignancy). Requires Order_ID to be present.

Tumor_Site: the anatomical site of the tumor (e.g., lung).

Group_ID: groups related sample rows (e.g., a tumor-normal pair) into a single analysis. When Group_ID is used, Patient_Ref is required for patient identification.

Tumor-normal pairs and Order_ID deduplication:

In a tumor-normal VCF analysis, the NORMAL and TUMOR rows typically share the same Order_ID. The CLI automatically deduplicates orders: when multiple rows in the same batch share the same Order_ID, the order is created only once. A console warning is displayed for each duplicate:

Order ID 'ORD-001' appears on multiple rows (e.g. tumor-normal pair) — order will be created only once.
  • By default, all the samples are considered to be a single run if no Capture_ID or Upload_Batch is present.
  • If Upload_Batch is present then it will be considered rather than Capture_ID to split the samples into specific runs.
  • Bundle_SN can be passed here or by adding the "--bdsNumber" or "--bdsMappingFile"
  • By default, underscores (_) in the sample ID are converted to hyphens (-). To preserve underscores, pass the "--includeUnderscores" flag when creating the run.

Note : Sample Numbers within a Batch should be unique.

Upload size limits

There are technical limitations on the total run size that can be processed by SOPHiA DDM in one upload batch. Please note that the application specific limitations are described in the corresponding product's Instruction for Use (IFU) document. Additionally, the upload size limits described below apply to all CLI uploads:

> Enhanced Exome Solutions: maximum 512 GiB per upload batch
> All other applications: maximum 420 GiB per upload batch

Attempting to upload a batch that exceeds above limits will results in an error. To ensure the upload batches remain within accepted size limits the Sample Sheet upload workflow can be used with the Capture_ID or Upload_batch column to ensure the upload is performed in multiple batches.

Commands

The script is run by calling sg-upload-v2-wrapper.py followed by one of the commands below and its options. Each command has a --help option (or -h) which will display more information.

login-iam

(recommended)

Recommended authentication mechanism to login to Sophia Genetics Platform.

This command is used to login with the IAM/SSO. You will be redirected to the browser to login and then after successful login you can close the tab.

After logging in, you will not need to login again as long as you have used the CLI at least once within 90 days.

The IAM/SSO is a new flow where you can authenticate yourself in a secure way. This was introduced to increase the security standards of the application.

Users can now login to multiple accounts using the option --client-id and can create run/upload specifying the client ID. If client ID is not provided then main account client ID will be considered.

Typical usage

$ python3 sg-upload-v2-wrapper.py login-iam

Login successful

$ python3 sg-upload-v2-wrapper.py login-iam --client-id 12345

Login successful
You're logged to IAM with client id: 12345

$ python3 sg-upload-v2-wrapper.py login-iam --client-id 67890

Login successful
You're logged to IAM with client id: 67890

Options

Option Alias Mandatory Description Example
--force -f No This flag enforces user to re-login with IAM/SSO even you have already logged in login-iam --force
--headless No Use this when running on a system with no GUI access login-iam --headless
--client-id No If specified, will login to the account related to this client id.

If not specified, will login to the main account

Also can login to multi accounts using client id and create run/upload to a particular account.
login-iam --client-id 123456

logout-iam

This command is used to logout the current loggedIn IAM/SSO user and Any subsequent commands (other than login/login-iam) will fail.

Typical usage

$ python3 sg-upload-v2-wrapper.py logout-iam

You have logged out

login

(not recommended, will be deprecated, use login-iam)

Note: We are moving customers from grid card based authentication to a new authentication method. It is recommended to use the login-iam command instead of the login command. Please refer to the section for more information on how to use it command login-iam. The login command will be phased out in the upcoming months.

Only one user can be logged in at a time. If another user logs in, any previous user will automatically be logged out.

Typical usage

$ python3 sg-upload-v2-wrapper.py login -u myemail@mycompany.org -p

Enter value for --password (Password): <enter your password>
Provide token for coordinate [8, D]: <enter your grid token>
Log in success

Options

Option Alias Mandatory Description Example
--user -u Yes Your Sophia Genetics username login --user myemail@my-company.org
--password -p * Command line will interactively ask for your password login --user myemail@my-company.org -p
--password:env -pe * Password is extracted from an environment variable login --user myemail@my-company.org -pe ENV_PASSWORD_VAR
--password:file -pf * Password is read from a UTF-8 encoded text file login --user myemail@my-company.org -pf /path_to/password_file
--help -h Explain previous options login -h

*one of -p, -pe or -pf is mandatory

logout

The currently logged in user will be disconnected, and subsequent commands (other than login) will fail.

Typical usage

$ python3 sg-upload-v2-wrapper.py logout

User logged out

adegen ( DEPRECATED: See sample sheet workflow )

Generate ADE formatted JSON from FastQ files. This command provides a convenient way to create ADE format JSON files that can be used with the new command.

Typical usage

$ python3 sg-upload-v2-wrapper.py adegen --folder /path/to/fastq/files --ref MyRun123 --output output.json

Available pipelines:
1234: BRCA Analysis (MISEQ)
5678: Hereditary Cancer Solution (NEXTSEQ)
9012: Solid Tumor Solution (NOVASEQ)
Enter pipeline ID: 1234

Created JSON file: output.json

Options

Option Alias Mandatory Description Example
--folder -f Yes Path to folder containing FastQ files adegen --folder /path/to/fastq/files
--output -o No Output JSON file. If not specified, JSON will be written to stdout adegen --folder /path/to/files --output run.json
--pipeline -p No Pipeline ID (defaults to -1, which triggers interactive pipeline selection) adegen --folder /path/to/files --pipeline 12345
--sampletype -s No Sample Type ID (defaults to 108000) adegen --folder /path/to/files --sampletype 108001
--ref -r No Run name adegen --folder /path/to/files --ref MyRun123
--deep -d No Recurse through the folder when searching for FastQ files adegen --folder /path/to/files --deep
--bdsNumber No Serial Number for all SOPHiA GENETICS bundle solutions. When provided, this number will be applied to all samples in the run. adegen --folder /path/to/files --bdsNumber BDS-123456
--bdsMappingFile No Path to a mapping file containing patient reference to Serial Number mappings adegen --folder /path/to/files --bdsMappingFile /path/to/mapping.csv
--help -h No Explain previous options adegen -h

BDS Number Format

When using either --bdsNumber or --bdsMappingFile options: - BDS numbers must start with "BDS-" prefix - Must be followed by valid serial numbers - Cannot use both --bdsNumber and --bdsMappingFile together

BDS Mapping File Format

When using --bdsMappingFile, the file should contain one mapping per line containing the patient reference and the BDS number:

patient1,BDS-123456
patient2,BDS-789012

Pipeline Selection

When not specifying a pipeline ID: 1. The system displays available pipelines with IDs and sequencer codes 2. You will be prompted to select a pipeline by entering its ID 3. Selection is validated against available pipelines

Errors

If there are any issues with the input, an error message will explain what went wrong.


Example 1 - Invalid pipeline ID

Error: Invalid pipeline ID

Example 3 - Invalid BDS mapping file format

Error: Invalid format in mapping file. Each line should contain: patient_ref,BDS-number

Important Notes 1. This command requires new Platform Services to be activated 2. The generated JSON file can be used as input for the new command 3. When no output file is specified, the JSON is printed to stdout

new

Create a new batch analysis request. This can be done in a few different ways, but the recommended workflow is using a sample sheet .csv file. See Sample Sheet

If you would prefer to use a JSON file, there are two formats - legacy and the more recent ADE. Python scripts are provided to generate both of these formats.

Typical usage

Using JSON file:

$ python3 sg-upload-v2-wrapper.py new --json /path_to/file.json

Run successfully created with id 200002747

Using folder with FastQ files:

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq/files --ref MyRun123

Available pipelines:
1234: BRCA Analysis (MISEQ)
5678: Hereditary Cancer Solution (NEXTSEQ)
9012: Solid Tumor Solution (NOVASEQ)
Enter pipeline ID: 1234

Run successfully created with id 200002748

Using client-id flag with upload:

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq --client-id 12345 --pipeline 1234 --upload --ref MyRun123

Run successfully created with id 200002749
Starting upload after analysis creation...
Upload ended in 123456ms

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq --client-id 67890 --pipeline 1234 --upload --ref MyRun456

Run successfully created with id 200002750
Starting upload after analysis creation...
Upload ended in 123456ms

Using BaseSpace project ID:

$ python3 sg-upload-v2-wrapper.py new --basespace-project 12345678 --pipeline 1234 --ref MyRun123

BaseSpace project integration:
  Project ID: 12345678
  Virtual folder: /tmp/basespace_project_12345678_xyz
  Note: Files will be processed via BaseSpace URLs during analysis
  FASTQ files discovered: 8

Available pipelines:
1234: BRCA Analysis (MISEQ)
5678: Hereditary Cancer Solution (NEXTSEQ)
Enter pipeline ID: 1234

Run successfully created with id 200002751

Using BaseSpace project ID with sample sheet:

$ python3 sg-upload-v2-wrapper.py new --basespace-project 12345678 --sampleSheet samplesheet.csv --ref MyRun123

BaseSpace project integration:
  Project ID: 12345678
  Virtual folder: /tmp/basespace_project_12345678_xyz
  Note: Files will be processed via BaseSpace URLs during analysis
  FASTQ files discovered: 8

Run successfully created with id 200002752

Options

Option Alias Mandatory Description Example
--json -j * Path to your JSON file describing the new run new --json /path_to/file.json
--folder -f * Path to folder containing FastQ files new --folder /path/to/fastq/files
--deep -d No Recurse through the folder when searching for FastQ files new --folder /path/to/files --deep
--ref -r No Run name (required when using --folder) new --folder /path/to/files --ref MyRun123
--pipeline -p No Pipeline ID (defaults to -1, which triggers interactive pipeline selection) new --folder /path/to/files --pipeline 12345
--sampletype -s No Sample Type ID (defaults to 108000) new --folder /path/to/files --sampletype 108001
--legacy No Use this option if you provide the json with the legacy format new --json /path_to/file.json --legacy
--client-id ** If specified, will create the RUN on your account related to this client id.

Important notes: 1. If you create RUNs for two clients located on different data-centers, the RUN id can be the same for both. In that case the newest one overrides the oldest one. To avoid that, it is recommended to run the "upload" command between each creation. 2. This flag is only relevant for Core Services (not working with new platform services). 3. When specifying client-id either --upload flag or --samplesheet has to be supplied
new --json /path_to/file.json --client-id 123456
--force-platform -fp No If specified, will create the RUN with new Platform Services. new --json /path_to/file.json -fp
--bdsNumber No Serial Number for all SOPHiA GENETICS bundle solutions. When provided, this number will be applied to all samples in the run. Works only with One command rules flow. new --folder /path/to/fastq/files --pipeline 12345 --bdsNumber BDS-123456
--bdsMappingFile No Path to a mapping file containing patient reference to Serial Number mappings. The file should be in CSV format with each line containing: patient_ref,BDS-number. Works only with One command rules flow. new --folder /path/to/fastq/files --pipeline 12345 --bdsMappingFile /path/to/mapping.csv
--basespace-project No BaseSpace project ID for file discovery and upload. When specified, files are processed via BaseSpace URLs (no upload needed). Can be used with --pipeline or --sampleSheet. Requires BaseSpace authentication (see basespace auth login). new --basespace-project 12345678 --pipeline 1234 --ref MyRun123
--upload ** When set, automatically starts the upload process after successfully creating the analysis. Note: Upload is automatically skipped when using --basespace-project since files are handled via BaseSpace URLs. new --json /path_to/file.json --upload
--help -h No Explain previous options new -h

*one of --json or --folder is mandatory

**when specifying client-id one of --upload or --samplesheet is mandatory

BDS Number Format

When using either --bdsNumber or --bdsMappingFile options: - BDS numbers must start with "BDS-" prefix - Must be followed by valid serial numbers - Cannot use both --bdsNumber and --bdsMappingFile together

BDS Mapping File Format

When using --bdsMappingFile, the file should contain one mapping per line:

patient1,BDS-123456
patient2,BDS-789012

BaseSpace Project Integration

When using --basespace-project: - The command creates a virtual folder that represents the BaseSpace project - Files are discovered from the BaseSpace project and processed via BaseSpace URLs during workflow execution - Upload is automatically skipped - BaseSpace files are handled via URLs, so no file upload is needed - Requires BaseSpace authentication (run basespace auth login first) - Can be used with either --pipeline (for pipeline-based import) or --sampleSheet (for sample sheet-based import) - The BaseSpace project ID can be found using basespace project list

Note

Sample Numbers within a run should be Unique.

Errors

If the JSON file is incorrect, an error message will explain what went wrong (will not be in red in the command line output).


Example 1 - "germatic" does not exist; it is either germline or somatic

Bad request
The experimentType germatic passed in parameters does not exist

Example 2 - one of the files does not exist

Unprocessable request
The file /path_to/SG10000001_S1_L001_R1_001.fastq.gz passed in parameters does not exist

upload

Upload the files and execute the batch analysis requests created with the new command. If an upload was in progress, it will be resumed. If several batch analysis requests have been created, they will be uploaded sequentially.

Typical usage

$ python3 sg-upload-v2-wrapper.py upload

Getting upload run information
Found 3 runs to upload.
Uploading run n°
Upload ended in 123456ms
Uploading run n°
Upload ended in 789012ms
Uploading run n°
Upload ended in 3456789ms

Options

Option Alias Mandatory Description Example
--id -i Uploads or resumes a given batch analysis requests upload --id 200003035
--dry-run -d Verifies that everything is okay, but doesn't start the upload process upload --dry-run
--port -p The upload process opens a socket (default: 40530) on your computer, to ensure that the upload process is only running one at a time.
If this socket is already used, you can specify another one.
To be able to run multiple uploads in parallel on the same computer, use the --port 0 option
upload --port 56789
upload --port 0 to launch uploads in parallel on the same computer
--help -h Explain previous options login -h
--show-progress -sp Show upload rate while uploading upload -i 200003035 --show-progress

WARNING

Do not upload any files containing nominative information or any other direct identifier related to a patient (e.g. patient's full name).


manage

This command returns the status of a batch analysis request, and can remove or reset all local files that could have been created by mistake. It never modifies data on the server side.

Typical usage

$ python3 sg-upload-v2-wrapper.py manage

Found 3 upload ready to be sent :
- Run with id '200003035' for client id '3' has 1 analysis
- Run with id '200003036' for client id '3' has 1 analysis
- Run with id '200003037' for client id '3' has 4 analysis
You can upload a specific run by using the 'upload' command with the '--id' option

Options

Option Alias Mandatory Description Example
--status -s Returns current status information of the current runs, if they have not been uploaded yet manage --status
--delete -d Deletes local files of a given batch analysis request id (run). manage --delete 123456789
--reset -r Deletes local files of all batch analysis requests (runs). manage --reset
--help -h Explain previous options login -h

status

Get the status of one or more batch analysis requests.

Typical usage

$ python3 sg-upload-v2-wrapper.py status --limit 3

200002934: Waiting for upload
200002933: Waiting for upload
200002932: Waiting for upload

Options

Option Alias Mandatory Description Example
--id -i * Get the status of one batch analysis request, given a specified identifier.
The second line of the terminal output will be the status. For example "Waiting for upload".
status --id 200002934
--limit -l * Get the status of many batch analysis requests, given a specified limit.

Each batch analysis request status will be prefixed with the entity identifier and a colon character. The displayed records will be ordered from the most recently created batch analysis request to the oldest one last.
status --limit 3
--run-ref -run-ref * Get the status of the batch analysis with the specified run reference and sample Id.
This will return the latest run that matches this condition.
status --run-ref run1 --sample-id sample1
--sample-id -sample-id * Get the status of the batch analysis with the specified run reference and sample Id.
This will return the latest run that matches this condition.
status --run-ref run1 --sample-id sample1
--help -h Explain previous options login -h
--pipeline-info -pipeline Lists the pipeline version for each sample in a run. To be used in combination with sample-id and run-ref options. status --run-ref run1 --sample-id sample1 --pipeline-info

*one of --id or --limit or(--run-ref and --sample-id) is mandatory


Status Responses

  • Waiting for upload
  • Upload in progress
  • Pipeline running
  • Finished
  • Status code unknown (####)
  • Error

patient

Create new patients, list existing ones, or manage patient diseases.

Typical usage

List patients

$ python3 sg-upload-v2-wrapper.py patient --list patient1,patient

[
  {
    "medicalInformationId": 111111111,
    "personalInformationId": 1111123222,
    "userRef": "patient1"
  },
  {
    "medicalInformationId": 2222222222,
    "personalInformationId": 22222444444,
    "userRef": "patient2"
  }
]

Add diseases to a patient

$ python3 sg-upload-v2-wrapper.py patient --add-diseases --patient-ref PATIENT_REF --diseases 1,2,3

Diseases added successfully for patient PATIENT_REF

Options

Option Alias Mandatory Description Example
--list -l * Retrieve technical IDs of the specified patients.

When at least one patient does not exist in the system, an error message is displayed with the list of not found patients:

Unable to find following patients in SGP: notFound1,notFound2
--create -c * Create patients specified in the command line.
It creates only non existing patients. It then prints the patients as with the --list option.
patient --create p3,p4,p5
--add-diseases * Add diseases to a specified patient using comma-separated disease IDs patient --add-diseases --patient-ref PATIENT_REF --diseases 1,2,3
--diseases ** Comma-separated list of disease IDs to add to the patient patient --add-diseases --patient-ref PATIENT_REF --diseases 1,2,3
--force-platform -fp If specified, will list patients using new Platform Services patient --list -fp patient1,patient2
--help -h Explain previous options login -h

one of --list, --create, or --add-diseases is mandatory *required when using --add-diseases

order

Manage orders for patients with disease support. The system supports two modes:

  • GEN1: Diseases are managed via patient command (default behavior)
  • GEN2: Diseases are added to order table via pmi-svc

Note: This command is only available with Platform Services.

Typical usage

GEN1 Orders (Default)

Basic order

$ python3 sg-upload-v2-wrapper.py order --add --patient-ref patient1 --order-id ORDER123

Order 'ORDER123' added successfully for patient patient1

GEN1 order with phenotypes

$ python3 sg-upload-v2-wrapper.py order --add \
  --patient-ref PATIENT123 \
  --order-id ORDER123 \
  --virtual-panel VP789 \
  --phenotypes PHENO101,PHENO102,PHENO103 \
  --filter FILTER303

Order 'ORDER123' added successfully for patient PATIENT123

Explicitly specify GEN1 (optional)

$ python3 sg-upload-v2-wrapper.py order --add \
  --patient-ref PATIENT123 \
  --order-id ORDER123 \
  --order-type GEN1

Order 'ORDER123' added successfully for patient PATIENT123

GEN2 Orders with Diseases

GEN2 order with diseases (order-type must be explicitly set to GEN2)

$ python3 sg-upload-v2-wrapper.py order --add \
  --patient-ref PATIENT123 \
  --order-id ORDER123 \
  --order-type GEN2 \
  --disease-ids "100,200,300"

Order 'ORDER123' added successfully for patient PATIENT123

GEN2 order with all parameters

$ python3 sg-upload-v2-wrapper.py order --add \
  --patient-ref PATIENT123 \
  --order-id ORDER123 \
  --order-type GEN2 \
  --disease-ids "100,200" \
  --phenotypes "PHENO101" \
  --virtual-panel VP789 \
  --filter FILTER303 \
  --patient-lock

Order 'ORDER123' added successfully for patient PATIENT123

List orders for a patient

JSON format

$ python3 sg-upload-v2-wrapper.py order --list --patient-ref patient1

[ {
  "id" : 22,
  "patientId" : 627721022,
  "orderId" : "ORDER-1",
  "diseaseIds" : [],
  "orderType" : "GEN1",
  "createdAt" : "2025-03-10T13:58:12Z",
  "updatedAt" : "2025-03-10T13:58:12Z"
}, {
  "id" : 32,
  "patientId" : 627721022,
  "orderId" : "ORDER-2",
  "diseaseIds" : ["100", "200"],
  "orderType" : "GEN2",
  "createdAt" : "2025-03-10T14:02:07Z",
  "updatedAt" : "2025-03-10T14:02:07Z"
} ]

Flat table format

$ python3 sg-upload-v2-wrapper.py order --list --patient-ref patient1 --flat

Orders for patient patient1:
ID         Order ID             Created At                    Updated At                    Virtual Panel        Phenotypes                    Filter                    Lock       Diseases                        Type
---------- -------------------- ------------------------------ ------------------------------ -------------------- ------------------------------ ------------------------------ ---------- ------------------------------ ----------
22         ORDER-1             2025-03-10T13:58:12Z          2025-03-10T13:58:12Z          VP789               PHENO101,PHENO102,PHENO103   FILTER303               No         []                              GEN1
32         ORDER-2             2025-03-10T14:02:07Z          2025-03-10T14:02:07Z          VP789               []                             []                      No         [100, 200]                      GEN2

Note: The Diseases column shows comma-separated disease IDs when present, or empty brackets [] when no diseases are associated with the order.

Options

Option Alias Mandatory Description Example
--add -a * Add a new order for a specified patient order --add --patient-ref patient1 --order-id ORDER123
--list -l * List all orders for a specified patient order --list --patient-ref patient1
--patient-ref -p Yes The patient reference to add an order for or list orders from order --add --patient-ref patient1 --order-id ORDER123
--order-id -o ** The order ID to add (required when using --add) order --add --patient-ref patient1 --order-id ORDER123
--virtual-panel -vp No Virtual panel ID for the order order --add --patient-ref patient1 --order-id ORDER123 --virtual-panel VP789
--phenotypes -ph No Comma-separated list of phenotype IDs order --add --patient-ref patient1 --order-id ORDER123 --phenotypes PHENO101,PHENO102
--filter -f No Cascade filter ID for the order order --add --patient-ref patient1 --order-id ORDER123 --filter FILTER303
--patient-lock No Lock the patient to prevent modifications order --add --patient-ref patient1 --order-id ORDER123 --patient-lock
--disease-ids No Comma-separated list of disease IDs (only allowed with GEN2) order --add --patient-ref patient1 --order-id ORDER123 --order-type GEN2 --disease-ids "100,200,300"
--order-type No Order type (GEN1 or GEN2). Disease IDs only allowed with GEN2 order --add --patient-ref patient1 --order-id ORDER123 --order-type GEN2
--flat No Display orders in flat format instead of JSON (for list command) order --list --patient-ref patient1 --flat
--force-platform -fp No If specified, will manage orders using Platform Services order --list --patient-ref patient1 -fp
--help -h Explain previous options order -h

one of --add or --list is mandatory *required when using --add

Validation Rules

  • Order Type: Defaults to GEN1 if not specified
  • Disease IDs with GEN1: Disease IDs can only be provided when order type is explicitly set to GEN2
  • Missing Order ID: Throws exception if order ID is not provided when adding orders

pipeline

Get all pipelines available to currently logged in user.

Typical usage

$ python3 sg-upload-v2-wrapper.py pipeline --list

[
  {
    "pipeline_id": 123,
    "pipeline_name": "Pipeline 123",
    "analysis_type": "BRCA",
    "analysis_type_id": 30078000,
    "kit": "Multiplicom_MASTR_assay",
    "sequencer_id": 123456
    "sequencer": "ILLUMINA_MiSeq",
    "experiment_type": "germline",
    "pairend": true
  },
  {
    "pipeline_id": 456,
    "pipeline_name": "Pipeline 456",
    "analysis_type": "HCS_v1_1",
    "analysis_type_id": 6003000,
    "kit": "IDT",
    "sequencer_id": 123456
    "sequencer": "ILLUMINA_MiSeq",
    "experiment_type": "germline",
    "pairend": true
  }
]

Options

Option Alias Mandatory Description Example
--list -l Yes Retrieve the list of allowed pipelines to the logged in user pipeline --list
--force-platform, -fp If specified, will list pipelines using new Platform Services. pipeline --list -fp
--file-out -o No The path where the file will be downloaded. pipeline --list --file-out /test/example/pipelineOutput.txt
--help -h Explain previous options login -h

sample

List all samples of a run or a specific sample.

Typical usage

View by Run ID

$ python3 sg-upload-v2-wrapper.py sample --run-id 1234567890

[
  {
    "id": 1111111111,
    "sgacltId": 10,
    "userRef": "dnoble",
    "analysisType": "BRCA_Tumor",
    "kit": "Multiplicom_MASTR_Plus",
    "sampleType": "Other",
    "sequencer": "ILLUMINA_MiSeq",
    "status": "Creation",
    "experimentType": "somatic",
    "isPairend": true,
    "isControl": false
  },
  {
    "id": 2222222222,
    "sgacltId": 10,
    "userRef": "dnoble2",
    "analysisType": "BRCA_Tumor",
    "kit": "Multiplicom_MASTR_Plus",
    "sampleType": "Other",
    "sequencer": "ILLUMINA_MiSeq",
    "status": "Creation",
    "experimentType": "somatic",
    "isPairend": true,
    "isControl": false
  }
]

View by Sample ID

$ python3 sg-upload-v2-wrapper.py sample --sample-id 1111111111

{
  "id": 1111111111,
  "sgacltId": 10,
  "userRef": "dnoble",
  "analysisType": "BRCA_Tumor",
  "kit": "Multiplicom_MASTR_Plus",
  "sampleType": "Other",
  "sequencer": "ILLUMINA_MiSeq",
  "status": "Creation",
  "experimentType": "somatic",
  "isPairend": true,
  "isControl": false
}

Options

Option Alias Mandatory Description Example
--run-id * Get all samples of the specified run sample --run-id 200002934
--sample-id * Get the description of the specified sample sample --sample-id 123456
--force-platform, -fp If specified, will list samples using new Platform Services. sample --run-id -fp 200002934
--help -h Explain previous options login -h

*one of --run-id or --sample-id is mandatory

file

List and download files from a batch request or an analysis.

Typical usage

List files by run ID

$ python3 sg-upload-v2-wrapper.py file --list --run-id 1234567890

[{"id":204177851,"name":"SG10000001_S1_L001_R1_001.fastq.gz","size":345676,"patient":"SG10000001","analysisId":300042016},
{"id":204177852,"name":"SG10000001_S1_L001_R2_001.fastq.gz","size":345682,"patient":"SG10000001","analysisId":300042016}]

List the Files by date (Time in milliseconds e.g. Epoch Unix Timestamp) and extension

$ python3 sg-upload-v2-wrapper.py file --list --date 1704063600000 --extension .bam

Download file by file ID to non-existent destination

$ python3 sg-upload-v2-wrapper.py file --download --file-id 1234567890 --file-out /tmp/test/test.fastq

Will copy file 1234567890 into /tmp/test/test.fastq
Will create the parent folder: /tmp/test
Have created the parent folder: /tmp/test
Your file has been downloaded and is available here: /tmp/test/test.fastq

Download file by file ID to current directory

$ python3 sg-upload-v2-wrapper.py file --download --file-id 1234567890 --file-out report.pdf

Will copy file 204183269 into report.pdf
Your file has been downloaded and is available here: report.pdf

*Download JSON report based on order ID and patient reference

$ python3 sg-upload-v2-wrapper.py file --download-reports --order-id testorder --patient-ref testpatient

*Download output files of an analysis based on run reference and sample ID

$ python3 sg-upload-v2-wrapper.py file --download --run-ref "testRun2025"  --sample-id "mySampleName"

Options

Option Alias Mandatory Description Example
--list -l * List the files of a bar or an analysis.

Requires at least --run-id or --analysis-id argument

By default results are in JSON format.

For files which has the information associated, the analysis id and patient user reference will also be provided.

File list for run 200002764
[{"id":123,"name":"file.fastq.gz","size":0}, {"id":123,"name":"file.fastq.gz","size":0,"analysisId":123,"patient":"dnoble"}]
file --list --run-id 200002934
--run-id Goes along with --list, this represent the batch analysis request id file --list --run-id 200002934
--analysis-id Goes along with --list, this represent the analysis id file --list --analysis-id 200002934
--flat Outputs the results in CSV format:
File list for run 200002764 id;name;size;analysisId;patient
204177851;SG10000001_S1_L001_R1_001.fastq.gz;0;;
204177852;SG10000001_S1_L001_R2_001.fastq.gz;0;;
file --list --run-id 200002934 --flat
--download -d * Downloads the file given by its --file-id argument in the --file-out file file --download --file-id 234565645 --file-out fastq.gz
--sample-id The sample id of the analysis ( For use with --download option )
--run-ref The run reference name of the batch ( For use with --download option )
--file-id The id of the file to download (mandatory when using the --download option) file --download --file-id 234565645 --file-out fastq.gz
--file-out -o The path where the file will be downloaded (mandatory when using the --download option)
  • If you provide a full path that doesn't exist, the folders will be automatically created
  • If the name ends with ".gz", the file will be automatically gzipped
  • If not specified, the file will be downloaded to the current directory
file --download --file-id 234565645 --file-out /test/example/fastq.gz
--run-id -r Download all output files of the specified RUN.

Input files (fastq and bam files) will not be downloaded by default.
file --download --run-id 123456
--with-input-files Usable only with the --run-id option.

If set to true, will download all the files of the run, even fastq or bam files.
file --download --run-id 123456 --with-input-files
--folder-user-ref Usable only with the --run-id option.

If set to true, will download all the files of the run and creates the subfolders by patient/userRef instead of Sophia's Internal Analysis Id.
file --download --run-id 123456 --folder--user-ref
--skip-zip Usable only with the --download option.

If set to true, will download all the files of the run and skips the zipping of the folder
file --download --run-id 123456 --skip-zip
--out The destination folder where to download files.

If used with the --run-id option, it will download the zip into the specified folder.
The zip name will be "runId-out.zip".
This option cannot be used with the --file-id and --file-out options.
file --download --run-id 123456 --out /my/custom/path
→ Download all files as /my/custom/path/123456-out.zip
--date List files uploaded since a specific date (Time in milliseconds e.g. Epoch Unix Timestamp). Important: This flag can only be used when --extension flag is provided. file --list --date 1704063600000 --extension .bam
--extension List files with a specific extension. Important: This flag can only when --date flag is provided. file --list --date 1704063600000 --extension .bam
--force-platform, -fp If specified, will download files using new Platform Services. file --list -fp
--help -h Explain previous options login -h

*one of --list or --download is mandatory

Downloadable files

All fastq, bam, bai, fna, qual, sff, ab1, warnings, zip files, as well as the following:

  • full_variant_table.txt
  • ampCov_patient_merge.txt
  • combined_cov_stats.txt
  • combined_var_stats.txt
  • combined_hotspot_stats.txt
  • exon_coverage_stats.txt
  • exon_coverage_stats_v2.txt
  • exon_coverage_stats_v3.txt
  • hCoV2_detection.txt
  • hCoV2_detection_per_patient.txt
  • QA-report.pdf
  • QA-patient.pdf
  • CNV-Report.pdf
  • MSI-Report.pdf
  • Gene-Expression-Report.pdf
  • full_variant_table.vcf
  • sampleCheckId.vcf
  • ontarget-mapping-statistics-table.csv
  • pcr-duplicates-table.csv
  • read-counts-overview-table.csv
  • softclip-percentage-table.csv
  • target-region-coverage-table.csv
  • alignment-stats-RNA-table.csv
  • detected-fusions-table.csv
  • full_variant_seq.fa
  • fasta_sequences.fa

export

Export files from completed interpretations. This command allows you to list and download output files (e.g., JSON reports) from analyses that have been completed since a specified date.

Typical usage

List files from completed interpretations since a specific date

$ python3 sg-upload-v2-wrapper.py export --since-date 2025-01-01 --file-pattern "*.json" --list

Found 15 completed interpretations since 2025-01-01
Found 45 files matching pattern '*.json'
[ {
  "id" : 204178491,
  "name" : "report.json",
  "checksum" : "686031b1b1ea63c3fea69b0061237e3b",
  "length" : 61812
}, {
  "id" : 204175927,
  "name" : "variant_report.json",
  "checksum" : "57a2743b3e8f5155f90172fb1fbf79c2",
  "length" : 129583
} ]

Output Modes

The --list command supports two output modes to suit different use cases:

Concise Mode (--json-only)

Use the --json-only flag for less verbose output — clean JSON with only essential fields, no informational messages. Ideal for scripting and piping to other tools:

$ python3 sg-upload-v2-wrapper.py export --since-date 2025-01-01 --file-pattern "*ORDER-46#*.json" -fp --list --json-only

[ {
  "id" : 422532316,
  "name" : "report-json-analysis_400163833-interpretation_30856-#ORDER-46#-rev19549.json",
  "checksum" : "9eee3cc1fc5fce788064b4c26e800b27",
  "length" : 12585
}, {
  "id" : 422532283,
  "name" : "report-json-analysis_400163833-interpretation_30856-#ORDER-46#-rev19518.json",
  "checksum" : "8678f5dff1d00d266e0a0feba7a804e7",
  "length" : 12587
} ]

Output fields:

Field Description
id Unique file identifier for download
name Filename
checksum MD5 checksum of the file
length File size in bytes

Verbose Mode (--flat)

Use the --flat flag for detailed output — includes informational messages and full file attributes with timestamps, associations, and encryption status:

$ python3 sg-upload-v2-wrapper.py export --since-date 2025-01-01 --file-pattern "*ORDER-46#*.json" -fp --list --flat

Found 39 completed interpretations since 2025-01-01
Found 2 files matching pattern '*ORDER-46#*.json'
[ {
  "id" : 422532316,
  "fileAttributes" : {
    "name" : "report-json-analysis_400163833-interpretation_30856-#ORDER-46#-rev19549.json",
    "dataAttributes" : {
      "checksum" : "9eee3cc1fc5fce788064b4c26e800b27",
      "length" : 12585
    }
  },
  "createdAt" : "2025-12-11T01:47:07Z",
  "fileAssociations" : [ {
    "entityType" : "SGAANA",
    "entityId" : 400163833,
    "ioType" : "OUTPUT"
  } ],
  "downloadable" : true,
  "hasWrappedEncryption" : true
}, {
  "id" : 422532283,
  "fileAttributes" : {
    "name" : "report-json-analysis_400163833-interpretation_30856-#ORDER-46#-rev19518.json",
    "dataAttributes" : {
      "checksum" : "8678f5dff1d00d266e0a0feba7a804e7",
      "length" : 12587
    }
  },
  "createdAt" : "2025-12-10T17:15:41Z",
  "fileAssociations" : [ {
    "entityType" : "SGAANA",
    "entityId" : 400163833,
    "ioType" : "OUTPUT"
  } ],
  "downloadable" : true,
  "hasWrappedEncryption" : true
} ]

Additional fields in verbose mode:

Field Description
createdAt File creation timestamp
fileAssociations Entity type, ID, and I/O type
downloadable Whether the file can be downloaded
hasWrappedEncryption Whether the file uses wrapped encryption

Download files from completed interpretations

$ python3 sg-upload-v2-wrapper.py export --since-date 2025-01-01 --file-pattern "*.json" --download

Found 15 completed interpretations since 2025-01-01
Found 45 files matching pattern '*.json'
Downloading 45 files to export-out
Downloading: report.json
...
Download complete: 45 succeeded, 0 failed

Download to a specific output folder

$ python3 sg-upload-v2-wrapper.py export --since-date 2025-01-01 --file-pattern "*.json" --download --out /path/to/output

Found 15 completed interpretations since 2025-01-01
Created output directory: /path/to/output
Downloading 45 files to /path/to/output
...
Download complete: 45 succeeded, 0 failed

Options

Option Alias Mandatory Description Example
--since-date Yes* Filter completed interpretations since date (YYYY-MM-DD format) export --since-date 2025-01-01 --list
--file-pattern No Filename pattern filter with wildcards (defaults to *.json) export --since-date 2025-01-01 --file-pattern "*.pdf" --list
--list -l * List files matching criteria in JSON format export --since-date 2025-01-01 --list
--download -d * Download files matching criteria export --since-date 2025-01-01 --download
--out No Output folder for downloads (defaults to 'export-out') export --since-date 2025-01-01 --download --out /my/folder
--json-only -j No Concise mode: Output only JSON with essential fields (id, name, checksum, length), no informational messages. Ideal for scripting export --since-date 2025-01-01 --list --json-only
--flat No Verbose mode: Output full file attributes including timestamps, associations, and encryption status export --since-date 2025-01-01 --list --flat
--analysis-id ** Analysis ID for variant/CNV/QC export export --analysis-id 123456 --variant-output --filter myfilter
--variant-output ** Download variant CSV file export --analysis-id 123456 --variant-output --filter myfilter
--cnv-output ** Download CNV CSV file export --analysis-id 123456 --cnv-output --filter myfilter
--qc-output ** Download QC file export --analysis-id 123456 --qc-output
--filter No Filter for variant/CNV export export --analysis-id 123456 --variant-output --filter myfilter
--help -h Explain previous options export -h

when using --since-date, one of --list or --download is mandatory *for existing export functionality (variant/CNV/QC), use --analysis-id with the appropriate output flag

How it works

The export command combines two service calls:

  1. Get completed interpretations: Queries all analyses with 'completed' status since the specified date
  2. Filter files by pattern: Queries the file service for files matching the filename pattern (e.g., *.json) for the retrieved analysis IDs

This allows you to efficiently export output files (such as JSON reports) from all completed interpretations without needing to know individual analysis IDs.

Notes

  • The --since-date option accepts dates in YYYY-MM-DD format
  • The --file-pattern option supports wildcards (e.g., *.json, *report*.pdf, *.vcf)
  • Files are downloaded to the current directory by default, or to the folder specified with --out
  • The command only retrieves files marked as 'downloadable' and 'output' type
  • Use --json-only for concise mode (clean JSON output suitable for scripting)
  • Use --flat for verbose mode (full file details with timestamps and associations)

userInfo

Display basic user information.

Typical usage

$ python3 sg-upload-v2-wrapper.py userInfo

{
  "userId": 405,
  "loginUsername": "dnoble",
  "clientId": 12
}

basespace

BaseSpace integration commands allow you to authenticate with Illumina BaseSpace, list projects, and automatically import sequencing data from BaseSpace projects into SOPHiA DDM.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace auth login

Authenticating to BaseSpace region: us
API Server: https://api.basespace.illumina.com
Enter your BaseSpace access token: <token>

✓ Successfully authenticated to BaseSpace!

basespace auth

BaseSpace authentication commands.

basespace auth login

Authenticate to BaseSpace. You need to obtain an access token from BaseSpace first.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace auth login

Authenticating to BaseSpace region: us
API Server: https://api.basespace.illumina.com
Please obtain an access token from BaseSpace:
1. Go to BaseSpace and navigate to your account settings
2. Create an API application or use an existing one
3. Generate an access token with appropriate scopes

Enter your BaseSpace access token: <token>
Validating token...
✓ Successfully authenticated to BaseSpace!

Options

Option Alias Mandatory Description Example
--region No BaseSpace region (us, euc1, aps1). Defaults to 'us' basespace auth login --region euc1
--api-server No BaseSpace API server URL basespace auth login --api-server https://api.euc1.sh.basespace.illumina.com
--token No BaseSpace access token. If not provided, will prompt for manual entry basespace auth login --token
--scope No Token scope (default: 'read write') basespace auth login --scope "read write"
--help -h No Explain previous options basespace auth login -h

Note: If you specify --region, the API server will be automatically set. If you specify --api-server, the region will be automatically detected. If neither is specified, defaults to US region.

basespace auth logout

Clear BaseSpace authentication.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace auth logout

✓ Successfully logged out from BaseSpace

basespace auth status

Show BaseSpace authentication status.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace auth status

Authenticated to BaseSpace (region: us)
Current region: us
Current API server: https://api.basespace.illumina.com

basespace project

BaseSpace project management commands.

basespace project list

List all BaseSpace projects for the authenticated user.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace project list

Listing BaseSpace projects (region: us)

✓ Found 3 projects:

Project: My Sequencing Run
  ID: 12345678
  Description: WGS run from 2025-01-15
  Created: 2025-01-15T10:30:00.0000000Z
  Owner: John Doe (98765432)

Project: Cancer Panel Run
  ID: 87654321
  Created: 2025-01-20T14:20:00.0000000Z
  Owner: Jane Smith (12345678)

Options

Option Alias Mandatory Description Example
--help -h No Explain previous options basespace project list -h

basespace project files

List files in a BaseSpace project.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace project files --project-id 12345678

Listing contents of BaseSpace project: 12345678

✓ Found 2 datasets:

Dataset: Sample1
  ID: dataset-001
  Description: Sample 1 sequencing data
  Created: 2025-01-15T10:35:00.0000000Z
  Files (4):
    - Sample1_S1_L001_R1_001.fastq.gz (1.2 GB)
    - Sample1_S1_L001_R2_001.fastq.gz (1.2 GB)
    - Sample1_S2_L001_R1_001.fastq.gz (1.1 GB)
    - Sample1_S2_L001_R2_001.fastq.gz (1.1 GB)

Options

Option Alias Mandatory Description Example
--project-id -p Yes BaseSpace project ID basespace project files -p 12345678
--show-all-files No Show all files, not just FASTQ files basespace project files -p 12345678 --show-all-files
--help -h No Explain previous options basespace project files -h

basespace auto-import

Automatically import all BaseSpace projects. This command will: - List all BaseSpace projects (optionally filtered by creation date) - Check each project for a sample sheet or FASTQ files - Create runs in SOPHiA DDM for each project - Track processed projects using lock files to avoid duplicate imports

Sample Sheet Support

The auto-import command supports two import methods:

  1. Sample Sheet Import (preferred): If a sample sheet is found in the project, it will be used to create the run. The sample sheet must follow the SOPHiA DDM format (see Sample Sheet section). The Pipeline_ID column in the sample sheet will be used if present.

  2. Pipeline-based Import: If no sample sheet is found, the command will use the specified pipeline ID to create the run. This requires the --pipeline option.

Typical usage

Import all projects with a specific pipeline:

$ python3 sg-upload-v2-wrapper.py basespace auto-import --pipeline 12345

BaseSpace Auto-Import
====================
Region: us

Listing BaseSpace projects...
Found 5 projects

Processing project: 12345678 (My Sequencing Run)
  Success: 12345678 → BS-12345678-20250115-143022
Processing project: 87654321 (Cancer Panel Run)
  Success: 87654321 → BS-87654321-20250115-143045

Summary:
  Processed: 2
  Skipped: 3
  Failed: 0

Import projects with sample sheets (no pipeline needed if sample sheets contain Pipeline_ID):

$ python3 sg-upload-v2-wrapper.py basespace auto-import

BaseSpace Auto-Import
====================
Region: us

Listing BaseSpace projects...
Found 3 projects

Processing project: 12345678 (My Sequencing Run)
  Found sample sheet: SampleSheet.csv
  Success: 12345678 → BS-12345678-20250115-143022

Dry-run to see what would be processed:

$ python3 sg-upload-v2-wrapper.py basespace auto-import --pipeline 12345 --dry-run

BaseSpace Auto-Import
====================
Region: us

Listing BaseSpace projects...
Found 5 projects

DRY RUN MODE - No projects will be imported
===========================================

[WOULD PROCESS] Project: 12345678 (My Sequencing Run)
  Created: 2025-01-15T10:30:00.0000000Z

[SKIP - Already processed] Project: 87654321 (Cancer Panel Run)
  Created: 2025-01-14T08:20:00.0000000Z
  Lock file: 2025-01-14_14:30:15|SUCCESS|BS-87654321-20250114-143015

Summary (DRY RUN):
  Would process: 4
  Would skip: 1
  Total: 5

Filter by date:

$ python3 sg-upload-v2-wrapper.py basespace auto-import --pipeline 12345 --from-date 2025-01-15T00:00:00Z

BaseSpace Auto-Import
====================
Region: us

Listing BaseSpace projects...
Found 10 projects
Filtering projects created on or after: 2025-01-15T00:00:00Z
After date filtering: 3 projects

Options

Option Alias Mandatory Description Example
--pipeline -p ** Pipeline ID. Required if no sample sheet is found in projects basespace auto-import --pipeline 12345
--sampletype -s No Sample Type ID (defaults to 8000) basespace auto-import --pipeline 12345 --sampletype 8000
--from-date No Only import projects created on or after this date. Format: 2025-05-09T22:11:20.0000000Z or 2025-05-09T22:11:20Z basespace auto-import --pipeline 12345 --from-date 2025-01-15T00:00:00Z
--dry-run No Simulate the import process without actually importing projects. Shows which projects would be processed basespace auto-import --pipeline 12345 --dry-run
--help -h No Explain previous options basespace auto-import -h

Pipeline ID requirement: The --pipeline option is required if: - No sample sheet is found in the project, OR - The sample sheet is found but doesn't contain a Pipeline_ID column

If a sample sheet with Pipeline_ID is found, the pipeline ID from the sample sheet will be used and --pipeline is not required.

Lock Files

The auto-import command uses lock files to track processed projects and avoid duplicate imports. Lock files are stored in:

~/.sophia/basespace/<region>/<project-id>.lock

Each lock file contains: - Timestamp of processing - Status (SUCCESS or FAILED) - Run reference (for successful imports)

Projects with existing lock files are automatically skipped. Failed imports create .lock.failed files.

Date Format

The --from-date option accepts ISO 8601 format dates with or without fractional seconds: - 2025-05-09T22:11:20Z (without fractional seconds) - 2025-05-09T22:11:20.000Z (with milliseconds) - 2025-05-09T22:11:20.0000000Z (with microseconds)

Behavior

  • Projects without FASTQ files are automatically skipped
  • Projects with sample sheets are processed using the sample sheet (preferred method)
  • Projects without sample sheets fall back to pipeline-based import (requires --pipeline)
  • Already processed projects (with lock files) are skipped
  • The command generates run references in the format: BS-{projectId}-{timestamp}

basespace status

Show BaseSpace connection and authentication status.

Typical usage

$ python3 sg-upload-v2-wrapper.py basespace status

BaseSpace Integration Status
============================

Authentication: Authenticated to BaseSpace (region: us)
Region: us
API Server: https://api.basespace.illumina.com

Testing connection...
✓ Token is valid and connection is working

Available commands:
  basespace auth login    - Authenticate to BaseSpace
  basespace auth logout   - Clear authentication
  basespace project list  - List all BaseSpace projects
  basespace project files -p <id> - List files in a BaseSpace project

Options

Option Alias Mandatory Description Example
--help -h No Explain previous options basespace status -h

One Command to Rule Them All

(New since 6.4.0)

The most efficient way to create and upload an analysis is by using the new command, which enables direct folder analysis with automatic uploading. You can either allow the system to guide you through selecting the appropriate pipeline or specify your choice directly.

Direct pipeline selection:

$ python3 sg-upload-v2-wrapper.py new --folder /path/to/fastq/files --ref MyRun123 --pipeline 1234 --upload

Run successfully created with id 200002747
Starting upload after analysis creation...
Upload ended in 123456ms

This single command does it all: - Automatically scans your FastQ folder - Creates the analysis request (with either interactive or direct pipeline selection) - Immediately starts the upload

No need to manually create JSON files or remember to initiate the upload after run creation — everything is handled in one single step. Ideal for both interactive use and automated scripts, making your workflow seamless and efficient.


Migration Guides

Sample Sheet v1 to v2 Migration

This guide explains how to migrate from v1 to v2. For information about the v2 format, see the Sample Sheet upload workflow section.

Key differences from v2: - Uses [SOPHIA_DDM_Data_v1] section header instead of [SOPHIA_DDM_Data] - Does not support [SOPHIA_DDM_Settings] section

How to Migrate from v1 to v2

To migrate an existing v1 sample sheet to v2 format:

  1. Change the section header: Replace [SOPHIA_DDM_Data_v1] with [SOPHIA_DDM_Data]

  2. Add the Settings section (optional but recommended): Add a [SOPHIA_DDM_Settings] section before the data section: [SOPHIA_DDM_Settings],,,, version,1,,,

  3. Keep all data columns unchanged: The column structure remains the same, so no changes are needed to your sample data rows

Example Migration:

Before (v1):

[SOPHIA_DDM_Data_v1],,,,
Sample_ID,Capture_ID,Bundle_SN,Pipeline_ID,Patient_Ref
SG10000008,1,BDS-1111111111-10,5,SDSD12

After (v2):

[SOPHIA_DDM_Settings],,,,
version,1,,,
[SOPHIA_DDM_Data],,,,
Sample_ID,Capture_ID,Bundle_SN,Pipeline_ID,Patient_Ref
SG10000008,1,BDS-1111111111-10,5,SDSD12

Documentation Version: 7.16.0-6.7.0 | Commit: 0fe5b5e7 | Built: 2026-04-01 08:05:23 UTC