Legacy Generation Script

A Python script can be downloaded to generate this json input. The script can be applied to a folder containing the fastq files to analyze. The pipeline to apply to all of those files has to be described in a configuration file ("pipeline.json" for example). This means that the same pipeline will be applied to all samples of a RUN.

Usage

Move all the fastq files to analyze into a folder.

Then run the following command, which will generate the fastqFolder.json file.

python create-bar-input.py pipeline.json fastqFolder

By default, the file will be saved in the current working directory. This can be overridden with an optional --save-folder argument. E.g.:


Example pipeline

{
  "sequencer": "ILLUMINA_MiSeq",
  "pairend": true,
  "analysis_type": "BRCA_Tumor",
  "experiment_type": "somatic",
  "kit": "Manufacturer_kit_code"
}

See the end of page for more pipeline examples.


python create-bar-input.py /my/fastq-folder --save-folder /some/path

Please note that the name of the output file can not be changed. This will always be the name of the FastQ folder. Any such file already existing will be overwritten without warning.

Both folder arguments can be absolute or relative. For example, this will process a "subfolder" of the current working directory and save the output file one level up:

python create-bar-input.py pipeline.json subfolder --save-folder ..

The script also supports an optional -v/--verbose flag which increases the output verbosity.

All files in the processed FastQ folder are assumed to follow this convention:

patient_mid_lane_r1/r2_ignored

"patient" has to be the first part of the file name and the last part will always be ignored. The position of "mid", "lane" and "r1/r2" can be interchanged, "lane" is optional.

For example:

GN2804-12A3456-2-subset2_S7_L002_R1_001.fastq.gz

This example file would produce these values:

Patient: GN2804-12A3456-2-subset2
Mid: S7
Lane: L002
R#: R1

WARNING

Do not upload any files containing nominative information or any other direct identifier related to a patient (e.g. patient's full name).


Legacy Format Description

Restricted properties allow only certain values which are provided by Sophia Genetics.

Level: root

Property Type Restricted Description
analyses array List of all the analysis to be done in this batch analysis request.
user_ref string Name of the batch analysis request (run) in SophiaDDM.
sequencer string Yes Code of the sequencer used for this batch analysis request
pairend boolean Whether the run is with pair ended analyses or not.
analyses array List of all the analysis to be done in this batch analysis request.

Level: analyses

Property Type Restricted Description
analysis_type string Yes The gene panel code as appears in Sophia DDM.
user_ref string Patient user reference, will be created if not found in Sophia DDM.
mid string MID of the sample.
experiment_type string Yes "germline" or "somatic".
kit string Yes Manufacturer code of the kit.
pairend boolean Whether the analysis is pair ended or not.
files array List of analysis files.

Level: analyses - files

Property Type Description
r1 string Path to the R1 file.
r2 string Path to the R2 file.

Full Example

{
  "v": 1,
  "user_ref": "run_name_reference",
  "sequencer": "Sequencer_code",
  "pairend": true,
  "analyses": [
    {
      "analysis_type": "BRCA_Tumor",
      "user_ref": "Seq_01_sample_21",
      "mid": "S1",
      "experiment_type": "somatic",
      "kit": "Manufacturer_kit_code",
      "pairend": true,
      "files": [
        {
          "r1": "/path_to/SG10000001_S1_L001_R1_001.fastq.gz",
          "r2": "/path_to/SG10000001_S1_L001_R2_001.fastq.gz"
        }
      ]
    },
    {
      "analysis_type": "BRCA_Tumor",
      "user_ref": "Seq_01_sample_22",
      "mid": "S2",
      "experiment_type": "somatic",
      "kit": "Manufacturer_kit_code",
      "pairend": true,
      "files": [
        {
          "r1": "/path_to/SG10000001_S2_L001_R1_001.fastq.gz",
          "r2": "/path_to/SG10000001_S2_L001_R2_001.fastq.gz"
        }
      ]
    }
  ]
}

Pipeline examples

ILLUMINA_MiSeq - BRCA

{
  "sequencer": "ILLUMINA_MiSeq",
  "analysis_type": "BRCA",
  "experiment_type": "germline",
  "kit": "Multiplicom_MASTR_assay",
  "pairend": true
}

ILLUMINA_MiSeq - BRCA_Tumor

{
  "sequencer": "ILLUMINA_MiSeq",
  "analysis_type": "BRCA_Tumor",
  "experiment_type": "germline",
  "kit": "Multiplicom_MASTR_Plus",
  "pairend": true
}

ILLUMINA_MiniSeq - STS_v1

{
  "sequencer": "ILLUMINA_MiniSeq",
  "analysis_type": "STS_v1",
  "experiment_type": "somatic",
  "kit": "IDT",
  "pairend": true
}