Manual

Introduction

IMSEQ is a tool for the analysis of T- and B-cell receptor chain sequences. It can be used to analyse either single-read data, where the reads cover the V-CDR3-J region sufficiently for an identification, or paired-end data where one read covers the V-region and one read covers the J- and CDR3-region. The latter read has to cover only a small fraction of the V-segment, sufficient for the localization of the Cys-104 motif (TODO- Correct spelling in IMSEQ app). This manual will begin with a listing of the command line arguments and their intended use, and then preset some vignettes that demonstrate the intended use of IMSEQ. The simplest IMSEQ command is used to show a complete list of options on the com- mand line:

$ imseq --help

This command will show the expected syntax of each of the command line arguments, e.g.:

-oa, --out-amino STR
      Output file path for translated clonotypes.

For each argument, there is a short form (here: -oa) which is predeced by a single dash, and an equivalent long form (–out-amino) that is preced by two dashes. Either form can be used interchangably. For each argument, the required parameter type is given:

STR A string, i.e., and combination of letters, numerals, or the underscore character (_)
NUM A valid number
FILE Absolute or relative path to a file that already exists on the file system.

If no parameter type is specified, the argument does not accept any parameters and is used as a flag to enable or disable a feature.

Basic usage and input files

For single-end V(D)J-reads, IMSEQ is called as follows:

$ imseq -ref <segment sequences> {-o,-oa,-on} <output file> <VDJ-reads>

For paired-end data, the program call takes two input files:

$ imseq -ref <segment sequences> {-o,-oa,-on} <output file> <V-reads> <VDJ-reads>

The segment sequences have to be provided in FASTA format as specified in the IMSEQ FASTA ID Specification. At least one output file has to be provided, using the -o, -oa or -on option. For multiple output files, each option has to be provided with its own output file name. The input sequences / reads can be provided either in FASTA or FASTQ format, while all quality related features only work when FASTQ files were provided. The input files can be GZIP compressed.

Output files

IMSEQ supports three output files to write the clonotyping and repertoire generation results:

Furthermore, the user can choose to specify one of the following options:

Read preprocessing

V/J segment alignment

V/J segment alignment (paired-end)

V/J segment alignment (Expert settings)

Quality filtering

Barcoding

Postprocessing / Clustering

Performance

Other options