MSQT/Admin Manual

Contents

[Up]

Introduction

With MSQT/Admin you can upload and organize your datasets and you can compile SNPs using the module 'Snipe4SNPs'. If you are an OS X user please also consult the 'Note to OSX users' below.

[Up]

Dataset Manager

The Dataset Manager provides an overview table of all available datasets you have uploaded previously with the 'Data Uploader'.

You can change a dataset to be the default dataset by clicking on the 'set default' link in the last column. The default dataset is the first dataset shown in the dataset selection boxes in MSQT/SBE, MSQT/ADF and MSQT/SNIPED!.

Delete a dataset after a confirmation request with the 'delete' link.

[Up]

Data Uploader

Use the 'Data Uploader' to add your datasets into MSQT. Fill out the form:

  • Name of dataset: Choose an identifier for your dataset. Use only characters, numbers and underscores, e.g. my_set01
  • Structure of data: Choose either 'Position name' or 'Chromosome position'.
  • Chromosomes: If your dataset is in 'Chromosome position' structure, provide the number of Chromosomes (only numbers).
  • Archive File: The archive containing your data. Please read the section 'MSQT Data Input Format Specification' for further details on the archive format.
  • Click 'upload' and on the next screen you will be given the opportunity to inspect the names of all individuals in your dataset detected by MSQT. In case corrections are needed, you will have to return to your original dataset, correct, create the archive and upload again. Once you pressed 'continue with data processing', MSQT will create the database schema and load the dataset. In case you need to correct your data beyond this point you will need to also either use a new name for the dataset or delete the previous dataset using the 'Dataset Manager'.

    [Up]

    Snipe4SNPs

    On an uploaded dataset you can precompile SNPs for the 'SNIPED!' module with 'Snipe4SNPs'. This will analyze each SNP in the dataset and load the results into a new table. This table can then be queried via the SNIPED! frontend.

    Choose one dataset and click 'compile SNPs' in the 'Action'-Column. On the next screen you will be asked to provide some parameters. These can be adjusted, however, it may only be useful for very big datasets. For normal operation we recommend to use the default settings, since subselections can be performed later with the SNIPED! frontend. (default: diff_threshold = 1.1, right_neigbh_threshold = 1, left_neigbh_threshold = 1)

    diff_threshold (DT)

    For each SNP position the software will determine the two unambigous alleles (A,G,C,T,-) with allele frequencies closest to 0.5 and compute the difference (diff) between these two frequencies. If diff is greater than diff_threshold, this SNP will be skipped and not analyzed. Setting diff_threshold to 1.0 or above will obviously inactivate this restriction criterion.

    right_neigbh_threshold (RNT)

    Threshold for the right neighborhood length.

    This is the minimal count of bases from the current SNP to the next SNP to the right.

    left_neigbh_threshold (LNT)

    Threshold for the left neighborhood length.

    The same as the right_neighborhood_threshold, but to the left.

    Setting both thresholds equal to 1 will skip indels, which is probably desired. Keep in mind that MSQT treats any position within an indel as a SNP.

    After starting 'Snipe4SNPs' a popup window, the 'Snipe4SNPs Processing Monitor' will appear. Do not close or reload this window until you read the message 'Snipe4SNPs is finished. You can close this window now'!

    [Up]

    Server Info

    This Module provides some information about your database and web server the current MSQT installation is using.

    [Up]

    MSQT Data Input Format Specification

    Input format for MSQT datasets to be imported with MSQT/Admin 'Data Uploader'

    MSQT/Admin 'Data Uploader' requires a compressed archive (zip or tar/gz) of a directory containing multiple alignment fasta files with a special directory hierarchy.

    Directory hierarchy for a dataset in 'chromosome position' format

    Use this format if you know the basepair position of your fragment in the reference genome. There has to be one top level directory which contains as much subdirectories as chromosomes available. The subdirectory names must be prefixed with 'chromosome_' and consecutively numbered starting with 1. The number needs to be an integer, please also assign numbers to the sex chromosomes.

    Each filename must be an integer and must correspond to the position of the first base of the aligned sequences in the reference genome.

    Example for my_dogs:


    my_dogs | |-- chromosome_1 | |-- 10095031 | |-- 112445 | |-- 197433 | |-- 2079956 | |-- 29215 | |-- 5757110 | |-- 7672503 | `-- 9343234 | |-- chromosome_2 | |-- 1043454 | |-- 2037121 | |-- 208721 | |-- 347341 | |-- 419364 | |-- 4796716 | |-- 5020871 | `-- 9256049 | |-- chromosome_3 | |-- 1073635 | |-- 1901310 | |-- 2072648 | |-- 3176691 | |-- 4056767 | |-- 70672 | |-- 8041706 | |-- 9279010 | `-- 964879 | |-- chromosome_4 | |-- 1055093 | |-- 142440 | |-- 3006871 | |-- 48286 | |-- 5077409 | |-- 7077771 | |-- 8078388 | `-- chromosome_5 |-- 10102075 |-- 105340 |-- 375923 |-- 4010373 |-- 43643 |-- 5021995 |-- 6022450 |-- 8331140 `-- 947444

    Directory hierarchy for a dataset in 'Position name' structure

    Use this format if you work in an organism without a reference genome. In this case we obviously do not have chromosomes und hence there are no subdirectories. One top level directory is to contain all sequence alignment files. The filenames will be used to compose the unique SNP identifiers.

    Example for my_cats:


    my_cats | |-- AR266f |-- gamma2_Oro52m |-- ANNEX_AR266f_fw |-- beta59f1_rev |-- Qua59f1_rev |-- Fili15f_fw |-- 1TIF4_rev |-- AMT |-- Ari18f_fw |-- betaBlue99f `-- ANNEX_Fili15f_fw

    Remark: the dogs and cats datasets are just examples and are not related to the animals with the same name.

    MSQT FASTA format:

    Each file has to contain one multiple alignment from one locus in FASTA format with Unix linebrakes. One sequence in each file has to be defined as the 'target' sequence. The sequences must be aligned and must have exactly the same length; please fill with 'N'. The filenames must not start with a dot ('.') and must not contain dashes ('-').

    Each fasta file should contain the same set of individuals, or at least a subset: Files where individuals are missing will be filled with a sequence containing only Ns automatically during the data upload process.

    Example Fasta file:


    >target AGTACAGCCCAGAGTACAAGGACGTTTATCAGACGCAGAGCCTGATCTCCGAGCTGGATGTGAGCTTCA >individual_1 AGTACAGCCCAGAGTACAAGGACGTTTATCAGACGCAGAGCCTGATCTCCGAGCTGGATGTGANNNNNN >individual_2 NNNNNNNNNNNNNNNNCAAGGACGTTTATCAGACGCAGAGCCTGATCTCCGAGCTGGATGTGAGCTTCA >individual_n AGTACAGCCCAGAGTACAAGGACGTTTATCAGACGCAGAGCCTGATCTCCGAGCTGGATGTGAGCTTCA

    No multi-line or interleaved sequences, no comments, no additional newlines are allowed.

    [Up]

    Note to OSX users

    Unix linebrakes ?

    You will need to use an editor that is capable of saving textfiles with Unix linebrakes. We recommend TextWrangler (from Bare Bones Software). Use -> File -> "save as ...", click on "Options" and choose "Linebrakes: Unix".

    Compressing your directory

    Do NOT use the right-mouse-click way of compressing your directory (Create Archive of "..."), because this will create additional directories and files within your archive which will interfere with the data upload; please open a terminal window (Applications -> Utilities -> Terminal), change into the parent directory of the directory to be compressed and use tar.


    tar -cvzf directoryname.tar.gz directoryname

    In Os X this will not create additional directories. It will, however, also create additional files, but the data uploader will ignore those; it also ignores the .DS_Store files.

    [Up]

    About

    For Credits and Copyright notices please see about.html

    [Up]