123SNP - SNP Discovery Software

Cheng-Ting Yeh, Sanzhen Liu, and Patrick S. Schnable

This software/script is provided AS IS and without user support

123SNP enables the discovery of SNPs using native alignment output files produced by bowtie, novoalign, and GSNAP. Polymorphisms detected by those programs can be tallied to discover variants found when comparing reads vs. a reference genome. This script essentially uses the results from external alignment programs and performs SNP filtering via a set of specified parameters.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Iowa State University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED 
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CHENG-TING YEH SANZHEN LIU, or PATRICK S. SCHNABLE 
(OR IOWA STATE UNIVERSITY) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
STRICT LIABILITY, OR  TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE  POSSIBILITY OF SUCH DAMAGE.

Download 123SNP (22 Kb - 20 NOV 2011)

Citation

Li X, C Zhu, CT Yeh, W Wu, EM Takacs, KA Petsch, F Tian, G Bai, ES Buckler, GJ Muehlbauer, MCP Timmermans, MJ Scanlon, PS Schnable, J Yu (2012) Genic and non-genic contributions to natural variation of quantitative traits in maize. Genome Res, 22(12): 2436-2444. (Selected as an Editors' Choice by MaizeGDB, 8/12)
[ Abstract | Full Text PDF | Supplemental Materials | doi:10.1101/gr.140277.112 ]

SUGGESTED BOWTIE, GSNAP, AND NOVOALIGN PARAMETERS:
  % bowtie <ebwt> -q <input.fq> --solexa-quals -a -n 2 -l 25 -m 1 -B 1 --best --strata
  % gsnap -D <db_dir> -d <db_name> -B 2 -m 10 -i 2 -N 1 -n 3 <input.fas or input.fq>
  % novoalign -d <index db> -f <input.fq> -F ILMFQ -R 0 -r None

SNP DISCOVERY USING BOWTIE, GSNAP, AND NOVOALIGN OUTPUT FILES
=============================================================
USAGE:
  perl SNP_Discovery.pl [--bowtie <bowtie files>] [--gsnap <gsnap files>] [--gsnapQuality <gsnap quality file>]
          [--novoalign <novoalign files>] [--native <native files] --output|-o <output.gff3> [OPTIONS]

WHERE:
  --bowtie <bowtie files>               : Path to alignment output files generated by bowtie. The specified
                                          files are assumed to only include the best and unique alignments.
                                          Refer to suggested bowtie program parameters below.
  --gsnap <gsnap files>                 : Path to alignment output files generated by gsnap. Refer to suggested
                                          gsnap program parameters below.
  --gsnapQual <gsnap quality file>      : Path to quality file containing nucleotide values.
                                          This option is only required if --gsnap is specified
                                          and --assignQual is omitted.
  --novoalign <novoalign files>         : Path to alignment output files generated by novoalign. Refer to
                                          suggested novoalign parameters below.
  --native <native files>               : Path to previously generated native files.
  --output|-o <output.gff3>             : Path to output file where SNP calls will be saved

OPTIONS:
  --alleles|-a <num>                    : Maximum number of read alleles to encounter per site [DEFAULT: 1]
  --mismatches|-mm|-m <num>             : Maximum number of mismatches allowed per read. This option
                                          evaluates each read and accepts/discards based on the number
                                          polymorphisms regardless of read length [DEFAULT: 3]
  --mismatches|-mm|-m <num> <length>    : Maximum number of mismatches per nucleotides of the read. This
                                          option is used if the input reads have variable lengths
                                          to allow variable tolerance in evaluating reads to be accepted
                                          or discared. For instance, if '--mismatches 2 36' is specified
                                          and the read is 75 bp in length, the total allowed number of
                                          mismatches of this read is: CEILING((75 x 2) / 36) = 5 mismatches
  --num|-n <number of reads>            : Minimum number of reads per allele in each SNP site [DEFAULT: 3]
  --allelecoverage|-ac <coverage>       : Minimum allele coverage allowed for a SNP [DEFAULT: 0.80]
  --coverage|-c <coverage>              : Minimum overall coverage of all allowed alleles
                                          per SNP site [DEFAULT: 0.80]
  --quality|-q <quality value>          : Minimum quality value of nucleotides to consider in
                                          phred scale (0 ~ 40) [DEFAULT: 15]
  --assignQual <quality value>          : Overwrite/Assign to all nucleotides aligned by bowtie, gsnap,
                                          and/or novoalign by the specified quality value. Specified quality
                                          value is an interger value in phred scale (0 ~ 40)
  --ignore <number of bases>            : Specify the number of bases to ignore at the beginning and
                                          end of the read. Ignored nucleotides does not participate
                                          nor affect SNP discovery procedures [DEFAULT: 3]
  --maxsplice <max. splice distance>    : Specify the maximum number of bases to allow as splice distance
                                          for each read (GSNAP output reads only) [DEFAULT: 10000]
  --source <string>                     : Source string to be included in GFF3 file [DEFAULT: SNP_Discovery]
  --feature <string>                    : Feature string to be included in GFF3 file [DEFAULT: SNP]
  --stacking|--nostacking               : Filters read stacking by comparing start/end chromosomal coordinates of
                                          of each read to determine reads stacking in certain regions for
                                          removal. [DEFAULT: --nostacking]
  --trimmedlog|trlog <trim. log file>   : Specify the trimmed log file produced by in-house fastq trimming
                                          script. The presence of this file will allow the program to determine
                                          the exact coordinates of each read when performing reads stacking removal.
  --temp <temp directory>               : Path to a temporary directory to save files to. Default is to save
                                          temporary files in current directory.
  --clean|--noclean                     : Enable/Disable the cleaning of temporary files [DEFAULT: --clean]