123SNP - SNP Discovery Software
Cheng-Ting Yeh, Sanzhen Liu, and Patrick S. Schnable
123SNP enables the discovery of SNPs using native alignment output files produced by bowtie, novoalign, and GSNAP. Polymorphisms detected by those programs can be tallied to discover variants found when comparing reads vs. a reference genome. This script essentially uses the results from external alignment programs and performs SNP filtering via a set of specified parameters.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the Iowa State University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CHENG-TING YEH SANZHEN LIU, or PATRICK S. SCHNABLE (OR IOWA STATE UNIVERSITY) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Li X, C Zhu, CT Yeh, W Wu, EM Takacs, KA Petsch, F Tian, G Bai, ES Buckler, GJ Muehlbauer, MCP Timmermans, MJ Scanlon, PS Schnable, J Yu (2012) Genic and non-genic contributions to natural variation of quantitative traits in maize. Genome Res, 22(12): 2436-2444. (Selected as an Editors' Choice by MaizeGDB, 8/12)
[ Abstract | Full Text PDF | Supplemental Materials | doi:10.1101/gr.140277.112 ]
SUGGESTED BOWTIE, GSNAP, AND NOVOALIGN PARAMETERS: % bowtie <ebwt> -q <input.fq> --solexa-quals -a -n 2 -l 25 -m 1 -B 1 --best --strata % gsnap -D <db_dir> -d <db_name> -B 2 -m 10 -i 2 -N 1 -n 3 <input.fas or input.fq> % novoalign -d <index db> -f <input.fq> -F ILMFQ -R 0 -r None SNP DISCOVERY USING BOWTIE, GSNAP, AND NOVOALIGN OUTPUT FILES ============================================================= USAGE: perl SNP_Discovery.pl [--bowtie <bowtie files>] [--gsnap <gsnap files>] [--gsnapQuality <gsnap quality file>] [--novoalign <novoalign files>] [--native <native files] --output|-o <output.gff3> [OPTIONS] WHERE: --bowtie <bowtie files> : Path to alignment output files generated by bowtie. The specified files are assumed to only include the best and unique alignments. Refer to suggested bowtie program parameters below. --gsnap <gsnap files> : Path to alignment output files generated by gsnap. Refer to suggested gsnap program parameters below. --gsnapQual <gsnap quality file> : Path to quality file containing nucleotide values. This option is only required if --gsnap is specified and --assignQual is omitted. --novoalign <novoalign files> : Path to alignment output files generated by novoalign. Refer to suggested novoalign parameters below. --native <native files> : Path to previously generated native files. --output|-o <output.gff3> : Path to output file where SNP calls will be saved OPTIONS: --alleles|-a <num> : Maximum number of read alleles to encounter per site [DEFAULT: 1] --mismatches|-mm|-m <num> : Maximum number of mismatches allowed per read. This option evaluates each read and accepts/discards based on the number polymorphisms regardless of read length [DEFAULT: 3] --mismatches|-mm|-m <num> <length> : Maximum number of mismatches per nucleotides of the read. This option is used if the input reads have variable lengths to allow variable tolerance in evaluating reads to be accepted or discared. For instance, if '--mismatches 2 36' is specified and the read is 75 bp in length, the total allowed number of mismatches of this read is: CEILING((75 x 2) / 36) = 5 mismatches --num|-n <number of reads> : Minimum number of reads per allele in each SNP site [DEFAULT: 3] --allelecoverage|-ac <coverage> : Minimum allele coverage allowed for a SNP [DEFAULT: 0.80] --coverage|-c <coverage> : Minimum overall coverage of all allowed alleles per SNP site [DEFAULT: 0.80] --quality|-q <quality value> : Minimum quality value of nucleotides to consider in phred scale (0 ~ 40) [DEFAULT: 15] --assignQual <quality value> : Overwrite/Assign to all nucleotides aligned by bowtie, gsnap, and/or novoalign by the specified quality value. Specified quality value is an interger value in phred scale (0 ~ 40) --ignore <number of bases> : Specify the number of bases to ignore at the beginning and end of the read. Ignored nucleotides does not participate nor affect SNP discovery procedures [DEFAULT: 3] --maxsplice <max. splice distance> : Specify the maximum number of bases to allow as splice distance for each read (GSNAP output reads only) [DEFAULT: 10000] --source <string> : Source string to be included in GFF3 file [DEFAULT: SNP_Discovery] --feature <string> : Feature string to be included in GFF3 file [DEFAULT: SNP] --stacking|--nostacking : Filters read stacking by comparing start/end chromosomal coordinates of of each read to determine reads stacking in certain regions for removal. [DEFAULT: --nostacking] --trimmedlog|trlog <trim. log file> : Specify the trimmed log file produced by in-house fastq trimming script. The presence of this file will allow the program to determine the exact coordinates of each read when performing reads stacking removal. --temp <temp directory> : Path to a temporary directory to save files to. Default is to save temporary files in current directory. --clean|--noclean : Enable/Disable the cleaning of temporary files [DEFAULT: --clean]