Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
bioinformatic_tools_to_detect_microsatellites_loci_from_genomic_data [2013/05/30 15:04]
anniearchambault
bioinformatic_tools_to_detect_microsatellites_loci_from_genomic_data [2013/08/08 15:21] (current)
anniearchambault
Line 30: Line 30:
 ^TRA     | 2004     | 2004     | [[http://​scholar.google.ca/​scholar?​cites=13056073161710952438&​as_sdt=5&​sciodt=0&​hl=fr|21]] ​    | Bilgen et al 2004(([[http://​bioinformatics.oxfordjournals.org/​cgi/​doi/​10.1093/​bioinformatics/​bth410|Bilgen,​ M., Karaca, M., Onus, A. N., and Ince, A. G. (2004). A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences. Bioinformatics 20, 3379-3386.]])) ​    | ?     | Heuristic ​    | ?     | Multiple files with multiple sequences each (Max 1 Mb sequence) from ESTs.     | ?     | Yes, searches for exact–inexact TRs and exact–inexact compound repeats ​    | No?     | ?     | ? [[ftp://​ftp.akdeniz.edu.tr/​Araclar/​TRA/​|Download]] ​    | Windows ​    | C++ (with Microsoft Visual C++)     | ?     | Searches among the organisms, organs, tissue types and development stages ​    | ?     ​| ​    | ^TRA     | 2004     | 2004     | [[http://​scholar.google.ca/​scholar?​cites=13056073161710952438&​as_sdt=5&​sciodt=0&​hl=fr|21]] ​    | Bilgen et al 2004(([[http://​bioinformatics.oxfordjournals.org/​cgi/​doi/​10.1093/​bioinformatics/​bth410|Bilgen,​ M., Karaca, M., Onus, A. N., and Ince, A. G. (2004). A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences. Bioinformatics 20, 3379-3386.]])) ​    | ?     | Heuristic ​    | ?     | Multiple files with multiple sequences each (Max 1 Mb sequence) from ESTs.     | ?     | Yes, searches for exact–inexact TRs and exact–inexact compound repeats ​    | No?     | ?     | ? [[ftp://​ftp.akdeniz.edu.tr/​Araclar/​TRA/​|Download]] ​    | Windows ​    | C++ (with Microsoft Visual C++)     | ?     | Searches among the organisms, organs, tissue types and development stages ​    | ?     ​| ​    |
 ^MsatFinder ​    | 2005     | 2007     | 48     | Thurston and Field 2005(([[http://​www.genomics.ceh.ac.uk/​msatfinder/​|Thurston,​ M. I., and Field, D. (2006). Msatfinder, (Oxford, UK: Centre for Ecology and Hydrology. Computer program)]])) ​    | One to 6 bp long     | ?     | 1) length of repeat 2) number repeat unit in the site 3) search engine (regex, multipass or iterative search) ​    | Limit of 10 Mb of sequence in the online access. Accepts GenBank, EMBL, Swissprot, FASTA, ASCII. ​    | Repeats, GFF, Counts, Msat_tabs, Flank_tabs, Fasta, MINE, Primers ​    | No, but detects compound perfect repeats ​    ​| ​     | [[http://​www.genomics.ceh.ac.uk/​cgi-bin/​msatfinder/​msatfinder.cgi|Yes]] ​    | Command-line [[http://​www.genomics.ceh.ac.uk/​msatfinder/#​download|Download]] ​    | Unix (may work on Mac OSX)     | perl script ​    | ?     | Nucleic acid or amino acid sequence ​    | ?     ​| ​    | ^MsatFinder ​    | 2005     | 2007     | 48     | Thurston and Field 2005(([[http://​www.genomics.ceh.ac.uk/​msatfinder/​|Thurston,​ M. I., and Field, D. (2006). Msatfinder, (Oxford, UK: Centre for Ecology and Hydrology. Computer program)]])) ​    | One to 6 bp long     | ?     | 1) length of repeat 2) number repeat unit in the site 3) search engine (regex, multipass or iterative search) ​    | Limit of 10 Mb of sequence in the online access. Accepts GenBank, EMBL, Swissprot, FASTA, ASCII. ​    | Repeats, GFF, Counts, Msat_tabs, Flank_tabs, Fasta, MINE, Primers ​    | No, but detects compound perfect repeats ​    ​| ​     | [[http://​www.genomics.ceh.ac.uk/​cgi-bin/​msatfinder/​msatfinder.cgi|Yes]] ​    | Command-line [[http://​www.genomics.ceh.ac.uk/​msatfinder/#​download|Download]] ​    | Unix (may work on Mac OSX)     | perl script ​    | ?     | Nucleic acid or amino acid sequence ​    | ?     ​| ​    |
-^FireµSat ​    | 2006     | 2011     | [[http://​scholar.google.ca/​scholar?​cites=17060705618169575807&​as_sdt=2005&​sciodt=0,​5&​hl=fr|5]] and [[http://​scholar.google.ca/​scholar?​cites=66353776030260478&​as_sdt=5&​sciodt=0&​hl=fr|this]] ​    | de Ridder at al 2006(([[http://​portal.acm.org/​citation.cfm?​doid=1216262.1216289|de Riddera, C., Kourie, D. G., and Watson, B. W. (2006). FireµSat. In Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries ​ - SAICSIT ​ ’06 (Somerset West, South Africa), pp. 247-256.]])) and [[http://upetd.up.ac.za/thesis/available/etd-08172010-202532/unrestricted/​dissertation.pdf|de Ridder ​2006]]     | 1 to 5 bp. Length set by the user. The next update should allow for detection of 6 to 100 bp repeats. ​    | Uses Counting Finite Automata (which are regular language acceptors) ​    | Max Motif Error (per motif); Max adjacent ATR elements; Motif Range Options; Min required TR elements; Max substring error (a threshold); Mismatch penalty (m_p); Delete penalty (d_p); Insert penalty (i_p). ​    | One fasta file with one sequence ​    | File in .csv format. ​    | Yes, substitutions and indels, but not compound loci.     | No     | No     | GUI and command-line,​ [[http://​www.dna-algo.co.za/​|Download]] ​    | Windows; Linux in progress ​    | C++ and MatLab ​    | Run time increases linearly with the sequence length; does not increase with longer motif lengths. ​    | Designed for microsatellites,​ but can detect any type of TR     | Fast, simple and flexible. ​    ​| ​    |+^FireµSat ​    | 2006     | 2011     | [[http://​scholar.google.ca/​scholar?​cites=17060705618169575807&​as_sdt=2005&​sciodt=0,​5&​hl=fr|5]] and [[http://​scholar.google.ca/​scholar?​cites=66353776030260478&​as_sdt=5&​sciodt=0&​hl=fr|this]] ​    | de Ridder at al 2006(([[http://​portal.acm.org/​citation.cfm?​doid=1216262.1216289|de Riddera, C., Kourie, D. G., and Watson, B. W. (2006). FireµSat. In Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries ​ - SAICSIT ​ ’06 (Somerset West, South Africa), pp. 247-256.]])) and de Ridder at al 2013(([[http://www.sciencedirect.com/science/article/pii/S1570866712001657|De Ridder, C., D.G. Kourie, B.W. Watson, T.R. Fourie, and P.V. Reyneke (2013). Fine-tuning the search for microsatellites. Journal of Discrete Algorithms 20: 21–37.]]))     | 1 to 5 bp. Length set by the user. The next update should allow for detection of 6 to 100 bp repeats. ​    | Uses Counting Finite Automata (which are regular language acceptors) ​    | Max Motif Error (per motif); Max adjacent ATR elements; Motif Range Options; Min required TR elements; Max substring error (a threshold); Mismatch penalty (m_p); Delete penalty (d_p); Insert penalty (i_p). ​    | One fasta file with one sequence ​    | File in .csv format. ​    | Yes, substitutions and indels, but not compound loci.     | No     | No     | GUI and command-line,​ [[http://​www.dna-algo.co.za/​|Download]] ​    | Windows; Linux in progress ​    | C++ and MatLab ​    | Run time increases linearly with the sequence length; does not increase with longer motif lengths. ​    | Designed for microsatellites,​ but can detect any type of TR     | Fast, simple and flexible. ​    ​| ​    |
 ^Phobos ​    | 2006     | 2010     | ?     | Mayer 2010(([[http://​www.ruhr-uni-bochum.de/​spezzoo/​cm/​cm_phobos.htm |Mayer, C. (2010). Phobos: Highly accurate search for perfect and imperfect tandem repeats in complete genomes by Christoph Mayer, (Bochum, Germany: Ruhr-Universität Bochum,​Faculty of Biological Sciences and Biotechnology). Computer program.]])) ​     | Perfect and imperfect TR, with a pattern size of 1 - 10 000 bp     | Exhaustive, uses alignment scores ​    | Mismatch score, indel score, minimum score, minimum length, minimum perfection, and others. ​    | One file in fasta format, with multiple sequence. No limit in sequence length. ​    | Text file, different formats, including gff and fasta     | Yes. Substitutions and indels. ​     | Not in itself, but yes as implemented in STAMP or Geneious. ​     | No     | User friendly GUI and easily scriptable Command-line program [[http://​www.ruhr-uni-bochum.de/​ecoevo/​cm/​cm_phobos.htm|Download]] ​    | MacOSX, Linux, Windows. ​    | C++     | Execution time increases with pattern size range. Very fast in the size range 1-10 bp, slow for patterns in the size range above 10-20 bp.     | Can be incorporated into pipelines. Implemented in STAMP and Geneious. ​    | Free only for academic users. ​    ​| ​    | ^Phobos ​    | 2006     | 2010     | ?     | Mayer 2010(([[http://​www.ruhr-uni-bochum.de/​spezzoo/​cm/​cm_phobos.htm |Mayer, C. (2010). Phobos: Highly accurate search for perfect and imperfect tandem repeats in complete genomes by Christoph Mayer, (Bochum, Germany: Ruhr-Universität Bochum,​Faculty of Biological Sciences and Biotechnology). Computer program.]])) ​     | Perfect and imperfect TR, with a pattern size of 1 - 10 000 bp     | Exhaustive, uses alignment scores ​    | Mismatch score, indel score, minimum score, minimum length, minimum perfection, and others. ​    | One file in fasta format, with multiple sequence. No limit in sequence length. ​    | Text file, different formats, including gff and fasta     | Yes. Substitutions and indels. ​     | Not in itself, but yes as implemented in STAMP or Geneious. ​     | No     | User friendly GUI and easily scriptable Command-line program [[http://​www.ruhr-uni-bochum.de/​ecoevo/​cm/​cm_phobos.htm|Download]] ​    | MacOSX, Linux, Windows. ​    | C++     | Execution time increases with pattern size range. Very fast in the size range 1-10 bp, slow for patterns in the size range above 10-20 bp.     | Can be incorporated into pipelines. Implemented in STAMP and Geneious. ​    | Free only for academic users. ​    ​| ​    |
 ^SSRscanner ​    | 2006     | ?     ​| ​ [[http://​scholar.google.ca/​scholar?​cites=15246530144073441813&​as_sdt=2005&​sciodt=0,​5&​hl=fr|3]] ​    | Anwar and Khan 2006(([[http://​www.ncbi.nlm.nih.gov/​pmc/​articles/​PMC1891659/​|Anwar,​ T., and Khan, A. (2006). SSRscanner: a program for reporting distribution and exact location of simple sequence repeats. Bioinformation 1, 89-91.]])) ​    | Only searches for predefined motifs ​    | Exhaustive, uses dictionary approach. ​    | File containing motifs of different repeat types; number of times for the motifs to be repeated ​    | One file with one sequence ​    | Motifposition.txt (gives the frequency of each repeat provided in the motif file) and (2) Motifresult.exe (gives the specific location of each repeat) ​    | No, perfect only     | No     | No     | Command-line Availability unknown, contact the [[mailto:​huzzi99@hotmail.com|author]] ​    | Platform independent ​    | perl script ​    | ?     | ?     | ?     ​| ​    | ^SSRscanner ​    | 2006     | ?     ​| ​ [[http://​scholar.google.ca/​scholar?​cites=15246530144073441813&​as_sdt=2005&​sciodt=0,​5&​hl=fr|3]] ​    | Anwar and Khan 2006(([[http://​www.ncbi.nlm.nih.gov/​pmc/​articles/​PMC1891659/​|Anwar,​ T., and Khan, A. (2006). SSRscanner: a program for reporting distribution and exact location of simple sequence repeats. Bioinformation 1, 89-91.]])) ​    | Only searches for predefined motifs ​    | Exhaustive, uses dictionary approach. ​    | File containing motifs of different repeat types; number of times for the motifs to be repeated ​    | One file with one sequence ​    | Motifposition.txt (gives the frequency of each repeat provided in the motif file) and (2) Motifresult.exe (gives the specific location of each repeat) ​    | No, perfect only     | No     | No     | Command-line Availability unknown, contact the [[mailto:​huzzi99@hotmail.com|author]] ​    | Platform independent ​    | perl script ​    | ?     | ?     | ?     ​| ​    |
Line 126: Line 126:
  
 ==FireµSat== ​ ==FireµSat== ​
-de Ridder at al 2006(([[http://​portal.acm.org/​citation.cfm?​doid=1216262.1216289|de Riddera, C., Kourie, D. G., and Watson, B. W. (2006). FireµSat. In Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing couuntries ​ - SAICSIT ​ ’06 (Somerset West, South Africa), pp. 247-256.]])) Is a combination of straightforward FA technology combined with a flavour of Moore machine technology. It uses Counting Finite Automata, which are regular language acceptors. The parameters details are the following: ​+de Ridder at al 2006(([[http://​portal.acm.org/​citation.cfm?​doid=1216262.1216289|de Riddera, C., Kourie, D. G., and Watson, B. W. (2006). FireµSat. In Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing couuntries ​ - SAICSIT ​ ’06 (Somerset West, South Africa), pp. 247-256.]])) ​and de Ridder at al 2013(([[http://​www.sciencedirect.com/​science/​article/​pii/​S1570866712001657|De Ridder, C., D.G. Kourie, B.W. Watson, T.R. Fourie, and P.V. Reyneke (2013). Fine-tuning the search for microsatellites. Journal of Discrete Algorithms 20: 21–37.]])), ​Is a combination of straightforward FA technology combined with a flavour of Moore machine technology. It uses Counting Finite Automata, which are regular language acceptors. The parameters details are the following: ​
   * Max Motif Error: Is the number of motif errors the user wants to allow per motif (mutations/ motif errors allowed: deletions, mismatches, insertions). ​   * Max Motif Error: Is the number of motif errors the user wants to allow per motif (mutations/ motif errors allowed: deletions, mismatches, insertions). ​
   * Max adjacent ATR elements: The number of ATREs that the user allows next to each other. ​   * Max adjacent ATR elements: The number of ATREs that the user allows next to each other. ​