IBC > Journal Article
Journal Article Synopsis
IBC 2012, vol. 4, article no. 14, pp. 1-6 | doi: 10.4051/ibc.2012.4.4.0014
view 8462 | download 3048 | rating 3.0 | comment 0
Review (Bioinformatics/Computational biology/Molecular modeling, Omics (Physiomics/metabolomics/proteomics/genomics) )
Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction
A.T.M. Golam Bari1, Mst. Rokeya Reaz1, Ho-Jin Choi2 and Byeong-Soo Jeong1,*
Department of Computer Engineering, Kyung Hee University, Suwon, Korea
Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
*Corresponding author
received: December 12, 2012 ; revised: December 30, 2012 ; accepted: December 31, 2012 ; published : December 31, 2012
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Keywords : coding sequence, exon-intron boundary, intron-exon boundary, splice site, support vector machine, translation process
This article is a part of the special issue: Translational Bioinformatics Conference 2012(TBC 2012)
Post-publication appraisal
Rate this manuscript
        Exceptional Highly recommended Recommended Fair Current rating: 3.0
Open discussion                       
(Open discussion is for 90 days after the initial publication)
:: Comments
Main text PDF(486. KB)
(Print version)
Send to a friend
Reviewed by
Edited by
- Keun Woo Lee
Author's Commentary
Export Citation
  StumbleUpon Facebook Connotea CiteULike twitter
- A.T.M. Golam Bari
- Byeong-Soo Jeong
Google Scholar
- A.T.M. Golam Bari
- Byeong-Soo Jeong
Interdisciplinary Bio Central (IBC) ISSN : 2005-8543 | Contact IBC//
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution License.