Protein secondary structure prediction using neural networks and support vector machines

Tsilo, L. C. (2009) Protein secondary structure prediction using neural networks and support vector machines. Masters thesis, Rhodes University.

[img]
Preview
Text
Tsilo-MSc-TR09-55.pdf

1201Kb

Abstract

Predicting the secondary structure of proteins is important in biochemistry because the 3D structure can be determined from the local folds that are found in secondary structures. Moreover, knowing the tertiary structure of proteins can assist in determining their functions. The objective of this thesis is to compare the performance of Neural Networks (NN) and Support Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins from their primary sequence. For each NN and SVM, we created six binary classifiers to distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient Backpropagation training with and without early stopping. We use NN with either no hidden layer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian kernel with parameter fixed at = 0.1 and varying cost parameters C in the range [0.1,5]. 10- fold cross-validation is used to obtain overall estimates for the probability of making a correct prediction. Our experiments indicate for NN and SVM that the different binary classifiers have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For SVM we show that the estimated accuracies do not depend on the value of the cost parameter. As a major result, we will demonstrate that the accuracy estimates of NN and SVM binary classifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM outperforms other predictors.

Item Type:Thesis (Masters)
Uncontrolled Keywords:Neural networks; support vector machines; protein secondary structure prediction
Subjects:Q Science > QA Mathematics > QA273 Probabilities. Mathematical statistics
Divisions:Faculty > Faculty of Commerce > Statistics
Faculty > Faculty of Science > Statistics
Supervisors:Jager, G. (Prof.)
ID Code:1675
Deposited By: Nicolene Mvinjelwa
Deposited On:18 Jun 2010 09:37
Last Modified:06 Jan 2012 16:21
539 full-text download(s) since 18 Jun 2010 09:37
349 full-text download(s) in the past 12 months
More statistics...

Repository Staff Only: item control page