|M.Sc Student||Heiman Amnon|
|Subject||Protein Identification via Mass Spectrometry|
|Department||Department of Computer Science||Supervisors||Professor Dan Geiger|
|Professor Emeritus Arie Admon|
This thesis is about the identification of proteins based on their mass spectrum. Protein identification plays a major role in many research areas, such as biochemistry and medicine. Mass spectrometry is a method of representing molecule fragments as a spectrum of masses. There are a few known algorithms for identifying proteins by their spectrum. All algorithms use the same meta-principle; they compare the spectrum of the gathered data with a theoretical spectrum of a protein's digestion, from the database. A scoring function is used to mark the match between the theoretical spectrum and the experimental one. The protein with the highest score is then declared the correct one. The current work suggests a new scoring function based on a Bayesian model and new algorithms for fast and accurate identification. The new scoring function allows the consideration of protein specific information so that the protein mass and isoelectric point could be part of the scoring function. Another Bayesian model is suggested to consider systematic error of the data. This is a new approach to the ill-calibration problem. The optimal calibration is calculated for each experiment and then used as part of the scoring function. A new method for candidate extraction was suggested. The new method is as efficient as known methods but allows the consideration of modification in the candidate extraction phase, which was consider unfeasible. The results of the suggested identification scheme are compared to known identification methods, using synthetic data and real data gathered in the National Protein Center at the Biology department of the Technion.