A Linear Regression Predictor for Identifying N 6 -Methyladenosine Sites Using Frequent Gapped K-mer Pattern
- 信息学院－已发表论文 
N6-methyladenosine (m 6 A) is one of the most common and abundant modifications in RNA, which is related to many biological processes in humans. Abnormal RNA modifications are often associated with a series of diseases, including tumors, neurogenic diseases, and embryonic retardation. Therefore, identifying m 6 A sites is of paramount importance in the post-genomic age. Although many lab-based methods have been proposed to annotate m 6 A sites, they are time consuming and cost ineffective. In view of the drawbacks of the intrinsic methods in RNA sequence recognition, computational methods are suggested as a supplement to identify m 6 A sites. In this study, we develop a novel feature extraction algorithm based on the frequent gapped k-mer pattern (FGKP) and apply the linear regression to construct the prediction model. The new predictor is used to identify m 6 A sites in the Saccharomyces cerevisiae database. It has been shown by the 10-fold cross-validation that the performance is better than that of recent methods. Comparative results indicate that our model has great potential to become a useful and effective tool for genome analysis and gain more insights for locating m 6 A sites.