基于门循环单元神经网络的中文分词法
A Gated Recurrent Unit Neural Network for Chinese Word Segmentation
Abstract
目前,学术界主流的中文分词法是基于字符序列标注的传统机器学习方法,该方法存在需要人工定义特征、特征稀疏等问题.随着深度学习的研究和应用的兴起,研; 究者提出了将长短时记忆(long short-term; memory,LSTM)神经网络应用于中文分词任务的方法,该方法可以自动学习特征,并有效建模长距离依赖信息,但是该模型较为复杂,存在模型训练和预; 测时间长的缺陷.针对该问题,提出了基于门循环单元(gated recurrent unit,GRU)神经网络的中文分词法,该方法继承了; LSTM模型可自动学习特征、能有效建立长距离依赖信息的优点,具有与基于LSTM神经网络中文分词法相当的性能,并在速度上有显著提升. Currently,the common method for Chinese word segmentation is traditional; machine learning on character-based sequence labeling. However, this; method faces disadvantages such as manual feature engineering and sparse; features. With the increasing research and application of deep learning,; researchers have proposed a method by applying long short-term memory; (LSTM) to Chinese word segmentation task. This method is capable of; learning features automatically and capturing long-distance dependence; as well. However, this method is complicated, and has defects in speed.; Therefore, we propose a gated recurrent unit (GRU) neural network for; Chinese word segmentation, which are also associated with advantages of; learning features automatically and the ability of capturing; long-distance dependence.Finally, our method performs comparably well as; the LSTM neural network for Chinese word segmentation,and exhibits a; great improvement in training and predicting speeds.