基于特征自动选择方法的汉语隐喻计算
Chinese Metaphor Computation Based on Automatic Feature Selection
Abstract
汉语隐喻计算是中文信息处理中的棘手难题之一.已有的隐喻识别研究多以人工方式分析和抽取隐喻特征,存在着主观性强、难以扩充的缺点,并且对于专业背景知识要求比较严格.本文基于大规模语料库的机器学习,利用最大熵分类模型,提出了一种最优特征模板自动抽取的隐喻识别算法,讨论了3种不同层次的特征模板,既包含了经典的简单特征,又将跨多个词的远距离上下文信息,以及描述语义信息的词语相似性引入特征模板进行考察.实验结果表明,该算法提高了隐喻识别准确率,是一种对于汉语隐喻计算行之有效的机器学习方法. Chinese metaphor computation is one of difficult problems in the Chinese information processing.It is very subjective and difficult for existing research methods by manually analyzing and extraction of metaphor feature.For the purpose of analyzing the traditional rule-based methods,a new machine learning method based on large scale corpus is proposed for metaphor recognition.The proposed method uses the maximum entropy model,and three different feature patterns,which are common features,large-scale context information,and the similarity of candidate words,to describe semantic information.Experimental results show that the proposed method can improve the accuracy of the metaphor recognition,and also indicate the effectiveness of the proposed machine learning method for metaphor computation.