Auto-identifying terms based on a place-extending method
- 人文学院－会议论文 
The normalized relative frequency ratio is used as the domain differential degree to estimate the domain feature of a string; the sequence correlation coefficient is used to judge the stability of a string. The identifying process takes two steps. 1)Get term seeds. Extract adjacent character pairs from the domain corpus and the general corpus respectively. Then obtain term seeds by sifting the adjacency pairs with the domain differential degree, mutual information and the taboo character list jointly; 2) Gain terms. With strategy of verbatim extending, take the term seeds as anchor points. Then extend each seeds to its both sides verbatim. Leach every spread character with the sequence correlation coefficients, exceptional-correct rules and the taboo word list in turn. Take the terms with the character, "?", as an example. The test showed that the precision and the recall rate of the algorithm reached 86.73% and 85.91%, respectively. ? 2011 IEEE.