A Hybrid Statistical Language Model Applied to the Domain Specific Information Retrieval
- 软件学院－会议论文 
The traditional language model takes the multi-topics document corpus as the research target. In order to avoid the interference brought by the multi-topics problem, this paper focuses on the domain specific Information Retrieval(IR). In domain specific IR, different terms are considered to take different contribution to the final query result. So the terms in a document can be divided into different categories according to their contribution degrees. And the statistical information of a term mainly its probabilities, is computed by different methods and smooth strategies according to its category. This paper proposed an improved hybrid statistical language model used in the Domain Specific IR. This new model has about 9%similar to 10% performance increments in the experimental result. In the end, some challenges and research orientation of the statistical language model research are presented.