您当前的位置: > Detailed Browse

结合改进的CHI统计方法的TF-IDF算法优化

请选择邀稿期刊:
Abstract: The selection of feature items and the calculation of feature weights are two crucial links in the process of text classification and play a key role in the results of text classification. In order to overcome the traditional CHI statistical method, there is a negative correlation between the frequency of feature items and the category, and a probability problem that a feature item exists in a text, The traditional CHI statistical method is improved by introducing some important factors such as negative correlation judgment and frequency, and the TF-IDF algorithm is optimized by combining the calculation method of semantic similarity. The K-nearest neighbor (KNN) classifier and support vector machine (SVM) classifier are respectively used in WEKA software to classify the Weibo emotional corpus The experimental results show that the new method has obvious improvement on the accuracy of text classification.

Version History

[V1] 2018-05-24 21:08:12 ChinaXiv:201805.00488V1 Download
Download
Preview
License Information
metrics index
  •  Hits2069
  •  Downloads1228
Comment
Share