Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-08-13 Cooperative journals: 《计算机应用研究》
Abstract: Text vectorization is the basis of text classification. Feature weighting is one of the important factors that directly affect the quality of text vector representation. Feature weighting schemes based on category information is not accurate enough to express the relationship between features and categories. That is the classification ability of the features with the same category frequency can’t be compared, so the distribution of the features in the category should be considered. This paper combines the inverse category frequency (ICF) and inner category entropy of the features into the term weight calculation, and constructs two supervised feature weighting schemes. The experimental results on the Uygur text categorization dataset showed that this method can obviously improve the spatial distribution of the samples and improve the micro average F1 value of the Uygur text classification.