Your conditions: 刘培玉
  • 基于卷积神经网络和贝叶斯分类器的句子分类模型

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

    Abstract: The traditional sentence classification model has many disadvantages such as complex feature extraction process and low classification accuracy. This paper used the advantages of the popular deep learning model based convolutional neural network in feature extraction, combined with the traditional sentence classification method, proposed a sentence classification model based on convolutional neural network and Bayesian classifier. The model first used convolutional neural network to extract text features, and secondly used principal component analysis method to reduce the dimensionality of text features. Finally, Bayesian classifier were used to classify sentences. The experimental results show that on Cornell University's public film review dataset and Stanford Sentiment Treebank dataset, the method proposed in this paper is superior to the model using only deep learning or the traditional sentence classification model.

  • 基于TextRank的自动摘要优化算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-19 Cooperative journals: 《计算机应用研究》

    Abstract: When Abstract: ng Chinese texts, the traditional TextRank algorithm only considers the similarity between nodes and neglects other important information of the text. Firstly, aiming at Chinese single document, on the basis of existing research, this paper uses TextRank algorithm, on the one hand, it considers the similarities between sentences, on the other hand, TextRank is combined with the overall structural information of texts and the contextual information of sentences, such as the physical position of the document sentences or paragraph, feature sentences, core sentences and other sentences that may increase the weight of the sentence, all are used to generate the digest candidate sentence group of the text. And then, removing high-similarity sentences by redundancy processing technology on the digest candidate sentence group. Finally, the experimental verification shows that the algorithm can improve the accuracy of the generated digest, indicating the effectiveness of the algorithm.

  • 基于互信息和邻接熵的新词发现算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-19 Cooperative journals: 《计算机应用研究》

    Abstract: How to identify new words quickly and efficiently is a very important task in natural language processing. Aiming at the problems existing in the discovery of new words, there is an algorithm for word-finding new words verbatim from left to right in the uncut word Weibo corpus. One way to get a candidate new word is by computing the candidate word and its right adjacent word mutual information to expand word by word; There are some ways to filter candidate new words to get new word sets. The included methods include calculating the branch entropy, deleting stop words contained in the first or last word of each candidate new word and deleting old words included in the candidate new word set. It solves the problem that some new words can not be recognized due to the mistakes in the word segmentation and It also solves the problem that the large number of repetitive word strings and rubbish words strings generated by the n-gram method are identified as new words. Finally, experiments verified the effectiveness of the algorithm.