Your conditions: 严建峰
  • 基于语义分布相似度的主题模型

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-10-11 Cooperative journals: 《计算机应用研究》

    Abstract: The latent Dirichlet allocation (LDA) is a popular three-layer Bayesian probability model that implements clustering of words in text and text at the topic level. LDA is based on the bag-of-words, which simplifies the complexity of modeling, but makes the semantic coherence of topics poor, and text representation ability is not strong. To solve this problem, this paper came up with the semantic distribution similarity based topic model. This model uses GPU (generalized P髄ya urn) model to add word-word and document-topic semantic distribution similarity to guide topic modeling under the framework of EM (Expectation Maximization) algorithm, which weakened the effect of bag-of-words hypothesis on topics from the semantic association level. Experiments on four public datasets show that the semantic distribution similarity based topic model is superior to the currently popular topic modeling algorithms in terms of topic semantic coherence and text classification accuracy, and the model improves the convergence speed and topic accuracy.

  • 多粒度时序特征在离网预测中的应用

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-02 Cooperative journals: 《计算机应用研究》

    Abstract: Telecom operators have developed multiple churn prediction models to find potential users for different scenes. The present churn prediction models firstly select a kind of time granularity to extract features, then model the extracted data using machine learning algorithm. Such approaches only consider the influence of the model on classification performance, but the role of data is not fully considered. To solve this problem, this paper proposed a method which extracts multi-grain temporal features, and try to integrate different granularity features at different training phases. Experimental results show that the performance of the model trained with multi-grain features is obviously superior than that trained with single granularity features.

  • 基于LDA主题模型的用户电信轨迹恢复算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-24 Cooperative journals: 《计算机应用研究》

    Abstract: With the development of mobile communication technology and the popularization of mobile devices, the daily track record data become rich. Massive track data hides valuable knowledge about person and human society. In order to make the knowledge model generated based on the trajectory data more accurate and effective to serve the users, it is particularly important to be able to recover the missing telco trajectories accurately and reliably. Currently, most of the methods mainly focus on modeling continuous trajectories such as GPS trajectories, but lack of researches on the restoration of telco trajectories generated in mobile communication scenarios. Therefore, it have transformed the problem of telecommunication trajectory recovery into a matrix completion problem, and proposed a recovery algorithm based on the LDA topic model. In the experiment, it make a comprehensive comparison with the traditional matrix completion algorithm and observe the effect of different parameters on trajectory recovery. The experimental results show that compared with the traditional matrix completion algorithm, the LDA topic model can significantly improve the recovery accuracy of missing telco tracks.