• 基于核密度估计的基本概率指派生成方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-05-10 Cooperative journals: 《计算机应用研究》

    Abstract: D-S evidence theory is a method that processes uncertain information effectively and is widely used in information fusion. However, the determination of BPA (Basic Probability Assign) or the action object of D-S fusion method is still an open problem in process of D-S theory application. This paper proposed a BPA determination method based on kernel density estimation(kde) . The method uses training data to construct a data attribute model with optimized bandwidth based on the optimized kernel density estimation; then calculate the density-distance-distribution (Tri-D) value of test data by using the kernel density model of training data. The next step is obtaining BPA of test data by using the nested method to assign Tri-D. Finally, fusing BPA by D-S method to get the final result, and judging the validity of the BPA generation method by the classification accuracy rate. An illustrative case regarding the classification accuracy compared with other methods on UCI data sets shows the effectiveness of the method.

  • 基于印象空间的互联网广告效果评价

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-04-01 Cooperative journals: 《计算机应用研究》

    Abstract: Internet advertising effectiveness evaluation is the core issue of online marketing. At present, the evaluation criteria of Internet advertising effectiveness are different. However, the evaluation methods have problems such as single source of information, no difference in falsehood, and global assumptions, which poses great challenges to the evaluation of Internet advertising effectiveness. Finding a new evaluation index to measure the effectiveness of Internet advertising has become an urgent task. This paper first proposes the concept of impression space innovatively as a more effective evaluation index of webpage advertising effects to solve the single problem of information source. Secondly, this paper analyzes the impact of user types, behaviors, behavioral processes and other characteristics on the evaluation criteria of Internet advertising effectiveness, and eliminates the evaluation bias caused by the user's indifference hypothesis. Thirdly, this paper introduces the local characteristics of web pages, and analyzes the influence of factors such as page layout, advertisement and page content relevance on Internet advertising effects to eliminate global assumptions. Finally, this paper constructs an impression space model based on multimodal features to predict the effectiveness of Internet advertising. The experimental results show that the accuracy rate of the impression space proposed by this paper is significantly improved to 92.4%. Moreover, the prediction results of the impression space model are not only more accurate and scientific, but also have obvious interpretability.

  • 融合协同过滤的XGBoost推荐算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-04-01 Cooperative journals: 《计算机应用研究》

    Abstract: In the recommendation system, this paper proposed an XGBoost recommendation algorithm to integrate collaborative filtering based on the cold-start problem of users. Firstly, it uses coarse grain to recall according to the collaborative filtering recommendation algorithm based on user similarity, and get a recall set of some users. Then using XGBoost algorithm to predict the items in the recall set. Secondly, for users with cold-start problems, it can directly use XGBoost algorithm to predict the items in the candidate set. Finally, the algorithm uses the online evaluation data set of CCIR2018 personalized recommendation evaluation, and puts the recommendation results on the online platform provided by Zhihu for evaluation. The evaluation results show that the algorithm in this paper can solve the cold-start problem of users with high efficiency and accuracy. It has achieved remarkable recommendation effect in the online evaluation platform and gets the third prize.

  • NLOF:基于网格过滤的两阶段离群点检测算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-01-28 Cooperative journals: 《计算机应用研究》

    Abstract: The purpose of outlier detection is to effectively identify anomalous data in the dataset and to mine meaningful potential information in the data set. The existing outlier detection algorithm does not process the original data, resulting in too high computational time complexity and unsatisfactory detection results. This paper proposes a two-stage outlier detection algorithm NLOF based on grid filtering: First use grid filtering to initially screen the original data, put data with a density less than a certain threshold into a candidate exception subset; then in order to further optimize the density-based algorithm, based on the k-neighborhood, according to the ratio of the number of data points in the neighborhood to the area of the circle formed by the neighborhood, as the basis for calculating the data point density, outlier detection to obtain a more accurate outlier set. Experiments have been carried out on a variety of public datasets. Experiments show that this method can achieve good performance in anomaly detection and reduce the time complexity of the algorithm.

  • 基于CRT机制混合神经网络的特定目标情感分析

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

    Abstract: The purpose of target-specific affective analysis is to predict the sentiment of a text from the perspective of different target words. The key is to assign appropriate affective words to a given target. When there are more than one affective word describing multiple target sentiments in a sentence, it may lead to the mismatch between the affective word and the target. In this paper, a hybrid neural network based on CRT mechanism is proposed for target-specific sentiment analysis. The model uses CNN layer to extract features from the word representation after BiLSTM transformation. The specific target representation of the word is generated by CRT component and the original context information from BiLSTM layer is saved. Experiments on three open datasets show that the proposed model can significantly improve the accuracy and stability of target-specific affective analysis tasks compared with previous models. It is proved that the CRT mechanism in this paper can integrate the advantages of CNN and LSTM well, which is of great significance to the task of sentiment analysis for specific targets.

  • 基于卷积神经网络和贝叶斯分类器的句子分类模型

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

    Abstract: The traditional sentence classification model has many disadvantages such as complex feature extraction process and low classification accuracy. This paper used the advantages of the popular deep learning model based convolutional neural network in feature extraction, combined with the traditional sentence classification method, proposed a sentence classification model based on convolutional neural network and Bayesian classifier. The model first used convolutional neural network to extract text features, and secondly used principal component analysis method to reduce the dimensionality of text features. Finally, Bayesian classifier were used to classify sentences. The experimental results show that on Cornell University's public film review dataset and Stanford Sentiment Treebank dataset, the method proposed in this paper is superior to the model using only deep learning or the traditional sentence classification model.

  • 联合特征选择和潜在子空间回归的跨媒体检索

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-08-13 Cooperative journals: 《计算机应用研究》

    Abstract: Cross-modal retrieval has recently drawn much attention due to the widespread existence of multi-modality data, generally involves two basic problems: the measure of relevance and coupled feature selection. However, most of the current methods only focus on solving the first problem: To mapping multi-modality data into a common subspace, in which the similarity between different modalities of data can be measured. The 21-norm penalties are imposed on the projection matrices separately to solve the second problem, which selects relevant and discriminative features from different feature spaces. Then this paper adopt the spectral regression method to learn the optimal latent space shared by data of all modalities based on the orthogonal constraints. And this paper construct a graph model to project the multi-modality data into the latent space, which preserves the intra-modality similarity relationships. The paper conduct extensive experiments on two datasets. The experimental results of cross-modal retrieval show the method is effective.

  • COPD多维特征提取与集成诊断方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-06-19 Cooperative journals: 《计算机应用研究》

    Abstract: Chronic obstructive pulmonary disease (COPD) is a chronic lung disease that can lead to a gradual decline in respiratory function. Therefore, big data analysis and algorithms are needed to help doctors diagnose diseases more accurately. At present, there are limitations to the study of COPD: On the one hand, the research results only use data to analyze the impact of single features on the disease; on the other hand, the research results are only verified by simple algorithm models for case data. Therefore, this paper proposes a COPD multi-dimensional feature extraction and integrated diagnosis method. First, the MDF-RS algorithm is proposed to extract the optimal combination of multi-dimensional features. Secondly, the DSA-SVM integrated model is proposed to construct the classifier for diagnosis and prediction. Finally, the cross-validation method is used to verify the accuracy and other performance indicators. The experimental comparison shows the effectiveness of the proposed algorithm.

  • 基于字典学习的跨媒体检索技术

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-02 Cooperative journals: 《计算机应用研究》

    Abstract: In the study of cross-media retrieval, how to capture and correlate heterogeneous features originating from different modalities remains a challenge. To cope with the aforementioned problems, this paper presented a novel cross-modal retrieval framework based on coupled dictionary learning. Firstly, it obtained sparse coefficients from different modalities by imposing dictionary learning. Then, it projected the data samples from different modalities into a common feature space. Moreover, it leveraged label information to align the cross-modal data sample pairs in the common space so as to encourage the inherent correlation across the different modalities. Simulation experimental results show that the method based on dictionary learning algorithm has superior recognition performance in comparison with the methods based on traditional mid-level feature subspace, experiment results on two public datasets demonstrate that our method outperforms several state-of-the-art methods.

  • 基于词语相关性的对话系统话题分割

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-02 Cooperative journals: 《计算机应用研究》

    Abstract: In view of the problems of topic transfer and the existence of a large number of short text in the dialogue content in open domain dialogue systems, the traditional similarity-based processing method has many limitations. This paper proposeed an innovative method, which is based on the relevance of the sentences to determine whether the dialogue topic transfer, and compares the difference between the correlation-based and the similarity-based methods in revealing the relationship between sentences. Furthermore, this paper presents a correlation-based algorithm to calculate the correlation of words and apply it to segment topics of sentences, and this can address some challenges of topic transfer detection. Comparing with existing methods, the experimental results demonstrate the superior performance of the correlation-based method in this paper.

  • 基于TextRank的自动摘要优化算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-19 Cooperative journals: 《计算机应用研究》

    Abstract: When Abstract: ng Chinese texts, the traditional TextRank algorithm only considers the similarity between nodes and neglects other important information of the text. Firstly, aiming at Chinese single document, on the basis of existing research, this paper uses TextRank algorithm, on the one hand, it considers the similarities between sentences, on the other hand, TextRank is combined with the overall structural information of texts and the contextual information of sentences, such as the physical position of the document sentences or paragraph, feature sentences, core sentences and other sentences that may increase the weight of the sentence, all are used to generate the digest candidate sentence group of the text. And then, removing high-similarity sentences by redundancy processing technology on the digest candidate sentence group. Finally, the experimental verification shows that the algorithm can improve the accuracy of the generated digest, indicating the effectiveness of the algorithm.

  • 基于互信息和邻接熵的新词发现算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-19 Cooperative journals: 《计算机应用研究》

    Abstract: How to identify new words quickly and efficiently is a very important task in natural language processing. Aiming at the problems existing in the discovery of new words, there is an algorithm for word-finding new words verbatim from left to right in the uncut word Weibo corpus. One way to get a candidate new word is by computing the candidate word and its right adjacent word mutual information to expand word by word; There are some ways to filter candidate new words to get new word sets. The included methods include calculating the branch entropy, deleting stop words contained in the first or last word of each candidate new word and deleting old words included in the candidate new word set. It solves the problem that some new words can not be recognized due to the mistakes in the word segmentation and It also solves the problem that the large number of repetitive word strings and rubbish words strings generated by the n-gram method are identified as new words. Finally, experiments verified the effectiveness of the algorithm.