• 基于XGBoost方法的社交网络异常用户检测技术

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-01-03 Cooperative journals: 《计算机应用研究》

    Abstract: Aiming at the problems of low recall rate and poor running efficiency caused by traditional abnormal accounts detecting algorithms in non-balanced social network datasets, the paper extracted user content, behavior, attributes, and relationship features from social network data sets, selected features using gradient-enhanced ensemble classifier XGBoost algorithm, established classification model, constructed unbalanced data sets and realized the identification of three types of spam accounts. Experimental results that the recall rate and the F1 value in identification of three types of abnormal users are improved effectively by XGBoost algorithm in binary classification and multiple classification tasks both in the balanced and unbalanced dataset in comparison with the traditional classification methods such as random forest. And with few features selected by XGBoost, the classification algorithms can get the same effect as with all features of samples, which proved the effectiveness of the method.

  • 基于用户关系的跨社交网络用户身份关联方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

    Abstract: In order to distinguish the accounts that belong to the same person, this paper proposed a method to link user identity across social networks based on user relations. Firstly, we designed a user relations feature extraction module based on network representation learning. It could embed large information networks into low-dimensional vector spaces. Secondly, we proposed CSN_LINE algorithm for heterogeneous information network. The improved algorithm could represent network combining with anchor links across networks. Finally, we constructed a user identity linkage model based on multi-layer perception . Experiments showed that the F1 rate and accuracy rate of this method increased over 12% compared with the current advanced algorithm. The validity and rationality of the method is proved.

  • 基于深度学习的中文微博作者身份识别研究

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-11-29 Cooperative journals: 《计算机应用研究》

    Abstract: Author identification has always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, the CABLSTM Chinese microblog author identification model is proposed without expert feature modeling, and the accuracy of the model is tested in the open microblog corpus. This model maximizes the extraction of short text features, fuses the Attention mechanism in the CNN and removes the pooling layer, and obtains context-related information through the bidirectional LSTM. The identity recognition result is output through the Softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and F value in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.

  • 三维重建系统下的特征点处理与位姿恢复优化算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-09-12 Cooperative journals: 《计算机应用研究》

    Abstract: How to improve the accuracy of feature point detection and matching results and to optimize the recovery results of camera pose is one of the key factors of the overall efficiency improvement of 3D reconstruction. In this paper, on the basis of the principle of SIFT algorithm, we constructed a completely new algorithm framework. The algorithm used FCN (Fully Convolutional Networks) neural network and BP (Back Propagation) neural network to comprehensively consider the semantic segmentation and image gray level co-occurrence matrix of the main target of image to achieve adaptive feature point detection range and quantity adjustment, and it used the offset stability of camera position to eliminate false matching during feature point matching. In the meantime, it optimized the pose recovery results by using graph-based optimization nonlinearity and obtained a more accurate camera pose. Finally, we compared it with the existing mainstream algorithms, and the experimental results verified the effectiveness of the proposed algorithm, the improvement of the scene-adaptive degree of feature points detection, the matching precision of feature points, the precision of posture recovery, and the better efficiency of three-dimensional reconstruction.

  • 基于WMF_LDA主题模型的文本相似度计算

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-06-19 Cooperative journals: 《计算机应用研究》

    Abstract: Text similarity calculation is a significant part with great research value in the field of NLP (Natural Language Processing) . The calculation of text similarity with LDA (Latent Dirichlet Allocation) model takes into account the semantic features, but it has the disadvantages of a large number of words, unconformity of the semantics of words, and the inability to dig and exploit the inter-domain differences inherent in texts of different categories. This paper proposes WMF_LDA topic model (Word Merging and Filtering_LDA) . This model maps domain words and synonyms, and filters the words based on POS. Finally, LDA theme is used on the processed result. Experiments show that this method greatly reduces the amount of words during modeling, reduces the time consumption of the modeling process, and improves the speed of the final text clustering. And compared with other text similarity methods, the method proposed in this paper also has a certain degree of improvement in accuracy.

  • 基于稀疏自编码特征聚类算法的图像窜改检测

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-20 Cooperative journals: 《计算机应用研究》

    Abstract: Copy-move forgery is a common type of image forgery. Block matching detection often has the problems of low accuracy and high time complexity. In order to improve the accuracy and significantly reduce the time complexity, this paper used deep learning characteristics and clustering algorithm for detecting. Firstly, it used the sparse autoencoder to find out the internal laws of the images and train the weight matrix of the hidden layer which obtained by a large number of sample sets. It obtained the hidden layer feature of the detection image by the weight matrix, that is, the sparse autoencoder feature. Secondly, it used the K-means algorithm to cluster the autoencoder features at the first time to remove the image smoothing region and to cluster the texture features to obtain the detection results. It used the Euclidean distance judgment and RANSAC(random sample consensus)algorithm to remove the abnormal blocks, in order to achieve tampering area detection. Experimental results show that the proposed algorithm can improve the accuracy by about 14.3% compared with other algorithms, and the time efficiency is improved by72%. The combination of the depth learning feature and the clustering algorithm makes the tampering of the copy-move forgery improved in both time efficiency and accuracy.

  • 一种云存储环境下的资源调度改进算法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-17 Cooperative journals: 《计算机应用研究》

    Abstract: How to store the user's massive data into the data center with the minimum time-consuming is the key issue to be considered in improving cloud storage efficiency and solving the bottleneck of its development. This paper first proved that the minimum storage time-consuming of resource scheduling scheme in cloud storage environment belongs to NPC problem. In view of the incomplete consideration of the existing scheduling algorithms and the problem that the scheduling result tends to fall into the local optimum, a new resource scheduling algorithm was proposed. The algorithm firstly used the triangular fuzzy analytic hierarchy process method to comprehensively analyze the scheduling effecting factors, the judgment matrix of storage nodes was obtained, which was used to construct the follow-up objective function of genetic algorithm, and then the simple genetic algorithm was innovated from the perspective of encoding, cross-mutation operation and self-improvement of lethal chromosome so that it is suitable for cloud storage environment. Finally, this paper analyzed and compared the Cinder block storage algorithm in OpenStack and the existing improved algorithms. The experimental results verified the effectiveness of the proposed algorithm and achieved more efficient resource scheduling.

  • 基于HBase的列存储压缩策略的选择优化

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-12 Cooperative journals: 《计算机应用研究》

    Abstract: In the era of big data, the usage of column storage database is increasing, which promoted the development of research in column-oriented storage field. In order to solve the problem of high learning cost and low compression efficiency caused by large data dispersion, small classification granularity and the defect of applied classification algorithm encountered in the compression process of the existing column-based database compression strategy, this paper designed a sorted-based hybrid compression strategy of column-based compression and sector-based compression. Firstly, we designed a method to sort the data in each column according to the characteristics of HBase to strengthen the data compaction. Secondly, according to the characteristics of the data, we applied the hybrid column-based compression strategy and the hybrid sector-based compression strategy respectively to recommend the compression algorithm . We have conducted experiments on TPC-DS standard data and the results demonstrate that the proposed strategy has excellent performance in both compression rate and compression / decompression time.