• Research on Gender Prediction of Chinese Social Media Users ——Taking Sina Weibo Short Text Content as an Example

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-10-08 Cooperative journals: 《知识管理论坛》

    Abstract: [Purpose/significance] Different from the rapid development of the Internet, the development of personal information security protection is relatively lagging. By predicting the gender of social media users, it can better provide privacy protection for the users. [Method/process] The short texts posted by users in social media, Sina Weibo, were taken as the research object. The experiment extracted linguistic features and topic features from the short texts. For each user, we constructed features vector based on linguistic features, topic features, and the superposition of two features, then used SVM Machine learning algorithms built a classifier for gender prediction. [Result/conclusion] Experiments show that the linguistic features and topic features can predict the gender of the users accurately, and the effect is superior to other features used in gender prediction.

  • Research Method and Application of Hidden Themes Influencing the Interactive Effect of Movie Microblog

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-10-08 Cooperative journals: 《知识管理论坛》

    Abstract: [Purpose/significance] Exploring the hidden themes that affect the interactive effect of movie microblogging can explore the hot issues of users’ attention and provide effective marketing strategies for enterprises. [Method/process] This paper crawled the popular microblog of 123 movies released in 2017 from Sina Weibo, used the topic modeling method to mine the hidden themes in the movie microblog text, and used the regression method to analyze the impact of hidden themes on the interactive effect of movie microblogging. [Results/conclusions] It turns out that there are 6 interpretable themes: movie characters, movie promotion, interactive marketing, movie content, movie evaluation and offline activities, of which 4 themes of movie promotion, interactive marketing, movie content and movie evaluation have a positive impact on the interactive effect of movie Weibo; at the same time, it is found that the number of user fans and the popularity of topic discussion positively affect the interactive effect of movie Weibo.

  • Influence Factors for Consumers to Provide Online Reviews on O2O Platform: An Empirical Study

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-10-08 Cooperative journals: 《知识管理论坛》

    Abstract: [Purpose/significance] With the development of O2O e-commerce model, online review, as a part of its successful operation, becomes increasingly important. How to motivate consumers to provide highquality online evaluation has gradually become a key topic affecting the success of O2O model. [Method/ process] Based on the theory of social exchange and the public good, this paper constructed a theoretical model of the factors influencing the online evaluation of participation intention of O2O platform. Then, it collected 386 valid questionnaires and verified the relevant assumptions in the structural equation model with AMOS17.0. [Result/conclusion] The results of empirical analysis show that helping others, selfimprovement, sense of belonging and moral responsibility have significant positive impact on intention to provide online review in O2O platform. Helping enterprises and economic reward don’t exhibit significant relationship with intention to participate in the online review system of O2O platform. The execution cost has no significant negative impact on the online evaluation willingness of O2O platform. The results of the analysis will help to deepen the understanding of O2O platform consumers’ online review behavior, and provide theoretical reference for enterprises to encourage consumers to participate in online evaluation and improve the service of platform.

  • User Profiling Based on the Behaviour and Content Combined Model

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] To identify and remove online reviews from irrational investors, enhance the professional degree and quality of comments, and to promote rational investment, this article takes identifying whether the users on the Guba website belong to the noise investors as an example, and carries out a user profiling study.[Method/process] Deep user representation learning method was used to learn text information such as users'posts, then a behavior and content combined model was proposed with respect to behavior characteristics such as fans number, influence, bar age, post number and so on, and an empirical and comparative study was done on the annotated data set.[Result/conclusion] Experiment result showed that the BCCM model got the F1 score of 79.47%, which is superior to Decision Tree model(69.90%), SVM model(75.61%), KNN model(73.21%) and ANN model(74.83%). In the specific user profiling task of identifying noise traders, by using deep user representation learning method to obtain text content characteristics, the various evaluation metrics of use profiling can be remarkably improved.

  • A Cross-domain Text Sentiment Analysis Based on Deep Recurrent Neural Network

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] In order to solve the problem of classification model in target domain that caused by the lack of data, this study firstly trains the model of source domain that includes rich labeling/tagging data, and then, projects source and target domain documents into the same feature space. [Method/process] The reviews of three product categories, i.e. books, DVD and music, from Amazon, which are written in Chinese, are taken as the experimental data, and the cross-domain text sentiment analysis is considered as the research task. A novel model, i.e. the Cross Domain Deep Recurrent Neural Network (CD-DRNN), is proposed to achieve knowledge transfer among domains. The average accuracy value of CD-DRNN achieves 81.70%,which excels the values of Stacked Long Short Term Memory (79.90%), Bidirectional Long Short Term Memory(80.50%), Convolution Neural Network with Long Short Term Memory (74.70%) and Merged Convolution Neural Network with Long Short Term Memory (80.90%). [Result/conclusion] Knowledge transfer in source domain and target domain could effectively solve the difficulties of achieving good classification performances on small data sets. The proposed method can be leveraged to effectively select features from unlabeled data, thereby greatly reducing the workload related to data annotation in the target domain.

  • Research of Abstractive Chinese Text Summarization Based on Seq2seq Model

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] To deal with the Out Of Vocabulary (OOV) in text summarization while avoiding duplication of summaries, this article focuses on solving the OOV problem and the self-duplication and carries out a profiling study.[Method/process] Bases on the sequence-to-sequence model, a pointer generator module and a coverage processing module are added. An attempt is made to copy the OOV into abstractive summary to solve the problem of OOV by means of the pointer generator module. The coverage processing module tries to avoid the Attention Mechanism paying attention to the same position repeatedly to solve the duplicate problem. The model is applied to the Chinese summarization dataset LCSTS to conduct experiments to test the effectiveness.[Result/conclusion] Experiment results show that the ROUGE of the generated summary is much higher than that of seq2seq model and extractive model, indicating that in the abstractive Chinese text summary, the pointer generator module and the coverage mechanism module can effectively solve the problem of OOV and the repetition of the summary, thereby significantly improving text summary quality.

  • Research on Scale Adaptation of Text Sentiment Analysis Algorithm in Big Data Environment: Using Twitter as Data Source

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] This paper aims to study the scale adaptation problem for the purpose of textual sentiment analysis in big data environment. The paper provides reference for the best choice between efficiency and cost when researchers in the field of information science carry out data analysis under big data environment. [Method/process] We use the Sentiment140 dataset of Stanford University. Based on the analysis of traditional sentiment analysis algorithms, we propose five textual sentiment analysis algorithms for big data to test the adaptation effectiveness of various algorithms under different environments and data sizes, and conduct empirical comparisons in terms of accuracy, scalability and efficiency. [Result/conclusion] The experimental results show that the cluster built in this paper has good operational efficiency, correctness, and scalability. Spark clusters have more efficiency advantages in processing large-scale text sentiment analysis data, and with increasing the data size, its efficiency advantage is more obvious. In resource utilization, as the number of nodes and cores increase, the overall operating efficiency of the cluster changes significantly. We find the configuration of five slave nodes with 4 cores and 4G memory can achieve the effect of saving resource costs while efficiently completing the classification task.

  • Weibo Rumor Identification in Public Health Emergencies

    Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] In public health emergencies such as the COVID-19 epidemic, a large number of statements about the epidemic have quickly been generated on social media on the Internet, including many rumors that endanger public mental health and affect the implementation of national policies. Detecting these remarks and identifying the rumors can enable the people to respond to public health emergencies correctly, and play a positive role in maintaining social stability and network governance.[Method/process] Firstly, the confirmed rumors during the epidemic were collected for in-depth analysis, and the main features of the rumor text were extracted, including context features, topic category features, sentiment level features, keyword features, etc.; then aiming at the problem that the text feature expression in the text classification model was relatively single, different models were used to vectorize the extracted rumor text features, and then a rumor recognition model based on multi-feature fusion was constructed. In the construction of this model, TF-IDF was used to strengthen the word vector, so that the word vector can merge the keyword feature information of the word granularity while capturing the context feature. Finally, this paper used the BiLSTM+DNN model to classify the fused feature vectors.[Result/conclusion] The experimental results show that features such as topic category and emotional level all contribute to the recognition of rumors, especially the fusion of the strengthened word vector and other features to significantly improve the recognition accuracy, recall rate, F1 measure, etc. The indicators all reached more than 90%, and the effect surpassed other rumor recognition models, indicating that the method constructed in this article can respond well to the task of rumor recognition in the context of public health emergencies.

  • 基于多特征融合的金融领域科研合作推荐 研究*

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

    Abstract: 【目的】科研合作关系是一种重要的社会网络。为了促进科研合作, 提高科研生产率, 对金融领域的科研 合作推荐模型进行研究。【方法】建立金融领域个人、机构和区域三个层面的科研合作网络, 提出一种新的融合 基于邻居节点和基于路径的网络特征的科研合作推荐模型, 并从个人、机构和区域三个层面进行实证检验。【结 果】通过对 2000 年到 2014 年刊载的 68 905 篇金融领域的文章进行分析并构建科研合作网络, 在个人、机构和 区域三个层面上, 基于特征融合的链接预测方法的 AUC 值分别为 84.25%、87.34%和 91.84%, 均高于基于邻居 节点的算法和基于路径的算法的 AUC 值。【局限】在进行训练集和测试集选取的时候只按时间进行切分, 有待 使用更多的切分方式对实验结果进行优化。【结论】本文有助于金融科研领域的个人、机构和区域寻求合作对象, 为进行科研网络的研究以及科研合作推荐的学者提供新的研究方法和思路。

  • 基于深度表示学习的跨领域情感分析

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

    Abstract:【目的】通过在标注资源丰富的源领域中学习, 并将目标领域的文档投影到与源领域相同的特征空间中去, 从而解决目标领域因数据量较小难以获得好的分类模型的问题。【方法】选择亚马逊在线购物网站在书籍、DVD 和音乐类目下的中文、英文和日文评论作为实验数据, 在卷积神经网络和结构对应学习的基础上提出跨领域深 度表示模型(CDDRM), 以实现不同领域环境下的知识迁移, 并将其应用到跨领域情感分析任务之中。【结果】实 验结果表明, CDDRM 在跨领域环境下最优的 F 值达到 0.7368, 证明了该模型的有效性。【局限】CDDRM 针对长 文本的跨领域情感分类 F 值仍然有待提升。【结论】知识迁移能够解决监督学习在小数据集上难以获得好的分类 效果的问题, 与传统监督学习的基本假设相比, 它并不要求训练集和测试集服从相同或相似的数据分布。

  • 基于多特征融合的金融领域科研合作推荐 研究*

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-30 Cooperative journals: 《数据分析与知识发现》

    Abstract: 【目的】科研合作关系是一种重要的社会网络。为了促进科研合作, 提高科研生产率, 对金融领域的科研 合作推荐模型进行研究。【方法】建立金融领域个人、机构和区域三个层面的科研合作网络, 提出一种新的融合 基于邻居节点和基于路径的网络特征的科研合作推荐模型, 并从个人、机构和区域三个层面进行实证检验。【结 果】通过对 2000 年到 2014 年刊载的 68 905 篇金融领域的文章进行分析并构建科研合作网络, 在个人、机构和 区域三个层面上, 基于特征融合的链接预测方法的 AUC 值分别为 84.25%、87.34%和 91.84%, 均高于基于邻居 节点的算法和基于路径的算法的 AUC 值。【局限】在进行训练集和测试集选取的时候只按时间进行切分, 有待 使用更多的切分方式对实验结果进行优化。【结论】本文有助于金融科研领域的个人、机构和区域寻求合作对象, 为进行科研网络的研究以及科研合作推荐的学者提供新的研究方法和思路。

  • 基于深度表示学习的跨领域情感分析

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-30 Cooperative journals: 《数据分析与知识发现》

    Abstract:【目的】通过在标注资源丰富的源领域中学习, 并将目标领域的文档投影到与源领域相同的特征空间中去, 从而解决目标领域因数据量较小难以获得好的分类模型的问题。【方法】选择亚马逊在线购物网站在书籍、DVD 和音乐类目下的中文、英文和日文评论作为实验数据, 在卷积神经网络和结构对应学习的基础上提出跨领域深 度表示模型(CDDRM), 以实现不同领域环境下的知识迁移, 并将其应用到跨领域情感分析任务之中。【结果】实 验结果表明, CDDRM 在跨领域环境下最优的 F 值达到 0.7368, 证明了该模型的有效性。【局限】CDDRM 针对长 文本的跨领域情感分类 F 值仍然有待提升。【结论】知识迁移能够解决监督学习在小数据集上难以获得好的分类 效果的问题, 与传统监督学习的基本假设相比, 它并不要求训练集和测试集服从相同或相似的数据分布。