• A Hierarchical Discovery Method of Scientific Knowledge Structure

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] This paper proposes a new hierarchical discovery method of scientific knowledge structure, which provides reference for optimizing knowledge structure discovery process and improving knowledge organization form.[Method/process] Firstly, this paper constructed a hierarchical discovery method of scientific knowledge structure by using LDA topic model. Then, according to the average similarity degree among topics, it automatically determined the hierarchy of knowledge structure, and the literature subsets were intersected by filtering threshold automatically in the "document-topic" probability matrix. Finally, it adopted tree diagram to display the science knowledge structure and explore the correlation and inheritance of knowledge points. Besides, we also compared our method with HLDA method which is a hierarchical topic model.[Result/conclusion] The result shows that the knowledge structure obtained by our method is better, the representation of knowledge topic is stronger and it has the higher operation efficiency. In addition, compared with the HLDA method, our method has a great improvement on the topic differences of the single layer and the topic inheritance between layers.

  • Multi-Dimensional Subject Knowledge Network Fusion Method Based on Graph Convolution Self-Encoding Model

    Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] Aiming at the problem that the knowledge network containing a single type of knowledge unit cannot fully reflect the knowledge structure of the subject, a method of integrating knowledge network structure in different dimensions is proposed to provide a reference for the knowledge structure mining in the subject area.[Method/process] This paper used LDA and TF-IDF methods to extract subject knowledge units, and then used semantic similarity and keywords co-occurrence analysis methods to construct three subject knowledge sub-networks: topics network, keywords network and entities network, and adopted spatial nodes transfer alignment align the nodes of the sub-networks, then designed a self-encoding model based on the graph convolution operation to represent the knowledge nodes, and finally reconstructed the disciplinary knowledge network by calculating the cosine similarity.[Result/conclusion] The experimental part takes the field of artificial intelligence as an example to construct a subject knowledge network that integrates topics, keywords, and entities and conducts analysis. The experimental results show that the method proposed in this article can effectively reveal the research content and knowledge structure of the subject area, and provide a useful reference for the discovery and organizational research of subject knowledge.

  • Multi-attribute Mining Method for Technology Innovation Subject from the Perspective of Patent——The Case of Chip Patents

    Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] By combining multiple attributes, it can quickly and effectively dig out multiple technological innovation themes in the field, providing reference for the determination of technological innovation direction. [Method/process] This paper combined the LDA (Latent Dirichlet Allocation) topic model with the evaluation indicators of patent value, and proposed a quantitative method for mining patent innovation themes. First, TF-IDF, means of perplexity and quartile method were used to construct the LDA topic model of the domain patent to mine technological topics. Then, the probability distribution matrix output by LDA was combined with the evaluation indicators of patent value(claim and IPC) to construct a quantitative indicator system. Then, patents in the chip field were selected for verification experiments, quantitative indicators were calculated and visualized by heat map to identify the technological innovation themes in the field. Finally, based on the mapping relationship between patent, LDA output matrix, innovation theme and quantitative indicators, innovation patent screening and reasonable marking of technological innovation themes were carried out. [Result/conclusion] By inviting experts in the field of microelectronics and based on the latest chip technology at home and abroad to evaluate the experimental results. The scoring results show that the method of mining technology innovation topics with multiple attributes can mine multiple technology innovation topics quickly and effectively. At the practical level, it can better provide ideas for enterprises and scientists in related fields to technological innovation themes.

  • Identifing and Tracing Technological Innovation Combination Based on Deep Learning and Semantic Mining

    Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/Significance] With the rapid development of strategic emerging technology industries, how to identify technological innovation combinations with potential synergistic effect and clarify the core innovation relationships in the combination is an important prerequisite for effectively planning industrial development routes and enhancing industrial competitive advantages.[Method/Process] Guided by the theory of technology portfolio evolution, this paper based on patent data and proposed a recognition scheme of technological innovation combinations and evolution relationships, which combined algorithms such as deep learning, SAO semantic mining and CFDP. The study protocol was divided into 3 steps:The first step was to design a domain search strategy based on keywords and patent classification numbers and completed the cleaning and word segmentation of the acquired data; Then the study got the word vector semantic network of the technical topics in the domain through Word2Vec, and used the CFDP algorithm to identify potential innovation elements and combination methods; Finally, it deeply explored the core SAO structures in each portfolio, classified their evolutionary relationships through the LSTM deep learning algorithm, and explored the core innovation approach of technology, so as to effectively discover the potential technology chance in the domain.[Result/Conclusion] Taking the field of speech recognition as an example, through in-depth mining of DII patent text data in this field, the study has identified and tracked five types of potential technological innovation combinations and core innovation methods. And the study finds that the current speech recognition field, which is in the smart chip design, speech recognition algorithms, new scenarios and applications, has great potential for technological innovation in China.

  • 基于属性特征的评论文本情感极性量化分析*

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

    Abstract:【目的】从评论对象的属性特征出发解决情感极性量化问题。【方法】将在线评论文本分解构建三层评论 体系, 即评论对象–对象属性–评论描述, 从属性层级抽取属性词集和对应的评论集, 考虑评论对象属性特征的 不同影响, 引入属性因子, 并对 TFIDF 进行改进用以计算属性因子; 结合评论模式和评论语境提出基于属性特 征的评论情感量化分析算法并采用 Python 语言予以实现。【结果】相较于传统机器学习分类算法(NB、SVM)、 属性因子设置为等权重时, 本文算法在评论文本情感分类准确性方面有显著提高。【局限】评论集领域选择方面 具有局限性, 量化算法在系数设定方面存在主观性。【结论】本文算法能有效解决情感极性量化问题, 进一步提 高了情感分类准确性。

  • 一种融合外部特征的改进主题模型

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》

    Abstract: [Objective] In order to reveal the relationships between contents, topics and authors of documents, this paper presents the Dynamic Author Topic (DAT) model which extends LDA model. [Context] Extracting features from large-scale texts is an important job for informatics researchers. [Methods] Firstly, collect the NIPS conference papers as data set and make preprocessing with them. Then divide data set into parts by published time, which forms a first-order Markov-chain. Then use perplexity to ensure the number of topics. At last, use Gibbs sampling to estimate the author-topic and topic-words distributions in each time slice. [Results] The results of experiments show that the document is represented as probability distributions of topics-words and authors-topics. On the dimension of time, the revolution of authors and topics can be observed. [Conclusions] DAT model can integrate contents and extra-features efficiently and accomplish text mining.

  • 基于Hadoop 的微博舆情监控系统模型研究

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》

    Abstract:【目的】针对当前的大数据环境, 提出基于Hadoop 的微博舆情监控系统模型, 实现对海量微博信息的采集、挖掘、监控分析。【方法】分析舆情监控技术, 构建舆情监控系统模型, 改进相关算法, 利用Hadoop 搭建大数据平台, 进行仿真实验, 验证模型可用性。【结果】实验结果表明, 模型能够很好地对海量微博数据进行监控分析, 达到舆情监控的目的。【局限】Hadoop 集群规模较小; 没有对比多种聚类算法, 未得到改进算法与其他算法的优劣。【结论】该模型可以对海量微博数据进行舆情监控分析, 为决策者应对舆情危机提供科学化的信息支持。