Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the problem of analysis patent topic in terms of word which causes topics are difficult to explain in the patent topic analysis, this paper proposes a patent topic discovery model integrated with term knowledge.[Method/process]The proposed model firstly introduces the class entropy and effectively recognizes the terms in the patent literature. Then, the Generalized Pólya Urn model is used to increase the probability of the semantic similarity terms assigned to the same topic, in order to alleviate the data sparsity problem brought by the term as the basic topic model analysis unit.[Result/conclusion]The experimental results show that the proposed model contains the term information to improve the quality of the topic generation, making the topic representation more readable and topic discriminative.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Because the research that automatic selection of domain-specific stopwords in topic model of patent text is insufficient, this paper proposes a new method of automatic selection of domain-specific stopwords, for patent text topic model analysis, in order to improve the differentiation and modeling quality of the patent topic model. [Method/process] In essence, domain-specific stopwords are less important words which contain relatively less information,such words are poorly differentiated in different kinds of patent. Therefore, this paper introduced the auxiliary multi-category patent text dataset and measured the distributions of words through the category entropy. Then, according to the category entropy of words. It chose some words that have the maximum category entropy as the domain-specific stopwords. [Result/conclusion] Experimental results show the feasibility and validity of the method proposed in this paper, which can improve the differentiation and quality of topic model for patent text analysis.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the difficulties in making different pattern matching rules for different data sets and the low accuracy of Chinese patent term extraction, this paper proposes a selection method of Chinese patent candidate term based on dependency syntax parsing to improve the accuracy of Chinese patent term extraction.[Method/process] The method mainly includes three main steps:dependency syntax parsing, pruning and dependency subtree generation. Firstly, dependency syntax analysis was carried out on the Chinese patent text, from which dependency tree were obtained. Then, the dependency subtrees were generated by removing dependency relations which do not meet requirements. At last, the continuous word strings were selected as candidate terms to extract Chinese patent terms.[Result/conclusion] The experimental results show that compared with the existing related methods, the proposed method based on dependency syntax parsing can effectively improve the accuracy of Chinese patent term extraction.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] In order to help college teachers and students make full use of web recruitment information, this paper proposes a curriculum knowledge model and its automatic construction method based on large data web recruitment text mining.[Method/process] This paper proposes a three-level curriculum knowledge model including "post-curriculum-knowledge point", which uses natural language text mining technology to realize the automatic construction, and verifies the construction process through experiments.[Result/conclusion] The experimental results show that the proposed model and method are highly feasible and effective, and provide teaching and learning reference for colleges and students.