Your conditions: 周鹏程
  • Importance Based Entity Ranking for News Documents

    Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] We propose an importance based method for entity ranking. Entities in a particular document show different importance. Many researches focus on documents or entities, such as text categorization and entity linking, while few research pay attention to the importance of entities in documents. This research has significant theoretical and practical value. [Method/process] Given a document which consists of words and entities, our method computes the relative importance of entities in the document, and then ranks these entities based on their importance with respect to the document. We perform experiment on the Sogou News dataset, and use evaluation metrics such as NDCG and inversed pair rate to evaluate the results. [Result/conclusion] Experimental results show that methods based on entity frequency, TF*IDF, distribution entropy and TextRank achieve better performance, while method based on cluster coefficient does not work well. In terms of NDCG, TF*IDF method reaches 95.86%, which is the best result and in terms of the inverse rate, the ensemble method reaches 84.46%, which is the best result.

  • 基于多知识库的短文本实体链接方法研究——以Wikipedia 和Freebase 为例

    Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》

    Abstract:【目的】基于多知识库进行实体链接, 解决基于单一知识库的实体链接覆盖度低的问题。【方法】首先生成文本的n-gram 并利用词性和多个指称–实体字典获取候选指称, 然后生成指称组合并保留覆盖度最大且不被其他组合包含的指称组合, 接着生成候选实体序列并利用多知识库信息计算实体序列的相关度, 最后选择相关度最大的实体序列为最终结果。【结果】以Wikipedia 和Freebase为例的实验结果表明, 基于Wikipedia+Freebase的实体链接准确率、召回率、F 值分别达到71.81%、76.86%、74.25%。【局限】基于词性过滤n-gram 缺乏理论依据, 数据集FACC1 具有高准确率和低召回率的特点。【结论】利用多个知识库的实体信息, 能够提升实体链接效果。