Your conditions: 王家亮
  • 基于SimHash和混合相似度的多模式匹配方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-11-29 Cooperative journals: 《计算机应用研究》

    Abstract: In order to solve the problems of multiple schema matching in the process of integrating multi-source heterogeneous civil aviation passenger service data, such as low efficiency, low accuracy and the complexity of obtaining complete schema information, this paper proposed the multiple schema matching method based on SimHash and mixed similarity. Firstly, the method calculated the weight of feature units based on PMI, and generated the signature of columns by SimHash to represent attribute features to reduce feature dimension. Further, it employed K-means++ to generate candidate matching sets by clustering the columns. Finally, it constructed the mapping graph of attributes based on attributes’ mixed similarity, and displayed the matching relationship between attributes intuitively. Meanwhile, it improved efficiency of multiple schema matching. The experimental results verify the feasibility of the proposed method. The method provides a new solution for efficiently resolving the schema conflict in the process of integrating multi-source heterogeneous civil aviation passenger service data.