• The Efficient Indexing and Fusion Algorithms for Large-scale Catalogs Based on File

    Subjects: Astronomy submitted time 2023-12-13 Cooperative journals: 《天文学进展》

    Abstract: Transient source searching, which aims at discovering changing objects in the sky, requires wide-field telescopes to survey the sky continuously. After sources are extracted on images, they will be cross-matched with existing large-scale catalogs to detect which object is changing. This step must be very fast, or it will slow down the data processing and cannot retrieve real-time discovery. But current referenced catalogs, e.g. SDSS, Gaia, Pan-STARRS, contain billions of objects. It is very difficult to complete the cross-matching step in seconds using traditional methods. In this paper, we propose a solution for the fast retrieval of hundreds of GB or even TB of catalogs with limited memory. Catalogs will be indexed in forms of individual files instead of database. A multi-resolution dynamic splitting algorithm based on HEALPix is introduced. It divides the catalogs into appropriate and uniform files according to the density of objects in different sky regions. A searching scheme is also designed with this algorithm. In order to improve the file reading speed, we create a medium storage mechanism to save file. The mechanism is based on Protocol Buffers, an open source component. The Peano-Hilbert curve is also applied to replace the HEALPix’s original Z#2;curve, to fast traverse the catalog. With test, it effectively improves the cache hit ratio and data fusion efficiency on large-scale catalogs. With these improvements, billions objects searching and cross-matching in a limited hardware becomes possible. Our solution will help the implementation of changing object real-time detection, and other rapid detecting projects.