Subjects: Other Disciplines >> Synthetic discipline submitted time 2023-03-19 Cooperative journals: 《中国科学院院刊》
Abstract: As modern scientific discoveries heavily depend on the big data management, it is an urgent task to research how to manage scientific big data efficiently. In this paper, we first introduce the application scenes and requirement of scientific big data. Then we summarize four challenges in the management of scientific big data (SPUS): Scale dynamic, Pipeline management, Unified access, and Sharing management. After that, we present the proposed scientific big data management system which consists of four components: computing & storage management, data processing management, data fusion management, and data sharing management. Moreover, we specify the key techniques in the proposed system. At last, we introduce the ongoing Big Scientific Data Management System (BigSDMS) program, which is a national key research and development program.
Subjects: Computer Science >> Computer Application Technology submitted time 2020-07-20
Abstract: "
Peer Review Status:Awaiting Review
Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-20 Cooperative journals: 《计算机应用研究》
Abstract: The exact approximation of write latency for NWR databases under various consistency levels can serve the building and operating of database clusters, by finding the optimal combination of cluster size and replication factor that minimizes the building and operating cost. Existing benchmarking or queue simulating based approaches can only give incomplete results as they are limited to specific configurations and testbeds. This paper depicted the first close-form analysis of (n, r, k) fork-join queueing process of Cassandra (a typical NWR database) write operations, based on which this paper proposed the first theoretical write latency model for NWR databases. The model is capable of giving more comprehensive latency results. Experiments validated the close-form analysis of (n, r, k) fork-join queues and the write latency model respectively on simulated queues and a Cassandra cluster.