您选择的条件: 计算机软件
  • 甘肃方言数据库建设与研究

    分类: 计算机科学 >> 计算机软件 提交时间: 2024-06-12

    摘要: 本文讨论了方言数据库的重要性,现状,以及未来的发展趋势。首先,论文介绍了方言数据库的概念,即对中国各地方言进行数字化整理、标注、分类,形成类似语料库的资源库。接着,论文分析了目前中国方言数据库的发展现状,包括方言语音库、方言文献库的建设,以及方言数字化整理和方言翻译技术的发展。然后,论文预测了方言数据库未来的发展趋势,包括大数据和云计算的应用,深度学习技术的运用,区块链技术的应用,以及新的研究方法和技术的更新。论文特别强调了甘肃方言数据库的研究和建设,包括建设方言数据库的步骤和技术要点,以及研究成果和意义。总的来说,方言数据库的研究和发展对于保护方言文化、弘扬国家语言和推进方言研究具有至关重要的意义。

  • Guiding Large Language Models to Generate Computer-Parsable Content

    分类: 计算机科学 >> 计算机软件 提交时间: 2024-04-23

    摘要: We propose a method to guide Large Language Models (LLMs) in generating structured content adhering to specific conventions without fine-tuning. By utilizing coroutine-based content generation constraints through a pre-agreed context-free grammar (CFG), LLMs are directed during decoding to produce formal language compliant outputs. This enhances stability and consistency in generating target data structures, types, or instructions, reducing application development complexities. Experimentally, error rates of GPT-2 and Gemma exceed 95% for DSLs longer than 36 and 282 tokens, respectively. We introduce YieldLang, a coroutine-based DSL generation framework, and evaluate it with LLMs on various tasks including JSON and Mermaid flowchart generation. Compared to benchmarks, our approach improves accuracy by 1.09 to 11.6 times, with LLMs requiring only about 16.5% of the samples to generate JSON effectively. This enhances usability of LLM-generated content for computer programs.

  • 引导大语言模型生成计算机可解析内容

    分类: 计算机科学 >> 计算机软件 分类: 语言学及应用语言学 >> 语言学及应用语言学 提交时间: 2024-04-21

    摘要: 此幻灯片从背景、动机、方法、效果、展望和致谢六方面讲述了《引导大语言模型生成计算机可解析内容》的研究。全文请参考:https://arxiv.org/abs/2404.05499

  • 引导大语言模型生成计算机可解析内容

    分类: 计算机科学 >> 计算机软件 分类: 语言学及应用语言学 >> 语言学及应用语言学 提交时间: 2024-04-07

    摘要: 大语言模型 (Large Language Models, LLMs) 能够从大量语料的上下文中学习到模式,其包括词语之间的关系、句子的结构甚至更复杂的语义和语用信息。然而,让预训练语言模型生成结构化、严格遵循约定的内容仍然是一项挑战。本文提出了一种引导LLMs生成计算机高可用内容的方案,无需微调和额外的神经网络推理,通过提前约定的上下文无关文法 (Context-Free Grammar, CFG) 引入基于协程的内容生成约束机制,在自回归模型Transformer的解码阶段引导模型采样正确的词元,以构成符合程序约定的形式语言。这将有效地提升LLMs生成目标数据结构、类型或指令的稳定性和一致性,降低应用开发和集成的难度。本文作者先通过“匹配括号对”实验验证了GPT-2和Gemma等模型在生成DSL长度分别大于36和282时错误率就达到了95%,说明了当前LLMs在特定DSL生成上的性能问题。本文作者还提出了基于协程的DSL生成框架YieldLang,并使用LLMs在多个任务数据集上进行了实验,包括JSON、Mermaid流图和函数调用表达式生成等任务。这些实验表明本文的方法相比基准,其准确率提升到了原来的109%到1160%,并且在最好的情况下能够将LLMs生成JSON的采样次数降低到基准的约16.5%,这将有效地提高LLMs生成内容对计算机程序的可用性。

  • 基于图神经网络的工业过程控制回路故障诊断

    分类: 计算机科学 >> 计算机软件 提交时间: 2024-01-07

    摘要: 本文基于图神经网络,提出了一种用于工业过程控制回路故障诊断的方法。通过对回路传感器输出信号的监测,图神经网络能够捕捉到回路中的异常行为,并自动诊断回路故障类型。实验结果表明,该方法能够高效地检测到回路故障,并且能够在单故障和多故障情况下都实现较高的准确率。该方法为工业过程控制提供了可靠的故障诊断方案,在实际工业应用中具有重要的意义和应用价值。

  • HS-ES-DE: HS-ES Followed by L-SHADE-EpSin for Real Parameter Single Objective Optimization

    分类: 计算机科学 >> 计算机软件 提交时间: 2022-12-07

    摘要: For real parameter single objective optimization, Differential Evolution (DE) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) both perform powerfully. Nevertheless, in the field of real parameter single objective optimization, it is impossible for a given algorithm to perform well in all fitness landscapes. Practice has proved that ensemble of different algorithms may lead to improvement in solution. In this paper, based on two famous population-based metaheuristics - LSHADE-EpSin and HS-ES, we propose ensemble with successively executed constituent algorithms - HS-ES-DE. In our algorithm, HS-ES is replaced by L-SHADE-EpSin after stagnation is detected. Beside our HS-ES-DE, 12 population-based metaheuristics are involved in our experiments in which three benchmark test suites are employed. Experimental results show that our algorithm is very competitive.

  • Zero-knowledge Based Proof-chain: A methodology for blockchain-partial system

    分类: 计算机科学 >> 计算机软件 提交时间: 2022-07-07

    摘要: Intuitively there is drastic distinction between the pure decentralized block-chain systems like Defis and those that only utilizes block-chain as an enhancing technology but remains centralized with real-world business model and conventional technologies like database, application server etc. Our study explores extensively this distinction from a methodological point of view, classifies them into blockchain-complete and blockchain-partial, analyzes key features of the two types, and reveal the root cause of this distinction. We analyze the function or, in more strong words, the ultimate purpose of blockchain in the blockchain-partial systems, and present a conceptual model we named proof-chain that quite satisfactorily represented the general paradigm of blockchain in blockchain-partial systems. A universal tension between strength of proof-chain and privacy is then revealed and the zero-knowledge based proof-chain takes shape. Several case studies demonstrate the explaining power of our proof-chain methodology. We then apply proof-chain methodology to the analysis of the ecosystem of a collaborating group of blockchain-partial systems, representing the paradigm of public and private data domain whose border the proof-chain crosses. Finally, some derived guidelines from this methodology speaks usefulness of our methodology.

  • A Creativity Survey of Distributed Database System

    分类: 计算机科学 >> 计算机软件 提交时间: 2022-03-10

    摘要: Distributed database system are widely used because of the rapid development of the Internet. With the ever-increasing demand, the boost performance and minimize resource and data contention are taken into consideration. A great distributed physical design, which determines where to place data, and which data item to replicate and partition, would help. This paper classification the development of physical design based on Michaels work and its references in research problems, research methods and measurement methods. Finally we put forward some suggestions for future research.

  • A Creativity Survey of Fully Dynamic Maximal Independent Set in Expected Poly-log Update Time

    分类: 计算机科学 >> 计算机软件 提交时间: 2022-02-24

    摘要: This paper focus on the researches of Maximal Independent Set (MIS). Based on reading and analysis of several recent papers, we divide the MIS problems into several classifications. The first is the classification based on the research objects, including the solution and maintenance of MIS; the second is the classification based on research methods, including serial, parallel, deterministic and randomized algorithms; the third is experimental analysis, including worst time complexity and expected time complexity.

  • From simple digital twin to complex digital twin Part I: A novel modeling method for multi-scale and multi-scenario digital twin

    分类: 机械工程 >> 机械工程其他学科 分类: 计算机科学 >> 计算机软件 提交时间: 2022-02-16

    摘要: 近年来,数字孪生受到了广泛关注,数字孪生也正在变得越来越复杂。目前的数字孪生案例大多集中在一个特定的场景上,面对多层次多场景的工作环境,甚至模型的交互与耦合,仍缺乏构建复杂数字孪生的方法。本文提出了一种标准化的基于模型分割和组装的复杂数字孪生模型的建模方法。首先,将数字孪生的复杂模型按照4C架构中的层次(Composition)、场景(Context) 、组件(Component)和代码(Code)划分为若干简单模型。层次和场景使数字孪生专注于特定尺度和场景中的有效元素。组件和代码用于开发简单数字孪生模型。其次,通过信息融合、多尺度关联、多场景交互,将数字孪生的简单模型组装成复杂模型。本体模型构建了不同数字孪生中实体的完整信息库。知识图谱在不同尺度的数字孪生之间架起了关系的桥梁。场景迭代实现行为交互和计算结果精度的提高。本文提供了一种可实现的方法来构建复杂的数字孪生模型,并支持组件和代码的复用促进数字孪生的快速开发。

  • 前沿探索: 使用深度强化学习进行蒙托卡洛自适应光场采样与重建

    分类: 计算机科学 >> 计算机软件 提交时间: 2022-01-01

    摘要: 在使用蒙托卡洛算法进行全局光照明绘制时,如果路径跟踪产生的样本点不足时,绘制结果会包含有大量的噪点,严重影响结果的可用性,一种解决方法是在采样过程中对路径追踪生成蒙托卡洛样本的过程加以引导,以提高最终绘制结果的质量,这是对采样进行过程进行优化的方案。本文探索并总结使用深度强化学习进行蒙托卡洛自适应光场采样与重建方向上的前沿进展。

  • Resonance Algorithm: A New Look at the Shortest Path Problem

    分类: 数学 >> 应用数学 分类: 计算机科学 >> 计算机软件 分类: 信息科学与系统科学 >> 信息与系统科学其他学科 提交时间: 2021-10-11

    摘要: The shortest path problem (SPP) is a classic problem and appears in a wide range of applications. Although a variety of algorithms already exist, new advances are still being made, mainly tuned for particular scenarios to have better performances. As a result, they become more and more technically complex and sophisticated. Here we developed a novel nature-inspired algorithm to compute all possible shortest paths between two nodes in a graph: Resonance Algorithm (RA), which is surprisingly simple and intuitive. Besides its simplicity, RA turns out to be much more time-efficient for large-scale graphs than the extended Dijkstra's algorithm (such that it gives all possible shortest paths). Moreover, RA can handle any undirected, directed, or mixed graphs, irrespective of loops, unweighted or positively-weighted edges, and can be implemented in a fully decentralized manner. These good properties ensure RA a wide range of applications.

  • RLEPSO:Reinforcement learning based Ensemble particle swarm optimizer

    分类: 计算机科学 >> 计算机软件 提交时间: 2021-06-29

    摘要: Evolution is the driving force behind the evolution of biological intelligence. Learning is the driving force behind human civilization. The combination of evolution and learning can form an entire natural world. Now, reinforcement learning has shown significant effects in many places. However, Currently, researchers in the field of optimization algorithms mainly focus on evolution strategies. And there is very little research on learning. Inspired by these ideas, this paper proposes a new particle swarm optimization algorithm Reinforcement learning based Ensemble particle swarm optimizer (RLEPSO) that combines reinforcement learning. The algorithm uses reinforcement learning for pre-training in the design phase to automatically find a more effective combination of parameters for the algorithm to run better and Complete optimization tasks faster. Besides, this algorithm integrates two robust particle swarm variants. And it sets the weight parameters for different algorithms to better adapt to the solution requirements of a variety of different optimization problems, which significantly improves the robustness of the algorithm. RLEPSO makes a certain number of sub-swarms to increase the probability of finding the global optimum and increasing the diversity of particle swarms. This proposed RLEPSO is evaluated on an optimization test functions benchmark set (CEC2013) with 28 functions and compared with other eight particle swarm optimization variants, including three state-of-the-art optimization algorithms. The results show that RLEPSO has better performance and outperforms all compared algorithms.

  • 自监督图像增强及去噪

    分类: 计算机科学 >> 计算机软件 提交时间: 2021-03-01

    摘要: This paper proposes a self-supervised low light image enhancement method based on deep learning, which can improve the image contrast and reduce noise at the same time to avoid the blur caused by pre-/post-denoising. The method contains two deep sub-networks, an Image Contrast Enhancement Network (ICE-Net) and a Re-Enhancement and Denoising Network (RED-Net). The ICE-Net takes the low light image as input and produces a contrast enhanced image. The RED-Net takes the result of ICE-Net and the low light image as input, and can re-enhance the low light image and denoise at the same time. Both of the networks can be trained with low light images only, which is achieved by a Maximum Entropy based Retinex (ME-Retinex) model and an assumption that noises are independently distributed. In the ME-Retinex model, a new constraint on the reflectance image is introduced that the maximum channel of the reflectance image conforms to the maximum channel of the low light image and its entropy should be the largest, which converts the decomposition of reflectance and illumination in Retinex model to a non-ill-conditioned problem and allows the ICE-Net to be trained with a self-supervised way. The loss functions of RED-Net are carefully formulated to separate the noises and details during training, and they are based on the idea that, if noises are independently distributed, after the processing of smoothing filters (\eg mean filter), the gradient of the noise part should be smaller than the gradient of the detail part. It can be proved qualitatively and quantitatively through experiments that the proposed method is efficient.

  • Applying Ricci flow to Manifold Learning

    分类: 计算机科学 >> 计算机软件 提交时间: 2017-04-10

    摘要: Traditional manifold learning algorithms often bear an assumption that the local neighborhood of any point on embedded manifold is roughly equal to the tangent space at that point without considering the curvature. The curvature indifferent way of manifold processing often makes traditional dimension reduction poorly neighborhood preserving. To overcome this drawback we propose a new algorithm called RF-ML to perform an operation on the manifold with help of Ricci flow before reducing the dimension of manifold.

  • pSnort:基于多核处理器的并行入侵检测系统

    分类: 计算机科学 >> 计算机软件 提交时间: 2017-03-09

    摘要: 网络入侵检测与防御系统在当前的IP 网络安全领域中扮演着重要的角色,互联网流量的激增和单核处理器在数据包处理上存在的瓶颈,使得传统的运行于单核上的单线程网络入侵检测与防御系统已经远远不能满足网络发展的需求。为了解决这个问题,本文以主流单线程网络入侵检测与防御系统软件Snort 为基础,设计了一个基于软件流水的并行入侵检测系统pSnort,将传统的Snort 划分为2 个阶段,通过将其中最耗时的处理阶段并行化,以达到提升性能的目的。同时,通过程序设计,pSnort 避免了由于并行化而带来的严重的同步/互斥问题。经过试验,pSnort在Intel Quad-core Xeon 通用平台上可以获得超过1Gbps 的包处理速度。相对于传统的Snort,pSnort 最高能获得147%的性能提升以及2.5 倍加速比。

  • 一种新型高效的算法级容错技术及实现

    分类: 计算机科学 >> 计算机软件 提交时间: 2016-06-08

    摘要: 随着高性能计算系统规模的不断扩大,节点失效愈加频发。传统的容错技术大都基于检查点(checkpoint)方式。但是,检查点技术的开销随着系统规模的扩大而不断增加,在百亿亿次(Exaflops)规模下其容错效率难以满足系统需求。算法失效恢复技术相比检查点方式具有更高的效率。然而,该技术依然基于停等模式。对于大规模系统,停等模式在很大程度上会影响程序的并行效率。本文提出了一种非停等的算法级容错策略——热替换策略。在程序运行过程中若发生节点失效,不用停等恢复失效节点上的数据,而用冗余节点替换失效节点,使计算能继续进行。最终的正确结果可以通过一个线性变换求出。为了论证方案的有效性,我们结合MPICH 的容错特性实现了容错的High Performance Linpack (HPL),并评估了方案的性能。实验结果表明,即使在小规模下,我们的方案的性能也明显优于算法失效恢复技术。

  • 流量感知的可重构路由算法

    分类: 计算机科学 >> 计算机软件 提交时间: 2016-06-08

    摘要: 在众核处理器系统中,片上网络常被用来提供高带宽、低延迟、高可靠的片上网络通信。为了减少网络拥塞、提高网络性能,流量平衡路由算法获得研究人员的广泛关注。流量平衡算法通常利用完全自适应路由算法来提供路径分集,而当前的完全自适应路由算法或者需要较多的虚通道或者假设一个保守的流控策略。一方面虚通道是比较昂贵的资源,另一方面保守的流控策略则有可能造成网络性能的下降。因此研究人员提出利用应用程序的流量信息来提升路由性能。这些算法在不使用虚通道的基础上可以针对不同的流量特性进行重构,从而实现路由自适应度的按需分配。按照使用的流量信息类型,流量感知的可重构路由算法可以分为离线和在线算法。离线算法需要事先知道程序的流量特征,因此他们大多针对应用程序定制的多核片上系统。在线算法则是根据在线收集的流量信息进行重构,因此可以用于通用处理器系统。本文将讨论最近国际上提出的两种著名的离线算法,并重点介绍本文作者在2011 年国际计算机体系结构大会(ISCA 11)上发表的基于算盘转向模型的在线可重构路由算法。

  • CPU/ATI GPU 混合体系结构上DGEMM 的性能研究

    分类: 计算机科学 >> 计算机软件 提交时间: 2016-06-08

    摘要: 本文报道了我们在CPU/ATI GPU 混合体系结构上优化双精度矩阵乘法(DGEMM)的工作。在真实应用中, CPU 与图形处理器(GPU)之间的数据传输是影响性能的关键因素。由于软件流水可以降低数据传输开销,我们提出了三种软件流水算法,分别是双缓存(Double Buffering)、数据重用(Data Reuse)和数据存储优化(Data Placement)。在AMD 公司的图形处理器(GPU)ATI HD5970 上,优化后DGEMM性能达到758 GFLOP/s,对应效率为82%,是ACML-GPU v1.1 性能的两倍。在Intel Westmere EP 和ATIHD5970 组成的异构系统上,性能达到844 GFLOP/s,效率为80%。我们进一步考察了多个CPU 和多个GPU上DGEMM 的扩展性,详细分析了体系结构方面的影响因素。分析表明,PCIe 总线和内存总线的竞争是异构系统上程序性能降低的重要影响因素。

  • 大规模众核体系结构的并行模拟

    分类: 计算机科学 >> 计算机软件 提交时间: 2016-06-08

    摘要: 随着芯片内部处理器核数的增多,多核处理器逐渐有向众核方向发展的趋势。而众核这一全新的体系结构给计算机模拟带来了挑战。串行模拟已经难以满足速度的需求,必须充分利用现有并行宿主机的多核资源,在保证不损失模拟精度的前提下提升模拟速度。本文以众核和众核集群两种体系结构为例,说明并行模拟技术在计算机并行体系结构模拟中的必要性和可行性,在众核模拟中,做到精度不变,模拟速度提升10 倍;在众核集群模拟中,所模拟的处理器小核总数达到千核规模,并实现了混合的编程运行环境,为该结构的可扩展性测试提供了基础。