CAS OpenIR  > 研究所(批量导入)
GMD 2.0的扩展–多GPU并行及基于GPU的MD约束算法
Alternative TitleMulti-GPU Paralleled MD and GPU-enabled Constraint Algorithms for GMD 2.0
刘忠亮
Subtype硕士
Thesis Advisor李晓霞
2013-04-01
Degree Grantor中国科学院研究生院
Degree Discipline化学工程
Keyword分子动力学 (Md)   lincs   settle   cuda   gmd 2.0
Abstract分子动力学(Molecular Dynamics, MD)方法是分子模拟的一类方法,广泛应用于生物、材料等领域。由于计算强度大,目前MD可模拟的时空尺度还远不能满足真实体系物理过程对应的时空尺度。近6年来,随着图形处理器(Graphics Processing Unit, GPU)的峰值计算能力和带宽远超过同时期的中央处理器(Central Processing Unit, CPU),利用GPU提升MD计算性能的尝试成为国际上研究的热点之一。 本论文基于作者所在的中国科学院过程工程研究所高性能计算与化学信息学课题组业已建立的单GPU加速的分子动力学程序GMD 2.0和NVIDIA C2050 GPU,实现了基于GPU的LINCS键长约束算法(GMD_LINCS)和处理刚性水分子的SETTLE约束算法(GMD_SETTLE)以扩展GMD 2.0的功能。同时,为进一步提升GMD 2.0单节点的计算性能、扩大其可模拟体系的规模,本论文初步建立了基于单节点多GPU并行的分子动力学程序MGMD。主要工作包括: (1)本论文实现了基于GPU的MD约束算法包括GMD_LINCS和GMD_SETTLE,并通过合理地组织线程和高效地使用存储器等优化策略显著提升了二者的计算性能。采用国际主流的MD模拟软件GROMACS 4.5.3版本的单核CPU程序作为计算性能测试的基准程序,对189,054粒子规模的聚丙烯腈(PAN)算例的测试结果表明,GMD_LINCS可获得约17倍的加速比;对99,678粒子规模的水算例的测试结果表明,GMD_SETTLE可获得约30倍的加速比。本论文已将GMD_LINCS和GMD_SETTLE集成到GMD 2.0程序中,以很小的额外计算代价扩展了GMD 2.0进行约束动力学模拟的功能。 (2)本论文采用POSIX thread多线程共享内存编程模型实现了单节点多GPU并行的分子动力学程序MGMD,将邻居搜索、范德华力计算、成键力计算和粒子更新等MD的核心模块全部置于GPU端计算。通过复杂的粒子和成键作用等信息的迁移处理,大幅减少了GPU与CPU之间的数据通信时间,显著提升了MGMD的计算性能。采用国际主流的MD模拟软件GROMACS程序的近期版本作为计算性能测试的基准程序,对 27万粒子规模的聚乙烯(PE)算例的测试结果表明,基于6块GPU的MGMD与GROMACS 4.5.3版本的单核CPU相比,MGMD的范德华力和成键力模块可获得约260倍的加速比,邻居搜索可获得约90倍的加速比,MGMD的整体性能可获得50倍左右的加速比。对18万粒子规模的PE算例的测试结果表明,与最新的支持多GPU加速的GROMACS 4.6.1版本的GPU程序相比,均基于2块GPU计算时,MGMD的计算性能约为GROMACS 4.6.1的2.4倍;均基于4块GPU计算时,MGMD约为GROMACS 4.6.1的1.6倍。本论文的单节点多GPU并行策略特别是粒子和成键作用等信息迁移的复杂处理策略获得了计算性能的高回报,MGMD程序获得了显著的加速效果。
Other AbstractMolecular dynamics (MD), a useful molecular simulation method, is wildly used in exploring the properties of biomolecules, materials and so on, but still quite limited in size and timescale to meet spatio-temporal scales of real system physical process. In the last six years, with the computing peak performance and memory bandwidth far exceeding CPU, the graphics processing unit (GPU) has shown its great potential and many efforts have been devoted to accelerate MD simulation with GPU. Based on GMD 2.0,a GPU-enabled molecular dynamics program from the author’s group, this paper presents two GPU-enabled constraint algorithms, the GMD_LINCS for bond vibration constraints and GMD_SETTLE for rigid water molecules. The two algorithms have been implemented and integrated into GMD 2.0 for constraint dynamics. In addition, a MD simulation program implemented with multiple GPUs on a single node (MGMD) was created for simulations of larger system size and longer time scale on desktop workstations. The recent versions of GROMACS program, a leading MD simulation software, are used as the baseline for the performance benchmark in this paper. The thesis can be summarized as the following. (1) The GPU-enabled LINCS and SETTLE constraint algorithms (GMD_LINCS and GMD_SETTLE) were implemented on a NVIDIA C2050 GPU. The computational performances of GMD_LINCS and GMD_SETTLE are significantly improved by using proper organization of threads and efficient utilization of memories. In terms of the simulation time averaged over 1000 time-steps for polyacrylonitrile (PAN) with 189,054 particles, GMD_LINCS achieves a speedup of 17 times against the GROMACS 4.5.3 on a single core CPU. The simulation results of water with 99,678 particles shows that GMD_SETTLE obtains a speedup of 30 folds over GROMACS 4.5.3 on a single core CPU. Both the GMD_LINCS and GMD_SETTLE have been integrated into GMD 2.0. (2) MGMD was implemented with POSIX threads interface to take advantage of direct access to shared memory of a node among multi-threads. The major MD computing kernels including neighbor list generation, calculation of van der Waals interactions and bonded interactions, and updating algorithms are implemented on GPUs. The time for data transfer between the GPUs and CPUs is greatly reduced and the performance of MGMD is significantly improved by means of complex migration operations of particles and bonded interactions. The performance of MGMD on 6 GPUs are benchmarked in simulation of polyethylene (PE) with 270, 000 particles against GROMACS 4.5.3 single core CPU in terms of simulation time averaged over 1000 time-steps. Of the major algorithms in MGMD, the van der Waals interactions and bonded interactions achieve the best speedups as high as 260 folds, and speedup of the neighbor list generation is up to 90. Thus the overall performance of MGMD achieves up to 50 times speedup. The overall performance of MGMD is 2.4 and 1.6 times respectively than the performance of GROMACS 4.6.1 on 2 GPUs and 4 GPUs, when benchmarked in a simulation of a PE system with 180, 000 particles.These benchmarks show that the multi-GPU parallel strategy in MGMD by combining POSIX thread with the complex migration operations of particles and bonded interactions, are paid off with significantly improved computing performance.
Pages89
Language中文
Document Type学位论文
Identifierhttp://ir.ipe.ac.cn/handle/122111/8352
Collection研究所(批量导入)
Recommended Citation
GB/T 7714
刘忠亮. GMD 2.0的扩展–多GPU并行及基于GPU的MD约束算法[D]. 中国科学院研究生院,2013.
Files in This Item:
File Name/Size DocType Version Access License
GMD 2.0的扩展–多GPU并行及基于(2245KB) 限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘忠亮]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘忠亮]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘忠亮]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.