Parallel Computing

Parallel Computing

并行计算

  • 4区 中科院分区
  • Q2 JCR分区

高引用文章

文章名称 引用次数
Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures 8
Optimizations of the eigensolvers in the ELPA library 7
Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression 7
DVFS-aware application classification to improve GPGPUs energy efficiency 5
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs 4
Comparing load-balancing algorithms for MapReduce under Zipfian data skews 4
Proteus: Exploiting precision variability in deep neural networks 3
SAGE: Percipient Storage for Exascale Data Centric Computing 3
Manila: Using a densely populated PMC-space for power modelling within large-scale systems 3
Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors 3
Benchmarking the GPU memory at the warp level 3
Performance of asynchronous optimized Schwarz with one-sided communication 3
IR plus : Removing parallel I/O interference of MPI programs via data replication over heterogeneous storage devices 3
Exponential integrators with parallel-in-time rational approximations for the shallow-water equations on the rotating sphere 3
A distributed-memory hierarchical solver for general sparse linear systems 3
Distributed ant colony optimization based on actor model 2
PSeIInv - A distributed memory parallel algorithm for selected inversion: The non-symmetric case 2
A hybrid CPU/GPU approach for optimizing sorting throughput 2
Characterizing the performance benefit of hybrid memory system for HPC applications 2
Overcoming the No Free Lunch Theorem in Cut-off Algorithms for Fork-Join programs 2
The time and energy efficiency of modern multicore systems 2
Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations 2
Microwave tomographic imaging of cerebrovascular accidents by using high-performance computing 2
Incomplete Sparse Approximate Inverses for Parallel Preconditioning 2
PMIx: Process management for exascale environments 2
Machine Learning in Multi-Agent Systems using Associative Arrays 2
Optimized large-message broadcast for deep learning workloads: MPI, MPI plus NCCL, or NCCL2? 2
Integrating blocking and non-blocking MPI primitives with task-based programming models 2
Utility-based resource management in an oversubscribed energy-constrained heterogeneous environment executing parallel applications 2
A comparative evaluation of three volume rendering libraries for the visualization of sheared thermal convection 1
Targeting GPUs with OpenMP directives on Summit: A simple and effective Fortran experience 1
Searching for common patterns on protein sequences by means of a parallel hybrid honey-bee mating optimization algorithm 1
Client-side straggler-aware I/O scheduler for object-based parallel file systems 1
Concurrency of three-dimensional refined isogeometric analysis 1
Parallel eigenvalue computation for banded generalized eigenvalue problems 1
A time-stamping system to detect memory consistency errors in MPI one-sided applications 1
Characterizing MPI matching via trace-based simulation 1
Hybrid parallelization of a multi-tree path search algorithm: Application to highly-flexible biomolecules 1
Petascale scramjet combustion simulation on the Tianhe-2 heterogeneous supercomputer 1
Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms 1
Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations 1
GeneaLog: Fine-grained data streaming provenance in cyber-physical systems 1
Computation of the 100 quadrillionth hexadecimal digit of pi on a cluster of Intel Xeon Phi processors 1
Introducing the explicitly many-processor approach 1
Parallel accelerated vector similarity calculations for genomics applications 1
Practical, distributed, low overhead algorithms for irregular gather and scatter collectives 1
Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles 1
Data staging for efficient high throughput stream processing 1
Exploring stream parallel patterns in distributed MPI environments 1
The OpenACC data model: Preliminary study on its major challenges and implementations 1