Instructions to Configure and Run Quantum ESPRESSO Benchmarks


1 - Brief Description of Quantum ESPRESSO

Quantum ESPRESSO (http://www.quantum-espresso.org) is an integrated suite of
computer codes for electronic-structure calculations and materials modelling,
based on density-functional theory, plane waves, and pseudopotentials
(norm-conserving, ultrasoft, and projector-augmented wave). Quantum ESPRESSO
stands for opEn Source Package for Research in Electronic Structure,
Simulation, and Optimization. It is freely available to researchers around the
world under the terms of the GNU General Public License. Quantum ESPRESSO
builds upon newly restructured electronic-structure codes that have been
developed and tested by some of the original authors of novel
electronic-structure algorithms and applied in the last twenty years by some
of the leading materials modelling groups worldwide. Innovation and efficiency
are still its main focus, with special attention paid to massively parallel
architectures, and a great effort being devoted to user friendliness. Quantum
ESPRESSO is evolving towards a istribution of independent and inter-operable
codes in the spirit of an open-source project, where researchers active in the
field of electronic-structure calculations are encouraged to participate in
the project by contributing their own codes or by implementing their own ideas
into existing codes. Quantum ESPRESSO is written mostly in Fortran90, and
parallelised using MPI and OpenMP.


2 - Download and Install Quantum ESPRESSO Benchmark Suite

For this benchmark suite latest version of Quantum ESPRESSO 5.2.0 will be
used. The code is publicly available from the Quantum ESPRESSO web site
www.quantum-espresso.org, or from the download pages of the developers portal
(qe-forge.org).
No authentication/registration is required.
Quantum ESPRESSO distribution's tar balls of the source code is available
at site:
http://www.quantum-espresso.org/download/
Links for the download and installation instruction pages for GPU enabled
version and Intel Phi version are available at the same page.

QE is self contained, nevertheless to obtain an optimal performance
usually it is better to link the code with external standard
libraries: BLAS, LAPACK, FFTW, BLACS/SCALAPACK (for parallel build).

For the installation instruction please refere to the Installation section
of the "User's Guide for QUANTUM ESPRESSO" at the link:
http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/

If everything goes fine you should find the executable "pw.x" in the directory
espresso-5.2.0/bin


3 - List and Purpose of Datasets

Two datasets are provided together with this benchmark: a small one
(AUSURF.tar) to be used to run benchmarks inside a single node (or device)
and as a test bed for code change; and a large one (GRIR686.tar) to run
benchmarks on many nodes.


4 - Run the Benchmarks

Quantum Espresso reads many command line parameters to control the
internal distribution of data structure as well as standard input.
For parallel execution Quantum ESPRESSO require the use of a system launcher
command (e.g. mpirun or mpiexec) to distribute the instance of Quantum
ESPRESSO on different nodes. Relevant command line parameters for this
benchmark are:
-input MY_INPUT_FILE
       tells Quantum ESPRESSO to read input from MY_INPUT_FILE)
-npool P
       (tells Quantum ESPRESSO to use P pools to distribute data. P should
       be less or equal the number of k-points, and maximum scalability is
       usually reached with P exactly equal to the number of k-points. You can
       read the output to find out the number of k-points of your system)
-ntg T
       (tells Quantum ESPRESSO to use T task groups to distribute FFT. Usually
       optimal performance can be reached with T ranging from 2 to 8)
-ndiag D
       (tells Quantum ESPRESSO to use D processors to perform parallel linear
       algebra computation, ScalaPACK. D can range from 1 to the maximum
       number of MPI tasks, the optima value for D depend on the bandwidth and
       latency of your network).

Below are reported possible command lines to execute QE:
AUSURF test-case (MPI)
    a) export OMP_NUM_THREADS=1; mpirun -np 16 $QE_PATH/bin/pw.x \
       -ntg 2 -npool 2 -input ausurf.in > ausurf.out
GRIR686 test-case (MPI & OpenMP)
    a) export OMP_NUM_THREADS=1; mpirun -np 512 $QE_PATH/bin/pw.x \
       -ntg 2 -npool 2 -input grir686.in.in > grir686.out
    b) export OMP_NUM_THREADS=4; mpirun -np 512 $QE_PATH/bin/pw.x \
       -ntg 4 -ndiag 64 -npool 8 -input grir686.in.in > grir686.out

So download the input dataset tar file and extract it in a working directory,
then in the same directory launch the command to execute the benchmark.


5 - Collect and Report the Results

To validate a benchmark result you have to check the value of the Total
Energy (ETOT) at convergence for AUSURF test-case and after the second SCF step
for the GRIR686 test-case. Proceed as follow:
- inside the running directory issue the command:
> grep ' total energy' MY_OUTPUT_FILE | tail -1
- you should see a string like:
     total energy              =   -XXXX.YYYYYYYY Ry
- or
!    total energy              =   -XXXX.YYYYYYYY Ry
- Note that if this string is not present the result is not valid!
- The value XXXX.YYYYYYYY is the ETOT. It may vary depending on the
number of tasks/command line parameters (its variation should be limited
to the last 4 digits for the AUSURF test-case).

Below are reported reference values for the datasets
For the AUSURF test case the valid results should have ETOT:
    -11427.0820YYYY Ry  +-0.0001
For the GRIR686 test case the valid results should have ETOT:
    -3543YY.YYYYYYYY Ry  +-10
where YYY can be any digits.

QE has already internal profiling and timing functions, then to
evaluate the performance of a given execution you need simply to
locate the execution wall time (PWSCF_WTIME) that can be found in the
PWSCF timing string (e.g.: PWSCF : 1m52.95s CPU 0m32.17s WALL) at the
end of the output. You can use the command:
> grep -E 'PWSCF.*WALL' MY_OUTPUT_FILE
Here "h", "m" and "s" stay for hours, minutes and seconds.


Please, for each dataset, provide us at least four results including
the reference configurations (16 cores and 512 cores, or the maximum
configuration available).