Dear Develoeprs,
I am running Lumen 2.0.1 and noticed that in the new release note “calculation of Xo response function in G-space: The implementation of the Xo response function has been re-written by taking advantage of lapack GEMM. The new implementation is fully GPU ported by using devxlib_xGEMM interface. Roughly a factor 2 speed-up is obtained and also a memory reduction of a factor 2 for the allocation of the G-space response function.”
I have found a new option as:
BZINDX_CPU= "" # [PARALLEL] CPUs for each role
BZINDX_ROLEs= "" # [PARALLEL] CPUs roles (k)
Do these parameters relate to the new Xo response function calculation?(If not,how should we set the related parameters about the new feature)
Thanks!
Xiaomeng
parameter
Re: parameter
Dear Xiaomeng,
the two point are not related.
1 - The new implementation in Xo (thanks to Andrea Ferretti and co-workers) is used automatically, nothing to do on your side.
2 - The X_{G,G'} can be further distributed in memory if you use the parallelization input variables for the response function, using the "g" role as below. PLease notice that other roles duplicte X_{G,G'} in memory, but are in general more efficient. Especially "c","v", and "k".
3 - Reagardless of X_{G,G'} being memory distributed or not, the solution of it's dyson equation is usually performed with Lapack, but it can be also performed with scalapack (but this is completely independent from points 1 and 2).
4 - The variables BZINDX instead are not new, and they refer to the memory distribution of the variable qindx_B, used for the bse runlevel. This variable is usually duplicated in memory. You can activate its distribution with (it might be needed if you have very dense k-meshes)
The implementation has been recently improved. See this point in the release notes:
D.
the two point are not related.
1 - The new implementation in Xo (thanks to Andrea Ferretti and co-workers) is used automatically, nothing to do on your side.
2 - The X_{G,G'} can be further distributed in memory if you use the parallelization input variables for the response function, using the "g" role as below. PLease notice that other roles duplicte X_{G,G'} in memory, but are in general more efficient. Especially "c","v", and "k".
Code: Select all
X_and_IO_ROLEs= "g.q.k.c.v"
X_and_IO_CPU="4.1.2.1.1"
Code: Select all
X_nCPU_LinAlg_INV=4Code: Select all
BZINDX_CPU= "8" # [PARALLEL] CPUs for each role
BZINDX_ROLEs= "8" # [PARALLEL] CPUs roles (k)
Best,Fixed performance issue in qindx_B distribution in BSE (switched from HDF5 I/O to MPI WinCreate)
D.
Davide Sangalli, PhD
Piazza Leonardo Da Vinci, 32, 20133 – Milano
CNR, Istituto di Struttura della Materia (ISM)
https://sites.google.com/view/davidesangalli
Piazza Leonardo Da Vinci, 32, 20133 – Milano
CNR, Istituto di Struttura della Materia (ISM)
https://sites.google.com/view/davidesangalli