by Andris Gulans for exciting neon
(Jupyter notebook by Andris Gulans & Mara Voiculescu)
Purpose: This tutorial gives an introduction into running ground state calculations for low-dimensional and sparse systems efficiently. Assuming the bottleneck of the calculation is the diagonalization, the computational effort can be significantly reduced by applying the Davidson eigensolver.
Read the following paragraphs before starting with the rest of this tutorial!
Before running any Jupyter tutorials, please refer to the 00_before_starting.md
document on how to correctly set up the environment. This only needs to be done once. After which, the venv can be (re)activated from exciting
's root directory:
source tools/excitingjupyter/venv/excitingvenv/bin/activate
As a first step, you may create a running directory for the notebook.
%%bash
mkdir -p run_graphyne
Low-dimensional systems attract particular interest in modern materials research, and exciting
can be used for studying them. The LAPW basis assumes periodicity in three dimensions, and specifying geometry of such systems involves large vacuum regions to prevent interactions between periodic images. As a result, one obtains large unit cells even for several atoms, and it leads to large basis sets and heavy calculations.
Suppose the unit cell of the studied system contains $N_\mathrm{at}$ atoms, and the basis consists of $N_\mathrm{LAPW}$ LAPWs. In the limit of large $N_\mathrm{LAPW}$, the most laborious step of a ground-state DFT calculation employing a (semi)local exchange-correlation functional is a Hamiltonian construction and diagonalization. Standard LAPW implementations use direct matrix diagonalization algorithms as implemented in LAPACK and other linear algebra libraries. With the matrix size $N_\mathrm{LAPW} \times N_\mathrm{LAPW}$, these algorithms require $O(N_\mathrm{LAPW}^3)$ floating-point operations (FLOPs).
Considering low-dimensional systems, we notice that the required number of bands (all occupied and a few unoccupied) is small compared to $N_\mathrm{LAPW}$, and it is typically well under 1%. This is a scenario where iterative algorithms are expected to have an advantage over the direct ones. This tutorial shows how to apply the Davidson eigensolver implemented in exciting
. The method was orinally introduced by E.R. Davidson in Journal of Computational Physics 17, 87 (1975), and applied in an LAPW calculation for the first time by D. Singh in Physical Review B 40, 5428 (1989). The implementation in exciting
extends this approach to (L)APW+lo taking into account that the basis becomes essentially linearly dependent at large rgkmax values. The implementation is briefly described by A. Gulans, A. Kozhevnikov, C. Draxl in Phys. Rev. B 97, 161105(R) (2018).
The iterative eigensolver implemented in exciting
constructs a subspace of trial wavefunctions, and it crucially depends on the ability to apply the Hamiltonian on the trial wavefunction, but it does not require that the Hamiltonian matrix is explicitly constructed. exciting
supports both options: (i) the Hamiltonian matrix is explicitly constructed (occupies more memory, but it is more flexible in terms what kind of calculations are supported), (ii) the Hamiltonian matrix is never built (advantageous in many cases as it uses less memory and is typically faster).
$\alpha$-graphene is a two-dimensional carbon allotrope consisting of $sp$- and $sp^2$-hybridized atoms. In this tutorial, we discuss applying the iterative eigensolver for calculating the band dispersion of this material. We start with preparing an input for a standard calculation that employs the default option where an eigensolver from LAPACK is used.
<input>
<title>Graphyne</title>
<structure speciespath="." >
<crystal scale="1.0">
<basevect>13.163831236 0.00000000000000 0.00000000000000</basevect>
<basevect>-6.581915618 11.400212262 0.00000000000000</basevect>
<basevect>0.00000000000000 0.00000000000000 20.0</basevect>
</crystal>
<species chemicalSymbol="C" speciesfile="C.xml" rmt="1.15">
<atom coord="0.0000000000 0.0000000000 0.0000000000"/>
<atom coord="0.6666666667 0.3333333333 0.0000000000"/>
<atom coord="0.2313939997 0.1156969998 0.0000000000"/>
<atom coord="0.4352726670 0.2176363335 0.0000000000"/>
<atom coord="0.7823636665 0.2176363335 0.0000000000"/>
<atom coord="0.8843030002 0.1156969998 0.0000000000"/>
<atom coord="0.7823636665 0.5647273330 0.0000000000"/>
<atom coord="0.8843030002 0.7686060003 0.0000000000"/>
</species>
</structure>
<groundstate
ngridk="4 4 1"
rgkmax="4"
gmaxvr="20"
maxscl="30"
radialgridtype="cubic-2"
xctype="LDA_PW"
>
<solver
type='Lapack'
/>
</groundstate>
</input>
As explained above, we introduce a large vacuum gap (20 bohr) between the graphyne layers to eliminate interactions between the periodic images. You can explore the structure by visualizing the structure as shown below. But, before you do it, please, notice that we chose a small value of the rgkmax
parameter just for getting started, and, in principle, it should be converged with respect to the quantity of interest. Also, the attribute type in the element solver is specified. Its default value is Lapack
, and therefore providing it is not mandatory at this point.
import os
from excitingjupyter.utilities import get_input_xml_from_notebook
# Extract input file content from this notebook:
input_str = get_input_xml_from_notebook("Low_dimensional_systems", "GRAPHYNE")
# Write out the input as an XML file:
with open(os.path.join(os.getcwd(), 'run_graphyne/input.lapack.xml'), "w") as fid:
fid.write(input_str)
# Extract species file content from this notebook:
species_str = get_input_xml_from_notebook("Low_dimensional_systems", "CARBON_SPECIES")
# Write out the input as an XML file:
with open(os.path.join(os.getcwd(), 'run_graphyne/C.xml'), "w") as fid:
fid.write(species_str)
%%bash
cd run_graphyne
xcrysden --exciting input.lapack.xml >/dev/null 2>&1 &
cd ..
Now we provide our favourite species definition for carbon. Note that the LAPW energy parameter for the radial functions with $l=0$ is set to correspond to the $2s$ band energy rather than $1s$. It is irrelevant when you use the direct eigensolvers, but it is crucially important for using the Davidson's method.
<spdb xsi:noNamespaceSchemaLocation="../../xml/species.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sp chemicalSymbol="C" name="carbon" z="-6.00000" mass="21894.16673">
<muffinTin rmin="0.100000E-05" radius="1.4500" rinf="21.1565" radialmeshPoints="300"/>
<atomicState n="1" l="0" kappa="1" occ="2.00000" core="false"/>
<atomicState n="2" l="0" kappa="1" occ="2.00000" core="false"/>
<atomicState n="2" l="1" kappa="1" occ="1.00000" core="false"/>
<atomicState n="2" l="1" kappa="2" occ="1.00000" core="false"/>
<basis>
<default type="lapw" trialEnergy="0.15" searchE="false"/>
<custom l="0" type="lapw" trialEnergy="-0.70" searchE="false"/>
<lo l="0">
<wf matchingOrder="0" trialEnergy="-0.70" searchE="false"/>
<wf matchingOrder="1" trialEnergy="-0.70" searchE="false"/>
</lo>
<lo l="0">
<wf matchingOrder="0" trialEnergy="-0.70" searchE="false"/>
<wf matchingOrder="0" trialEnergy="-9.9" searchE="false"/>
</lo>
<lo l="0">
<wf matchingOrder="0" trialEnergy="-9.9" searchE="false"/>
<wf matchingOrder="1" trialEnergy="-9.9" searchE="false"/>
</lo>
<lo l="0">
<wf matchingOrder="1" trialEnergy="-9.9" searchE="false"/>
<wf matchingOrder="2" trialEnergy="-9.9" searchE="false"/>
</lo>
<custom l="1" type="lapw" trialEnergy="-0.20" searchE="false"/>
<lo l="1">
<wf matchingOrder="0" trialEnergy="-0.20" searchE="false"/>
<wf matchingOrder="1" trialEnergy="-0.20" searchE="false"/>
</lo>
<lo l="1">
<wf matchingOrder="1" trialEnergy="-0.20" searchE="false"/>
<wf matchingOrder="2" trialEnergy="-0.20" searchE="false"/>
</lo>
</basis>
</sp>
</spdb>
Now we save the inputs and run the calculation.
%%bash
pwd
cd run_graphyne
time exciting_smp input.lapack.xml
cat TOTENERGY.OUT
cd ..
Constructing the Hamiltonian and overlap matrices and diagonalizing the Hamiltonian problem take a large portion of the total run time. We can speed up this calculation by using the Davidson eigensolver. This adjustment is simple, as you have to the attribute type
in the element groundstate/solver
from Lapack
to Davidson
. We run the code again.
%%bash
cd run_graphyne
time exciting_smp input.iterative.xml
cat TOTENERGY.OUT
cd ..
Notice the reduction in the run time, while the total energy remains unchanged (within the convergence criteria) compared to the LAPACK calculation.
In the next step, we explore what is the effect of constructing or not constructing the Hamiltonian matrix during the iterative diagonalization. This choice is defined in the attribute constructHS
of the element groundstate/solver
. The default value of the attribute is true
meaning that the Hamiltonian is explicitly constructed. We will change it to false
and rerun the calculation.
%%bash
cd run_graphyne
time exciting_smp input.iterative_v2.xml
cat TOTENERGY.OUT
cd ..
We managed to reduce the run time even further, and this is the optimal option for this system. At fixed rgkmax
, the require number of FLOPs for the iterative solver is $O(N_\mathrm{at}N_\mathrm{LAPW}^2)$ and $O(N_\mathrm{at}^2 N_\mathrm{LAPW})$ when we construct or do not construct the Hamiltonian, respectively. If rgkmax
is varied, the asymptotic scaling changes slightly, although the choice not to construct the Hamiltonian matrix remains superior for sparse systems.
The diagonalization is performed during the self-consistence cycle, but not exclusively. For example, plotting the band structure requires it too. Although the choice of the eigensolver is made in the groundstate/solver element, it applies to the properties calculations too. The input below plots the band dispersion.
<input>
<title>Graphyne</title>
<structure speciespath="." >
<crystal scale="1.0">
<basevect>13.163831236 0.00000000000000 0.00000000000000</basevect>
<basevect>-6.581915618 11.400212262 0.00000000000000</basevect>
<basevect>0.00000000000000 0.00000000000000 20.0</basevect>
</crystal>
<species chemicalSymbol="C" speciesfile="C.xml" rmt="1.15">
<atom coord="0.0000000000 0.0000000000 0.0000000000"/>
<atom coord="0.6666666667 0.3333333333 0.0000000000"/>
<atom coord="0.2313939997 0.1156969998 0.0000000000"/>
<atom coord="0.4352726670 0.2176363335 0.0000000000"/>
<atom coord="0.7823636665 0.2176363335 0.0000000000"/>
<atom coord="0.8843030002 0.1156969998 0.0000000000"/>
<atom coord="0.7823636665 0.5647273330 0.0000000000"/>
<atom coord="0.8843030002 0.7686060003 0.0000000000"/>
</species>
</structure>
<groundstate
ngridk="4 4 1"
rgkmax="4"
gmaxvr="20"
maxscl="30"
radialgridtype="cubic-2"
xctype="LDA_PW"
do="skip"
nempty="10"
>
<solver
type='Davidson'
constructHS='false'
/>
</groundstate>
<properties>
<bandstructure>
<plot1d>
<path steps="25">
<point coord="0.0 0.0 0.0" label="Gamma"/>
<point coord="0.0 0.5 0.0" label="M"/>
<point coord="0.3333333333 0.3333333333 0" label="K"/>
<point coord="0.0 0.0 0.0" label="Gamma"/>
</path>
</plot1d>
</bandstructure>
</properties>
</input>
%%bash
cd run_graphyne
time exciting_smp input.xml
cd ..
%%bash
cd run_graphyne
python3 -m excitingscripts.plot.band_structure -e -10 5
cd ..
from IPython.display import Image
Image(filename="run_graphyne/PLOT.png")
In the previous sections, we performed initial calculations with the parameter selection that allows us to get started quickly. A production calculation would require higher rgkmax grid and a finer $\mathbf{k}$-grid. In this tutorial, we limit ourselves to discussing rgkmax.
The choice of the rgkmax parameter depends on the precision targets and the quantities of the interest, but typically the reasonable values are in the range 6--10. Let us assume that the most optimistic scenario where rgkmax
=6 is sufficient. How much more expensive our calculation will get in comparison with what we did in the previous sections? Here is "a back of the envelope" estimate. The number of the basis functions scales as $N_\mathrm{LAPW}\propto \mathrm{rgkmax}^3$, and $N_\mathrm{LAPW}$ increases $(\mathrm{rgkmax}^\mathrm{new}/\mathrm{rgkmax}^\mathrm{old})^3 \approx 3.4$ times. Direct eigensolvers from the LAPACK library require $O(N_\mathrm{LAPW}^3)$ FLOPs which translates into $3.4^3\approx 38$ times longer runtime spent on the diagonalization!
How long would it take to run the calculation using the iterative eigensolver? Let us run and see!
new_input_rgkmax6_xml = ExcitingInputXML.from_xml(os.path.join(os.getcwd(), 'run_graphyne/input.lapack.xml'))
new_input_rgkmax6_xml.groundstate.rgkmax = "6"
new_input_rgkmax6_xml.groundstate.solver.type = "Davidson"
new_input_rgkmax6_xml.groundstate.solver.constructHS = "false"
with open(os.path.join(os.getcwd(), 'run_graphyne/input.rgkmax6.xml'), "w") as fid:
fid.write(new_input_rgkmax6_xml.to_xml_str())
%%bash
cd run_graphyne
time exciting_smp input.rgkmax6.xml
cd ..