Purpose: In this tutorial, you will learn how calculate the absorption spectrum of diamond on using fastBSE within exciting.
Read the following paragraph before starting with the rest of this tutorial!
Before running any Jupyter tutorials, please refer to the 00_before_starting.md
document on how to correctly set up the environment. This only needs to be done once. After which, the venv can be (re)activated from exciting's root directory:
source tools/excitingjupyter/venv/excitingvenv/bin/activate
Important note:
In this tutorial you will learn how to calculate the optical absorption spectrum of with the fastBSE algorithm. To demonstrate that the results of the new method are equivalent to the old one, you will also calculate the spectrum with the direct algorithm. If you are not familiar how this is done, begin with the tutorial Excited states from BSE.
fastBSE is low scaling, matrix-free, iterative algorithm to solve the Bethe-Salpeter equation (BSE). It scales with $\mathcal O(N_v N_c(1 + \log N_k))$. This is massive speed up compared to the direct algorithm that scales with $\mathcal O(N_v^3 N_c^3 N_k^3)$.
The fastBSE algorithm was published by Henneke and collegues.
In this tutorial, we only consider the Tamm-Dancoff-approximation (TDA) and neglect the coupling between the resonant and anti-resonant components of the Bethe-Salpeter Hamiltonian (BSH). We also neglect finite momentum transitions such that $\mathbf q = \mathbf 0$.
The heavy lifting of solving the BSE is setting up and diagonalizing the BSH:
$$ H^{BSE} = D + 2 V - W, $$where $D$ is the diagonal part, taking into account the single electron properties, $V$ is the repulsive, long-range exchange interaction and $W$ the attractive screened interaction. For mapping the problem to linear algebra, we expand the exciton wave function in the two particle basis:
$$ \Phi^\lambda(\mathbf r, \mathbf r') = \sum_{vck} C^\lambda_{vck} \psi_{vk}(\mathbf r)\psi^*_{ck}(\mathbf r'), $$where $\psi_{ik}(\mathbf r)$ is a single electron wave function of band $i$, at $\mathbf r$ and the $k$'th Brillouin zone (BZ) vector. The indices $v$ and $c$ refer to valence and conduction bands, respectively, and $\lambda$ is the index of the exciton. Thus the diagonalization can be rewritten as the following eigenvalue problem:
$$ H^{BSE}|\Phi_\lambda\rangle = E^\lambda |\Phi_\lambda\rangle, $$In this expansion the diagonal part is given by
$$ D_{vck, v'c'k'} = \big(\epsilon_{ck} - \epsilon_{vk}\big) \delta_{cc'} \delta_{vv'} \delta_{kk'}, $$the exchange interaction kernel by
$$ V_{vck, v'c'k'} = \int d^3r \int d^3 r' \frac{ u^*_{v'k'}(\mathbf r')u_{c'k'}(\mathbf r')u^*_{ck}(\mathbf r)u_{vk}(\mathbf r) } {| \mathbf{r - r'} |}, $$and the screened interaction kernel by
$$ W_{vck, v'c'k'} = \int d^3r \int d^3 r' u^*_{v'k'}(\mathbf r')u_{vk}(\mathbf r') W(\mathbf{r, r'})_{\mathbf {k-k'}} u^*_{ck}(\mathbf r)u_{c'k'}(\mathbf r). $$Here was taken into account that for periodic materials, the wavefunctions are Blochwaves and can be written as
$$ \psi_{ik}(\mathbf r) = u_{ik}(\mathbf r) \cdot e^{i\mathbf r \mathbf k}, $$where $u_{ik}(\mathbf r)$ is the periodic part of the wavefunctions.
In total this results in a matrix with $(N_v N_c N_k)^2$ elements. Solving the BSE directly means to calculate the full matrix and diagonalizing it. This scales with $\mathcal O\big((N_v N_c N_k)^3\big)$. For many problems, a converged solution of the BSE can not be calculated with reasonable resources.
The fastBSE algorithm avoids to calculate the full matrix by using interpolative seperable density fitting (ISDF) for approximating the wavefunction products as $u_{ik}(\mathbf r)u^*_{jk}(\mathbf r)$ in the formulars for the integrals:
$$ u_{ik}^*(\mathbf r)u_{jk'}(\mathbf r) \approx \sum_{\mu=1}^{N_\mu} \zeta_\mu(\mathbf r) u_{ik}^*(\mathbf r_\mu) u_{jk}(\mathbf r_\mu). $$When discretized on a real space grid, $\{\mathbf r_i\}_{i=1}^{N_r}$, $u_{ik}^*(\mathbf r)u_{jk'}(\mathbf r)$ can be viewed as a matrix $Z \in \mathbb C^{N_r \times N_iN_jN_k^2}$, where $N_i$ and $N_j$ are the number of bands of $u_{ik}$ and $u_{jk'}$, respectively, to be combined. Thus, for a given $\mathbf r$ $u_{ik}(\mathbf r)^*u_{jk}(\mathbf r)$ can be viewed as a row vector of size $N_iN_jN_k^2$. The ISDF expands all such rows approximately in a linear combination of matrix rows with respect to a selected set of rows or interpolation points and then interpolated on a sub grid $\{\mathbf r_\mu\}_{\mu=1}^{N_\mu} \subseteq \{\mathbf r_i\}_{i=1}^{N_r}$ with typically $N_\mu \ll N_r$. Similarly we can combine the interpolation coeffiecients $\zeta_\mu(\mathbf r)$ in a matrix $\Theta \in \mathbb C^{N_r \times N_\mu}$ and the interpolation rows $u_{ik}(\mathbf r)u_{jk'}(\mathbf r)$ in a matrix $C \in \mathbb C^{N_\mu \times N_iN_jN_k^2}$. Using linear algebra notation, we can write the approximation as
$$ Z \approx \Theta C. $$With respect to the interpolation coefficients $\Theta$ this equation is an over-determined linear system and can be solved by the least squares approximation:
$$ \Theta = ZC^* (CC^*)^{-1}. $$Due to the tensor structure of $Z$ and $C$, $ZC^*$ and $CC^*$ can be carried out efficiently. The overall cost of evaluating $\Theta$ is bounded by $\mathcal O (N_g N N_kN_\mu + N_\mu^3 + N_gN_\mu^2)$, where $N=N_i + N_j$.
The interpolation points are calculated with controidal Voronoi tessalation (CVT). This method splits up the unit cell in cluster with the same weight with respect to the electron density and can be calculated very efficeintly with in $\mathcal O(N_rN_\mu)$.
Since we have three different wavefunction pairings we need to calculate three times ISDF:
$$ u^*_{ck}(\mathbf r)u_{vk}(\mathbf r) \approx \sum_{\mu=1}^{N_\mu^V} \zeta^V_\mu(\mathbf r) u^*_{ck}(\mathbf r^V_\mu)u_{vk}(\mathbf r^V_\mu). $$$$ u^*_{vk}(\mathbf r)u_{vk'}(\mathbf r) \approx \sum_{\mu=1}^{N_\mu^{W_v}} \zeta^{W_v}_\mu(\mathbf r) u^*_{vk}(\mathbf r^{W_c}_\mu)u_{vk'}(\mathbf r^{W_c}_\mu), $$$$ u^*_{ck}(\mathbf r)u_{ck'}(\mathbf r) \approx \sum_{\mu=1}^{N_\mu^{W_c}} \zeta^{W_c}_\mu(\mathbf r) u^*_{ck}(\mathbf r^{W_c}_\mu)u_{ck'}(\mathbf r^{W_c}_\mu). $$Replacing the wavefunction pairs in the formulars by these approximations, we obtain for $V$
$$ V_{vc\mathbf k, v'c'\mathbf k'} \approx \frac 1 {N^2_k} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u^*_{c\mathbf k} (\mathbf r_\mu) u_{v\mathbf k} (\mathbf r_\mu) \tilde V_{\mu \nu} u^*_{c'\mathbf k} (\mathbf r_\nu) u_{v'\mathbf k} (\mathbf r_\nu), $$with
$$ \tilde V_{\mu \nu} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*V}(\mathbf r) V(\mathbf r,\mathbf r') \zeta_\nu^{V}(\mathbf r') $$and for $W$
$$ W_{vc\mathbf k, v'c'\mathbf k'} \approx \frac 1 {N^2_k} \sum_{\mu=1}^{N_{\mu}^{W_c}} \sum_{\nu=1}^{N_{\mu}^{W_v}} u^*_{c\mathbf k} (\mathbf r_\mu) u_{c'\mathbf k'} (\mathbf r_\mu) \tilde W_{\mu\nu, \mathbf{k-k'}} u^*_{v'\mathbf k'} (\mathbf r_\nu) u_{v\mathbf k} (\mathbf r_\nu). $$with
$$ \tilde W_{\mu\nu, \mathbf{k-k'}} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*W_c}(\mathbf r) W_{\mathbf{k}-\mathbf{k'}}(\mathbf r,\mathbf r') \zeta_\nu^{W_v}(\mathbf r'). $$First thing to note is that we reduced with that the number of integrals to solve drastically from $\left( N_k N_v N_c \right)^2$ to $\left( N_\mu^V \right)^2$ and $N_{\mu}^{W_v}N_{\mu}^{W_c}$ for $V$ and $W$, respectively, given that we can chose
$$ N_\mu^V \ll N_k N_v N_c \\ N_{\mu}^{W_v} \ll N_k^2N_v^2 \\ N_{\mu}^{W_c} \ll N_k^2N_c^2, $$which is always true.
The second thing is that the number of matrix elements is not reduced this way. To make use of the deconstructed matrix we use the Lanczos algorithm for the diagonalization. The Lanczos algorithm constructs an approximation to the lowest eigenpairs of a matrix by multiplying it iteratively to a vector. Therefore the scaling of the diagonalization with the matrix size reduces to that of matrix-vector multiplication. Since we are only interested in the lowest eigenvalues, we don't loose anything of interest by not computing the full solution. In the decomposed form as shown above, we can apply the inteaction kernels very efficiently to an vector $X \in \mathbb C^{N_vN_cN_k}$:
For $V$ we obtain
$$ [V \cdot X]_{vck} = \frac{1} {N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u_{ck}^*(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \tilde{V}_{\mu\nu} u^*_{v'k'}(\mathbf{r}_\nu) u_{c'k'}(\mathbf{r}_\nu) \cdot X(\mathbf{k'} v' c'). $$Reordering the sums reveals how the matrix vector product can be evaluated efficiently:
$$ [V \cdot X]_{vck} = \frac{1} {N_k} \sum_{\mu=1}^{N_\mu^V} u^*_{ck}(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \left\{ \sum_{\nu=1}^{N_\mu^V} \tilde V_{\mu\nu} \left[ \sum_{k'} \left( \sum_{c'} u_{c'k'}(\mathbf{r}_\nu) \left[ \sum_{v'} u^*_{v'k'}(\mathbf{r}_\nu) \cdot X(k' v' c') \right] \right) \right] \right\}, $$which scales with $\mathcal O(N_\mu^V N_v N_c N_k + N_\mu N_c N_k)$ and therefor linear with respect to the number of $\mathbf k$-points and quadratically with the number of electrons ($\mathcal O(N_e) \approx \mathcal O(N_v) \approx \mathcal O(N_c)$).
For $W$ we obtain
$$ [W_A\cdot X]_{vck} = \frac{1}{N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\nu=1}^{N_\mu^{W_v}} \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{v'k'}(\mathbf r_\nu^{W_v}) u_{vk}(\mathbf r_\nu^{W_v}) \tilde{W}_{\mu\nu, \mathbf{k-k'}} u^*_{ck}(\mathbf r_\mu^{W_c}) u_{c'k'}(\mathbf r_\mu^{W_c}) \cdot X_{v'c'k'}. $$Here aswell, reordering the sums reveals the efficient evaluation of the matrix vector product:
$$ [W_A\cdot X]_{vck} = \frac{1} {N_k} \sum_{\nu=1}^{N_\mu^{W_v}} u_{vk}(\mathbf r_\nu^{W_v}) \left\{ \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{ck}(\mathbf r_\mu^{W_c}) \left[\sum_{\mathbf{k'}} \tilde{W}_{\mu\nu, \mathbf{k-k'}} \left(\sum_{c'} u_{c'k'}(\mathbf r_\mu^{W_c}) \left[ \sum_{v'} \cdot u^*_{\mathbf{k'}j_o}(\mathbf{\hat{r}}_\nu^W) X(\mathbf{k'} j_o j_u)\right]\right)\right]\right\}, $$which scales with $\mathcal O (N_\mu^{W_v}N_v N_c N_k + N_\mu^{W_v}N_\mu^{W_c}(N_c N_k + N_k \log N_k))$. The $\mathcal O (N_k \log N_k)$ comes from the sum over $k'$
$$ \sum_{k'} W_{k-k'}A_{k'} $$which is a discrete convolution that can be calculated efficently using fast fourier transformations.
As an example, you will calculate the absorption spectrum for diamond on a $4 \times 4 \times 4$ $\mathbf k$-grid. Please note that the parameters are not converged. A converged calculation would require too many resources for a tutorial calculation.
As for all other tutorials, you can choose between executing all steps manually from your command line, or to simply execute the python cells in this Jupyter notebook.
%%bash
# Create working directory
mkdir -p run_fastBSE_diamond
First, calculate the common properties for both the direct and the fastBSE algorithm. These are the electronic eigen energies $\epsilon_{ik}$, the corresponding momentum matrix elements and the screened coulomb interaction. In this example we use the electronic properties from a DFT groundstate calculation. As for the direct algorithm, you can also use a GW calculations as starting point for the fastBSE algorithm. The following input.xml tells exciting to calculate what we need:
<input>
<title>Diamond: Groundstate and Screening</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="fromscratch"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 1.5"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="direct"/>
<plan>
<doonly task="scrgeneigvec"/>
<doonly task="scrwritepmat"/>
<doonly task="screen"/>
<doonly task="xsgeneigvec"/>
<doonly task="writepmatxs"/>
</plan>
</xs>
</input>
In the xs
element we define the general proerties of the BSE calculation. We can leave most of this as it if in the following, all important edits are explained.
Note that in the plan
element we define a workflow that only calculates what we need for solving the BSE but not solving it.
solver="direct"
inside the xs
element tells exciting to use the direct solver workflow. This attribute is only important if you have not defined a costum workflow as we did in the plan
element.
In order to run exciting
from the terminal, you need to to execute the following commands.
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
time $EXCITINGROOT/bin/exciting_smp input.xml
cd ..
Now calculate the absorption spectrum by solving the BSE directly. Therefore edit the input.xml as follows:
<input>
<title>Diamond: Direct BSE</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="skip"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 1.5"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="direct"/>
<plan>
<doonly task="scrcoulint"/>
<doonly task="exccoulint"/>
<doonly task="bse"/>
</plan>
</xs>
</input>
do="skip"
inside the groundstate
element. We do not need to calculate the groundstate again.
The the plan
element defines now a workflow to setup the interaction kernels and solve the BSE.
In order to run exciting
from the terminal, you need to execute the following commands.
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
time $EXCITINGROOT/bin/exciting_smp input.xml
cd ..
<input>
<title>Diamond: fastBSE</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="skip"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 2.0"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="fastBSE"/>
<fastBSE
rsampling="20 20 20"
nisdf="80 200 200"
lanczosmaxits="200"/>
<plan>
<doonly task="write_screened_coulomb"/>
<doonly task="write_wfplot"/>
<doonly task="fastBSE_setup_transitions"/>
<doonly task="fastBSE_isdf_cvt"/>
<doonly task="fastBSE_main"/>
</plan>
</xs>
</input>
The the plan
element defines a workflow with the remaining tasks to calculate the absorption spectrum with the fastBSE algorithm. Without it, exciting will run a workflow, that automatically calculates all necessary properties.
solver="fastBSE"
inside the xs
element tells exciting to use the fastBSE solver. Since we use here a custom plan it has no effect.
The fastBSE
element in the xs
element defines the following parameters for fast BSE.
r_sampling="20 20 20"
inside the xs
defines the real space grid on which $u_{ik}(\mathbf r)$ is calculated.
nisdf="100 200 200"
inside the xs
defines the number of interpolation points for the ISDF of the $u_{vk}(\mathbf r)u_{ck}(\mathbf r)$ pairs in $V$ and $u_{vk}(\mathbf r)u_{v'k'}(\mathbf r)$ and $u_{ck}(\mathbf r)u_{c'k'}(\mathbf r)$ in $W$ respectively.
n_lanczos="200"
inside the xs
defines the maximum number of Lanczos iterations to be calculated.
As you can see, the properties of the BSE calculation do not need to be edited or copied. You only need to set solver="fastBSE"
and define the properties for the fastBSE algorithm.
In order to run exciting
from the terminal, you need to execute the following commands.
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
time $EXCITINGROOT/bin/exciting_smp input.xml
cd ..
By comparing the runtime, you already see that the fastBSE algorithm is significantly faster.
To convince yourself that the absorption spectra calculated with both algorithms are equivalent plot the absorption spectrum by running the following commands. The image is saved as PLOT.png and PLOT.eps.
%%bash
cd run_fastBSE_diamond
cp EPSILON/EPSILON_BSE-singlet-TDA-BAR_SCR-full_OC11.OUT direct
cp fastBSE_singlet_eps_im.out fastBSE
$EXCITINGSCRIPTS/PLOT-files.py -t 'Absorption Spectrum: Diamond' -f direct fastBSE -cx 1 -cy 3 -lx '$\omega$ [eV]' -ly 'Im $\epsilon_M$' -x 0 40
cd ..