VQE for BeH2 Molecule Ground State

What You'll Learn:

How to scale VQE from 4 qubits (LiH) to 6 qubits (BeH2)
How the parameter landscape grows and barren plateaus become severe
How linear CX connectivity maps to larger qubit registers
Why BeH2 is the largest molecule in the VQE progression before active-space methods are needed

Level: Intermediate | Time: 35 minutes | Qubits: 6 | Framework: Qiskit

Prerequisites

LiH Ground State — 4-qubit VQE with hardware-efficient ansatz
H2 Ground State — VQE fundamentals on 2 qubits

The Idea

LiH was a solid step up from H2: 4 qubits, 16 parameters, a 9-term Hamiltonian. Now we push further. Beryllium hydride (BeH2) has 6 electrons — 4 from beryllium and 1 from each hydrogen — arranged in a linear H-Be-H geometry. This requires 6 qubits in the minimal basis, a 14-term Hamiltonian, and 24 variational parameters across 2 layers.

BeH2 is the largest molecule in the VQE progression before active-space reduction techniques become essential. At 6 qubits, the Hilbert space is 64-dimensional — still classically tractable for verification, but the optimization landscape is qualitatively harder. The 24-parameter space is 50% larger than LiH, making COBYLA convergence slower and barren plateaus more pronounced.

This is where you start to feel the fundamental tension of variational quantum algorithms: expressibility (enough parameters to represent the ground state) vs. trainability (the optimizer can actually find the minimum).

How It Works

The Ansatz: 2-Layer Hardware-Efficient

Each layer has: (1) RY+RZ rotations on every qubit, then (2) a linear CX chain across all 6 qubits.

CODE
Layer 1                                            Layer 2
┌────────┐┌────────┐                               ┌─────────┐┌─────────┐
q0: ┤ RY(θ₀)├┤ RZ(θ₁)├──■──────────────────       ┤ RY(θ₁₂)├┤ RZ(θ₁₃)├──■──
    ├────────┤├────────┤┌─┴─┐                       ├─────────┤├─────────┤┌─┴─┐
q1: ┤ RY(θ₂)├┤ RZ(θ₃)├┤ X ├──■──────────         ┤ RY(θ₁₄)├┤ RZ(θ₁₅)├┤ X ├──■──
    ├────────┤├────────┤└───┘┌─┴─┐                  ├─────────┤├─────────┤└───┘┌─┴─┐
q2: ┤ RY(θ₄)├┤ RZ(θ₅)├─────┤ X ├──■──            ┤ RY(θ₁₆)├┤ RZ(θ₁₇)├─────┤ X ├──■──
    ├────────┤├────────┤     └───┘┌─┴─┐             ├─────────┤├─────────┤     └───┘┌─┴─┐
q3: ┤ RY(θ₆)├┤ RZ(θ₇)├──────────┤ X ├──■──       ┤ RY(θ₁₈)├┤ RZ(θ₁₉)├──────────┤ X ├──■──
    ├────────┤├────────┤          └───┘┌─┴─┐        ├─────────┤├─────────┤          └───┘┌─┴─┐
q4: ┤ RY(θ₈)├┤ RZ(θ₉)├───────────────┤ X ├──■──  ┤ RY(θ₂₀)├┤ RZ(θ₂₁)├───────────────┤ X ├──■──
    ├─────────┤├─────────┤             └───┘┌─┴─┐   ├─────────┤├─────────┤              └───┘┌─┴─┐
q5: ┤ RY(θ₁₀)├┤ RZ(θ₁₁)├──────────────────┤ X ├  ┤ RY(θ₂₂)├┤ RZ(θ₂₃)├───────────────────┤ X ├
    └─────────┘└─────────┘                  └───┘   └─────────┘└─────────┘                   └───┘

RY: Controls the polar angle on the Bloch sphere (amplitude mixing)
RZ: Controls the azimuthal angle (phase control)
CX chain: Creates nearest-neighbor entanglement (0->1->2->3->4->5)
24 parameters: 6 qubits x 2 rotations x 2 layers

The linear CX connectivity matches most superconducting hardware (IBM, Google), making this ansatz directly executable without transpilation overhead.

The Hamiltonian: 14 Dominant Pauli Terms

The full BeH2 Hamiltonian in STO-3G has hundreds of Pauli terms. We use a simplified version with the 14 largest contributions:

Term	Coefficient	Physical meaning
IIIIII	-14.3847	Constant offset (calibrated to FCI)
ZIIIII	+0.30	Single-qubit Z, qubit 5
IZIIII	-0.25	Single-qubit Z, qubit 4
IIZIII	+0.20	Single-qubit Z, qubit 3
IIIZII	-0.20	Single-qubit Z, qubit 2
IIIIZI	+0.15	Single-qubit Z, qubit 1
IIIIIZ	-0.15	Single-qubit Z, qubit 0
ZZIIII	+0.08	ZZ correlation, qubits 5-4
IIZZII	+0.06	ZZ correlation, qubits 3-2
IIIIZZ	+0.05	ZZ correlation, qubits 1-0
XXIIII	+0.04	XX exchange, qubits 5-4
YYIIII	+0.04	YY exchange, qubits 5-4
IIXXII	+0.03	XX exchange, qubits 3-2
IIYYII	+0.03	YY exchange, qubits 3-2

Measuring Multi-Qubit Pauli Strings

Each 6-character Pauli string requires its own measurement circuit. Before measuring, rotate each qubit into the appropriate basis:

I or Z: No rotation (computational basis)
X: Apply H (Hadamard)
Y: Apply S†H

For example, to measure XXIIII, apply H on qubits 4 and 5 (the X positions), leave qubits 0-3 unchanged, then measure all in the computational basis.

PYTHON
from circuit import run_circuit, optimize_vqe

# Quick evaluation with initial parameters
result = run_circuit()
print(f"Energy: {result['energy']:.4f} Ha")

# Full optimization (takes longer — 24 parameters)
opt = optimize_vqe(max_iterations=300, shots=2048, seed=42)
print(f"Optimized: {opt['optimal_energy']:.4f} Ha")

The Math

Scaling Across the VQE Progression

Property	H2	HeH+	LiH	BeH2	Growth (H2->BeH2)
Electrons	2	2	4	6	3x
Qubits	2	2	4	6	3x
Parameters	4	4	16	24	6x
Hamiltonian terms	5	5	9	14	2.8x
Matrix size	4x4	4x4	16x16	64x64	16x
Measurement circuits	2	2	8	13	6.5x

The parameter count grows as O(qubits x layers), and measurement circuits grow with the number of unique Pauli bases. At 6 qubits, each VQE iteration requires 13 separate circuit executions.

Barren Plateaus at Scale

With 24 parameters and 2 layers on 6 qubits, the barren plateau problem is measurably worse:

CODE
Var(dE/dtheta_i) ~ O(2^(-n))

n=2 (H2):   Var ~ 0.25    — gradients are large, optimization is easy
n=4 (LiH):  Var ~ 0.0625  — gradients shrink, but COBYLA still works
n=6 (BeH2): Var ~ 0.0156  — gradients flatten, convergence becomes fragile

The gradient variance decreases exponentially with qubit count, making the optimization landscape increasingly flat. Strategies to mitigate this:

Use few layers (we use 2 — adding more layers would make it worse)
Initialize near identity (small angles)
Use problem-informed ansatze (UCCSD) instead of hardware-efficient ones
Layer-wise training: optimize one layer at a time

Parameter Landscape Complexity

The 24-parameter landscape has approximately 24! / (24-k)! local minima for depth-k saddle points. In practice, COBYLA with 300 iterations explores only a tiny fraction of this space. The probability of finding the global minimum decreases significantly compared to the 16-parameter LiH case.

Expected Output

Metric	Value
Exact ground state (FCI/STO-3G)	-15.835 Ha
VQE with optimized parameters	-15.7 to -15.9 Ha
Initial guess energy	~-14.5 Ha
Chemical accuracy threshold	+/-1.6 mHa

With a simplified 14-term Hamiltonian, VQE may not reach chemical accuracy. The full Hamiltonian with hundreds of terms is needed for production-quality results. The wider energy range compared to LiH reflects the increased optimization difficulty.

Running the Circuit

PYTHON
from circuit import run_circuit, optimize_vqe, verify_vqe

# Quick evaluation
result = run_circuit()
print(f"Energy: {result['energy']:.4f} Ha (error: {result['error']:.4f})")

# Full optimization (more iterations needed for 24 params)
opt = optimize_vqe(max_iterations=300, shots=2048, seed=42)
print(f"Optimized: {opt['optimal_energy']:.4f} Ha")
print(f"Chemical accuracy: {opt['chemical_accuracy']}")

# Verification (wider tolerance for 6 qubits)
v = verify_vqe()
for check in v["checks"]:
    status = "PASS" if check["passed"] else "FAIL"
    print(f"[{status}] {check['name']}: {check['detail']}")

Try It Yourself

Add a 3rd layer: Change layers=3 and use 36 parameters. Does the extra depth improve the energy, or does it worsen barren plateaus? Compare convergence speed with 2 layers.
Try UCCSD-inspired connectivity: Instead of a linear chain, add CX gates between non-adjacent qubits (0->2, 1->3, 2->4, 3->5). Does this capture more of the BeH2 correlation structure?
Layer-wise training: Optimize layer 1 first (12 params), freeze it, then optimize layer 2. Compare with simultaneous optimization of all 24 parameters.
Compare with LiH scaling: Run both LiH and BeH2 VQE with the same number of COBYLA iterations. Plot convergence curves side by side. How much harder is the 6-qubit case?
Investigate the bond dissociation curve: Vary the H-Be bond length from 1.0 to 3.0 A (adjust Hamiltonian coefficients). Does VQE reproduce the correct equilibrium geometry at ~1.33 A?

What's Next

H2O Ground State — Water molecule with non-linear topology and active-space reduction
QAOA MaxCut — Variational algorithms for combinatorial optimization (not chemistry)

Applications

Domain	Use case
Benchmark	Largest standard VQE benchmark before active-space methods are needed
Education	Demonstrates scaling challenges: barren plateaus, parameter explosion
Materials	BeH2 is relevant to beryllium ceramics and nuclear energy materials
Algorithm R&D	Testing ansatz designs at the boundary of hardware-efficient feasibility

References

Kandala, A. et al. (2017). "Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets." Nature 549, 242-246. DOI: 10.1038/nature23879
Peruzzo, A. et al. (2014). "A variational eigenvalue solver on a photonic quantum processor." Nature Communications 5, 4213. DOI: 10.1038/ncomms5213
Hempel, C. et al. (2018). "Quantum Chemistry Calculations on a Trapped-Ion Quantum Simulator." Physical Review X 8, 031022. DOI: 10.1103/PhysRevX.8.031022
McClean, J.R. et al. (2018). "Barren plateaus in quantum neural network training landscapes." Nature Communications 9, 4812. DOI: 10.1038/s41467-018-07090-4