VQE for BeH2 Molecule Ground State
What You'll Learn:
- How to scale VQE from 4 qubits (LiH) to 6 qubits (BeH2)
- How the parameter landscape grows and barren plateaus become severe
- How linear CX connectivity maps to larger qubit registers
- Why BeH2 is the largest molecule in the VQE progression before active-space methods are needed
Level: Intermediate | Time: 35 minutes | Qubits: 6 | Framework: Qiskit
Prerequisites
- LiH Ground State — 4-qubit VQE with hardware-efficient ansatz
- H2 Ground State — VQE fundamentals on 2 qubits
The Idea
LiH was a solid step up from H2: 4 qubits, 16 parameters, a 9-term Hamiltonian. Now we push further. Beryllium hydride (BeH2) has 6 electrons — 4 from beryllium and 1 from each hydrogen — arranged in a linear H-Be-H geometry. This requires 6 qubits in the minimal basis, a 14-term Hamiltonian, and 24 variational parameters across 2 layers.
BeH2 is the largest molecule in the VQE progression before active-space reduction techniques become essential. At 6 qubits, the Hilbert space is 64-dimensional — still classically tractable for verification, but the optimization landscape is qualitatively harder. The 24-parameter space is 50% larger than LiH, making COBYLA convergence slower and barren plateaus more pronounced.
This is where you start to feel the fundamental tension of variational quantum algorithms: expressibility (enough parameters to represent the ground state) vs. trainability (the optimizer can actually find the minimum).
How It Works
The Ansatz: 2-Layer Hardware-Efficient
Each layer has: (1) RY+RZ rotations on every qubit, then (2) a linear CX chain across all 6 qubits.
CODELayer 1 Layer 2 ┌────────┐┌────────┐ ┌─────────┐┌─────────┐ q0: ┤ RY(θ₀)├┤ RZ(θ₁)├──■────────────────── ┤ RY(θ₁₂)├┤ RZ(θ₁₃)├──■── ├────────┤├────────┤┌─┴─┐ ├─────────┤├─────────┤┌─┴─┐ q1: ┤ RY(θ₂)├┤ RZ(θ₃)├┤ X ├──■────────── ┤ RY(θ₁₄)├┤ RZ(θ₁₅)├┤ X ├──■── ├────────┤├────────┤└───┘┌─┴─┐ ├─────────┤├─────────┤└───┘┌─┴─┐ q2: ┤ RY(θ₄)├┤ RZ(θ₅)├─────┤ X ├──■── ┤ RY(θ₁₆)├┤ RZ(θ₁₇)├─────┤ X ├──■── ├────────┤├────────┤ └───┘┌─┴─┐ ├─────────┤├─────────┤ └───┘┌─┴─┐ q3: ┤ RY(θ₆)├┤ RZ(θ₇)├──────────┤ X ├──■── ┤ RY(θ₁₈)├┤ RZ(θ₁₉)├──────────┤ X ├──■── ├────────┤├────────┤ └───┘┌─┴─┐ ├─────────┤├─────────┤ └───┘┌─┴─┐ q4: ┤ RY(θ₈)├┤ RZ(θ₉)├───────────────┤ X ├──■── ┤ RY(θ₂₀)├┤ RZ(θ₂₁)├───────────────┤ X ├──■── ├─────────┤├─────────┤ └───┘┌─┴─┐ ├─────────┤├─────────┤ └───┘┌─┴─┐ q5: ┤ RY(θ₁₀)├┤ RZ(θ₁₁)├──────────────────┤ X ├ ┤ RY(θ₂₂)├┤ RZ(θ₂₃)├───────────────────┤ X ├ └─────────┘└─────────┘ └───┘ └─────────┘└─────────┘ └───┘
- RY: Controls the polar angle on the Bloch sphere (amplitude mixing)
- RZ: Controls the azimuthal angle (phase control)
- CX chain: Creates nearest-neighbor entanglement (0->1->2->3->4->5)
- 24 parameters: 6 qubits x 2 rotations x 2 layers
The linear CX connectivity matches most superconducting hardware (IBM, Google), making this ansatz directly executable without transpilation overhead.
The Hamiltonian: 14 Dominant Pauli Terms
The full BeH2 Hamiltonian in STO-3G has hundreds of Pauli terms. We use a simplified version with the 14 largest contributions:
| Term | Coefficient | Physical meaning |
|---|---|---|
| IIIIII | -14.3847 | Constant offset (calibrated to FCI) |
| ZIIIII | +0.30 | Single-qubit Z, qubit 5 |
| IZIIII | -0.25 | Single-qubit Z, qubit 4 |
| IIZIII | +0.20 | Single-qubit Z, qubit 3 |
| IIIZII | -0.20 | Single-qubit Z, qubit 2 |
| IIIIZI | +0.15 | Single-qubit Z, qubit 1 |
| IIIIIZ | -0.15 | Single-qubit Z, qubit 0 |
| ZZIIII | +0.08 | ZZ correlation, qubits 5-4 |
| IIZZII | +0.06 | ZZ correlation, qubits 3-2 |
| IIIIZZ | +0.05 | ZZ correlation, qubits 1-0 |
| XXIIII | +0.04 | XX exchange, qubits 5-4 |
| YYIIII | +0.04 | YY exchange, qubits 5-4 |
| IIXXII | +0.03 | XX exchange, qubits 3-2 |
| IIYYII | +0.03 | YY exchange, qubits 3-2 |
Measuring Multi-Qubit Pauli Strings
Each 6-character Pauli string requires its own measurement circuit. Before measuring, rotate each qubit into the appropriate basis:
- I or Z: No rotation (computational basis)
- X: Apply H (Hadamard)
- Y: Apply S†H
For example, to measure XXIIII, apply H on qubits 4 and 5 (the X positions), leave qubits 0-3 unchanged, then measure all in the computational basis.
PYTHONfrom circuit import run_circuit, optimize_vqe # Quick evaluation with initial parameters result = run_circuit() print(f"Energy: {result['energy']:.4f} Ha") # Full optimization (takes longer — 24 parameters) opt = optimize_vqe(max_iterations=300, shots=2048, seed=42) print(f"Optimized: {opt['optimal_energy']:.4f} Ha")
The Math
Scaling Across the VQE Progression
| Property | H2 | HeH+ | LiH | BeH2 | Growth (H2->BeH2) |
|---|---|---|---|---|---|
| Electrons | 2 | 2 | 4 | 6 | 3x |
| Qubits | 2 | 2 | 4 | 6 | 3x |
| Parameters | 4 | 4 | 16 | 24 | 6x |
| Hamiltonian terms | 5 | 5 | 9 | 14 | 2.8x |
| Matrix size | 4x4 | 4x4 | 16x16 | 64x64 | 16x |
| Measurement circuits | 2 | 2 | 8 | 13 | 6.5x |
The parameter count grows as O(qubits x layers), and measurement circuits grow with the number of unique Pauli bases. At 6 qubits, each VQE iteration requires 13 separate circuit executions.
Barren Plateaus at Scale
With 24 parameters and 2 layers on 6 qubits, the barren plateau problem is measurably worse:
CODEVar(dE/dtheta_i) ~ O(2^(-n)) n=2 (H2): Var ~ 0.25 — gradients are large, optimization is easy n=4 (LiH): Var ~ 0.0625 — gradients shrink, but COBYLA still works n=6 (BeH2): Var ~ 0.0156 — gradients flatten, convergence becomes fragile
The gradient variance decreases exponentially with qubit count, making the optimization landscape increasingly flat. Strategies to mitigate this:
- Use few layers (we use 2 — adding more layers would make it worse)
- Initialize near identity (small angles)
- Use problem-informed ansatze (UCCSD) instead of hardware-efficient ones
- Layer-wise training: optimize one layer at a time
Parameter Landscape Complexity
The 24-parameter landscape has approximately 24! / (24-k)! local minima for depth-k saddle points. In practice, COBYLA with 300 iterations explores only a tiny fraction of this space. The probability of finding the global minimum decreases significantly compared to the 16-parameter LiH case.
Expected Output
| Metric | Value |
|---|---|
| Exact ground state (FCI/STO-3G) | -15.835 Ha |
| VQE with optimized parameters | -15.7 to -15.9 Ha |
| Initial guess energy | ~-14.5 Ha |
| Chemical accuracy threshold | +/-1.6 mHa |
With a simplified 14-term Hamiltonian, VQE may not reach chemical accuracy. The full Hamiltonian with hundreds of terms is needed for production-quality results. The wider energy range compared to LiH reflects the increased optimization difficulty.
Running the Circuit
PYTHONfrom circuit import run_circuit, optimize_vqe, verify_vqe # Quick evaluation result = run_circuit() print(f"Energy: {result['energy']:.4f} Ha (error: {result['error']:.4f})") # Full optimization (more iterations needed for 24 params) opt = optimize_vqe(max_iterations=300, shots=2048, seed=42) print(f"Optimized: {opt['optimal_energy']:.4f} Ha") print(f"Chemical accuracy: {opt['chemical_accuracy']}") # Verification (wider tolerance for 6 qubits) v = verify_vqe() for check in v["checks"]: status = "PASS" if check["passed"] else "FAIL" print(f"[{status}] {check['name']}: {check['detail']}")
Try It Yourself
-
Add a 3rd layer: Change
layers=3and use 36 parameters. Does the extra depth improve the energy, or does it worsen barren plateaus? Compare convergence speed with 2 layers. -
Try UCCSD-inspired connectivity: Instead of a linear chain, add CX gates between non-adjacent qubits (0->2, 1->3, 2->4, 3->5). Does this capture more of the BeH2 correlation structure?
-
Layer-wise training: Optimize layer 1 first (12 params), freeze it, then optimize layer 2. Compare with simultaneous optimization of all 24 parameters.
-
Compare with LiH scaling: Run both LiH and BeH2 VQE with the same number of COBYLA iterations. Plot convergence curves side by side. How much harder is the 6-qubit case?
-
Investigate the bond dissociation curve: Vary the H-Be bond length from 1.0 to 3.0 A (adjust Hamiltonian coefficients). Does VQE reproduce the correct equilibrium geometry at ~1.33 A?
What's Next
- H2O Ground State — Water molecule with non-linear topology and active-space reduction
- QAOA MaxCut — Variational algorithms for combinatorial optimization (not chemistry)
Applications
| Domain | Use case |
|---|---|
| Benchmark | Largest standard VQE benchmark before active-space methods are needed |
| Education | Demonstrates scaling challenges: barren plateaus, parameter explosion |
| Materials | BeH2 is relevant to beryllium ceramics and nuclear energy materials |
| Algorithm R&D | Testing ansatz designs at the boundary of hardware-efficient feasibility |
References
- Kandala, A. et al. (2017). "Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets." Nature 549, 242-246. DOI: 10.1038/nature23879
- Peruzzo, A. et al. (2014). "A variational eigenvalue solver on a photonic quantum processor." Nature Communications 5, 4213. DOI: 10.1038/ncomms5213
- Hempel, C. et al. (2018). "Quantum Chemistry Calculations on a Trapped-Ion Quantum Simulator." Physical Review X 8, 031022. DOI: 10.1103/PhysRevX.8.031022
- McClean, J.R. et al. (2018). "Barren plateaus in quantum neural network training landscapes." Nature Communications 9, 4812. DOI: 10.1038/s41467-018-07090-4