Quantum Kernel SVM — Classification in Hilbert Space

What You'll Learn:

How quantum kernels K(x,y) = |⟨φ(x)|φ(y)⟩|² enable SVMs to classify data in exponentially large Hilbert spaces
Why the ZZ feature map creates entanglement between encoded features, capturing non-linear correlations
How the SVM dual problem uses quantum kernel matrices to find maximum-margin decision boundaries
When quantum kernels provably outperform classical kernels (and when they don't)

Level: Advanced | Time: 30 minutes | Qubits: 4 | Framework: PennyLane

Prerequisites

Quantum Kernel — kernel trick, feature maps, kernel matrices
Fidelity Kernel — inversion test, SWAP test, kernel evaluation
Bell State — entanglement, measurement basics

The Idea

Classical SVMs find the hyperplane that best separates two classes of data. The kernel trick lets them work in high-dimensional feature spaces without explicitly computing the transformation. A quantum kernel replaces the classical kernel with a quantum circuit: encode data into a quantum state, then measure the overlap between states.

Why quantum? A quantum state on n qubits lives in a 2^n-dimensional Hilbert space. The ZZ feature map encodes pairwise feature products into entanglement phases, creating correlations that are hard to represent classically. For certain data distributions, this gives a provable exponential advantage over any classical kernel.

Think of it this way: a classical kernel is like projecting data onto a fixed set of axes. A quantum kernel projects data onto an exponentially larger set of axes defined by quantum interference — some of which have no efficient classical description.

How It Works

Step 1: Encode Data (ZZ Feature Map)

Each data point x is encoded into a quantum state |φ(x)⟩ via the ZZ feature map:

CODE
         ┌───┐┌────────┐                         ┌───┐┌────────┐
q_0: ────┤ H ├┤ RZ(x₀π)├──■─────────────────■────┤ H ├┤ RZ(x₀π)├──■──...
         ├───┤├────────┤┌─┴─┐┌──────────┐┌─┴─┐  ├───┤├────────┤┌─┴─┐
q_1: ────┤ H ├┤ RZ(x₁π)├┤ X ├┤RZ(x₀x₁π)├┤ X ├──┤ H ├┤ RZ(x₁π)├┤ X ├──...
         ├───┤├────────┤└───┘└──────────┘└───┘  ├───┤├────────┤└───┘
q_2: ────┤ H ├┤ RZ(x₀π)├──■─────────────────■────┤ H ├┤ RZ(x₀π)├──■──...
         ├───┤├────────┤┌─┴─┐┌──────────┐┌─┴─┐  ├───┤├────────┤┌─┴─┐
q_3: ────┤ H ├┤ RZ(x₁π)├┤ X ├┤RZ(x₀x₁π)├┤ X ├──┤ H ├┤ RZ(x₁π)├┤ X ├──...
         └───┘└────────┘└───┘└──────────┘└───┘  └───┘└────────┘└───┘
              ├──────── rep 1 ────────────────┤  ├──────── rep 2 ────────

H + RZ(x_i π): Encodes each feature as a phase rotation in superposition
CNOT-RZ-CNOT: Implements ZZ interaction exp(-i x_i x_j π ZZ), encoding feature products into entanglement

Step 2: Compute Kernel (Inversion Test)

The kernel K(x,y) = |⟨φ(x)|φ(y)⟩|² is computed via the inversion test:

|0...0⟩ ── U(x) ── U†(y) ── Measure P(|0...0⟩)

If x = y, the inverse perfectly undoes the encoding and P(|0...0⟩) = 1. If x ≠ y, the mismatch leaves residual amplitude in other basis states.

Step 3: Build Kernel Matrix

Evaluate K(x_i, x_j) for all pairs of training points:

CODE
K = ┌                         ┐
    │ K(x₁,x₁)  K(x₁,x₂)  …│
    │ K(x₂,x₁)  K(x₂,x₂)  …│
    │    ⋮          ⋮       ⋱│
    └                         ┘

This requires O(n²) quantum circuit evaluations.

Step 4: Train Classical SVM

Feed the quantum kernel matrix to the SVM dual problem. The optimizer finds support vectors (data points closest to the decision boundary) and the maximum-margin classifier.

Step 5: Predict

For a new point x, compute K(x, x_i) against all training points and evaluate:

CODE
f(x) = Σ α_i y_i K(x, x_i) + b
prediction = sign(f(x))

The Math

ZZ Feature Map

The feature map applies the unitary:

U(x) = ∏_{rep} [ U_ZZ(x) · U_φ(x) ]

where:

CODE
U_φ(x) = ⊗_i [ RZ(x_i π) · H ]     (single-qubit encoding)
U_ZZ(x) = ∏_{⟨i,j⟩} exp(-i x_i x_j π Z_i Z_j / 2)   (entangling)

The ZZ gate is decomposed as CNOT → RZ → CNOT:

exp(-iθZZ) = CNOT · (I ⊗ RZ(2θ)) · CNOT

Kernel as Fidelity

The quantum kernel is the fidelity between encoded states:

K(x,y) = |⟨0|U†(x)U(y)|0⟩|² = |⟨φ(x)|φ(y)⟩|²

This is a valid Mercer kernel: symmetric and positive semi-definite by construction.

SVM Dual Problem

Given kernel matrix K and labels y ∈ {-1, +1}:

CODE
max_α  Σ_i α_i - ½ Σ_{i,j} α_i α_j y_i y_j K(x_i, x_j)

subject to:  0 ≤ α_i ≤ C  (box constraint)
             Σ_i α_i y_i = 0  (class balance)

The solution identifies support vectors (α_i > 0) and the bias term b.

Quantum Advantage Condition

Quantum kernels offer advantage when:

The target function lies in the RKHS of the quantum kernel (expressivity)
The quantum kernel values are hard to estimate classically (hardness)
Sufficient training data to learn the function (generalization)

Liu et al. (2021) proved a rigorous separation for specific data distributions.

Expected Output

Metric	Expected Value
Accuracy	80-95% (on separable synthetic data)
Support vectors	3-8 (depends on margin)
K(x,x)	1.0 (self-overlap)
K(x,y) range	[0, 1]
Kernel PSD	True
Circuit evaluations	O(n²) for n training points

Running the Circuit

PYTHON
from circuit import run_circuit, verify_quantum_kernel_svm

# Train and evaluate quantum SVM
result = run_circuit(n_samples=20, n_qubits=4)
print(f"Accuracy: {result['accuracy']:.2%}")
print(f"Support vectors: {result['n_support_vectors']}")

# Verification suite
v = verify_quantum_kernel_svm()
for check in v["checks"]:
    status = "PASS" if check["passed"] else "FAIL"
    print(f"[{status}] {check['name']}: {check['detail']}")

Try It Yourself

Vary the regularization: Try C=0.1 (soft margin) vs C=100 (hard margin) in train_quantum_svm(). How does the number of support vectors change?
Increase feature map depth: Set reps=3 or reps=4 in the feature map. Does expressivity improve classification, or does it lead to overfitting (training accuracy >> test accuracy)?
Non-separable data: Modify run_circuit to use overlapping clusters (e.g., both centered at 0.5). How does the quantum SVM handle non-linearly-separable data compared to a classical RBF kernel?
Scale the qubits: Try n_qubits=2 vs n_qubits=6. More qubits = larger Hilbert space, but does accuracy always improve? What happens to runtime?
Compare with classical: Compute an RBF kernel exp(-||x-y||²/2σ²) on the same data with sklearn. Is the quantum kernel competitive on this synthetic dataset?

What's Next

Fidelity Kernel — SWAP test approach to kernel evaluation (compare with inversion test)
Trainable Kernel — Add variational parameters to the feature map
Projected Quantum Kernel — Classical post-processing of quantum measurements

Applications

Domain	Use case
Drug discovery	Molecular property classification using quantum-encoded molecular descriptors
Anomaly detection	One-class SVM with quantum kernel for detecting outliers in high-dimensional data
Financial modeling	Classification of market regimes using quantum feature correlations
Material science	Classifying quantum phases of matter using experimentally measurable kernels

References

Havlicek, V. et al. (2019). "Supervised learning with quantum-enhanced feature spaces." Nature 567, 209-212. DOI: 10.1038/s41586-019-0980-2
Schuld, M. & Killoran, N. (2019). "Quantum Machine Learning in Feature Hilbert Spaces." Physical Review Letters 122, 040504. DOI: 10.1103/PhysRevLett.122.040504
Liu, Y. et al. (2021). "A rigorous and robust quantum speed-up in supervised machine learning." Nature Physics 17, 1013-1017. DOI: 10.1038/s41567-021-01287-z