Projected Quantum Kernel — IQP Encoding with Classical Projection

What You'll Learn:

Why projecting to classical space beats full fidelity kernels on noisy devices and at scale
How IQP (Instantaneous Quantum Polynomial) circuits encode data with believed classical intractability
How random Pauli measurements construct a finite-dimensional kernel without computing inner products
How to verify projected kernel matrices and interpret their spectral properties

Level: Advanced | Time: 30 minutes | Qubits: 4 | Framework: PennyLane

Prerequisites

Quantum Kernel SVM — kernel trick, feature maps, SVM classification
Fidelity Kernel — full fidelity kernel K(x,y) = |<phi(x)|phi(y)>|^2
Bell State — superposition, entanglement basics

The Idea

The standard quantum kernel computes K(x,y) = |<phi(x)|phi(y)>|^2 -- the full fidelity between two quantum-encoded data points. This is elegant but has serious practical problems:

Exponential concentration: As qubit count grows, kernel values concentrate around a fixed value (everything looks equally similar), making the kernel useless for classification.
Noise sensitivity: Each kernel entry requires precise overlap estimation; hardware noise corrupts fidelity more than expectation values.
No interpretability: A single scalar K(x,y) tells you "how similar" but not "in what way."

The projected quantum kernel (Huang et al. 2021) solves all three by trading the exponential Hilbert space for a finite-dimensional classical vector:

Instead of asking "how much do these states overlap?", ask "do they give the same measurement results across a set of observables?"

Each data point x gets mapped to a vector p(x) = [<O_1>, <O_2>, ..., <O_m>] of expectation values, and the kernel becomes a simple dot product K_proj(x,y) = <p(x), p(y)>. The dimension m is a tunable hyperparameter -- more projections capture finer structure, fewer projections are cheaper to measure.

How It Works

Step 1: IQP Feature Map

The IQP (Instantaneous Quantum Polynomial) encoding applies:

|phi(x)> = H^n . ZZ(x) . RZ(x) . H^n |0>^n

where RZ(x) applies single-qubit rotations and ZZ(x) applies pairwise entangling gates:

CODE
         +---+ +--------+     +-----------+ +---+
  q_0: --| H |-| RZ(x0) |--*--| RZ(x0*x1) |-| H |--
         +---+ +--------+  |  +-----------+ +---+
                          +-+-+
  q_1: --| H |-| RZ(x1) |-| X |-------------| H |--
         +---+ +--------+ +---+             +---+
  ...

IQP circuits are classically hard to simulate (Bremner et al. 2016), giving the feature map its quantum advantage potential. The pairwise ZZ interactions encode feature correlations, not just individual feature values.

Step 2: Measure Observables

For each data point x, measure m observables to build a projection vector:

Projection Type	Observables	Range	Interpretability
Random Pauli	Tensor products from {I,X,Y,Z}^n	[-1, 1]	Captures multi-body correlations
Computational Basis	Probabilities P(\|b>) for bit strings b	[0, 1]	Directly interpretable as state populations

Step 3: Classical Kernel

Compute the kernel as a standard inner product in R^m:

K_proj(x, y) = p(x) . p(y) = sum_k  <O_k>_x * <O_k>_y

This is just a linear kernel on the projected features -- compatible with any classical ML pipeline (SVM, ridge regression, etc.).

The Math

Projected Kernel Formula

Given m observables {O_1, ..., O_m} and IQP-encoded states:

CODE
p(x) = ( <phi(x)| O_1 |phi(x)>,  ...,  <phi(x)| O_m |phi(x)> )  in  R^m

K_proj(x, y) = p(x)^T p(y) = sum_{k=1}^{m} <O_k>_x  <O_k>_y

IQP Expressibility

The IQP circuit implements the unitary:

U_IQP(x) = H^n . exp(i sum_{j<k} x_j x_k Z_j Z_k) . diag(e^{i x_0}, ..., e^{i x_{n-1}}) . H^n

The ZZ interaction term creates entanglement that depends on the product of features, giving the encoding non-linear expressibility. The Hadamard layers convert between the X and Z bases, creating interference patterns that depend on the data.

Why Projection Helps

Full fidelity kernel on n qubits lives in a 2^n-dimensional Hilbert space. The projected kernel compresses this to m dimensions where m << 2^n. Huang et al. (2021) showed that:

For many practical datasets, O(n) projections suffice (no exponential overhead)
The projected kernel avoids exponential concentration that plagues full fidelity kernels
With enough projections, the projected kernel can match or exceed full kernel performance

Expected Output

Metric	Random Projections (m=8)	Basis Projections (m=8)
Kernel rank	8	8
Diagonal mean	~2.0-4.0	~0.1-0.5
Effective rank	>= 3	>= 3
Is PSD	True	True
Is symmetric	True	True
Top eigenvalue	~15-30	~1-5

Note: Unlike fidelity kernels, projected kernels do NOT have unit diagonal. The diagonal value K(x,x) = ||p(x)||^2 depends on the projection norms.

Running the Circuit

PYTHON
from circuit import run_circuit

# Random Pauli projections (default)
result = run_circuit(n_samples=10, n_projections=8, projection_type="random")

print(f"Projection dimension: {result['projection_dim']}")
print(f"Kernel rank: {result['kernel_rank']}")
print(f"Top eigenvalues: {result['top_eigenvalues']}")

# Verification checks
v = result["verification"]
print(f"PSD: {v['is_psd']}, Symmetric: {v['is_symmetric']}")
print(f"Effective rank: {v['effective_rank']}")

# Basis projections for comparison
result_basis = run_circuit(n_samples=10, n_projections=8, projection_type="basis")
print(f"Basis kernel rank: {result_basis['kernel_rank']}")

Try It Yourself

Projection dimension sweep: Run with n_projections = 2, 4, 8, 16, 32. Plot kernel rank vs. m. At what point does adding projections stop helping?
Random vs. basis: Compare projection_type="random" and "basis" on the same data. Which gives higher effective rank? Which has more spread in eigenvalues?
Qubit scaling: Increase n_qubits from 2 to 6 while keeping n_projections=8. Does the kernel quality degrade (exponential concentration) or stay stable?
Feature dimension: Try n_features=1 vs. n_features=4. With only 1 feature, the ZZ interactions all see the same value -- how does this affect the kernel spectrum?
Noise robustness (advanced): Add qml.DepolarizingChannel after each gate and compare projected vs. fidelity kernel degradation. The projected kernel should be more robust since expectation values average over noise.

What's Next

Trainable Kernel — learn the kernel parameters via gradient descent
Kernel Alignment — optimize the kernel to match a target alignment
Quantum Kernel SVM — plug kernels into SVM classifiers

Applications

Domain	How Projected Kernels Help
Large-scale QML	O(m) projections scale better than O(n^2) kernel entries
Noisy hardware	Expectation values are more robust than fidelity to depolarizing noise
Feature analysis	Each projection dimension corresponds to a physical observable
Kernel bandwidth tuning	Projection count m acts as an implicit bandwidth parameter (Shaydulin & Wild 2022)
Classical-quantum hybrid	Projection vectors feed directly into classical ML pipelines (SVM, PCA, clustering)

References

Huang, H.-Y., Broughton, M., Mohseni, M. et al. "Power of data in quantum machine learning." Nature Communications 12, 2631 (2021). DOI: 10.1038/s41467-021-22539-9
Shaydulin, R. & Wild, S.M. "Importance of kernel bandwidth in quantum machine learning." Physical Review X Quantum 3, 030327 (2022). DOI: 10.1103/PRXQuantum.3.030327
Bremner, M.J., Montanaro, A. & Shepherd, D.J. "Average-case complexity versus approximate simulation of commuting quantum computations." Physical Review Letters 117, 080501 (2016). DOI: 10.1103/PhysRevLett.117.080501
Havlicek, V. et al. "Supervised learning with quantum-enhanced feature spaces." Nature 567, 209-212 (2019). DOI: 10.1038/s41586-019-0980-2