Projected Quantum Kernel — IQP Encoding with Classical Projection
What You'll Learn:
- Why projecting to classical space beats full fidelity kernels on noisy devices and at scale
- How IQP (Instantaneous Quantum Polynomial) circuits encode data with believed classical intractability
- How random Pauli measurements construct a finite-dimensional kernel without computing inner products
- How to verify projected kernel matrices and interpret their spectral properties
Level: Advanced | Time: 30 minutes | Qubits: 4 | Framework: PennyLane
Prerequisites
- Quantum Kernel SVM — kernel trick, feature maps, SVM classification
- Fidelity Kernel — full fidelity kernel K(x,y) = |<phi(x)|phi(y)>|^2
- Bell State — superposition, entanglement basics
The Idea
The standard quantum kernel computes K(x,y) = |<phi(x)|phi(y)>|^2 -- the full fidelity between two quantum-encoded data points. This is elegant but has serious practical problems:
- Exponential concentration: As qubit count grows, kernel values concentrate around a fixed value (everything looks equally similar), making the kernel useless for classification.
- Noise sensitivity: Each kernel entry requires precise overlap estimation; hardware noise corrupts fidelity more than expectation values.
- No interpretability: A single scalar K(x,y) tells you "how similar" but not "in what way."
The projected quantum kernel (Huang et al. 2021) solves all three by trading the exponential Hilbert space for a finite-dimensional classical vector:
Instead of asking "how much do these states overlap?", ask "do they give the same measurement results across a set of observables?"
Each data point x gets mapped to a vector p(x) = [<O_1>, <O_2>, ..., <O_m>] of expectation values, and the kernel becomes a simple dot product K_proj(x,y) = <p(x), p(y)>. The dimension m is a tunable hyperparameter -- more projections capture finer structure, fewer projections are cheaper to measure.
How It Works
Step 1: IQP Feature Map
The IQP (Instantaneous Quantum Polynomial) encoding applies:
|phi(x)> = H^n . ZZ(x) . RZ(x) . H^n |0>^n
where RZ(x) applies single-qubit rotations and ZZ(x) applies pairwise entangling gates:
CODE+---+ +--------+ +-----------+ +---+ q_0: --| H |-| RZ(x0) |--*--| RZ(x0*x1) |-| H |-- +---+ +--------+ | +-----------+ +---+ +-+-+ q_1: --| H |-| RZ(x1) |-| X |-------------| H |-- +---+ +--------+ +---+ +---+ ...
IQP circuits are classically hard to simulate (Bremner et al. 2016), giving the feature map its quantum advantage potential. The pairwise ZZ interactions encode feature correlations, not just individual feature values.
Step 2: Measure Observables
For each data point x, measure m observables to build a projection vector:
| Projection Type | Observables | Range | Interpretability |
|---|---|---|---|
| Random Pauli | Tensor products from {I,X,Y,Z}^n | [-1, 1] | Captures multi-body correlations |
| Computational Basis | Probabilities P(|b>) for bit strings b | [0, 1] | Directly interpretable as state populations |
Step 3: Classical Kernel
Compute the kernel as a standard inner product in R^m:
K_proj(x, y) = p(x) . p(y) = sum_k <O_k>_x * <O_k>_y
This is just a linear kernel on the projected features -- compatible with any classical ML pipeline (SVM, ridge regression, etc.).
The Math
Projected Kernel Formula
Given m observables {O_1, ..., O_m} and IQP-encoded states:
CODEp(x) = ( <phi(x)| O_1 |phi(x)>, ..., <phi(x)| O_m |phi(x)> ) in R^m K_proj(x, y) = p(x)^T p(y) = sum_{k=1}^{m} <O_k>_x <O_k>_y
IQP Expressibility
The IQP circuit implements the unitary:
U_IQP(x) = H^n . exp(i sum_{j<k} x_j x_k Z_j Z_k) . diag(e^{i x_0}, ..., e^{i x_{n-1}}) . H^n
The ZZ interaction term creates entanglement that depends on the product of features, giving the encoding non-linear expressibility. The Hadamard layers convert between the X and Z bases, creating interference patterns that depend on the data.
Why Projection Helps
Full fidelity kernel on n qubits lives in a 2^n-dimensional Hilbert space. The projected kernel compresses this to m dimensions where m << 2^n. Huang et al. (2021) showed that:
- For many practical datasets, O(n) projections suffice (no exponential overhead)
- The projected kernel avoids exponential concentration that plagues full fidelity kernels
- With enough projections, the projected kernel can match or exceed full kernel performance
Expected Output
| Metric | Random Projections (m=8) | Basis Projections (m=8) |
|---|---|---|
| Kernel rank | 8 | 8 |
| Diagonal mean | ~2.0-4.0 | ~0.1-0.5 |
| Effective rank | >= 3 | >= 3 |
| Is PSD | True | True |
| Is symmetric | True | True |
| Top eigenvalue | ~15-30 | ~1-5 |
Note: Unlike fidelity kernels, projected kernels do NOT have unit diagonal. The diagonal value K(x,x) = ||p(x)||^2 depends on the projection norms.
Running the Circuit
PYTHONfrom circuit import run_circuit # Random Pauli projections (default) result = run_circuit(n_samples=10, n_projections=8, projection_type="random") print(f"Projection dimension: {result['projection_dim']}") print(f"Kernel rank: {result['kernel_rank']}") print(f"Top eigenvalues: {result['top_eigenvalues']}") # Verification checks v = result["verification"] print(f"PSD: {v['is_psd']}, Symmetric: {v['is_symmetric']}") print(f"Effective rank: {v['effective_rank']}") # Basis projections for comparison result_basis = run_circuit(n_samples=10, n_projections=8, projection_type="basis") print(f"Basis kernel rank: {result_basis['kernel_rank']}")
Try It Yourself
-
Projection dimension sweep: Run with n_projections = 2, 4, 8, 16, 32. Plot kernel rank vs. m. At what point does adding projections stop helping?
-
Random vs. basis: Compare
projection_type="random"and"basis"on the same data. Which gives higher effective rank? Which has more spread in eigenvalues? -
Qubit scaling: Increase n_qubits from 2 to 6 while keeping n_projections=8. Does the kernel quality degrade (exponential concentration) or stay stable?
-
Feature dimension: Try n_features=1 vs. n_features=4. With only 1 feature, the ZZ interactions all see the same value -- how does this affect the kernel spectrum?
-
Noise robustness (advanced): Add
qml.DepolarizingChannelafter each gate and compare projected vs. fidelity kernel degradation. The projected kernel should be more robust since expectation values average over noise.
What's Next
- Trainable Kernel — learn the kernel parameters via gradient descent
- Kernel Alignment — optimize the kernel to match a target alignment
- Quantum Kernel SVM — plug kernels into SVM classifiers
Applications
| Domain | How Projected Kernels Help |
|---|---|
| Large-scale QML | O(m) projections scale better than O(n^2) kernel entries |
| Noisy hardware | Expectation values are more robust than fidelity to depolarizing noise |
| Feature analysis | Each projection dimension corresponds to a physical observable |
| Kernel bandwidth tuning | Projection count m acts as an implicit bandwidth parameter (Shaydulin & Wild 2022) |
| Classical-quantum hybrid | Projection vectors feed directly into classical ML pipelines (SVM, PCA, clustering) |
References
-
Huang, H.-Y., Broughton, M., Mohseni, M. et al. "Power of data in quantum machine learning." Nature Communications 12, 2631 (2021). DOI: 10.1038/s41467-021-22539-9
-
Shaydulin, R. & Wild, S.M. "Importance of kernel bandwidth in quantum machine learning." Physical Review X Quantum 3, 030327 (2022). DOI: 10.1103/PRXQuantum.3.030327
-
Bremner, M.J., Montanaro, A. & Shepherd, D.J. "Average-case complexity versus approximate simulation of commuting quantum computations." Physical Review Letters 117, 080501 (2016). DOI: 10.1103/PhysRevLett.117.080501
-
Havlicek, V. et al. "Supervised learning with quantum-enhanced feature spaces." Nature 567, 209-212 (2019). DOI: 10.1038/s41586-019-0980-2