Kernel-Target Alignment Optimization

What You'll Learn:

How centered kernel-target alignment (CKA) measures how well a kernel matches label structure
Why centering the kernel matrix is essential to avoid trivially high alignment scores
How to optimize quantum kernel parameters using gradient-based methods on CKA
How parameterized encodings combine data-dependent and trainable rotations for task-specific kernels

Level: Advanced | Time: 30 minutes | Qubits: 4 | Framework: PennyLane

Prerequisites

Quantum Kernel — kernel trick, feature maps, kernel matrices
Variational Classifier — parameterized circuits, gradient-based optimization
Fidelity Kernel — kernel construction via state overlap

The Idea

Most quantum kernel methods use a fixed feature map — you choose an encoding circuit, compute the kernel matrix, and hope it captures the right structure for your classification task. If it doesn't, you pick a different circuit and try again. This is trial-and-error, not engineering.

Kernel-target alignment flips the approach: instead of guessing a good kernel, you learn one. The idea is to define a score that measures how well a kernel matrix matches the ideal kernel implied by the labels, then optimize the encoding parameters to maximize that score.

The key insight from Cristianini et al. (2001): the ideal kernel for binary classification with labels y in {-1, +1} is the outer product Y = yy^T. If the learned kernel K agrees with Y — high values where same-class pairs appear, low values where different-class pairs appear — then K is well-aligned for the task. The alignment score formalizes this as a normalized inner product in the space of kernel matrices.

Centering (Cortes et al. 2012) makes the score robust: without centering, a trivial kernel K_ij = 1 for all i,j would score high alignment with any balanced dataset. The centering matrix H = I - (1/n)11^T removes this mean contribution, ensuring only genuinely informative kernel structure contributes to the score.

How It Works

Centered Kernel-Target Alignment (CKA)

Build the kernel matrix: For each pair of data points (x_i, x_j), compute k(x_i, x_j; theta) using the parameterized quantum circuit
Center it: Apply the centering matrix H to get K_centered = HKH
Build the target kernel: Y = yy^T (outer product of labels), then center it the same way
Score the alignment: CKA = Frobenius inner product of the centered matrices, normalized by their norms

Parameterized Encoding

Each layer of the encoding circuit applies three blocks:

CODE
     ┌───────────┐┌────────┐     ┌───────────────┐
q_0: ┤ RY(x_0·θ₀)├┤ RZ(θ₄) ├──■──┤ RZ(x₀x₁·θ₈)  ├──■──
     ├───────────┤├────────┤┌─┴─┐└───────────────┘  │
q_1: ┤ RY(x₁·θ₁)├┤ RZ(θ₅) ├┤ X ├───────────────────■──
     ├───────────┤├────────┤└───┘               ┌─┴─┐
q_2: ┤ RY(x₀·θ₂)├┤ RZ(θ₆) ├──■──              ┤ X ├──
     ├───────────┤├────────┤┌─┴─┐              └───┘
q_3: ┤ RY(x₁·θ₃)├┤ RZ(θ₇) ├┤ X ├────────────────────
     └───────────┘└────────┘└───┘

Block 1 (data-dependent RY): Encodes features with trainable scaling x_i * theta_k
Block 2 (trainable RZ): Pure variational freedom, no data dependence
Block 3 (entangling): CNOT + RZ with product features x_i * x_j * theta_k

Parameters per layer: n_qubits + n_qubits + (n_qubits - 1) = 3n - 1. For 4 qubits and 2 layers: 22 parameters.

Optimization Loop

The loss function is -CKA (negative because we minimize), optimized with Adam:

L(theta) = -CKA(K_theta, Y)

PennyLane's autograd traces through the kernel matrix construction and CKA computation, providing gradients via the parameter-shift rule for the quantum circuit components.

The Math

Centering Matrix

The centering matrix H removes the mean from the kernel in feature space:

H = I_n - (1/n) 1_n 1_n^T

where 1_n is the n-dimensional all-ones vector. The centered kernel:

K_centered = H K H

This is equivalent to centering the feature vectors in the reproducing kernel Hilbert space (RKHS): K_centered_ij = <phi(x_i) - mu, phi(x_j) - mu> where mu = (1/n) sum_i phi(x_i).

CKA Formula

CKA(K, Y) = <HKH, HYH>_F / (||HKH||_F * ||HYH||_F)

where:

Y = yy^T is the target kernel (outer product of labels)
<A, B>_F = sum_ij A_ij * B_ij is the Frobenius inner product
||A||_F = sqrt(<A, A>_F) is the Frobenius norm

CKA ranges from -1 to 1. A value of 1 means perfect alignment (kernel perfectly predicts labels). A value near 0 means the kernel is uninformative for the task.

Why Centering Matters

Without centering, standard KTA is:

KTA(K, Y) = <K, Y>_F / (||K||_F * ||Y||_F)

A constant kernel K_ij = c for all i,j gives KTA > 0 whenever labels are imbalanced. Centering eliminates this artifact because HKH = 0 for any constant matrix K.

Expected Output

Metric	Value
Initial CKA (random params)	~0.0 -- 0.2
Final CKA (optimized)	~0.5 -- 0.8
Improvement	> 0 (always)
Parameters (4 qubits, 2 layers)	22
Framework	PennyLane

Running the Circuit

PYTHON
from circuit import run_circuit, verify_alignment

# Run optimization
result = run_circuit(n_samples=10, n_epochs=15)

print(f"Initial alignment: {result['initial_alignment']:.4f}")
print(f"Final alignment:   {result['final_alignment']:.4f}")
print(f"Improvement:       {result['improvement']:.4f}")
print(f"Parameters:        {result['n_params']}")

# Run verification suite
v = verify_alignment()
for check in v["checks"]:
    status = "PASS" if check["passed"] else "FAIL"
    print(f"[{status}] {check['name']}: {check['detail']}")

Try It Yourself

More layers: Increase n_layers from 2 to 3. Does the final CKA improve? Does it take longer to converge? Plot the alignment history to see.
Larger dataset: Increase n_samples from 10 to 20. The kernel matrix becomes 20x20 — how does training time scale? (Hint: quadratically in n_samples.)
Different data: Replace the XOR pattern with concentric circles or moon-shaped data from sklearn. Which patterns benefit most from alignment optimization?
Learning rate sweep: Try lr values from 0.01 to 0.5. Too high and CKA oscillates; too low and it barely moves. Find the sweet spot.
Compare with fixed kernel: Run the Quantum Kernel circuit on the same XOR data without optimization. How does its classification accuracy compare to the alignment-optimized kernel?

What's Next

Trainable Kernel — Alternative approach: optimize kernel parameters for classification accuracy directly
Quantum Kernel SVM — Use the optimized kernel for SVM classification
Projected Quantum Kernel — Classical post-processing of quantum measurements for efficient kernels

Applications

Domain	Use case
Dataset-specific kernels	Learn optimal feature space for a given classification task
Automated feature engineering	Replace manual feature map selection with optimization
Transfer learning	Pre-train kernel parameters on one dataset, fine-tune on another
Kernel benchmarking	CKA provides a principled comparison metric between different kernels
Quantum advantage analysis	Compare optimized quantum CKA vs. optimized classical CKA

References

Cristianini, N., Shawe-Taylor, J., Elisseeff, A. & Kandola, J. (2001). "On Kernel-Target Alignment." NeurIPS 14. Paper
Cortes, C., Mohri, M. & Rostamizadeh, A. (2012). "Algorithms for Learning Kernels Based on Centered Alignment." Journal of Machine Learning Research 13, 795-828. JMLR
Hubregtsen, T. et al. (2022). "Training quantum embedding kernels on near-term quantum computers." Physical Review A 106, 042431. DOI: 10.1103/PhysRevA.106.042431