Variational Quantum Classifier

What You'll Learn:

How to encode classical data into quantum states using rotation gates
How variational layers create trainable quantum transformations
How to train a quantum classifier with classical optimization
The encode → process → measure paradigm of quantum machine learning

Level: Intermediate | Time: 25 minutes | Qubits: 2 | Framework: Qiskit

Prerequisites

Bell State — CX entanglement basics
H2 Ground State — variational optimization

The Idea

A neural network takes input data, passes it through trainable layers, and outputs a prediction. A variational quantum classifier does the same thing — on a quantum computer.

The key differences: (1) the "input layer" encodes data as qubit rotations instead of neuron activations, (2) the "hidden layers" are parameterized quantum gates instead of weight matrices, and (3) the "output" is a measurement probability instead of a softmax.

For binary classification (is this email spam or not?), we encode 2 features into 2 qubits, apply trainable layers, and measure qubit 0: if P(|1⟩) > 0.5, predict class 1.

Why quantum? For small problems like this, classical classifiers are faster. The potential advantage comes at scale: quantum feature spaces grow exponentially with qubits, potentially capturing patterns that classical models miss.

How It Works

The Architecture

CODE
     ┌──────────┐ ┌──────────┐     ┌──────────┐ ┌──────────┐     ┌──────────┐
q_0: ┤ RY(x₀·π)├─┤ RY(θ₀)  ├──■──┤ RZ(θ₂)  ├─┤ RY(θ₄)  ├──■──┤ RZ(θ₆)  ├─ M
     ├──────────┤ ├──────────┤┌─┴─┐├──────────┤ ├──────────┤┌─┴─┐├──────────┤
q_1: ┤ RY(x₁·π)├─┤ RY(θ₁)  ├┤ X ├┤ RZ(θ₃)  ├─┤ RY(θ₅)  ├┤ X ├┤ RZ(θ₇)  ├
     └──────────┘ └──────────┘└───┘└──────────┘ └──────────┘└───┘└──────────┘
     |─ Encode ─| |──────── Layer 1 ──────────| |──────── Layer 2 ──────────|

Step 1: Feature Encoding

Each feature x_i ∈ [-1, 1] is mapped to a rotation angle:

PYTHON
qc.ry(x[0] * np.pi, 0)  # x=0 → |0⟩, x=1 → |1⟩, x=0.5 → superposition
qc.ry(x[1] * np.pi, 1)

This places the data point on the Bloch sphere — different inputs produce different quantum states.

Step 2: Variational Layers

Each layer has 3 components:

RY rotations: Mix amplitudes (like weights in a neural network)
CX gate: Entangle qubits (creates correlations classical models can't)
RZ rotations: Adjust phases (adds expressibility)

With 2 layers and 4 parameters each: 8 trainable parameters.

Step 3: Measurement

Measure qubit 0: P(|1⟩) = probability of class 1. If P(|1⟩) > 0.5, predict class 1.

PYTHON
from circuit import predict, train_classifier

# Single prediction
result = predict([0.5, 0.5], theta=[0.1]*8)
print(f"Class: {result['prediction']}, Confidence: {result['confidence']:.1%}")

# Train on dataset
trained = train_classifier(max_iterations=80, seed=42)
print(f"Accuracy: {trained['accuracy']:.1%}")

The Math

Encoding Layer

The feature map F(x) prepares state:

|ψ(x)⟩ = RY(x₁π) ⊗ RY(x₀π) |00⟩

Each RY(θ) = cos(θ/2)|0⟩ + sin(θ/2)|1⟩, so the data point is encoded as rotation angles on the Bloch sphere.

Variational Layer

Each layer applies:

V(θ) = (RZ(θ₃) ⊗ RZ(θ₂)) · CX · (RY(θ₁) ⊗ RY(θ₀))

The full classifier maps input x to class probability:

P(class=1 | x, θ) = |⟨1₀| V_L(θ_L) ... V_1(θ_1) F(x) |00⟩|²

where ⟨1₀| measures qubit 0 in state |1⟩.

Training

Minimize the loss over dataset {(x_i, y_i)}:

L(θ) = 1 - (1/N) Σᵢ [yᵢ = ŷᵢ(θ)]

where ŷᵢ is the prediction for input xᵢ. COBYLA optimizes θ to maximize accuracy.

Expected Output

Metric	Value
Random baseline accuracy	~50%
Trained accuracy (6 samples)	>66% (typically 83-100%)
Parameters	8 (2 layers × 4)
Training iterations	~50-80

Running the Circuit

PYTHON
from circuit import run_circuit, train_classifier, verify_classifier, evaluate_dataset

# Single prediction
result = run_circuit(x=[0.8, -0.2])
print(f"Prediction: class {result['prediction']}")

# Train
trained = train_classifier(max_iterations=80, shots=512, seed=42)
print(f"Accuracy: {trained['accuracy']:.1%}")

# Verify
v = verify_classifier()
for check in v["checks"]:
    print(f"[{'PASS' if check['passed'] else 'FAIL'}] {check['name']}")

Try It Yourself

Add more layers: Train with n_layers=3 (12 parameters). Does accuracy improve? Is training slower?
Try XOR data: Use X=[[1,1],[-1,-1],[1,-1],[-1,1]], y=[0,0,1,1]. Can the classifier learn XOR? (Hint: it needs entanglement.)
More data: Generate 20 random samples from two Gaussian clusters. Does the classifier generalize?
Remove entanglement: Replace CX with identity. Train again. How much accuracy is lost?
Feature map comparison: Replace RY encoding with RZ encoding. Does the classifier still learn?

What's Next

Quantum Kernel — Kernel methods instead of variational classification
Data Re-uploading — Universal classification on a single qubit
Amplitude Encoding — Exponential data compression

Applications

Domain	Use case
Drug discovery	Molecular property classification
Finance	Credit risk assessment, fraud detection
Materials	Phase classification in quantum materials
HEP	Particle identification in detector data

References

Schuld, M. et al. (2020). "Circuit-centric quantum classifiers." Physical Review A 101, 032308. DOI: 10.1103/PhysRevA.101.032308
Havlicek, V. et al. (2019). "Supervised learning with quantum-enhanced feature spaces." Nature 567, 209-212. DOI: 10.1038/s41586-019-0980-2
Perez-Salinas, A. et al. (2020). "Data re-uploading for a universal quantum classifier." Quantum 4, 226. DOI: 10.22331/q-2020-02-06-226