Quantum Circuit Learning - Mitarai et al. 2018
Overview
Faithful reproduction of the Quantum Circuit Learning (QCL) paper -- one of the foundational works in variational quantum machine learning. This paper demonstrated that parameterized quantum circuits can be trained as universal function approximators using exact gradients computed via the parameter-shift rule.
| Property | Value |
|---|---|
| Category | Research Reproduction |
| Difficulty | Advanced |
| Framework | PennyLane |
| Qubits | 4 (configurable, 3-6) |
| Gates | RY, RZ, CNOT |
| Depth | O(L) variational layers |
| Paper | Physical Review A 98, 032309 (2018) |
| DOI | 10.1103/PhysRevA.98.032309 |
| arXiv | 1803.00745 |
Paper Summary
Mitarai, Negoro, Kitagawa, and Fujii introduced a framework for training parameterized quantum circuits on classical data using a hybrid classical-quantum optimization loop. Their key contributions:
- Parameter-shift rule for exact gradients -- Analytic gradient computation without finite-difference approximation, making training hardware-compatible (Eq. 4-6 of the paper).
- Universal function approximation -- Proof that quantum circuits with sufficient depth and entanglement can approximate any continuous function, analogous to the classical universal approximation theorem.
- Practical training framework -- End-to-end demonstration of data encoding, variational optimization, and measurement-based prediction for both classification and regression tasks.
- Arctan encoding -- A nonlinear feature map RY(arctan(x)) that compresses the real line into a bounded rotation angle, preserving ordering while enabling gradient flow.
Theoretical Background
Why Quantum Circuits Can Learn
A parameterized quantum circuit U(x, theta) maps classical input x and trainable parameters theta to a quantum state. The expectation value f(x) = <0|U^dag Z U|0> defines a function class whose expressivity depends on:
- Number of qubits n -- determines the Hilbert space dimension (2^n)
- Circuit depth L -- controls the number of trainable parameters
- Entanglement topology -- enables correlations between qubits
The paper proves that this function class is dense in the space of continuous functions on compact domains, establishing quantum circuits as universal approximators.
The Parameter-Shift Rule
For a gate with parameter theta, the exact gradient of the expectation value is:
d<O>/dtheta = ( <O>_{theta + pi/2} - <O>_{theta - pi/2} ) / 2
This elegant formula:
- Requires only two circuit evaluations per parameter
- Computes exact gradients (no approximation error)
- Works on real quantum hardware (no need for backpropagation through the device)
- Enables use of classical optimizers (SGD, Adam, etc.) in the outer loop
The QCL Architecture
Stage 1: Data Encoding
Classical features are mapped to qubit rotations via arctan encoding:
|0_i> --> RY(arctan(x_i)) for qubit i
This maps x in (-inf, +inf) to rotation angles in (-pi/2, +pi/2). For function approximation, multi-frequency encoding is used instead: RY(x * (i+1)) for qubit i, creating a Fourier-like basis.
Stage 2: Variational Layers
Each of L layers applies single-qubit rotations followed by entangling gates:
CODELayer l: |psi> --> RY(theta_{l,0}) RZ(phi_{l,0}) --*----------- RY(theta_{l,1}) RZ(phi_{l,1}) --X--*-------- RY(theta_{l,2}) RZ(phi_{l,2}) -----X--*----- RY(theta_{l,3}) RZ(phi_{l,3}) --------X--*-- | (wraps to qubit 0)
The ring topology CNOT cascade creates periodic boundary conditions, maximizing entanglement spread across all qubits.
Stage 3: Measurement
Output = <Z_0> (Pauli-Z expectation on qubit 0)
- Classification: sign(<Z_0>) determines predicted label
- Regression: raw <Z_0> value approximates the target function
Full Circuit Diagram (4 qubits, 2 layers)
CODEEncoding Layer 1 Layer 2 q_0: |0>-RY(atan(x0))--RY(t00)--RZ(p00)--*---------RY(t10)--RZ(p10)--*---------<Z> q_1: |0>-RY(atan(x1))--RY(t01)--RZ(p01)--X--*------RY(t11)--RZ(p11)--X--*------ q_2: |0>-RY(atan(x2))--RY(t02)--RZ(p02)-----X--*---RY(t12)--RZ(p12)-----X--*--- q_3: |0>-RY(atan(x3))--RY(t03)--RZ(p03)--------X-*-RY(t13)--RZ(p13)--------X-*- | | (ring to q0) (ring to q0)
Running the Circuit
Classification Task (Paper Section IV.A)
PYTHONfrom circuit import run_circuit result = run_circuit( task='classification', n_qubits=4, n_layers=2, n_train=30, n_epochs=50 ) print(f"Accuracy: {result['classification']['train_accuracy']:.1%}") print(f"Loss: {result['classification']['initial_loss']:.4f} -> " f"{result['classification']['final_loss']:.4f}")
Function Approximation (Paper Section IV.B)
PYTHONresult = run_circuit( task='function', n_qubits=4, n_layers=3, n_train=30, n_epochs=100 ) print(f"MSE: {result['function_approximation']['mse']:.4f}") print(f"Target: {result['function_approximation']['target_function']}")
Reproduction Verification
PYTHONresult = run_circuit(task='classification', n_qubits=4, n_layers=2) verification = result['verification'] print(verification['summary']) for check in verification['checks']: status = 'PASS' if check['passed'] else 'FAIL' print(f" [{status}] {check['name']}: {check['message']}")
Expected Results
Classification (Synthetic Data, seed=42)
| Layers | Qubits | Parameters | Accuracy | Notes |
|---|---|---|---|---|
| 1 | 4 | 8 | ~70% | Underfitting -- insufficient expressivity |
| 2 | 4 | 16 | ~85% | Good accuracy on linearly separable data |
| 3 | 4 | 24 | ~95% | Near-optimal for this dataset |
| 2 | 6 | 24 | ~90% | More qubits help with higher-dim features |
Function Approximation (sin(x), domain [-pi, pi])
| Layers | Qubits | MSE | Notes |
|---|---|---|---|
| 1 | 4 | ~0.3 | Too few parameters for good fit |
| 2 | 4 | ~0.1 | Reasonable approximation |
| 3 | 4 | ~0.02 | Good fit (matches paper's reported quality) |
Convergence Behavior
Training loss should decrease monotonically in early epochs, then plateau. With the default learning rate of 0.1 and gradient descent:
- Classification typically converges in 30-50 epochs
- Function approximation may need 100+ epochs for tight fit
Parameter-Shift Rule: Detailed Derivation
The parameter-shift rule exploits the structure of rotation gates. For a gate R(theta) = exp(-i * theta * G / 2) where G has eigenvalues +/-1:
<O(theta)> = A * cos(theta) + B * sin(theta) + C
Evaluating at theta +/- pi/2:
CODE<O(theta + pi/2)> = -A * sin(theta) + B * cos(theta) + C <O(theta - pi/2)> = A * sin(theta) - B * cos(theta) + C
Subtracting and dividing by 2:
CODEd<O>/dtheta = -A * sin(theta) + B * cos(theta) = ( <O(theta + pi/2)> - <O(theta - pi/2)> ) / 2
This works for all standard rotation gates (RX, RY, RZ) and generalizes to multi-parameter gates. The PennyLane framework implements this automatically via qml.gradients.param_shift.
Universal Approximation Argument
The paper establishes universality via three observations:
- Encoding: Arctan encoding maps R^n into the unit hypercube of rotation angles, creating a continuous injection from feature space to Hilbert space.
- Variational layers: The alternating rotation + entanglement structure generates a dense subset of SU(2^n) as the number of layers L increases.
- Measurement: The expectation value <Z_0> is a continuous function of the quantum state.
By composition, the map x -> <Z_0>(U(x, theta)|0>) is dense in the space of continuous functions on compact subsets of R^n, for sufficiently large L.
Comparison with Classical Models
| Aspect | QCL | Neural Network |
|---|---|---|
| Parameters | O(nL) -- linear in qubits and layers | O(n^2 L) -- quadratic in layer width |
| Gradient | Exact via parameter-shift | Approximate via backpropagation |
| Expressivity | Universal (proven) | Universal (proven) |
| Training difficulty | Barren plateaus risk at large n | Vanishing gradients at large depth |
| Hardware | Runs on quantum processors | Runs on classical hardware |
| Advantage regime | Potentially exponential in specific cases | Well-understood scaling |
Implementation Notes
Encoding Strategy Choice
The paper presents two encoding strategies:
PYTHON# Classification: arctan encoding (bounded, order-preserving) RY(arctan(x_i)) # maps R -> (-pi/2, pi/2) # Function approximation: frequency encoding (Fourier basis) RY(x * (i+1)) # qubit i encodes frequency (i+1)
The frequency encoding is related to the quantum Fourier features framework developed in subsequent work (Schuld et al., 2021).
Ring vs. Linear Entanglement
PYTHON# Ring topology (classification) -- periodic boundary conditions for i in range(n_qubits): CNOT(i, (i+1) % n_qubits) # Linear chain (function approximation) -- open boundary conditions for i in range(n_qubits - 1): CNOT(i, i+1)
Ring topology spreads entanglement faster (diameter n/2 vs. n-1) but both achieve universal expressivity with sufficient layers.
Known Limitations
- Barren plateaus: Random initialization of deep circuits can lead to exponentially vanishing gradients (McClean et al., 2018). This reproduction mitigates this by using shallow circuits (L=2-3) on few qubits (n=3-4).
- Training data: The paper uses synthetic data for proof-of-concept. Real-world classification tasks may require additional techniques (data preprocessing, output scaling, regularization).
- Simulator only: This reproduction runs on PennyLane's
default.qubitsimulator. Hardware execution would introduce noise effects not modeled here.
Historical Impact
This paper (published September 2018) was one of the earliest works to:
- Formalize the parameter-shift rule for gradient computation on quantum circuits
- Prove that parameterized quantum circuits are universal function approximators
- Demonstrate end-to-end training of quantum circuits on classical data
- Inspire the design of PennyLane, TensorFlow Quantum, and other QML frameworks
- Establish the QCL framework that underlies modern variational quantum algorithms
As of 2026, the paper has over 1000 citations and remains one of the most-referenced works in variational quantum computing.
Related Work
| Paper | Relation to QCL |
|---|---|
| Schuld et al. (2019) -- Quantum embeddings | Extended encoding theory; introduced kernel perspective |
| McClean et al. (2018) -- Barren plateaus | Identified trainability limitations for deep random circuits |
| Benedetti et al. (2019) -- PQC review | Comprehensive survey placing QCL in broader VQA context |
| Perez-Salinas et al. (2020) -- Data reuploading | Single-qubit universality via repeated encoding |
| Schuld et al. (2021) -- Fourier features | Formal connection between QCL frequency encoding and Fourier analysis |
Paper Citation
BIBTEX@article{mitarai2018quantum, title = {Quantum circuit learning}, author = {Mitarai, Kosuke and Negoro, Makoto and Kitagawa, Masahiro and Fujii, Keisuke}, journal = {Physical Review A}, volume = {98}, number = {3}, pages = {032309}, year = {2018}, doi = {10.1103/PhysRevA.98.032309} }