Quantum Circuit Learning - Mitarai et al. 2018

Overview

Faithful reproduction of the Quantum Circuit Learning (QCL) paper -- one of the foundational works in variational quantum machine learning. This paper demonstrated that parameterized quantum circuits can be trained as universal function approximators using exact gradients computed via the parameter-shift rule.

Property	Value
Category	Research Reproduction
Difficulty	Advanced
Framework	PennyLane
Qubits	4 (configurable, 3-6)
Gates	RY, RZ, CNOT
Depth	O(L) variational layers
Paper	Physical Review A 98, 032309 (2018)
DOI	10.1103/PhysRevA.98.032309
arXiv	1803.00745

Paper Summary

Mitarai, Negoro, Kitagawa, and Fujii introduced a framework for training parameterized quantum circuits on classical data using a hybrid classical-quantum optimization loop. Their key contributions:

Parameter-shift rule for exact gradients -- Analytic gradient computation without finite-difference approximation, making training hardware-compatible (Eq. 4-6 of the paper).
Universal function approximation -- Proof that quantum circuits with sufficient depth and entanglement can approximate any continuous function, analogous to the classical universal approximation theorem.
Practical training framework -- End-to-end demonstration of data encoding, variational optimization, and measurement-based prediction for both classification and regression tasks.
Arctan encoding -- A nonlinear feature map RY(arctan(x)) that compresses the real line into a bounded rotation angle, preserving ordering while enabling gradient flow.

Theoretical Background

Why Quantum Circuits Can Learn

A parameterized quantum circuit U(x, theta) maps classical input x and trainable parameters theta to a quantum state. The expectation value f(x) = <0|U^dag Z U|0> defines a function class whose expressivity depends on:

Number of qubits n -- determines the Hilbert space dimension (2^n)
Circuit depth L -- controls the number of trainable parameters
Entanglement topology -- enables correlations between qubits

The paper proves that this function class is dense in the space of continuous functions on compact domains, establishing quantum circuits as universal approximators.

The Parameter-Shift Rule

For a gate with parameter theta, the exact gradient of the expectation value is:

d<O>/dtheta = ( <O>_{theta + pi/2} - <O>_{theta - pi/2} ) / 2

This elegant formula:

Requires only two circuit evaluations per parameter
Computes exact gradients (no approximation error)
Works on real quantum hardware (no need for backpropagation through the device)
Enables use of classical optimizers (SGD, Adam, etc.) in the outer loop

The QCL Architecture

Stage 1: Data Encoding

Classical features are mapped to qubit rotations via arctan encoding:

|0_i> --> RY(arctan(x_i))   for qubit i

This maps x in (-inf, +inf) to rotation angles in (-pi/2, +pi/2). For function approximation, multi-frequency encoding is used instead: RY(x * (i+1)) for qubit i, creating a Fourier-like basis.

Stage 2: Variational Layers

Each of L layers applies single-qubit rotations followed by entangling gates:

CODE
Layer l:
  |psi> --> RY(theta_{l,0}) RZ(phi_{l,0}) --*-----------
            RY(theta_{l,1}) RZ(phi_{l,1}) --X--*--------
            RY(theta_{l,2}) RZ(phi_{l,2}) -----X--*-----
            RY(theta_{l,3}) RZ(phi_{l,3}) --------X--*--
                                                      |
                                           (wraps to qubit 0)

The ring topology CNOT cascade creates periodic boundary conditions, maximizing entanglement spread across all qubits.

Stage 3: Measurement

Output = <Z_0> (Pauli-Z expectation on qubit 0)

Classification: sign(<Z_0>) determines predicted label
Regression: raw <Z_0> value approximates the target function

Full Circuit Diagram (4 qubits, 2 layers)

CODE
         Encoding         Layer 1                      Layer 2
q_0: |0>-RY(atan(x0))--RY(t00)--RZ(p00)--*---------RY(t10)--RZ(p10)--*---------<Z>
q_1: |0>-RY(atan(x1))--RY(t01)--RZ(p01)--X--*------RY(t11)--RZ(p11)--X--*------
q_2: |0>-RY(atan(x2))--RY(t02)--RZ(p02)-----X--*---RY(t12)--RZ(p12)-----X--*---
q_3: |0>-RY(atan(x3))--RY(t03)--RZ(p03)--------X-*-RY(t13)--RZ(p13)--------X-*-
                                                 |                             |
                                          (ring to q0)                  (ring to q0)

Running the Circuit

Classification Task (Paper Section IV.A)

PYTHON
from circuit import run_circuit

result = run_circuit(
    task='classification',
    n_qubits=4,
    n_layers=2,
    n_train=30,
    n_epochs=50
)

print(f"Accuracy: {result['classification']['train_accuracy']:.1%}")
print(f"Loss: {result['classification']['initial_loss']:.4f} -> "
      f"{result['classification']['final_loss']:.4f}")

Function Approximation (Paper Section IV.B)

PYTHON
result = run_circuit(
    task='function',
    n_qubits=4,
    n_layers=3,
    n_train=30,
    n_epochs=100
)

print(f"MSE: {result['function_approximation']['mse']:.4f}")
print(f"Target: {result['function_approximation']['target_function']}")

Reproduction Verification

PYTHON
result = run_circuit(task='classification', n_qubits=4, n_layers=2)
verification = result['verification']

print(verification['summary'])
for check in verification['checks']:
    status = 'PASS' if check['passed'] else 'FAIL'
    print(f"  [{status}] {check['name']}: {check['message']}")

Expected Results

Classification (Synthetic Data, seed=42)

Layers	Qubits	Parameters	Accuracy	Notes
1	4	8	~70%	Underfitting -- insufficient expressivity
2	4	16	~85%	Good accuracy on linearly separable data
3	4	24	~95%	Near-optimal for this dataset
2	6	24	~90%	More qubits help with higher-dim features

Function Approximation (sin(x), domain [-pi, pi])

Layers	Qubits	MSE	Notes
1	4	~0.3	Too few parameters for good fit
2	4	~0.1	Reasonable approximation
3	4	~0.02	Good fit (matches paper's reported quality)

Convergence Behavior

Training loss should decrease monotonically in early epochs, then plateau. With the default learning rate of 0.1 and gradient descent:

Classification typically converges in 30-50 epochs
Function approximation may need 100+ epochs for tight fit

Parameter-Shift Rule: Detailed Derivation

The parameter-shift rule exploits the structure of rotation gates. For a gate R(theta) = exp(-i * theta * G / 2) where G has eigenvalues +/-1:

<O(theta)> = A * cos(theta) + B * sin(theta) + C

Evaluating at theta +/- pi/2:

CODE
<O(theta + pi/2)> = -A * sin(theta) + B * cos(theta) + C
<O(theta - pi/2)> =  A * sin(theta) - B * cos(theta) + C

Subtracting and dividing by 2:

CODE
d<O>/dtheta = -A * sin(theta) + B * cos(theta)
            = ( <O(theta + pi/2)> - <O(theta - pi/2)> ) / 2

This works for all standard rotation gates (RX, RY, RZ) and generalizes to multi-parameter gates. The PennyLane framework implements this automatically via qml.gradients.param_shift.

Universal Approximation Argument

The paper establishes universality via three observations:

Encoding: Arctan encoding maps R^n into the unit hypercube of rotation angles, creating a continuous injection from feature space to Hilbert space.
Variational layers: The alternating rotation + entanglement structure generates a dense subset of SU(2^n) as the number of layers L increases.
Measurement: The expectation value <Z_0> is a continuous function of the quantum state.

By composition, the map x -> <Z_0>(U(x, theta)|0>) is dense in the space of continuous functions on compact subsets of R^n, for sufficiently large L.

Comparison with Classical Models

Aspect	QCL	Neural Network
Parameters	O(nL) -- linear in qubits and layers	O(n^2 L) -- quadratic in layer width
Gradient	Exact via parameter-shift	Approximate via backpropagation
Expressivity	Universal (proven)	Universal (proven)
Training difficulty	Barren plateaus risk at large n	Vanishing gradients at large depth
Hardware	Runs on quantum processors	Runs on classical hardware
Advantage regime	Potentially exponential in specific cases	Well-understood scaling

Implementation Notes

Encoding Strategy Choice

The paper presents two encoding strategies:

PYTHON
# Classification: arctan encoding (bounded, order-preserving)
RY(arctan(x_i))   # maps R -> (-pi/2, pi/2)

# Function approximation: frequency encoding (Fourier basis)
RY(x * (i+1))     # qubit i encodes frequency (i+1)

The frequency encoding is related to the quantum Fourier features framework developed in subsequent work (Schuld et al., 2021).

Ring vs. Linear Entanglement

PYTHON
# Ring topology (classification) -- periodic boundary conditions
for i in range(n_qubits):
    CNOT(i, (i+1) % n_qubits)

# Linear chain (function approximation) -- open boundary conditions
for i in range(n_qubits - 1):
    CNOT(i, i+1)

Ring topology spreads entanglement faster (diameter n/2 vs. n-1) but both achieve universal expressivity with sufficient layers.

Known Limitations

Barren plateaus: Random initialization of deep circuits can lead to exponentially vanishing gradients (McClean et al., 2018). This reproduction mitigates this by using shallow circuits (L=2-3) on few qubits (n=3-4).
Training data: The paper uses synthetic data for proof-of-concept. Real-world classification tasks may require additional techniques (data preprocessing, output scaling, regularization).
Simulator only: This reproduction runs on PennyLane's default.qubit simulator. Hardware execution would introduce noise effects not modeled here.

Historical Impact

This paper (published September 2018) was one of the earliest works to:

Formalize the parameter-shift rule for gradient computation on quantum circuits
Prove that parameterized quantum circuits are universal function approximators
Demonstrate end-to-end training of quantum circuits on classical data
Inspire the design of PennyLane, TensorFlow Quantum, and other QML frameworks
Establish the QCL framework that underlies modern variational quantum algorithms

As of 2026, the paper has over 1000 citations and remains one of the most-referenced works in variational quantum computing.

Paper	Relation to QCL
Schuld et al. (2019) -- Quantum embeddings	Extended encoding theory; introduced kernel perspective
McClean et al. (2018) -- Barren plateaus	Identified trainability limitations for deep random circuits
Benedetti et al. (2019) -- PQC review	Comprehensive survey placing QCL in broader VQA context
Perez-Salinas et al. (2020) -- Data reuploading	Single-qubit universality via repeated encoding
Schuld et al. (2021) -- Fourier features	Formal connection between QCL frequency encoding and Fourier analysis

Paper Citation

BIBTEX
@article{mitarai2018quantum,
  title   = {Quantum circuit learning},
  author  = {Mitarai, Kosuke and Negoro, Makoto and Kitagawa, Masahiro
             and Fujii, Keisuke},
  journal = {Physical Review A},
  volume  = {98},
  number  = {3},
  pages   = {032309},
  year    = {2018},
  doi     = {10.1103/PhysRevA.98.032309}
}

Quantum Circuit Learning (Mitarai 2018)

Quantum Circuit Learning - Mitarai et al. 2018

Overview

Paper Summary

Theoretical Background

Why Quantum Circuits Can Learn

The Parameter-Shift Rule

The QCL Architecture

Stage 1: Data Encoding

Stage 2: Variational Layers

Stage 3: Measurement

Full Circuit Diagram (4 qubits, 2 layers)

Running the Circuit

Classification Task (Paper Section IV.A)

Function Approximation (Paper Section IV.B)

Reproduction Verification

Expected Results

Classification (Synthetic Data, seed=42)

Function Approximation (sin(x), domain [-pi, pi])

Convergence Behavior

Parameter-Shift Rule: Detailed Derivation

Universal Approximation Argument

Comparison with Classical Models

Implementation Notes

Encoding Strategy Choice

Ring vs. Linear Entanglement

Known Limitations

Historical Impact

Paper Citation

Learn More

Quantum Circuit Learning - Mitarai et al. 2018

Overview

Paper Summary

Theoretical Background

Why Quantum Circuits Can Learn

The Parameter-Shift Rule

The QCL Architecture

Stage 1: Data Encoding

Stage 2: Variational Layers

Stage 3: Measurement

Full Circuit Diagram (4 qubits, 2 layers)

Running the Circuit

Classification Task (Paper Section IV.A)

Function Approximation (Paper Section IV.B)

Reproduction Verification

Expected Results

Classification (Synthetic Data, seed=42)

Function Approximation (sin(x), domain [-pi, pi])

Convergence Behavior

Parameter-Shift Rule: Detailed Derivation

Universal Approximation Argument

Comparison with Classical Models

Implementation Notes

Encoding Strategy Choice

Ring vs. Linear Entanglement

Known Limitations

Historical Impact

Related Work

Paper Citation

Learn More