QAOA for Traveling Salesman Problem (TSP)

What You'll Learn:

How QAOA handles constrained optimization (not just simple partitions)
How the city-timestep encoding maps TSP routes to qubit states
Why constraint penalties are crucial and how to tune them
How validity rate and tour quality trade off in QAOA solutions

Level: Intermediate | Time: 30 minutes | Qubits: 4 | Framework: Qiskit

Prerequisites

QAOA for MaxCut — QAOA fundamentals (cost/mixer, gamma/beta optimization)
Bell State — entanglement basics
Grover's Search — quantum search over bitstrings

The Idea

Imagine you're a delivery driver with a list of addresses. You need to visit every address exactly once and return home, minimizing total driving distance. That's the Traveling Salesman Problem (TSP) — one of the most famous NP-hard problems in computer science.

Unlike MaxCut (where any bitstring is a valid partition), TSP has constraints: you must visit each city exactly once, and each timestep must have exactly one city. This makes TSP harder for QAOA — the algorithm must learn to avoid invalid routes while minimizing distance.

The key insight: we encode both the objective (minimize distance) and the constraints (valid routes) into the same cost Hamiltonian. Penalty terms push the optimization away from invalid solutions, while distance terms guide it toward short tours.

How It Works

The Problem

Given n cities and pairwise distances, find the shortest round-trip route visiting each city once:

CODE
Default: 2 cities, distance 5

  City 0 <--- 5 ---> City 1

  Valid tours: [0, 1] or [1, 0]
  Optimal tour length: 10 (5 + 5, round trip)

The Encoding: City x Timestep

Each qubit represents "city c is visited at timestep t". For n cities, we need n^2 qubits:

CODE
2 cities, 4 qubits:

         Time 0    Time 1
City 0:   q0        q2
City 1:   q1        q3

Valid state "0110": q1=1 (city 1 at t=0), q2=1 (city 0 at t=1)
  -> Tour: [1, 0] (visit city 1 first, then city 0)

Valid state "1001": q0=1 (city 0 at t=0), q3=1 (city 1 at t=1)
  -> Tour: [0, 1] (visit city 0 first, then city 1)

Constraints

Two types of constraints ensure valid tours:

Row constraint (each city visited exactly once): For each city c, exactly one of the qubits (c, t=0), (c, t=1), ..., (c, t=n-1) should be 1.
Column constraint (one city per timestep): For each timestep t, exactly one of the qubits (c=0, t), (c=1, t), ..., (c=n-1, t) should be 1.

Violations are penalized with weight penalty (default: 10.0, which exceeds the maximum distance).

The QAOA Circuit

CODE
       ┌───┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────┐
q_0:   ┤ H ├─┤ RZZ(dist, q0-q3)   ├─┤ RZZ(pen, q0-q2)    ├─┤ RX(2β)  ├── measure
       ├───┤ ├─────────────────────┤ ├─────────────────────┤ ├─────────┤
q_1:   ┤ H ├─┤ RZZ(dist, q1-q2)   ├─┤ RZZ(pen, q1-q3)    ├─┤ RX(2β)  ├── measure
       ├───┤ ├─────────────────────┤ ├─────────────────────┤ ├─────────┤
q_2:   ┤ H ├─┤  (distance terms)  ├─┤ RZZ(pen, q0-q1)    ├─┤ RX(2β)  ├── measure
       ├───┤ ├─────────────────────┤ ├─────────────────────┤ ├─────────┤
q_3:   ┤ H ├─┤  (distance terms)  ├─┤ RZZ(pen, q2-q3)    ├─┤ RX(2β)  ├── measure
       └───┘ └─────────────────────┘ └─────────────────────┘ └─────────┘
             ├── Cost (distance) ───┤├── Cost (penalties) ──┤├─ Mixer ──┤

Step 1: Uniform Superposition

All n^2 qubits start in |+> — equal probability over all 2^(n^2) states (mostly invalid!).

Step 2: Cost Operator

Three types of RZZ interactions:

Distance terms: RZZ(2*gamma*d_{c1,c2}) between qubit (c1, t) and qubit (c2, t+1) for consecutive timesteps. Penalizes long transitions.
Row constraints: RZZ(2*gamma*penalty) between qubit (c, t1) and qubit (c, t2) for same city at different times. Penalizes visiting a city twice.
Column constraints: RZZ(2*gamma*penalty) between qubit (c1, t) and qubit (c2, t) for different cities at same time. Penalizes visiting two cities simultaneously.

Step 3: Mixer Operator

RX(2*beta) on each qubit. Same transverse field mixer as MaxCut — explores neighboring states.

Step 4: Measure and Optimize

Measure all qubits, decode bitstrings to tours, evaluate tour length + constraint violations. Classically optimize gamma and beta to minimize expected cost.

PYTHON
from circuit import run_circuit, optimize_qaoa

# Default parameters
result = run_circuit()
print(f"Validity rate: {result['validity_rate']:.1%}")
print(f"Best tour: {result['best_tour']}, length: {result['best_length']}")

# Optimize parameters
opt = optimize_qaoa(p=1, max_iterations=50, seed=42)
print(f"Optimized validity: {opt['validity_rate']:.1%}")

The Math

Cost Hamiltonian

The TSP cost Hamiltonian has three parts:

C = C_distance + C_row + C_column

Distance cost — minimize total tour length:

C_distance = sum_{t} sum_{c1, c2} d_{c1,c2} * Z_{(c1,t)} * Z_{(c2,t+1)}

Row constraint — each city visited exactly once:

C_row = penalty * sum_c (sum_t Z_{(c,t)} - 1)^2

Column constraint — one city per timestep:

C_column = penalty * sum_t (sum_c Z_{(c,t)} - 1)^2

Mixer Hamiltonian

B = sum_{q} X_q

The standard transverse field mixer. More advanced TSP formulations use XY-mixers that preserve the feasible subspace, but the simple RX mixer works for educational purposes.

QAOA State

After p layers:

|gamma, beta> = exp(-i*beta_p * B) exp(-i*gamma_p * C) ... exp(-i*beta_1 * B) exp(-i*gamma_1 * C) |+>^(n^2)

The expected cost ⟨gamma,beta|C|gamma,beta⟩ is minimized over the 2p parameters (unlike MaxCut which maximizes).

Penalty Tuning

The penalty weight must satisfy:

penalty > max tour length

If penalty is too low, the optimizer finds short but invalid routes. If too high, the optimizer focuses on constraint satisfaction and ignores distance optimization. Default: 10.0 (for 2 cities with distance 5, max tour = 10).

Expected Output

Metric	Value
Optimal tour length (2 cities)	10
Valid tours possible	2 ([0,1] and [1,0])
QAOA p=1 (default params)	validity ~5-15%
QAOA p=1 (optimized)	validity ~10-25%
Random baseline (valid tours)	2/16 = 12.5%

Note: Low validity rates are expected for small instances with simple mixers. The key metric is whether the optimal tour appears among valid samples.

Running the Circuit

PYTHON
from circuit import (
    run_circuit, optimize_qaoa, verify_qaoa,
    compute_exact_solution, decode_bitstring
)
import numpy as np

# Exact solution
exact = compute_exact_solution()
print(f"Optimal: length={exact['optimal_length']}, tours={exact['optimal_tours']}")

# QAOA with defaults
result = run_circuit(shots=2048)
print(f"Validity: {result['validity_rate']:.1%}")
print(f"Best tour: {result['best_tour']}, length: {result['best_length']}")

# Optimize
opt = optimize_qaoa(p=1, shots=2048, seed=42)
print(f"Optimized validity: {opt['validity_rate']:.1%}")

# Custom 3-city problem (9 qubits!)
distances_3 = np.array([
    [0, 5, 10],
    [5, 0, 3],
    [10, 3, 0]
])
result = run_circuit(distances=distances_3, gamma=[0.2], beta=[0.6])

# Verification
v = verify_qaoa()
for check in v["checks"]:
    print(f"[{'PASS' if check['passed'] else 'FAIL'}] {check['name']}")

Scaling Challenge

TSP is much harder to scale than MaxCut due to the n^2 qubit overhead:

Cities	Qubits (n^2)	RZZ gates per layer	Classical tours	Feasibility
2	4	~8	1	Simulator
3	9	~36	3	Simulator
4	16	~96	12	Near-term hardware
5	25	~200	60	Challenging
10	100	~2000	181,440	Future hardware

The valid solution space is tiny: only n! / n = (n-1)! out of 2^(n^2) possible bitstrings are valid tours. For 5 cities, that's 24 valid states out of ~33 million.

Try It Yourself

Increase penalty: Try penalty=20 vs penalty=5. How does the validity rate change? What happens to tour quality?
3-city instance: Use distances=np.array([[0,5,10],[5,0,3],[10,3,0]]). The optimal tour is [0,1,2] with length 18. How many qubits does this need?
Increase p: Run with p=2 (optimize_qaoa(p=2)). Does the validity rate improve? How many more iterations are needed?
Compare with MaxCut: Run both MaxCut and TSP with 4 qubits. Why does TSP have much lower success rates?
Analyze invalid states: Look at the counts dict — what fraction of bitstrings have exactly 2 ones? Why is this relevant?

What's Next

Portfolio Optimization — QAOA for financial portfolio selection (another constrained problem)
VQE H2 Ground State — Another variational algorithm, for quantum chemistry

MaxCut vs TSP: Key Differences

Aspect	MaxCut	TSP
Encoding	1 qubit per vertex	n^2 qubits (city x timestep)
Valid states	All 2^n (any partition)	Only n! out of 2^(n^2)
Constraints	None	Row + column constraints
Objective	Maximize cut edges	Minimize tour length
Success metric	Approximation ratio	Validity rate + tour quality
Difficulty	Moderate for QAOA	Hard (feasible space is tiny)

Applications

Domain	Use case
Logistics	Delivery route optimization, last-mile routing
Manufacturing	Tool path planning, CNC drill sequencing
PCB design	Minimize drilling distance between holes
Genome sequencing	DNA fragment assembly (shortest superstring)

References

Farhi, E., Goldstone, J., Gutmann, S. (2014). "A Quantum Approximate Optimization Algorithm." arXiv: 1411.4028
Lucas, A. (2014). "Ising formulations of many NP problems." Frontiers in Physics 2:5. DOI: 10.3389/fphy.2014.00005
Hadfield, S. et al. (2019). "From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz." Algorithms 12(2):34. DOI: 10.3390/a12020034