## 12.4 Approximating unitaries

So now we know a bit about how norms (or metrics, or inner products) can help us to understand distance between state vectors, can we say something similar about quantum evolutions? Say we have unitary operators U and V acting on the same Hilbert space, where U is some “target” unitary that we want to implement in a real-life circuit, and V is an “approximate” unitary that we can actually implement in practice. We say that V approximates U with precision \varepsilon, or that U and V are \varepsilon-close, if241 \|U-V\| \leqslant\varepsilon where \|\cdot\| is some norm on unitary matrices (of the same size), which we would want to satisfy the following property: if \|U-V\| is “small”, then U should be hard to distinguish from V when acting on any quantum state.

Before defining such a norm, however, we first recall some linear algebra which we briefly touched upon in Exercise 5.14.13. The singular values of an operator A are the square roots of the (necessarily non-negative) eigenvalues of the Hermitian operator A^\dagger A. If A is normal (e.g. a density operator), then its singular values are exactly the absolute values of its eigenvalues. We tend to denote singular values by s_i(A) (or just s_i if it is clear which operator we are talking about), and we write \sigma(A) to mean the set of eigenvalues of A, i.e. \sigma(A) \coloneqq \{\lambda\in\mathbb{C} \mid \det(A-\lambda\mathbf{1})=0\}. This means that \{s_i(A)\} = \{\sqrt{\lambda} \mid \lambda\in\sigma(A)\}.

The operator norm (or spectral norm) \|A\| of an operator A\in\mathcal{B}(\mathcal{H}) is the maximum length of the vector A|v\rangle over all possible normalised vectors |v\rangle\in\mathcal{H}, i.e. \|A\| \coloneqq \max_{|v\rangle\in S_{\mathcal{H}}^1}\Big\{ |A|v\rangle| \Big\} (where S_{\mathcal{H}}^1 is the unit sphere in \mathcal{H}, i.e. the set of vectors of norm 1). One can show that \|A\| is equal to the largest singular value of A.

If A is normal (e.g. a density operator), then \|A\| = \max_{\lambda\in\sigma(A)}\Big\{ |\lambda| \Big\}.

The operator norm satisfies some very useful properties:242

• If A is normal, then \|A^\dagger\|=\|A\|
• \|A\otimes B\|=\|A\|\|B\|
• If U is unitary, then \|U\|=1
• If P\neq0 is an orthogonal projector, then \|P\|=1
• Sub-multiplicativity: \|AB\|\leqslant\|A\|\|B\|.

Now suppose that some quantum system, initially in state |\psi\rangle, evolves according to U or V. Let P be a projector associated with some specific outcome of some measurement that can be performed on the system after either evolution (such as P=|a\rangle\langle a|, as in our earlier example). Let p_U (resp. p_V) be the probability of obtaining the corresponding measurement outcome if the operation U (resp. V) was performed. By definition, we see that \begin{aligned} |p_U-p_V| &= \Big| \langle\psi|U^\dagger PU|\psi\rangle - \langle\psi|V^\dagger PV|\psi\rangle \Big| \\&= \Big| \langle\psi|U^\dagger P(U-V)|\psi\rangle+\langle\psi|(U^\dagger-V^\dagger)PV|\psi\rangle \Big| \\&\leqslant\Big| \langle\psi|U^\dagger P(U-V)|\psi\rangle \Big| + \Big| \langle\psi|(U^\dagger-V^\dagger)PV|\psi\rangle \Big| \end{aligned} where the inequality is exactly the triangle inequality.

By an application of the Cauchy–Schwartz inequality243 followed by sub-multiplicativity, we then have \begin{aligned} |p_U-p_V| &\leqslant\|U^\dagger P\|\|U-V\| + \|U^\dagger-V^\dagger\|\|VP\| \\&\leqslant 2\|U-V\|. \end{aligned}

This tells us what \varepsilon-closeness means: suppose that V and U are \varepsilon-close; then if, instead of applying one, we apply the other, and subsequently measure the resulting physical system, we know that the probabilities of any particular outcome in any measurement will differ by at most 2\varepsilon.

Now what about working with sequences of unitaries, as we do when we construct quantum circuits? It turns out that closeness is additive under multiplication of unitaries: if \|U_1-V_1\|\leqslant\varepsilon_1 and \|U_2-V_2\|\leqslant\varepsilon_2, then \begin{aligned} \|U_2U_1 - V_2V_1\| &= \|U_2U_1 - V_2U_1 + V_2U_1 - V_2V_1\| \\&= \|(U_2-V_2)U_1 + V_2(U_1-V_1)\| \\&\leqslant\|U_2-V_2\|\|U_1\| + \|V_2\|\|U_1-V_1\| \\&= \|U_2-V_2\| + \|U_1-V_1\| \\&\leqslant\varepsilon_1+\varepsilon_2. \end{aligned} We can then apply this argument inductively.

Errors in the approximation of one sequence of unitaries by another accumulate at most linearly in the number of unitary operations: \|U_n\cdots U_1 - V_n\cdots V_1\| \leqslant\sum_{i=1}^n \varepsilon_n if \|U_i-V_i\|\leqslant\varepsilon_i for all i=1,\ldots,n.

This linear error accumulation relies heavily on the fact that the norm of a unitary operator is equal to 1; for non-unitary operators, errors could accumulate exponentially, which would make efficient approximations of circuits practically impossible. Geometrically, this is because unitaries just rotate vectors, without scaling them.

Again, we can appeal to some trigonometry. First note that \|U-V\| = \|UV^\dagger-\mathbf{1}\| since the operator norm is unitarily invariant.244 Since UV^\dagger is also unitary, its eigenvalues are exactly phase factors e^{i\varphi} for \varphi\in\mathbb{R}; the corresponding eigenvalue of UV^\dagger-\mathbf{1} has modulus |e^{i\varphi}-1| = \sqrt{2}\sqrt{1-\cos\varphi}. Putting this all together, we see that asking for \|U-V\|\leqslant\varepsilon is exactly asking for each eigenvalue of UV^\dagger-\mathbf{1} to satisfy \sqrt{2}\sqrt{1-\cos\varphi}\leqslant\varepsilon, which rearranges to \cos\varphi \geqslant 1-\frac{\varepsilon^2}{2} which is simply |\varphi|\leqslant\varepsilon for small enough \varepsilon. So U rotates relative to V by (at worst) an angle of order \varepsilon, and if we compose unitaries in a sequence then the accumulated rotation increases linearly with the number of unitaries.

1. Note that V approximates U with precision \varepsilon if and only if U approximates V with precision \varepsilon. Even though we might think of one as being our ideal unitary and the other as being the best feasible real-life implementation that we can achieve, this is only us giving names to things — the definition does not care which way round we think of them.↩︎

2. Proving these properties, along with some others, is a good thing to practise — see Exercise 12.11.5.↩︎

3. See Exercise 12.11.5.↩︎

4. See Exercise 12.11.5.↩︎