1.11 Remarks and exercises

1.11.1 A historical remark

Back in 1926, Max Born simply postulated the connection between amplitudes and probabilities, but did not get it quite right on his first attempt. In the original paper40 proposing the probability interpretation of the state vector (wavefunction) he wrote:

… If one translates this result into terms of particles only one interpretation is possible. Θη,τ,m(α,β,γ)\Theta_{\eta,\tau,m}(\alpha,\beta,\gamma) [the wavefunction for the particular problem he is considering] gives the probability^* for the electron arriving from the zz direction to be thrown out into the direction designated by the angles α,β,γ\alpha,\beta,\gamma

^* Addition in proof: More careful considerations show that the probability is proportional to the square of the quantity Θη,τ,m(α,β,γ)\Theta_{\eta,\tau,m}(\alpha,\beta,\gamma).

1.11.2 Modifying the Born rule

Suppose that we modified the Born rule, so that probabilities were given by the absolute values of amplitudes raised to the power pp (for some p>0p>0 not necessarily equal to 22). Then physically admissible evolutions would still have to preserve the normalisation of probability: mathematically speaking, they would have to be isometries of pp-norms.

The pp-norm of a vector v=(v1,v2,,vn)v=(v_1, v_2,\ldots, v_n), for pNp\in\mathbb{N}, is defined as41 v1p+v2p++vnpp. \sqrt[{}^p]{|v_1|^p + |v_2|^p + \ldots + |v_n|^p}. It is clear that any permutation of vector components and multiplication by phase factors (i.e. unit complex numbers, of the form eiφe^{i\varphi} for some φ\varphi) will leave any pp-norm unchanged. It turns out that these complex permutations are the only isometries, except for one special case: p=2p=2. For p=2p=2, the isometries are exactly unitaries, which form a continuous group; in all other cases we are restricted to discrete permutations. We do not have to go into details of the proof since we can see this result.

The unit spheres in the p-norm for p=1,2,42,\infty (where the definition of the \infty-norm is slightly different; we will come back to this in Section 12.11.2).

Figure 1.7: The unit spheres in the pp-norm for p=1,2,42,p=1,2,42,\infty (where the definition of the \infty-norm is slightly different; we will come back to this in Section 12.11.2).

The image of the unit sphere must be preserved under probability preserving operations. As we can see in Figure 1.7, the 22-norm is special because of its rotational invariance (it describes a circle) — the probability measure picks out no preferred basis in the space of state vectors. Moreover, it respects unitary operations and does not restrict them in any way. If the admissible physical evolutions were restricted to discrete symmetries, e.g. permutations, then there would be no continuity, and no concept of “time” as we know it.

1.11.3 Many computational paths

A quantum computer starts calculations in some initial state, then follows nn different computational paths which lead to the final output. The computational paths are followed with probability amplitudes 1neikφ\frac{1}{n}e^{i k \varphi}, where φ\varphi is a fixed angle 0<φ<2π0< \varphi <2\pi and k=0,1,...n1k=0,1,...n-1. Using the fact that 1+z+z2++zn=1zn+11z1+z+z^2+\ldots + z^n= \frac{1-z^{n+1}}{1-z}, show that the probability PP of generating the output is given by P=1n21einφ1eiφ2=1n2sin2(nφ2)sin2(φ2). P = \frac{1}{n^2}\left\vert \frac{1-e^{i n\varphi}}{1-e^{i\varphi}} \right\vert^2 = \frac{1}{n^2} \frac{\sin^2 (n\frac{\varphi}{2})}{\sin^2 (\frac{\varphi}{2})}. for 0<φ<2π0<\varphi<2\pi, and that P=1P=1 when φ=0\varphi=0. Plot the probability as a function of φ\varphi.

1.11.4 Distant photon emitters

Imagine two distant stars, A and B, that emit identical photons. If you point a single detector towards them you will register a click every now and then, but you never know which star the photon came from. Now prepare two detectors and point them towards the stars. Assume the photons arrive with the probability amplitudes specified in Figure 1.8. Every now and then you will register a coincidence: the two detectors will both click.

  1. Calculate the probability of such a coincidence.
  2. Now assume that z1rei2rπ/λz\approx \frac{1}{r}e^{i{2r\pi}/{\lambda}}, where rr is the distance between detectors and the stars and λ\lambda is some fixed constant. How can we use this to measure rr?
Two photon detectors pointing at two stars, with the probabilities of detection labelling the arrows.

Figure 1.8: Two photon detectors pointing at two stars, with the probabilities of detection labelling the arrows.

1.11.5 Quantum Turing machines

The classical theory of computation is essentially the theory of the universal Turing machine — the most popular mathematical model of classical computation. Its significance relies on the fact that, given a possibly very large but still finite amount of time, the universal Turing machine is capable of any computation that can be done by any modern classical digital computer, no matter how powerful. The concept of Turing machines may be modified to incorporate quantum computation, but we will not follow this path. It is much easier to explain the essence of quantum computation talking about quantum logic gates and quantum Boolean networks or circuits. The two approaches are computationally equivalent, even though certain theoretical concepts, e.g. in computational complexity, are easier to formulate precisely using the Turing machine model. The main advantage of quantum circuits is that they relate far more directly to proposed experimental realisations of quantum computation.

1.11.6 More time, more memory

A quantum machine has NN perfectly distinguishable configurations. What is the maximum number of computational paths connecting a specific input with a specific output after kk steps of the machine?

Suppose you are using your laptop to add together amplitudes pertaining to each of the paths. As kk and NN increase you may need more time and more memory to complete the task. How does the execution time and the memory requirements grow with kk and NN? In particular, which will be the thing that limits you sooner: not having enough memory, not having enough time, or both?

1.11.7 Asymptotic behaviour: big-O

In order to make qualitative distinctions between how different functions grow we will often use the asymptotic big-O\mathcal{O} notation. For example, suppose an algorithm running on input of size nn takes an2+bn+ca n^2+bn+c elementary steps, for some positive constants a,ba, b and cc. These constants depend mainly on the details of the implementation and the choice of elementary steps. What we really care about is that, for large nn, the whole expression is dominated by its quadratic term. We then say that the running time of this algorithm grows as n2n^2, and we write it as O(n2)\mathcal{O}(n^2), ignoring the less significant terms and the constant coefficients. More precisely, let f(n)f(n) and g(n)g(n) be functions from positive integers to positive reals. You may think of f(n)f(n) and g(n)g(n) as the running times of two algorithms on inputs of size nn. We say f=O(g)f=\mathcal{O}(g),42 which means that ff grows no faster than gg, if there is a constant c>0c>0 such that f(n)cg(n)f(n)\leqslant c g(n) for all sufficiently large values of nn. Essentially, f=O(g)f=\mathcal{O}(g) is a very loose analogue of fgf \leqslant g. In addition to the big-O\mathcal{O} notation, computer scientists often use Ω\Omega for lower bounds: f=Ω(g)f=\Omega (g) means g=O(f)g=\mathcal{O}(f). Again, this is a very loose analogue of fgf \geqslant g.

  1. When we say that f(n)=O(logn)f(n)=\mathcal{O}(\log n), why don’t we have to specify the base of the logarithm?
  2. Let f(n)=5n3+1000n+50f(n)=5n^3+1000n+50. Is f(n)=O(n3)f(n)=\mathcal{O}(n^3), or O(n4)\mathcal{O}(n^4), or both?
  3. Which of the following statements are true?
    1. nk=O(2n)n^k=\mathcal{O}(2^n) for any constant kk
    2. n!=O(nn)n!=\mathcal{O}(n^n)
    3. if f1=O(g)f_1=\mathcal{O}(g) and f2=O(g)f_2=\mathcal{O}(g), then f1+f2=O(g)f_1+f_2=\mathcal{O}(g).

1.11.8 Polynomial is good, and exponential is bad

In computational complexity the basic distinction is between polynomial and exponential algorithms. Polynomial growth is good and exponential growth is bad, especially if you have to pay for it. There is an old story about the legendary inventor of chess who asked the Persian king to be paid only by a grain of cereal, doubled on each of the 64 squares of a chess board. The king placed one grain of rice on the first square, two on the second, four on the third, and he was supposed to keep on doubling until the board was full. The last square would then have 263=9,223,372,036,854,775,8082^{63}=9,223,372,036,854,775,808 grains of rice, more than has been ever harvested on planet Earth, to which we must add the grains of all previous squares, making the total number about twice as large. If we placed that many grains in an unbroken line we would reach the nearest star Alpha Centauri, our closest celestial neighbour beyond the solar system, about 4.44.4 light-years away.43

The moral of the story: if whatever you do requires an exponential use of resources, you are in trouble.

1.11.9 Imperfect prime tester

There exists a randomised algorithm which tests whether a given number NN is prime.44 The algorithm always returns yes\texttt{yes} when NN is prime, and the probability it returns yes\texttt{yes} when NN is not prime is ε\varepsilon, where ε\varepsilon is never greater than a half (independently, each time you run the algorithm). You run this algorithm rr times (for the same value of NN), and each time the algorithm returns yes\texttt{yes}. What is the probability that NN is not prime?

1.11.10 Imperfect decision maker

Suppose a randomised algorithm solves a decision problem, returning yes\texttt{yes} or no\texttt{no} answers. It gets the answer wrong with a probability not greater than 12δ\frac{1}{2}-\delta, where δ>0\delta>0 is a constant.45 If we are willing to accept a probability of error no larger than ε\varepsilon, then it suffices to run the computation rr times, where r=O(log1/ε)r=\mathcal{O}(\log 1/\varepsilon).

  1. If we perform this computation rr times, how many possible sequences of outcomes are there?
  2. Give a bound on the probability of any particular sequence with ww wrong answers.
  3. If we look at the set of rr outcomes, we will determine the final outcome by performing a majority vote. This can only go wrong if w>r/2w>r/2. Give an upper bound on the probability of any single sequence that would lead us to the wrong conclusion.
  4. Using the bound 1xex1-x\leqslant e^{-x}, conclude that the probability of our coming to the wrong conclusion is upper bounded by e2rδ2e^{-2r\delta^2}.

  1. Max Born, “Zur Quantenmechanik der Stoßvorgänge”, Zeitschrift für Physik 37 (1926), pp. 893–867.↩︎

  2. In the case p=2p=2, we recover the usual Pythagorean/Euclidean equation that we all know and love: the distance of the point (v1,v2,,vn)(v_1,v_2,\ldots,v_n) from the origin is v12+v22++vn2\sqrt{v_1^2+v_2^2+\ldots+v_n^2}; if we take n=2n=2 as well then we recover the Pythagoras theorem.↩︎

  3. f=O(g)f=\mathcal{O}(g) is pronounced as “ff is big-oh of gg”.↩︎

  4. One light year (the distance that light travels through a vacuum in one year) is 9.4607×10159.4607\times10^{15} metres.↩︎

  5. Primality used to be given as the classic example of a problem in BPP\texttt{BPP} but not P\texttt{P}. However, in 2002 a deterministic polynomial time test for primality was proposed by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena in the wonderfully titled paper “PRIMES is in P\texttt{P}”. Thus, since 2002, primality has been in P\texttt{P}.↩︎

  6. This result is known as the Chernoff bound.↩︎