# Chapter 1 Quantum interference

About complex numbers, called probability amplitudes, that, unlike probabilities, can cancel each other out, leading to quantum interference, and consequently qualitatively new ways of processing information.

The classical theory of computation does not usually refer to physics. Pioneers such as Alan Turing, Alonzo Church, Emil Post, and Kurt Gödel managed to capture the correct classical theory by intuition alone and, as a result, it is often falsely assumed that its foundations are self-evident and purely abstract. They are not!7

The concepts of information and computation can be properly formulated only in the context of a physical theory — information is stored, transmitted and processed always by physical means. Computers are physical objects and computation is a physical process. Indeed, any computation, classical or quantum, can be viewed in terms of physical experiments, which produce outputs that depend on initial preparations called inputs. Once we abandon the classical view of computation as a purely logical notion independent of the laws of physics it becomes clear that whenever we improve our knowledge about physical reality, we may also gain new means of computation. Thus, from this perspective, it is not very surprising that the discovery of quantum mechanics in particular has changed our understanding of the nature of computation. In order to explain what makes quantum computers so different from their classical counterparts, we begin with the rudiments of quantum theory.

Some of what we say in this chapter will be repeated in later chapters, but usually in much more detail. Feel free to think of this chapter as a sort of “aeroplane tour” of the rudiments, knowing that we will soon land on the ground to go out exploring by foot.

## 1.1 Two basic rules

Quantum theory, at least at some instrumental level, can be viewed as a modification of probability theory. We replace positive numbers (probabilities) with complex numbers z (called probability amplitudes) such that the squares of their absolute values, |z|^2, are interpreted as probabilities.

The correspondence between probability amplitudes z and probabilities |z|^2 is known as Born’s Rule.

The rules for combining amplitudes are very reminiscent of the rules for combining probabilities:

1. Whenever something can happen in a sequence of independent steps, we multiply the amplitudes of each step.

1. Whenever something can happen in several alternative ways, we add the amplitudes for each separate way.

That’s it! These two rules are basically all you need to manipulate amplitudes in any physical process, no matter how complicated.8 They are universal and apply to any physical system, from elementary particles through atoms and molecules to white dwarfs stars. They also apply to information, since, as we have already emphasised, information is physical. The two rules look deceptively simple but, as you will see in a moment, their consequences are anything but trivial.

## 1.2 Quantum interference: the failure of probability theory

Modern mathematical probability theory is based on three axioms, proposed by Andrey Nikolaevich Kolmogorov (1903–1987) in his monograph with the impressive German title Grundbegriffe der Wahrscheinlichkeitsrechnung (“Foundations of Probability Theory”). The Kolmogorov axioms are simple and intuitive:9

1. Once you identify all elementary outcomes, or events, you may then assign probabilities to them.
2. Probability is a number between 0 and 1, and an event which is certain has probability 1.
3. Last but not least, the probability of any event can be calculated using a deceptively simple rule — the additivity axiom: Whenever an event can occur in several mutually exclusive ways, the probability for the event is the sum of the probabilities for each way considered separately.

Obvious, isn’t it? So obvious, in fact, that probability theory was accepted as a mathematical framework theory, a language that can be used to describe actual physical phenomena. Physics should be able to identify elementary events and assign numerical probabilities to them. Once this is done we may revert to mathematical formalism of probability theory. The Kolmogorov axioms will take care of the mathematical consistency and will guide us whenever there is a need to calculate probabilities of more complex events. This is a very sensible approach, apart from the fact that it does not always work! Today, we know that probability theory, as ubiquitous as it is, fails to describe many common quantum phenomena. In order to see the need for quantum theory let us consider a simple experiment in which probability theory fails to give the right predictions.

### 1.2.1 The double slit experiment

In a double slit experiment, a particle emitted from a source S can reach the detector D by taking two different paths, e.g. through an upper or a lower slit in a barrier between the source and the detector. After sufficiently many repetitions of this experiment we can evaluate the frequency of clicks in the detector D and show that it is inconsistent with the predictions based on probability theory. Let us use the quantum approach to show how the discrepancy arises.

The particle emitted from a source S can reach detector D by taking two different paths, with amplitudes z_1 and z_2 respectively. We may say that the upper slit is taken with probability p_1=|z_1|^2 and the lower slit with probability p_2=|z_2|^2. These are two mutually exclusive events. With the two slits open, probability theory declares (by the additivity axiom) that the particle should reach the detector with probability p_1+p_2= |z_1|^2+|z_2|^2. But this is not what happens experimentally!

Following the “quantum rules”, first we add the amplitudes and then we square the absolute value of the sum to get the probability. Thus, the particle will reach the detector with probability \begin{aligned} p &= |z|^2 \\& = |z_1 + z_2|^2 \\& = |z_1|^2 + |z_2|^2 + z_1^\star z_2 + z_1 z_2^\star \\& = p_1 + p_2 + |z_1||z_2|\left( e^{i(\varphi_2-\varphi_1)} + e^{-i(\varphi_2-\varphi_1)} \right) \\& = p_1 + p_2 + 2 \sqrt{p_1 p_2} \cos(\varphi_2-\varphi_1) \\& = p_1 + p_2 + \text{interference terms} \end{aligned} \tag{1.2.1.1} where we have expressed the amplitudes in their polar forms \begin{aligned} z_1 &= |z_1|e^{i\varphi_1} \\z_2 &= |z_2|e^{i\varphi_2}. \end{aligned} The appearance of the interference terms marks the departure from the classical theory of probability. The probability of any two seemingly mutually exclusive events is the sum of the probabilities of the individual events, p_1 + p_2, modified by the interference term 2 \sqrt{p_1p_2}\cos(\varphi_2-\varphi_1). Depending on the relative phase \varphi_2-\varphi_1, the interference term can be either negative (which we call destructive interference) or positive (constructive interference), leading to either suppression or enhancement of the total probability p.

The algebra is simple; our focus is on the physical interpretation. Firstly, note that the important quantity here is the relative phase \varphi_2-\varphi_1 rather than the individual values \varphi_1 and \varphi_2. This observation is not trivial at all: if a particle reacts only to the difference of the two phases, each pertaining to a separate path, then it must have, somehow, experienced the two paths, right? Thus we cannot say that the particle has travelled either through the upper or the lower slit, because it has travelled through both. In the same way, quantum computers follow, in some tangible way, all computational paths simultaneously, producing answers that depend on all these alternative calculations. Weird, but this is how it is!

Secondly, what has happened to the additivity axiom in probability theory? What was wrong with it? One problem is the assumption that the processes of taking the upper or the lower slit are mutually exclusive; in reality, as we have just mentioned, the two transitions both occur, simultaneously. However, we cannot learn this from probability theory, nor from any other a priori mathematical construct.10

There is no fundamental reason why Nature should conform to the additivity axiom.

We find out how nature works by making intelligent guesses, running experiments, checking what happens and formulating physical theories. If our guess disagrees with experiments then it is wrong, so we try another intelligent guess, and another, etc. Right now, quantum theory is the best guess we have: it offers good explanations and predictions that have not been falsified by any of the existing experiments. This said, rest assured that one day quantum theory will be falsified, and then we will have to start guessing all over again.

## 1.3 Superpositions

Amplitudes are more than just tools for calculating probabilities: they tell us something about physical reality. When we deal with probabilities, we may think about them as numbers that quantify our lack of knowledge. Indeed, when we say that a particle goes through the upper or the lower slit with some respective probabilities, it does go through one of the two slits, we just do not know which one. In contrast, according to quantum theory, a particle that goes through the upper and the lower slit with certain amplitudes does explore both of the two paths, not just one of them. This is a statement about a real physical situation — about something that is out there and something we can experiment with.

The assumption that the particle goes through one of the two slits, but just that we do not know which one, is inconsistent with many experimental observations.

We have to accept that, apart from some easy to visualise states, known as the basis states, (such as the particle at the upper slit or the particle at the lower slit), there are infinitely many other states, all of them equally real, in which the particle is in a superposition of the two basis states. This rather bizarre picture of reality is the best we have at the moment, and it works, at least for now.

Physicists write such superposition states as11 |\psi\rangle=\alpha |\text{at the upper slit}\rangle +\beta |\text{at the lower slit}\rangle, meaning the particle at the upper slit with amplitude \alpha and at the lower slit with amplitude \beta. Mathematically, you can think about this expression as a vector |\psi\rangle in a two-dimensional complex vector space written in terms of the two basis vectors |\text{at the upper slit}\rangle and |\text{at the lower slit}\rangle. You could also write this vector as a column vector with two complex entries \alpha and \beta, but then you would have to explain the physical meaning of the basis states. Here, we use the |\cdot\rangle notation, introduced by Paul Dirac in the early days of the quantum theory as a useful way to write and manipulate vectors. In Dirac notation you can put into the box |\phantom{0}\rangle anything that serves to specify what the vector is: it could be |\uparrow\rangle for spin up and |\downarrow\rangle for spin down, or |0\rangle for a quantum bit holding logical 0 and |1\rangle for a quantum bit holding logical 1, etc. As we shall see soon, there is much more to this notation, and learning to manipulate it will help you greatly.

## 1.4 Interferometers

Many modern interference experiments are performed using internal degrees of freedom of atoms and ions. For example, Ramsey interferometry, named after American physicist Norman Ramsey, is a generic name for an interference experiment in which atoms are sent through two separate resonant interaction zones, known as Ramsey zones, separated by an intermediate dispersive interaction zone.

Many beautiful experiments of this type were carried out in the 1990s in Serge Haroche’s lab at the Ecole Normale Supérieure in Paris. Rubidium atoms were sent through two separate interaction zones (resonant interaction in the first and the third cavity) separated by a phase inducing dispersive interaction zone (the central cavity). The atoms were subsequently measured, via a selective ionisation, and found to be in one of the two preselected energy states, here labeled as |0\rangle and |1\rangle. The fraction of atoms found in states |0\rangle or |1\rangle showed a clear dependence on the phase shifts induced by the dispersive interaction in the central cavity. In 2012, Serge Haroche and Dave Wineland shared the Nobel Prize in physics for “ground-breaking experimental methods that enable measuring and manipulation of individual quantum systems.”

The three rectangular boxes in Figure 1.1 represent three cavities, each cavity being an arrangement of mirrors which traps electromagnetic field (think about standing waves in between two mirrors). The oval shapes represent rubidium atoms with two preselected energy states labelled as |0\rangle and |1\rangle. Each atom is initially prepared in a highly excited internal energy state |0\rangle and zips through the three cavities, from the left to the right. In each cavity the atom interacts with the cavity field. The first and the third cavities are, for all theoretical purposes, identical: their frequencies are tuned to the resonant frequency of the atom, and the atom exchanges energy with the cavity, going back and forth between its energy states |0\rangle and |1\rangle. In contrast, in the second (central) cavity, the atom undergoes the so-called dispersive interaction: it is too off-resonance to exchange energy with the field but its energy states “feel” the field and acquire phase shifts. After experiencing this well timed sequence of resonant–dispersive–resonant interactions, the energy of the atom is measured and the atom is found to be either in state |0\rangle or state |1\rangle. The fraction of atoms found in state |0\rangle or |1\rangle shows a clear dependence on the phase shifts induced by the dispersive interaction in the central cavity.

We can understand this interference better if we follow the two internal states of the atom as it moves through the three cavities.

Suppose we are interested in the probability that the atom, initially in state |0\rangle, will be found, after completing its journey through the three cavities, in state |1\rangle. As you can see in Figure 1.2, this can happen in two ways, as indicated by the two red paths connecting the input state |0\rangle on the left with the output state |1\rangle on the right. Again, let U_{ij} denote the probability amplitude that input |j\rangle generates output |i\rangle (for i,j=0,1). We can see from the diagram that \begin{aligned} U_{10} &= \frac{1}{\sqrt2} e^{i\varphi_0}\frac{1}{\sqrt2} + \frac{1}{\sqrt2} e^{i\varphi_1}\frac{-1}{\sqrt2} \\&= \frac12 e^{i\varphi_0} - \frac12 e^{i\varphi_1} \\&= -ie^{i\frac{\varphi_0+\varphi_1}{2}}\sin\frac{\varphi}{2}, \end{aligned} where \varphi = \varphi_1-\varphi_0 is the relative phase. The corresponding probability reads12 \begin{aligned} P_{10} &= \vert U_{10}\vert^2 \\&= \left\vert \frac12 e^{i\varphi_0} - \frac12 e^{i\varphi_1}\right\vert^2 \\&= \frac12 - \frac12\cos\varphi. \end{aligned} You should recognise the first term, \frac12, as the “classical” probability and the second one, -\frac12\cos\varphi, as the interference term. We can repeat such calculations for any other pair of input–output states. This approach works fine here but, in general, tracking all possible paths in evolving quantum systems can become messy when the number of input and output states increases. There is, however, a neat way of doing it via matrix multiplication.

The effect of each interaction on atomic states can be described by a matrix of transition amplitudes, as illustrated in Figure 1.3. Then a sequence of independent interactions is described by the product of these matrices. \begin{aligned} U &= \begin{bmatrix} \frac{1}{\sqrt2} & \frac{1}{\sqrt2} \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2} \end{bmatrix} \begin{bmatrix} e^{i\varphi_0} & 0 \\0 & e^{i\varphi_1} \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt2} & \frac{1}{\sqrt2} \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2} \end{bmatrix} \\&= e^{i\frac{\varphi_0+\varphi_1}{2}} \begin{bmatrix} \cos\frac{\varphi}{2} & -i\sin\frac{\varphi}{2} \\\ -i\sin\frac{\varphi}{2}& \cos\frac{\varphi}{2} \end{bmatrix}, \end{aligned} where \varphi = \varphi_1-\varphi_0, as before.

In general, quantum operation A followed by another quantum operation B is a quantum operation described by the matrix product BA (watch the order of matrices). Indeed, the expression (BA)_{ij}=\sum_k B_{ik}A_{kj} is the sum over amplitudes that input |j\rangle generates output |i\rangle via a specific intermediate state |k\rangle. As you can see, the matrix approach is a wonderful bookkeeping tool for in one swap it takes care of both multiplying and adding probability amplitudes corresponding to all the contributing paths.

## 1.5 Qubits, gates, and circuits

Atoms, trapped ions, molecules, nuclear spins and many other quantum objects with two pre-selected basis states labeled as |0\rangle and |1\rangle (from now on we will call such objects quantum bits or qubits) can be used to implement simple quantum interference. There is no need to learn about physics behind these diverse technologies if all you want is to understand the basics of quantum theory. We may now conveniently forget about any specific experimental realisation of a qubit and represent a generic single qubit interference graphically as a circuit diagram:13

This diagram should be read from left to right. The horizontal line represents a qubit that is inertly carried from one quantum operation to another. We often call this line a quantum wire. The wire may describe translation in space (e.g. atoms travelling through cavities) or translation in time (e.g. a sequence of operations performed on a trapped ion). The boxes or circles on the wire represent elementary quantum operations, called quantum logic gates. Here we have two types of gates: two Hadamard gates H (think about resonant interactions) and one phase gate P_\varphi (think about dispersive interaction), where14 H=\begin{bmatrix} \frac{1}{\sqrt2} & \frac{1}{\sqrt2} \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2} \end{bmatrix} \quad\text{and}\quad P_\varphi = \begin{bmatrix} 1 & 0 \\0 & e^{i\varphi} \end{bmatrix}.

The input qubits appear as state vectors on the left side of circuit diagrams, and the output qubits as state vectors on the right. The product of the three matrices HP_\varphi H (see Figure 1.3) describes the action of the whole circuit: it maps input state vectors to output state vectors15: \begin{array}{lcr} |0\rangle & \longmapsto & \cos\frac{\varphi}{2}|0\rangle - i\sin\frac{\varphi}{2}|1\rangle, \\|1\rangle & \longmapsto &- i\sin\frac{\varphi}{2}|0\rangle + \cos\frac{\varphi}{2}|1\rangle. \end{array}

## 1.6 Quantum decoherence

We do need quantum theory to describe many physical phenomena, but, at the same time, there are many other phenomena where the classical theory of probability works pretty well. We hardly see quantum interference on a daily basis. Why? The answer is decoherence. The addition of probability amplitudes, rather than probabilities, applies to physical systems which are completely isolated. However, it is almost impossible to isolate a complex quantum system, such as a quantum computer, from the rest of the world. There will always be spurious interactions with the environment, and when we add amplitudes, we have to take into account not only different configurations of the physical system at hand, but also different configurations of the environment.

For example, consider an isolated system composed of a quantum computer and its environment. The computer is prepared in some input state I and generates output O. Let us look at the following two scenarios:

1. The computer is isolated and quantum computation does not affect the environment. The computer and the environment evolve independently from each other and, as a result, the environment does not hold any physical record of how the computer reached output O. In this case we add the amplitudes for each of the two alternative computational paths.

1. Quantum computation affects the environment. The environment now holds a physical record of how the computer reached output O, which results in two final states of the composed system (computer + environment) which we denote O_1 and O_2. We add the probabilities for each of the two alternative computational paths.

When quantum computation affects the environment, we have to include the environment in our analysis for it now takes part in the computation. Depending on which computational path was taken, the environment may end up in two distinct states. The computer itself may show output O, but when we include the environment we have not one, but two outputs, O_1 and O_2, denoting, respectively, “computer shows output O and the environment knows that path 1 was taken” and “computer shows output O and the environment knows that path 2 was taken”. There are no alternative ways of reaching O_1 or O_2, hence there is no interference, and the corresponding probabilities read p_1=|z_1|^2 for O_1, and p_2=|z_2|^2 for O_2. The probability that the computer shows output O, regardless the state of the environment, is the sum of of the two probabilities: p=p_1+p_2. We have lost the interference term and any advantages of quantum computation along with it. In the presence of decoherence, the interference formula in Equation (1.2.1.1) is modified and reads p = p_1 + p_2 + 2 v \sqrt{p_1 p_2}\cos (\varphi_2-\varphi_1), where the parameter v, called the visibility of the interference pattern, ranges from 0 (the environment can perfectly distinguish between the two paths, total decoherence, no interference) to 1 (the environment cannot distinguish between the two paths, no decoherence, full interference), with the values in between corresponding to partial decoherence.

We shall derive this formula later on, and you will see that v quantifies the degree of distinguishability between O_1 and O_2. The more the environment knows about which path was taken the less interference we see.

Decoherence suppresses quantum interference.

Decoherence is chiefly responsible for our classical description of the world — without interference terms we may as well add probabilities instead of amplitudes. While decoherence is a serious impediment to building quantum computers, depriving us of the power of quantum interference, it is not all doom and gloom: there are clever ways around decoherence, such as quantum error correction and fault-tolerant methods we will meet later.

## 1.7 Computation: deterministic, probabilistic, and quantum

Take one physical bit or a qubit. It has two logical states: |0\rangle and |1\rangle. Bring another qubit and the combined systems has four logical states |00\rangle, |01\rangle,|10\rangle and |11\rangle. In general n qubits will give us 2^n states representing all possible binary strings of length n. It is important to use subsystems — here qubits — rather than one chunk of matter, for operating on at most n qubits we can reach any of the 2^n states of the composed system. Now, let the qubits interact in a controllable fashion. We are computing. Think about computation as a physical process that evolves a prescribed initial configuration of a computing machine, called \texttt{INPUT}, into some final configuration, called \texttt{OUTPUT}. We shall refer to the configurations as states. Figure 1.4 shows five consecutive computational steps performed on four distinct states.

That computation was deterministic: every time you run it with the same input, you get the same output. But a computation does not have to be deterministic — we can augment a computing machine by allowing it to “toss an unbiased coin” and to choose its steps randomly. It can then be viewed as a directed16 tree-like graph where each node corresponds to a state of the machine, and each edge represents one step of the computation, as shown in Figure 1.5

The computation starts from some initial state (\texttt{INPUT}) and it subsequently branches into other nodes representing states reachable with non-zero probability from the initial state. The probability of a particular final state (\texttt{OUTPUT}) being reached is equal to the sum of the probabilities along all mutually exclusive paths which connect the initial state with that particular state. Figure 1.5 shows only two computational paths, but, in general, there could be many more paths (here, up to 256) contributing to the final probability. Quantum computation can be represented by a similar graph, as in 1.6.

For quantum computations, we associate with each edge in the graph the probability amplitude that the computation follows that edge. The probability amplitude of a particular path to be followed is the product of amplitudes pertaining to transitions in each step. The probability amplitude of a particular final state being reached is equal to the sum of the amplitudes along all mutually exclusive paths which connect the initial state with that particular state: z = \sum_{\mathrm{all\,paths}\,k} z_k. The resulting probability, as we have just seen, is the sum of the probabilities pertaining to each computational path p_k modified by the interference terms: \begin{aligned} p &= |z|^2 \\&= \sum_{k,j} z_j^\star z_k \\&= \sum_k p_k + \sum_{k\ne j} \sqrt{p_k p_j}\cos(\varphi_k-\varphi_j). \end{aligned}

Quantum computation can be viewed as a complex multi-particle quantum interference involving many computational paths through a computing device. The art of quantum computation is to shape quantum interference, through a sequence of computational steps, enhancing probabilities of correct outputs and suppressing probabilities of the wrong ones.

## 1.8 Computational complexity

Is there a compelling reason why we should care about quantum computation? It may sound like an extravagant way to compute something that can be computed anyway. Indeed, your standard laptop, given enough time and memory, can simulate pretty much any physical process. In principle, it can also simulate any quantum interference and compute everything that quantum computers can compute. The snag is, this simulation, in general, is very inefficient. And efficiency does matter, especially if you have to wait more than the age of the Universe for your laptop to stop and deliver an answer!17

In order to solve a particular problem, computers (classical or quantum) follow a precise set of instructions called an algorithm. Computer scientists quantify the efficiency of an algorithm according to how rapidly its running time, or the use of memory, increases when it is given ever larger inputs to work on. An algorithm is said to be efficient if the number of elementary operations taken to execute it increases no faster than a polynomial function of the size of the input.18 We take the input size to be the total number of binary digits (bits) needed to specify the input. For example, using the algorithm taught in elementary school, one can multiply two n digit numbers in a time that grows like the number of digits squared, n^2. In contrast, the fastest-known method for the reverse operation — factoring an n-digit integer into prime numbers — takes a time that grows exponentially, roughly as 2^n. That is considered inefficient.

The class of problems that can be solved by a deterministic computer in polynomial time is represented by the capital letter \texttt{P}, for polynomial time. The class of problems that can be solved in polynomial time by a probabilistic computer is called \texttt{BPP}, for bounded-error probabilistic polynomial time. It is clear that \texttt{BPP} contains \texttt{P}, since a deterministic computation is a special case of a probabilistic computation in which we never consult the source of randomness. When we run a probabilistic (a.k.a. randomised) computation many times on the same input, we will not get the same answer every time, but the computation is useful if the probability of getting the right answer is high enough. Finally, the complexity class \texttt{BQP}, for bounded-error quantum polynomial, is the class of problems that can be solved in polynomial time by a quantum computer.

Since a quantum computer can easily generate random bits and simulate a probabilistic classical computer, \texttt{BQP} certainly contains the class \texttt{BPP}. Here we are interested in problems that are in \texttt{BQP} but not known to be in \texttt{BPP}. The most popular example of such a problem is factoring.

A quantum algorithm, discovered by Peter Shor in 1994, can factor n-digit numbers in a number of steps that grows only as n^2, as opposed to the 2^n that we have classically.19 Since the intractability of factorisation underpins the security of many methods of encryption, Shor’s algorithm was soon hailed as the first `killer application’ for quantum computation: something very useful that only a quantum computer could do. Since then, the hunt has been on for interesting things for quantum computers to do, and at the same time, for the scientific and technological advances that could allow us to build quantum computers.

## 1.9 Outlook

When the physics of computation was first investigated, starting in the 1960s, one of the main motivations was a fear that quantum-mechanical effects might place fundamental bounds on the accuracy with which physical objects could render the properties of the abstract entities, such as logical variables and operations, that appear in the theory of computation. It turned out, however, that quantum mechanics itself imposes no significant limits, but does break through some of those that classical physics imposed. The quantum world has a richness and intricacy that allows new practical technologies, and new kinds of knowledge. In this course we will merely scratch the surface of the rapidly developing field of quantum computation. We will concentrate mostly on the fundamental issues and skip many experimental details. However, it should be mentioned that quantum computing is a serious possibility for future generations of computing devices. At present it is not clear how and when fully-fledged quantum computers will eventually be built; but this notwithstanding, the quantum theory of computation already plays a much more fundamental role in the scheme of things than its classical predecessor did. I believe that anyone who seeks a fundamental understanding of either physics, computation or logic must incorporate its new insights into their world view.

## 1.10Remarks and exercises

### 1.10.1 A historical remark

Back in 1926, Max Born simply postulated the connection between amplitudes and probabilities, but did not get it quite right on his first attempt. In the original paper20 proposing the probability interpretation of the state vector (wavefunction) he wrote:

… If one translates this result into terms of particles only one interpretation is possible. \Theta_{\eta,\tau,m}(\alpha,\beta,\gamma) [the wavefunction for the particular problem he is considering] gives the probability^* for the electron arriving from the z direction to be thrown out into the direction designated by the angles \alpha,\beta,\gamma

^* Addition in proof: More careful considerations show that the probability is proportional to the square of the quantity \Theta_{\eta,\tau,m}(\alpha,\beta,\gamma).

### 1.10.2 Modifying the Born rule

Suppose that we modified the Born rule, so that probabilities were given by the absolute values of amplitudes raised to power p (for some p not necessarily equal to 2). Then admissible physical evolutions would still have to preserve the normalisation of probability: mathematically speaking, they would have to be isometries of p-norms.

Recall that the p-norm of vector v, with components v_1, v_2,\ldots, v_n, is defined as \sqrt[{}^p]{|v_1|^p + |v_2|^p + \ldots + |v_n|^p}. It is clear that any permutation of vector components and multiplication by phase factors (i.e. unit complex numbers) will leave any p-norm unchanged. It turns out that these complex permutations are the only isometries, except for one special case! For p=2, the isometries are unitary operations, which form a continuous group; in all other cases we are restricted to discrete permutations. We do not have to go into details of the proof since we can see this result.

In particular, the image of the unit sphere must be preserved under probability preserving operations. As we can see in Figure 1.7, the 2-norm is special because of its rotational invariance — the probability measure picks out no preferred basis in the space of state vectors. Moreover, it respects unitary operations and does not restrict them in any way. If the admissible physical evolutions were restricted to discrete symmetries, e.g. permutations, then there would be no continuity, and no concept of “time” as we know it.

### 1.10.3 Complex numbers

Complex numbers have many applications in physics, however, not until the advent of quantum theory was their ubiquitous and fundamental role in the description of the actual physical world so evident. Even today, their profound link with probabilities appears to be a rather mysterious connection. Mathematically speaking, the set of complex numbers is a field. This is an important algebraic structure used in almost all branches of mathematics. You do not have to know much about algebraic fields to follow these lectures, but still, you should know the basics. Look them up.

1. The set of rational numbers and the set of real numbers are both fields, but the set of integers is not. Why?
2. What does it mean to say that the field of complex numbers is algebraically closed?
3. Evaluate each of the following quantities: 1+e^{-i\pi}, \quad |1+i|, \quad (1+i)^{42}, \quad \sqrt{i}, \quad 2^i, \quad i^i.
4. Here is a simple proof that +1=-1: 1=\sqrt{1}=\sqrt{(-1)(-1)}=\sqrt{-1}\sqrt{-1}=i^2=-1. What is wrong with it?

### 1.10.4 Many computational paths

A quantum computer starts calculations in some initial state, then follows n different computational paths which lead to the final output. The computational paths are followed with probability amplitudes \frac{1}{\sqrt n}e^{i k \varphi}, where \varphi is a fixed angle 0< \varphi <2\pi and k=0,1,...n-1. Show that the probability of generating the output is21 \frac{1}{n}\left\vert \frac{1-e^{i n\varphi}}{1-e^{i\varphi}} \right\vert^2 = \frac{1}{n} \frac{\sin^2 (n\frac{\varphi}{2})}{\sin^2 (\frac{\varphi}{2})}. for 0<\varphi<2\pi, and 1 for \varphi=0. Plot the probability as a function of \varphi.

### 1.10.5 Distant photon emitters

Imagine two distant stars, A and B, that emit identical photons. If you point a single detector towards them you will register a click every now and then, but you never know which star the photon came from. Now prepare two detectors and point them towards the stars. Assume the photons arrive with the probability amplitudes specified in Figure 1.8. Every now and then you will register a coincidence: the two detectors will fire.

1. Calculate the probability of a coincidence.
2. Now, assume that z\approx \frac{1}{r}e^{i\frac{2r\pi}{\lambda}}, where r is the distance between detectors and the stars. How can we use this to measure r?

### 1.10.6 Physics against logic?

Now that we have poked our heads into the quantum world, let us see how quantum interference challenges conventional logic and leads to qualitatively different computations. Consider the following task (which we will return to a few more times in later chapters): design a logic gate that operates on a single bit such that, when it is followed by another, identical, logic gate, the output is always the negation of the input. Let us call this logic gate the square root of \texttt{NOT}, or \sqrt{\texttt{NOT}}. A simple check, such as an attempt to construct a truth table, should persuade you that there is no such operation in logic. It may seem reasonable to argue that since there is no such operation in logic, \sqrt{\texttt{NOT}} is impossible. Think again!

Figure 1.9 shows a simple computation, two identical computational steps performed on two states labelled as 0 and 1, i.e. on one bit. An interplay of constructive and destructive interference makes some transitions impossible and the result is the logical \texttt{NOT}. Thus, quantum theory declares, the square root of \texttt{NOT} is possible. And it does exist! Experimental physicists routinely construct this and many other “impossible” gates in their laboratories. They are the building blocks of a quantum computer. Quantum theory explains the behaviour of \sqrt{\texttt{NOT}}, hence, reassured by the physical experiments that corroborate this theory, logicians are now entitled to propose a new logical operation \sqrt{\texttt{NOT}}. Why? Because a faithful physical model for it exists in nature.

Write a 2\times 2 matrix which describes the \sqrt{\texttt{NOT}} operation. Is there just one such a matrix? Suppose you are given a supply of Hadamard and phase gates with tuneable phase settings. How would you construct the \sqrt{\texttt{NOT}} gate?

### 1.10.7 Quantum bomb tester

You have been drafted by the government to help in the demining effort in a former war-zone.22 In particular, retreating forces have left very sensitive bombs in some of the sealed rooms. The bombs are configured such that if even one photon of light is absorbed by the fuse (i.e. if someone looks into the room), the bomb will go off. Each room has an input and output port which can be hooked up to external devices. An empty room will let light go from the input to the output ports unaffected, whilst a room with a bomb will explode if light is shone into the input port and the bomb absorbs even just one photon — see Figure 1.10.

Your task is to find a way of determining whether a room has a bomb in it without blowing it up, so that specialised (limited and expensive) equipment can be devoted to defusing that particular room. You would like to know with certainty whether a particular room had a bomb in it.

1. To start with, consider the setup in Figure 1.11, where the input and output ports are hooked up in the lower arm of a Mach–Zehnder interferometer.23
1. Assume an empty room. Send a photon to input port |0\rangle. Which detector, at the output port, will register the photon?
2. Now assume that the room does contain a bomb. Again, send a photon to input port |0\rangle. Which detector will register the photon and with which probability?
3. Design a scheme that allows you — at least some of the time — to decide whether a room has a bomb in it without blowing it up. If you iterate the procedure, what is its overall success rate for the detection of a bomb without blowing it up?
1. Assume that the two beam splitters in the interferometer are different. Say the first beam-splitter reflects incoming light with probability r and transmits with probability t=1-r, and the second one transmits with probability r and reflects with probability t. Would the new setup improve the overall success rate of the detection of a bomb without blowing it up?

2. There exists a scheme, involving many beam-splitters and something called the quantum Zeno effect, such that the success rate for detecting a bomb without blowing it up approaches 100%. Try to work it out, or find a solution on the internet.

### 1.10.8 More time, more memory

A quantum machine has N perfectly distinguishable configurations. What is the maximum number of computational paths connecting a specific input with a specific output after k steps of the machine?

Suppose you are using your laptop to add together amplitudes pertaining to each of the paths. As k and N increase you may need more time and more memory to complete the task. How does the execution time and the memory requirements grow with k and N? In particular, will you need more time, or more memory, or both?

### 1.10.9 Quantum Turing machines

The classical theory of computation is essentially the theory of the universal Turing machine — the most popular mathematical model of classical computation. Its significance relies on the fact that, given a large but finite amount of time, the universal Turing machine is capable of any computation that can be done by any modern classical digital computer, no matter how powerful. The concept of Turing machines may be modified to incorporate quantum computation, but we will not follow this path. It is much easier to explain the essence of quantum computation talking about quantum logic gates and quantum Boolean networks or circuits. The two approaches are computationally equivalent, even though certain theoretical concepts, e.g. in computational complexity, are easier to formulate precisely using the Turing machine model. The main advantage of quantum circuits is that they relate far more directly to proposed experimental realisations of quantum computation.

### 1.10.10 Polynomial = good; exponential = bad

In computational complexity the basic distinction is between polynomial and exponential algorithms. Polynomial growth is good and exponential growth is bad, especially if you have to pay for it. There is an old story about the legendary inventor of chess who asked the Persian king to be paid only by a grain of cereal, doubled on each of the 64 squares of a chess board. The king placed one grain of rice on the first square, two on the second, four on the third, and he was supposed to keep on doubling until the board was full. The last square would then have 2^{63}=9,223,372,036,854,775,808 grains of rice, more than has been ever harvested on planet Earth, to which we must add the grains of all previous squares, making the total number about twice as large. If we placed that many grains in an unbroken line we would reach the nearest star Alpha Centauri, our closest celestial neighbour beyond the solar system, about 4.4 light-years away.24 The moral of the story: if whatever you do requires an exponential use of resources, you are in trouble.

### 1.10.11 Big O

In order to make qualitative distinctions between how different functions grow we will often use the asymptotic big-O notation. For example, suppose an algorithm running on input of size n takes a n^2+bn+c elementary steps, for some positive constants a, b and c. These constants depend mainly on the details of the implementation and the choice of elementary steps. What we really care about is that, for large n, the whole expression is dominated by its quadratic term. We then say that the running time of this algorithm grows as n^2, and we write it as O(n^2), ignoring the less significant terms and the constant coefficients. More precisely, let f(n) and g(n) be functions from positive integers to positive reals. You may think of f(n) and g(n) as the running times of two algorithms on inputs of size n. We say f=O(g),25 which means that f grows no faster than g, if there is a constant c>0 such that f(n)\leqslant c g(n) for all sufficiently large values of n. Essentially, f=O(g) is a very loose analogue of f \leqslant g. In addition to the big-O notation, computer scientists often use \Omega for lower bounds: f=\Omega (g) means g=O(f). Again, this is a very loose analogue of f \geqslant g.

1. When we say that f(n)=O(\log n), why don’t we have to specify the base of the logarithm?

2. Let f(n)=5n^3+1000n+50. Is f(n)=O(n^3), or O(n^4), or both?

3. Which of the following statements are true?

1. n^k=O(2^n) for any constant k
2. n!=O(n^n)
3. if f_1=O(g) and f_2=O(g), then f_1+f_2=O(g).

### 1.10.12 Imperfect prime tester

There exists a randomised algorithm which tests whether a given number N is prime.26 The algorithm always returns \texttt{yes} when N is prime, and the probability it returns \texttt{yes} when N is not prime is \epsilon, where \epsilon is never greater than a half (independently, each time you run the algorithm). You run this algorithm (for the same N) r times and each time the algorithm returns \texttt{yes}. What is the probability that N is not prime?

### 1.10.13 Imperfect decision maker

Suppose a randomised algorithm solves a decision problem, returning \texttt{yes} or \texttt{no} answers. It gets the answer wrong with a probability not greater than \frac12-\delta, where \delta>0 is a constant.27. If we are willing to accept a probability of error no larger than \epsilon, then it suffices to run the computation r times, where r=O(\log 1/\epsilon).

1. If we perform this computation r times, how many possible sequences of outcomes are there?
2. Give a bound on the probability of any particular sequence with w wrong answers.
3. If we look at the set of r outcomes, we will determine the final outcome by performing a majority vote. This can only go wrong if w>r/2. Give an upper bound on the probability of any single sequence that would lead us to the wrong conclusion.
4. Using the bound 1-x\leqslant e^{-x}, conclude that the probability of our coming to the wrong conclusion is upper bounded by e^{-2r\delta^2}.

1. Computation is a physical process. Computation is a physical process. Computation is …↩︎

2. We will, however, amend the two rules later on when we touch upon particle statistics.↩︎

3. I always found it an interesting coincidence that the two basic ingredients of modern quantum theory, namely probability and complex numbers, were discovered by the same person, an extraordinary man of many talents: a gambling scholar by the name of Girolamo Cardano (1501–1576).↩︎

4. According to the philosopher Karl Popper (1902–1994) a theory is genuinely scientific only if it is possible, in principle, to establish that it is false. Genuinely scientific theories are never finally confirmed because no matter how many confirming observations have been made observations that are inconsistent with the empirical predictions of the theory are always possible.↩︎

5. Dirac notation will likely be familiar to physicists, but may look odd to mathematicians or computer scientists. Love it or hate it (and I suggest the former), the notation is so common that you simply have no choice but to learn it, especially if you want to study anything related to quantum theory.↩︎

6. From the classical probability theory perspective the resonant interaction induces a random switch between |0\rangle and |1\rangle (why?) and the dispersive interaction has no effect on these two states (why?). Hence, one random switch followed by another random switch gives exactly a single random switch, which gives \frac12 for the probability that input |0\rangle becomes output |1\rangle.↩︎

7. Do not confuse the interference diagrams of Figure 1.1 and Figure 1.3 with the circuit diagram. In the circuit diagrams, which we will use a lot from now on, a single qubit is represented by a single line.↩︎

8. Global phase factors are irrelevant, it is the relative phase \varphi =\varphi_1-\varphi_0 that matters. In a single qubit phase gate we usually factor out e^{i\varphi_0}, which leaves us with the two diagonal entries: 1 and e^{i\varphi}.↩︎

9. HP_\varphi H =\begin{bmatrix}\cos\frac{\varphi}{2} & -i\sin\frac{\varphi}{2}\\\ -i\sin\frac{\varphi}{2}& \cos\frac{\varphi}{2}\end{bmatrix}↩︎

10. So we read left to right, and omit the arrowheads.↩︎

11. The age of the Universe is currently estimated at 13.772 billion years.↩︎

12. Note that the technological progress alone, such as increasing the speed of classical computers, will never turn an inefficient algorithm (exponential scaling) into an efficient one (polynomial scaling). Why?↩︎

13. It must be stressed that not all quantum algorithms are so efficient, in fact many are no faster than their classical counterparts. Which particular problems will lend themselves to quantum speed-ups is an open question.↩︎

14. Max Born, “Zur Quantenmechanik der Stoßvorgänge”, Zeitschrift für Physik 37 (1926), 893–867.↩︎

15. 1+z+z^2+\ldots + z^n= \frac{1-z^{n+1}}{1-z}↩︎

16. This is a slightly modified version of a bomb testing problem described by Avshalom Elitzur and Lev Vaidman in Quantum-mechanical interaction-free measurement, Found. Phys. 47 (1993), 987-997.↩︎

17. Read about Mach–Zehnder interferometers in Chapter 3.↩︎

18. One light year (the distance that light travels through a vacuum in one year) is 9.4607\times10^{15} metres.↩︎

19. f=O(g) is pronounced as “f is big-oh of g”.↩︎

20. Primality used to be given as the classic example of a problem in \texttt{BPP} but not \texttt{P}. However, in 2002 a deterministic polynomial time test for primality was proposed by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. Thus, since 2002, primality has been in \texttt{P}.↩︎

21. This result is known as the Chernoff bound.↩︎