8.2 Statistical mixtures

Let us start with probability distributions¹⁵⁷ over state vectors. Suppose Alice prepares a quantum system and hands it over to Bob, who subsequently measures observable M. If Alice’s preparation is described by a state vector |\psi\rangle, then, quantum theory declares, the average value of any observable M is given by \langle\psi|M|\psi\rangle, which we have previously also written as¹⁵⁸ \langle M\rangle = \langle\psi|M|\psi\rangle = \operatorname{tr}M|\psi\rangle\langle\psi|.

This way of expressing the average value makes a clear separation between the contributions from the state preparation and from the choice of the measurement. We have two operators inside the trace: |\psi\rangle\langle\psi| describes the state preparation, and M describes the measurement.

Now, suppose Alice prepares the quantum system in one of the (normalised, but not necessarily orthogonal) states |\psi_1\rangle,\ldots,|\psi_m\rangle, choosing state |\psi_i\rangle with probability p_i. She then hands the system to Bob without telling him which state she chose. We call this situation a (statistical) mixture of the states |\psi_i\rangle, or a mixed state for short.¹⁵⁹

It is important to note that a mixture of states is very different from a superposition of states: a superposition always yields a definite state vector, whereas a mixture does not, and so must be described by a density operator.

Let’s be extra clear about this distinction between superpositions and statistical mixtures. If Alice had prepared the system in the superposition \sum_i p_i|\psi_i\rangle, then both her and Bob would describe it by the state vector \sum_i p_i|\psi_i\rangle. If she instead follows the above random procedure, then she knows that it is simply described by the state vector |\psi_i\rangle, but the best “description”¹⁶⁰ available to Bob is \sum_i p_i|\psi_i\rangle\langle\psi_i|, as we will now justify.

What Bob does know is the ensemble of states |\psi_1\rangle,\ldots,|\psi_m\rangle as well as the corresponding probability distribution p_1,\ldots,p_m. Using this, he can calculate \langle M\rangle as follows: \begin{aligned} \langle M\rangle &= \sum_i p_i\left( \operatorname{tr}M|\psi_i\rangle\langle\psi_i| \right) \\&= \operatorname{tr}M \left( \sum_i p_i|\psi_i\rangle\langle\psi_i| \right) \\&=\operatorname{tr}M\rho \end{aligned} where we have simply defined \rho=\sum_i p_i|\psi_i\rangle\langle\psi_i|. As before, we have two operators under the trace: \rho=\sum_i p_i|\psi_i\rangle\langle\psi_i|, which pertains to the state preparation, and M, which describes the measurement. We shall call the operator \rho = \sum_i p_i |\psi_i\rangle\langle\psi_i| the associated density operator, since it has all the defining properties of a density operator (it is a convex sum of rank-one projectors). It depends on the constituent states |\psi_i\rangle and their probabilities, and it describes our ignorance about the state preparation. Conversely, given a density operator \rho, then we call a set \{(p_i,|\psi_i\rangle\langle\psi_i|)\} a convex decomposition if it expresses \rho as a convex sum of rank-one projectors, i.e. if \rho=\sum_i p_i|\psi_i\rangle\langle\psi_i|.

Once we have \rho we can make statistical predictions: we have just shown that, for any observable M, its expected value is given by \langle M\rangle = \operatorname{tr}M\rho. So the exact composition of the mixture does not enter this formula: for computing the statistics associated with any observable property of a system, all that matters is the density operator itself, but not its decomposition into the mixture of states. This is important because any given density operator, with the remarkable exception of a pure state, can arise from many different mixtures of pure states. Consider, for example, the following three scenarios:

Alice flips a fair coin. If the result is heads then she prepares the qubit in the state |0\rangle, and if the result is tails then she prepares the qubit in the state |1\rangle. She gives Bob the qubit without revealing the result of the coin-flip. Bob’s knowledge of the qubit is described by the density matrix \frac{1}{2}|0\rangle\langle 0| + \frac{1}{2}|1\rangle\langle 1| = \begin{bmatrix} \frac{1}{2} & 0 \\0 & \frac{1}{2} \end{bmatrix}.
Alice flips a fair coin. If the result is heads then she prepares the qubit in the state |+\rangle\coloneqq\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle), and if the result is tails then she prepares the qubit in the state |-\rangle\coloneqq\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle). Bob’s knowledge of the qubit is now described by the density matrix \begin{aligned} \frac{1}{2}|+\rangle\langle+| + \frac{1}{2}|-\rangle\langle-| &= \frac{1}{2} \begin{bmatrix} \frac{1}{2} & \frac{1}{2} \\\frac{1}{2} & \frac{1}{2} \end{bmatrix} +\frac{1}{2} \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} \\-\frac{1}{2} & \frac{1}{2} \end{bmatrix} \\&= \begin{bmatrix} \frac{1}{2} & 0 \\0 & \frac{1}{2} \end{bmatrix}. \end{aligned}
Alice flips a fair coin, having already picked an arbitrary pair of orthonormal states |u_1\rangle and |u_2\rangle. If the result is heads then she prepares the qubit in the state |u_1\rangle, and if the result is tails then she prepares the qubit in the state |u_2\rangle. Since any two orthonormal states of a qubit form a complete basis, the mixture \frac{1}{2}|u_1\rangle\langle u_1|+\frac{1}{2}|u_2\rangle\langle u_2| gives \frac{1}{2}\mathbf{1}.

As you can see, these three different preparations yield precisely the same density matrix and are thus statistically indistinguishable. In general, two different mixtures can be distinguished (in a statistical, experimental sense) if and only if they yield different density matrices. In fact, the optimal way of distinguishing quantum states with different density operators is still an active area of research.

For brevity, we often simply say “probability distribution” to mean “a finite set of non-negative real numbers p_k such that \sum_k p_k=1”.↩︎
If M is one of the orthogonal projectors P_k describing the measurement, then the average \langle P_k\rangle is the probability of the outcome k associated with this projector.↩︎
A pure state can be seen as a special case of a mixed state, where all but one the probabilities p_i equal zero. So by talking about mixed states, we’re still able to talk about everything that we’ve already seen up to this point.↩︎
This description is not one that we have seen before — it’s not a linear combination of kets, but instead a linear combination of projectors!↩︎