6.2 Statistical mixtures

Let us start with probability distributions over state vectors. Suppose Alice prepares a quantum system and hands it over to Bob who subsequently measures observable M. If Alice’s preparation is described by a state vector |\psi\rangle, then, quantum theory declares, the average value of any observable M is given by \langle\psi|M|\psi\rangle, which can be also written as88 \langle M\rangle = \operatorname{tr}M|\psi\rangle\langle\psi|. This way of expressing the average value makes a clear separation between the contributions from the state preparation and from the choice of the measurement. We have two operators under the trace: one of them, |\psi\rangle\langle\psi|, describes the state preparation, and the other one, M, the measurement. Now, suppose Alice prepares the quantum system in one of the states |\psi_1\rangle,\ldots,|\psi_m\rangle, choosing state |\psi_i\rangle with probability p_i, and hands the system to Bob without telling him which state was chosen. The possible states |\psi_i\rangle are normalised but need not be orthogonal. We call this situation a mixture of the states |\psi_i\rangle, or a mixed state for short.

Remember, a mixture of states is very different from a superposition of states: a superposition always yields a definite state vector, whereas a mixture does not, and so must be described by a density operator.

Bob knows the ensemble of states |\psi_1\rangle,\ldots,|\psi_m\rangle and the corresponding probability distribution p_1,\ldots,p_m, and can hence calculate \langle M\rangle as89 \begin{aligned} \langle M\rangle &= \sum_i p_i\left( \operatorname{tr}M|\psi_i\rangle\langle\psi_i| \right) \\&= \operatorname{tr}M \underbrace{\left( \sum_i p_i|\psi_i\rangle\langle\psi_i| \right)}_{\rho} \\&=\operatorname{tr}M\rho. \end{aligned} Again, we have two operators under the trace: \rho=\sum_i p_i|\psi_i\rangle\langle\psi_i|, which pertains to the state preparation, and M, which describes the measurement. We shall call the operator \rho = \sum_i p_i |\psi_i\rangle\langle\psi_i| the density operator, since it has all the defining properties of the density operator (the convex sum of rank one projectors). It depends on the constituent states |\psi_i\rangle and their probabilities, and it describes our ignorance about the state preparation.

Once we have \rho we can make statistical predictions: for any observable M we have \langle M\rangle = \operatorname{tr}M\rho. We see that the exact composition of the mixture does not enter this formula: for computing the statistics associated with any observable property of a system, all that matters is the density operator itself, and not its decomposition into the mixture of states. This is important because any given density operator, with the remarkable exception of a pure state, can arise from many different mixtures of pure states. Consider, for example, the following three scenarios:

  1. Alice flips a fair coin. If the result is \texttt{Heads} then she prepares the qubit in the state |0\rangle, and if the result is \texttt{Tails} then she prepares the qubit in the state |1\rangle. She gives Bob the qubit without revealing the result of the coin-flip. Bob’s knowledge of the qubit is described by the density matrix \frac12|0\rangle\langle 0| + \frac12|1\rangle\langle 1| = \begin{bmatrix} \frac12 & 0 \\0 & \frac12 \end{bmatrix}.

  2. Suppose Alice flips a fair coin, as before, but now if the result is \texttt{Heads} then she prepares the qubit in the state |\bar{0}\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle), and if the result is \texttt{Tails} then she prepares the qubit in the state |\bar{1}\rangle = \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle). Bob’s knowledge of the qubit is now described by the density matrix \begin{aligned} \frac12|\bar{0}\rangle\langle\bar{0}| + \frac12|\bar{1}\rangle\langle\bar{1}| &= \frac12 \begin{bmatrix} \frac12 & \frac12 \\\frac12 & \frac12 \end{bmatrix} + \frac12 \begin{bmatrix} \frac12 & -\frac12 \\-\frac12 & \frac12 \end{bmatrix} \\&= \begin{bmatrix} \frac12 & 0 \\0 & \frac12 \end{bmatrix}. \end{aligned}

  3. Suppose Alice picks up any pair of orthogonal states of a qubit and then flips the coin to chose one of them. Any two orthonormal states of a qubit, |u_1\rangle, |u_2\rangle, form a complete basis, so the mixture \frac12|u_1\rangle\langle u_1|+\frac12|u_2\rangle\langle u_2| gives \frac12\mathbf{1}.

As you can see, these three different preparations yield precisely the same density matrix and are hence statistically indistinguishable. In general, two different mixtures can be distinguished (in a statistical sense) if and only if they yield different density matrices. In fact, the optimal way of distinguishing quantum states with different density operators is still an active area of research.


  1. If M is one of the orthogonal projectors P_k describing the measurement, then the average \langle P_k\rangle is the probability of the outcome k associated with this projector.↩︎

  2. A pure state can be seen as a special case of a mixed state, where all but one the probabilities p_i equal zero.↩︎