# Chapter 6 Density matrices

About

density matrices, and how they help to solve the problem introduced by entangled states, as well as how they let us talk about mixtures and subsystems. Also a first look at thepartial trace.

We cannot always assign a definite state vector to a quantum system.
It may be that the system is part of a composite system that is in an entangled state, or it may be that our knowledge of the preparation of a particular system is insufficient to determine its state (for example, someone may prepare a particle in one of the states

We have already mentioned that the existence of entangled states begs an obvious question: if we cannot attribute a state vectors to an individual quantum system then how shall we describe its quantum state?
In this chapter we will introduce an alternate description of quantum states that can be applied both to a composite system and to any of its subsystems.
Our new mathematical tool is called a **density operator**.^{83}
We will start with the density operator as a description of the mixture of quantum states, and will then discuss the partial trace, which is a unique operation that takes care of the reduction of a density operator of a composite system to density operators of its components.

## 6.1 Definitions

If you are an impatient mathematically minded person, who feels more comfortable when things are properly defined right from the beginning, here is your definition:^{84}

A **density operator**

It follows that any such ^{85}

An important example of a density operator is a rank one projector.^{86}
Any quantum state that can be described by the state vector **pure state**, can be also described by the density operator **mixed states**, can be always written as the convex sum of pure states:

## 6.2 Statistical mixtures

Let us start with probability distributions over state vectors.
Suppose Alice prepares a quantum system and hands it over to Bob who subsequently measures observable ^{87}
**mixture of the states** **mixed state** for short.

Remember, a mixture of states is very different from a superposition of states: a superposition *always* yields a definite state vector, whereas a mixture does *not*, and so must be described by a density operator.

Bob knows the ensemble of states ^{88}
**density operator**, since it has all the defining properties of the density operator (the convex sum of rank one projectors).
It depends on the constituent states

Once we have

Alice flips a fair coin. If the result is

\texttt{Heads} then she prepares the qubit in the state|0\rangle , and if the result is\texttt{Tails} then she prepares the qubit in the state|1\rangle . She gives Bob the qubit without revealing the result of the coin-flip. Bob’s knowledge of the qubit is described by the density matrix\frac12|0\rangle\langle 0| + \frac12|1\rangle\langle 1| = \begin{bmatrix} \frac12 & 0 \\0 & \frac12 \end{bmatrix}. Suppose Alice flips a fair coin, as before, but now if the result is

\texttt{Heads} then she prepares the qubit in the state|\bar{0}\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle) , and if the result is\texttt{Tails} then she prepares the qubit in the state|\bar{1}\rangle = \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle) . Bob’s knowledge of the qubit is now described by the density matrix\begin{aligned} \frac12|\bar{0}\rangle\langle\bar{0}| + \frac12|\bar{1}\rangle\langle\bar{1}| &= \frac12 \begin{bmatrix} \frac12 & \frac12 \\\frac12 & \frac12 \end{bmatrix} + \frac12 \begin{bmatrix} \frac12 & -\frac12 \\-\frac12 & \frac12 \end{bmatrix} \\&= \begin{bmatrix} \frac12 & 0 \\0 & \frac12 \end{bmatrix}. \end{aligned} Suppose Alice picks up any pair of orthogonal states of a qubit and then flips the coin to chose one of them. Any two orthonormal states of a qubit,

|u_1\rangle ,|u_2\rangle , form a complete basis, so the mixture\frac12|u_1\rangle\langle u_1|+\frac12|u_2\rangle\langle u_2| gives\frac12\mathbf{1} .

As you can see, these three different preparations yield precisely the same density matrix and are hence statistically indistinguishable. In general, two different mixtures can be distinguished (in a statistical sense) if and only if they yield different density matrices. In fact, the optimal way of distinguishing quantum states with different density operators is still an active area of research.

## 6.3 A few instructive examples, and some less instructive remarks

The density matrix corresponding to the state vector

|\psi\rangle is the rank one projector|\psi\rangle\langle\psi| . Observe that there is no phase ambiguity, since|\psi\rangle\mapsto e^{i\phi}|\psi\rangle leaves the density matrix unchanged, and each|\psi\rangle gives rise to a distinct density matrix.If Alice prepares a qubit in the state

|\psi\rangle = \alpha|0\rangle + \beta|1\rangle then the corresponding density matrix is the projector|\psi\rangle\langle\psi| = \begin{bmatrix} |\alpha|^2 & \alpha\beta^\star \\\alpha^\star\beta & |\beta|^2 \end{bmatrix}. You are given a qubit and you are told that it was prepared either in state

|0\rangle with probability|\alpha|^2 or in state|1\rangle with probability|\beta|^2 . In this case all you can say is that your qubit is in a mixed state described by the density matrix|\alpha|^2|0\rangle\langle 0| + |\beta|^2|1\rangle\langle 1| = \begin{bmatrix} |\alpha|^2 & 0 \\0 & |\beta|^2 \end{bmatrix}. Diagonal density matrices correspond to classical probability distributions on the set of basis vectors.Suppose you want to distinguish between preparations described by the density matrices in examples 2 and 3. Assume that you are given sufficiently many identically prepared qubits described either by the density matrix in example 2 or by the density matrix in example 3. Which of the two measurements would you choose: the measurement in the standard basis

\{|0\rangle,|1\rangle\} , or the measurement in the basis\{|\psi\rangle,|\psi_\perp\rangle\} ? One of the two measurements is completely useless. Which one, and why?In general, the diagonal entries of a density matrix describe the probability distributions on the set of basis vectors. They must add up to one, which is why the trace of any density matrix is one. The off-diagonal elements, often called

**coherences**, signal departure from the classical probability distribution and quantify the degree to which a quantum system can interfere (we will discuss this in detail later on). The process in which off-diagonal entries (the parameter\epsilon in the matrices below) go to zero is called**decoherence**.\begin{bmatrix} |\alpha|^2 & \alpha\beta^\star \\\alpha^\star\beta & |\beta|^2 \end{bmatrix} \mapsto \begin{bmatrix} |\alpha|^2 & \epsilon \\\epsilon^\star & |\beta|^2 \end{bmatrix} \mapsto \begin{bmatrix} |\alpha|^2 & 0 \\0 & |\beta|^2 \end{bmatrix} For\epsilon = \alpha\beta^\star we have a pure quantum state (“full interference capability”) and for\epsilon=0 we have a classical probability distribution over the standard basis (“no interference capability”).Suppose it is equally likely that your qubit was prepared either in state

\alpha|0\rangle + \beta|1\rangle or in state\alpha|0\rangle - \beta|1\rangle . This means that your qubit is in a mixed state described by the density matrix\frac12 \begin{bmatrix} |\alpha|^2 & \alpha\beta^\star \\\alpha^\star\beta & |\beta|^2 \end{bmatrix} + \frac12 \begin{bmatrix} |\alpha|^2 & -\alpha\beta^\star \\-\alpha^\star\beta & |\beta|^2 \end{bmatrix} = \begin{bmatrix} |\alpha|^2 & 0 \\0 & |\beta|^2 \end{bmatrix}. You cannot tell the difference between the equally weighted mixture of\alpha|0\rangle\pm\beta|1\rangle and a mixture of|0\rangle and|1\rangle with (respective) probabilities|\alpha|^2 and|\beta|^2 .For any density matrix

\rho , the most natural mixture that yields\rho is its spectral decomposition:\rho=\sum_i p_i|u_i\rangle\langle u_i| , with eigenvectors|u_i\rangle and eigenvaluesp_i .If the states

|u_1\rangle,\ldots,|u_m\rangle form an orthonormal basis, and each occurs with equal probability1/m , then the resulting density matrix is proportional to the identity:\frac{1}{m}\sum_{i=1}^m |\psi_i\rangle\langle\psi_i| = \frac{1}{m}\mathbf{1}. This is called the**maximally mixed state**. For qubits, any pair of orthogonal states taken with equal probabilities gives the maximally mixed state\frac12\mathbf{1} . In maximally mixed states, outcomes of*any*measurement are completely random.It is often convenient to write density operators in terms of projectors on states which are not normalised, incorporating the probabilities into the length of the state vector:

\rho = \sum_i|\widetilde\psi_i\rangle\langle\widetilde\psi_i| where|\widetilde\psi_i\rangle = \sqrt{p_i}|\psi_i\rangle , i.e.p_i=\langle\widetilde\psi_i|\widetilde\psi_i\rangle . This form is more compact, but you have to remember that the state vectors are*not*normalised. We tend to mark such states with the tilde, e.g.|\widetilde\psi\rangle , but you may have your own way to remember.

## 6.4 The Bloch ball

We have already talked in some depth about the Bloch sphere in Chapter 2 and Chapter 3, but now that we are considering density operators (which are strictly more general than state vectors), we are actually interested in the Bloch *ball*, i.e. not just the sphere of vectors of magnitude *less than or equal to*

The most general Hermitian ^{89}
**Bloch vector** for the density operator

Let us compute the eigenvalues of ^{90}
We can now visualise the convex set of

## 6.5 Subsystems of entangled systems

We have already trumpeted that one of the most important features of the density operator formalism is its ability to describe the quantum state of a subsystem of a composite system. Let me now show you how it works.

Given a quantum state of the composite system

Here is a simple example.
Suppose a composite system

## 6.6 Partial trace, revisited

If you are given a matrix you calculate the trace by summing its diagonal entries.
How about the partial trace?
Suppose someone writes down for you a density matrix of two qubits in the standard basis, ^{91}

## 6.7 Mixtures and subsystems

We have used the density operators to describe two distinct situations: the statistical properties of the mixtures of states, and the statistical properties of subsystems of composite systems.
In order to see the relationship between the two, consider a joint state of a bipartite system

Then the partial trace over

Now, let us see how

But suppose Bob chooses to measure his subsystem in some other basis.
Will it have any impact on Alice’s statistical predictions?
Measurement in the new basis will result in a different mixture, but Alice’s density operator will not change.
Suppose Bob chooses basis

If Bob measures in the ^{92}

*It does not.*

After all, Alice and Bob may be miles away from each other, and if any of Bob’s actions were to result in something that is physically detectable at the Alice’s location that would amount to instantaneous communication between the two of them.

From the operational point of view it does not really matter whether the density operator represents our ignorance of the actual state (mixtures) or provides the only description we can have after discarding one part of an entangled state (partial trace).^{93}
In the former case, the system is in some definite pure state but we do not know which.
In contrast, when the density operator arises from tracing out irrelevant, or unavailable, degrees of freedom, the individual system cannot be thought to be in some definite state of which we are ignorant.
Philosophy aside, the fact that the two interpretations give exactly the same predictions is useful: switching back and forth between the two pictures often offers additional insights and may even simplify lengthy calculations.

## 6.8 Partial trace, yet again

The partial trace is the only map ^{94}

For example, let us go back to the state in Equation (9.7.1) and assume that Alice measures some observable

**!!!TODO!!! The uniqueness of the partial trace, for now see Nielsen & Chuang Box 2.6.**

## 6.9 Remarks and exercises

### 6.9.1

Consider two qubits in the state

What is the density operator

\rho of the two qubits corresponding to the state|\psi\rangle ? Write it in both Dirac notation and explicitly as a matrix in the computational basis\{|00\rangle,|01\rangle,|10\rangle,|11\rangle\} .Find the reduced density operators

\rho_1 and\rho_2 of the first and second qubit (respectively). Again, write them in both Dirac notation and explicitly as a matrix in the computational basis.

### 6.9.2 Purification of mixed states

Given a mixed state **purification** of

Show that an arbitrary mixed state

\rho always has a purification.Show that purification is unique up to unitary equivalence.

Let

|\psi_1\rangle and|\psi_2\rangle in\mathcal{H}_{\mathcal{A}}\otimes\mathcal{H}_{\mathcal{B}} be two pure states such that\operatorname{tr}_{\mathcal{B}}|\psi_1\rangle\langle\psi_1| = \operatorname{tr}_{\mathcal{B}}|\psi_2\rangle\langle\psi_2| . Show that|\psi_1\rangle = \mathbf{1}\otimes U|\psi_2\rangle for some unitary operatorU on\mathcal{H}_{\mathcal{B}} .

### 6.9.3

Two qubits are in the state described by the density operator

### 6.9.4

Write the density matrix of two qubits corresponding to the mixture of the Bell state

### 6.9.5 The trace norm

The **trace norm** of a matrix

Show that, if

A is self-adjoint, then its trace norm is equal to the sum of the absolute values of its eigenvalues.What is the trace norm of an arbitrary density matrix?

The distance induced by the trace norm is called the **trace distance**, defined as

- What is the trace distance between two arbitrary pure states?

### 6.9.6 Distinguishability and the trace distance

Say we have a physical system which is been prepared in one of two states (say,

Suppose that

\rho_1 and\rho_2 commute.^{95}Using the spectral decompositions of\rho_1 and\rho_2 in their common eigenbasis, describe the optimal measurement that can distinguish between the two states. What is its probability of success?Suppose that you are given one of the two, randomly selected, qubits of the state

|\psi\rangle = \frac{1}{\sqrt2}\left( |0\rangle\otimes\left( \sqrt{\frac23}|0\rangle - \sqrt{\frac13}|1\rangle \right) + |1\rangle\otimes\left( \sqrt{\frac23}|0\rangle + \sqrt{\frac13}|1\rangle \right) \right) from above. What is the maximal probability with which you can determine whether it is the first or second qubit?

### 6.9.7 Spectral decompositions and common eigenbases

**!!!TODO!!!**

If we choose a particular basis, operators become matrices. Here I will use both terms (density

*operators*and density*matrices*) interchangeably.↩︎A self-adjoint matrix

M is said to be**non-negative**, or**positive semi-definite**, if\langle v|M|v\rangle\geqslant 0 for any vector|v\rangle , or if all of its eigenvalues are non-negative, or if here exists a matrixA such thatM=A^\dagger A . (This is called a**Cholesky factorization**.)↩︎A subset of a vector space is said to be

**convex**if, for any two points in the subset, the straight line segment joining them is also entirely contained inside the subset.↩︎The

**rank**of a matrix is the number of its non-zero eigenvalues.↩︎If

M is one of the orthogonal projectorsP_k describing the measurement, then the average\langle P_k\rangle is the probability of the outcomek associated with this projector.↩︎A pure state can be seen as a special case of a mixed state, where all but one the probabilities

p_i equal zero.↩︎Physicists usually still refer to the Bloch

*ball*as the Bloch*sphere*, even though it really is a ball now, not a sphere.↩︎One might hope that there is an equally nice visualisation of the density operators in higher dimensions. Unfortunately there isn’t.↩︎

Take any of the Bell states, write its

(4\times 4) -density matrix explicitly, and then trace over each qubit. In each case you should get the maximally mixed state.↩︎The

U_{ij} are the components of a unitary matrix, hence\sum_k U_{ik}U^\star_{jk}=\delta_{ij} .↩︎The two interpretations of density operators filled volumes of academic papers. The terms

**proper mixtures**and**improper mixtures**are used, mostly by philosophers, to describe the statistical mixture and the partial trace approach, respectively.↩︎One can repeat the same argument for

\rho^{\mathcal{AB}}\mapsto\rho^{\mathcal{B}} : the partial trace is the unique map\rho^{\mathcal{AB}}\mapsto\rho^{\mathcal{B}} such that\rho^{\mathcal{B}} satisfies\operatorname{tr}[Y\rho^{\mathcal{B}}] = \operatorname{tr}[(1\otimes Y)\rho^{\mathcal{AB}}] for any observableY on\mathcal{B} .↩︎The commutativity assumption makes this problem essentially a special case of a purely classical one: distinguishing between two probability distributions.↩︎