Partial trace, revisited
You can calculate the trace of a matrix by summing its diagonal entries.
Can you do something similar to calculate the partial trace of a density matrix?
Suppose someone writes down for you a density matrix of two qubits in the standard basis, \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}, and asks you to find the reduced density matrices of the individual qubits.
The tensor product structure of this (4\times 4) matrix means that it is has a block form:
\rho_{\mathcal{AB}}
= \left[
\begin{array}{c|c}
P & Q
\\\hline
R & S
\end{array}
\right]
where P,Q,R,S are (2\times 2) sized sub-matrices.
The two partial traces can then be evaluated as
\begin{aligned}
\rho_{\mathcal{A}}
&=
\operatorname{tr}_{\mathcal{B}}\rho_{\mathcal{AB}}
= \left[
\begin{array}{c|c}
\operatorname{tr}P & \operatorname{tr}Q
\\\hline
\operatorname{tr}R & \operatorname{tr}S
\end{array}
\right]
\\\rho_{\mathcal{B}}
&= \operatorname{tr}_{\mathcal{A}}\rho_{\mathcal{AB}}
= P+S.
\end{aligned}
In general, for any matrix \rho in \mathcal{H}_{\mathcal{A}}\otimes\mathcal{H}_{\mathcal{B}} that is written in the tensor product basis, the partial trace over \mathcal{A} is the sum of the diagonal block matrices, and the partial trace over \mathcal{B} is the matrix in which the block sub-matrices are replaced by their traces — see Figure 8.1.
To better understand the partial trace, it helps to give a more abstract definition.
It turns out that the partial trace over \mathcal{B} can be defined as the unique map \rho_{\mathcal{AB}}\mapsto\rho_{\mathcal{A}} such that
\operatorname{tr}[X\rho_{\mathcal{A}}] = \operatorname{tr}[(X\otimes\mathbf{1})\rho_{\mathcal{AB}}]
\tag{$\circledast$}
holds for any observable X acting on \mathcal{A}, where \mathbf{1} is the identity operator acting on \mathcal{B}.
This condition ensures the consistency of statistical predictions: any observable X on \mathcal{A} can be viewed as an observable X\otimes\mathbf{1} on the composite system \mathcal{AB}; when constructing \rho_{\mathcal{A}}, we had better make sure that for any observable X the average value of X in the state \rho_{\mathcal{A}} is the same as the average value of X\otimes\mathbf{1} in the state \rho_{\mathcal{AB}}.
This is exactly what the condition in (\circledast) guarantees.
To show that our more ad-hoc definition of the partial trace agrees with this slightly more abstract one, consider again some state |\psi_{\mathcal{AB}}\rangle written in the form
\begin{aligned}
|\psi_{\mathcal{AB}}\rangle
&= \sum_{i,j} c_{ij}|a_i\rangle\otimes|b_j\rangle
\\&= \sum_{j} |\widetilde\psi_j\rangle|b_j\rangle
= \sum_{j} \sqrt{p_j}|\psi_j\rangle|b_j\rangle.
\end{aligned}
Now assume that Alice measures some observable X on her part of the system.
Such an observable can be thought of as X\otimes\mathbf{1}, acting on the entire system.
The expected value of this observable in the state |\psi_{\mathcal{AB}}\rangle is, by definition, \operatorname{tr}(X\otimes\mathbf{1})|\psi_{\mathcal{AB}}\rangle\langle\psi_{\mathcal{AB}}|, and
\begin{aligned}
\operatorname{tr}[(X\otimes \mathbf{1}) \rho_{\mathcal{AB}}]
&= \operatorname{tr}\left[
(X\otimes\mathbf{1}) \left(
\sum_{i,j} |\widetilde\psi_i\rangle\langle\widetilde\psi_j| \otimes |b_i\rangle\langle b_j|
\right)
\right]
\\&= \sum_{i,j} \left[
\operatorname{tr}\left(X |\widetilde\psi_i\rangle\langle\widetilde\psi_j|\right)
\right]
\underbrace{\left[\operatorname{tr}\left(|b_i\rangle\langle b_j|\right)\right]}_{\delta_{ij}}
\\&= \sum_i \operatorname{tr}\big[X |\widetilde\psi_i\rangle\langle\widetilde\psi_i|\big]
\\&= \operatorname{tr}\left[
X \underbrace{\sum_i p_i|\psi_i\rangle\langle\psi_i|}_{\rho_{\mathcal{A}} = \operatorname{tr}_{\mathcal{B}}\rho_{\mathcal{AB}}}
\right]
\\&= \operatorname{tr}[X\rho_{\mathcal{A}}]
\end{aligned}
as required.
We can also quickly prove why the partial trace is the unique map satisfying the condition (\circledast).
Suppose that we had some arbitrary map T satisfying this condition, i.e. such that
\operatorname{tr}[XT(\rho_{\mathcal{AB}})] = \operatorname{tr}[(X\otimes\mathbf{1})\rho_{\mathcal{AB}}]
for all density matrices \rho_{\mathcal{AB}} and for all observables X acting on \mathcal{A}.
Now, take some orthonormal (with respect to the Hilbert–Schmidt inner product (A|B)=\frac{1}{2}\operatorname{tr}A^\dagger B) basis \{M_i\} of the space of Hermitian matrices.
Since the M_i are Hermitian, the inner product (M_i|T(\rho_{\mathcal{AB}})) is just \operatorname{tr}[M_iT(\rho_{\mathcal{AB}})].
So when we expand T(\rho_{\mathcal{AB}}) in this basis we get
\begin{aligned}
T(\rho_{\mathcal{AB}})
&= \sum_i (M_i|T(\rho_{\mathcal{AB}})) M_i
\\&= \sum_i \operatorname{tr}[M_iT(\rho_{\mathcal{AB}})] M_i.
\end{aligned}
But now we can substitute in the condition that T satisfies, giving
T(\rho_{\mathcal{AB}})
= \sum_i \operatorname{tr}[(M_i\otimes\mathbf{1})\rho_{\mathcal{AB}}] M_i.
And we’re done!
Indeed, if we had started with some other such map T' then we would have arrived at the same expression, which is independent of our choice of T or T', whence T=T'.