Partial trace, revisited
If you are given a matrix you calculate the trace by summing its diagonal entries.
How about the partial trace?
Suppose someone writes down for you a density matrix of two qubits in the standard basis, \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}, and asks you to find the reduced density matrices of the individual qubits.
The tensor product structure of this (4\times 4) matrix means that it is has a block form:
\rho^{\mathcal{AB}}
=
\left[
\begin{array}{c|c}
P & Q
\\\hline
R & S
\end{array}
\right]
where P,Q,R,S are (2\times 2) sized sub-matrices.
The two partial traces can then be evaluated as
\begin{aligned}
\rho^\mathcal{A}
&=
\operatorname{tr}_{B}\rho^{\mathcal{AB}}
=
\left[
\begin{array}{c|c}
\operatorname{tr}P & \operatorname{tr}Q
\\\hline
\operatorname{tr}R & \operatorname{tr}S
\end{array}
\right]
\\\rho^\mathcal{B}
&= \operatorname{tr}_{A}\rho^{\mathcal{AB}}
= P+S.
\end{aligned}
The same holds for a general \rho^{\mathcal{AB}} on any \mathcal{H}_{\mathcal{A}}\otimes\mathcal{H}_{\mathcal{B}} with corresponding block form ((m\times m) blocks of (n\times n)-sized sub-matrices, where m and n are the dimensions of \mathcal{H}_{\mathcal{A}} and \mathcal{H}_{\mathcal{B}}, respectively).