Basic quantum coding and decoding
Suppose Alice uniformly at random chooses one of the pre-agreed N signal states |s_1\rangle,\ldots|s_N\rangle and sends it to Bob, who tries to identify the signal states by performing a measurement defined by the projectors P_1,\ldots,P_N.
Let P be a projector on the subspace spanned by the signal states |s_1\rangle,\ldots|s_N\rangle, i.e. P|s_k\rangle = |s_k\rangle for all k=1,\ldots,N.
The dimension d of this subspace is given by d=\operatorname{tr}P.
We shall assume, without any loss of generality, that Bob designed his measurement in such a way that, whenever he gets outcome P_k, he concludes that Alice sent state |s_k\rangle.
His probability of successfully identifying which state Alice sent to him is given by
\Pr(\text{success})
= \frac{1}{N} \sum_k \langle s_k|P_k|s_k\rangle
which is the probability that signal state |s_k\rangle is selected (here equal to 1/N, since Alice chose between all N signal states with equal probability) times the probability that the selected signal state is correctly identified by Bob (which is \langle s_k|P_k|s_k\rangle), and we sum over all possible signal states.
Let us use this as a chance to practice some of the trace identities.
In particular, it is often convenient to write expressions such as \langle\psi|A|\psi\rangle in terms of the trace: for any vector |\psi\rangle and operator A we have
\begin{aligned}
\langle\psi|A|\psi\rangle
&= \operatorname{tr}(A|\psi\rangle\langle\psi|)
\\&= \operatorname{tr}(|\psi\rangle\langle\psi| A).
\end{aligned}
In our case,
\begin{aligned}
\Pr(\text{success})
&= \frac{1}{N} \sum_k \langle s_k|P_k|s_k\rangle
\\&= \frac{1}{N} \sum_k \langle s_k|PP_kP|s_k\rangle
\\&= \frac{1}{N} \sum_k \operatorname{tr}(PP_kP|s_k\rangle\langle s_k|)
\end{aligned}
where we have also used that P|s_k\rangle=|s_k\rangle.
If B is a positive semi-definite operator, and P is a projector, then
\operatorname{tr}BP \leqslant\operatorname{tr}B.
To prove this, consider the projector Q=\mathbf{1}-P, and note that
\begin{aligned}
\operatorname{tr}B
&= \operatorname{tr}B(P+Q)
\\&= \operatorname{tr}BP + \operatorname{tr}BQ
\end{aligned}
and that \operatorname{tr}BQ is non-negative.
We can use this inequality to bound the expression above:
\begin{aligned}
\sum_k\frac{1}{N} \langle s_k|P_k|s_k\rangle
&= \frac{1}{N} \sum_k \operatorname{tr}(PP_kP|s_k\rangle\langle s_k|)
\\&\leqslant\frac{1}{N} \sum_k \operatorname{tr}(PP_kP)
\\&= \frac{1}{N} \operatorname{tr}\left(P\left(\sum_k P_k\right)P\right)
\\&= \frac{1}{N} \operatorname{tr}(P^3)
\\&= \frac{1}{N} \operatorname{tr}(P)
\\&= \frac{d}{N}.
\end{aligned}
So if Alice encodes N equally likely messages as states in a quantum system that, mathematically speaking, lives in the Hilbert space of dimension d, and if Bob decodes by performing a measurement and inferring the message from the result, then Bob’s probability of success is bounded above by \frac{d}{N}.
If the number N of possible signals exceeds the dimension d, then Bob will not be able to reliably distinguish between the signals by any measurement.
In particular:
With this setup, one qubit can store at most one bit of information that can reliably be read by a measurement.