0.11 Probabilities
In a sense, the basics of quantum theory boil down to the combination of two bits of mathematics: linear algebra over the complex numbers, and probability theory. We have just gone over all the linear algebra that we will need, so now let’s tackle the other topic (though we will immediately revisit it in Chapter 1).
Probability theory is a vast and beautiful subject which has undergone many transformations over the centuries. What started as something understood in terms of gambling odds later evolved into the theory of measure spaces, and is now even able to be expressed in terms of diagrammatic category theory. But for our purposes, we only need the very elementary parts of the subject, so we will stick with the first interpretation: probability tells us the odds of something happening.19
The setup is always the same: we have some process (rolling some dice, flipping a coin, drawing a card, etc.) that has some possible outcomes (getting a 5, heads, or an ace of hearts, etc.) but is realised in some way which means that we cannot be certain which outcome we will see whenever we run the process (or “perform the experiment”).
The first thing to define in any such scenario is the sample space, usually denoted by
Process | Sample space |
---|---|
Rolling a six-sided die | |
Flipping a coin | |
Flipping two distinct coins | |
Flipping two identical coins |
And here’s a table of some (but not all, except for in the case of flipping a single coin) of the events corresponding to these sample spaces.
Process | Example events |
Interpretation |
---|---|---|
Rolling a six-sided die | rolling a 1 | |
rolling an odd number | ||
rolling a number less than 3 | ||
rolling a prime number less than 4 | ||
Flipping a coin | getting heads | |
getting tails | ||
any outcome at all | ||
Flipping two distinct coins | getting two heads | |
getting at least one heads | ||
getting two the same | ||
Flipping two identical coins | getting two heads | |
getting at least one heads |
In the table above we can see that, for example, if we rolled a 1 on a six-sided die then many events occurred: we rolled a 1, but we also rolled an odd number, and a number less than 3; but we did not roll a prime number.
Something else that arises in these examples the notion of distinguishable outcomes, when we look at how the sample space of flipping two coins depends on whether or not they are identical.
That is, if we have a gold coin and a silver coin then it makes sense to say that
This approach towards probability, where we think of a “scenario” as such a triple
Now we can define probability rather succinctly.
In a fair process, where all outcomes are equally likely, the probability
Running through some of the above examples of events, we see that this definition of probability agrees with what we might already expect.
Event | Probability |
---|---|
Getting heads on a single coin flip | |
Rolling a 6 with a single die | |
Rolling an odd number with a single die |
Flipping a fair coin (or actually, even an unfair one) is a common scenario in discussing probability, because it has just two outcomes — the smallest amount you can have without things becoming purely deterministic. There are lots of numbers that you will see turn up time and time again in calculations of probability for binary outcome events, and most usually they are binomial coefficients. These are numbers that can be read directly from the rows of Pascal’s triangle (which, as is often the case in mathematics, is more deserving of being named after a different person: Al-Karaji, or maybe Omar Khayyam), and they satisfy many interesting combinatorial patterns.
Now let’s look at what happens when we’re interested in more than one event occurring.
We might study the possibility of either event
First of all, let’s consider both events
We say that
In words,
Usually, when we talk of mutually exclusive events we are referring to a single run of an experiment, and for independent events we are referring to multiple runs. For example, “rolling an even number” and “rolling an odd number” are mutually exclusive events when rolling a single die once, but independent events when rolling a single die twice.21 Basically, we should be careful when talking about events and make sure to be precise as to what our sample space is, and how the event is actually realised as a subset of this.
We can think of mutually exclusive and independent as extreme ends of a scale: on one side we have events that affect each other so strongly that if one occurs then we know with absolute certainty that the other one did not; on the other we have events that have absolutely no effect on each other whatsoever.
One might wonder about what the opposite of mutually exclusive might be, and there are two ideas that seem like they might be interesting: events
Now let’s think about either event
The relationship between
Inclusion–exclusion principle.
We will not prove this, but it’s a fun exercise to think about why this must be true.22
In fact, the general inclusion–exclusion principle describes what happens for an arbitrary finite number of events.
Using this, we can see why mutually independent events are particularly nice: if
In the same way that mutually exclusive events are special in the eyes of the inclusion–exclusion principle, independent events are special in the eyes of conditional probability.
Oftentimes we consider events that are not independent, such as drawing cards from a deck without replacing them afterwards: if I take the ace of hearts, then the probability of me drawing a heart the next time has gone down from
Conditional probabilities are the source of many misunderstandings.
For example, it’s intuitively obvious that the probability of flipping a coin 100 times and getting heads every single time is very small.
So say we’ve flipped a coin 99 times and managed to get a heads every single time, are we now more likely to flip a tail, because the chance of getting 100 heads in a row must be small?
Well we don’t even need mathematics to answer this: the coin has no way of remembering what has happened on the previous flips!
In other words, also the probability
Now we can see how independent events are special: if
Finally, we mention what might be called the fundamental theorem of conditional probability.
Bayes’ theorem.
Let
You should now be able to answer the following questions:
- When you roll a normal six-sided die, what is the set of distinguishable outcomes?
- What is the probability of getting a 5?
- What is the probability of getting a number (strictly) less than 3?
Now imagine that you have two six-sided dice.
- If you roll both dice at the same time, what is the probability of them both landing on a 6?
- What is the probability of getting two numbers that add up to 6?
Finally, we give our two six-sided dice to a friend for them to roll in secret.
- If they tell us that they rolled two numbers that added up to 6, what is the probability that they rolled a 1?
We mentioned in Section 0.6 that a lot of the structure inherent in our formalism of quantum theory can be encapsulated by the notion of a dagger compact category, and can thus be investigated with a diagrammatic approach. It turns out that parts of probability theory — specifically Markov processes, which describe scenarios where different events can happen with varying probabilities, but where nothing depends on the history of the scenario, only on the here-and-now — are also amenable to such an approach. This leads to the definition of a Markov category.
What we actually mean by “the odds of something happening” and how we should really interpret probabilities “in the real world” is a profound philosophical problem that we shall completely pass over.↩︎
Another possibility would be to distinguish the coins in time instead of space, i.e. to flip one coin first and then the other afterwards. A coin cannot remember what happened the last time it was flipped, so is there really a difference between flipping a single coin twice or two coins once? In the eyes of probability theory, the answer is “no”.↩︎
Exercise. Are the events “rolling an even number” and “rolling an odd number” still independent when we think of rolling two die simultaneously?↩︎
Drawing a Venn diagram might help.↩︎