• Articles for students
Bayesian statistics for dummies
'Bayesian statistics' is a big deal at the moment. It has been put forward
as a solution to a number of important problems in, among other
disciplines, law and medicine. These problems are concerned with such matters
as determining the likelihood that a particular suspect committed a
murder if his fingerprints are found on the weapon, or the
likelihood that a person who tests positive for
HIV really has an HIV infection. These are, clearly, important matters.
However, most people can't see that there are any difficulties
here at all.
If the likelihood of two people having the same fingerprints is, say,
one in 500,000, and fingerprints matching person X are found on the
murder weapon, then surely it is a near-certainty that X is the
murderer? And if I tell you that 99.5% of people with a confirmed HIV
infection test positive for HIV, then surely a person who tests
positive is 99.5% likely to have the virus? In fact, both of
these conclusions of wrong. Dead wrong, if you'll pardon the
expression. The importance of Bayes' theorem is that it will
tell you the true likelihood of a person having an HIV infection if
he tests positive, and the true likelihood of person X being the
murderer if his fingerprints turn up on the weapon. Or, at least,
it will do so if you can feed in the right data, which isn't always
In a celebrated court case (R v Adams  1 Cr App R 377,
for any lawyers that
are interested) Lord Bingham, one of the UK's most senior judges,
refused to allow the defence to present an argument to the jury
based on Bayes' theorem.
He conceded that it was a methodologically
sound approach, but that it would 'confuse the jury'. In fact,
Bayes' theorem is extremely straightforward, and need not confuse
anyone who can add, multiply, and divide. The result of this
judge's decision could well have been that an innocent person
was convicted, because a likelihood based on a Bayesian calculation
is not merely better than the intuitive result, it
is the only right answer. Any different answer is simply wrong.
Not wrong in the sense of 'stealing is wrong', but wrong in the
sense that '2+2=5' is wrong.
In this article I will explain how Bayes' theorem works, from
first principles. I assume of the reader no knowledge of
mathematics beyond elementary arithemetic. At the end of the
article I will attempt to show how Bayes' theorem can be derived from
the common-sense example I present, and to follow this does require
a knowledge of basic algebra. But you don't need to be able to follow
the derivation to appreciate how Bayes' method works.
I will start by considering, as the Reverend Mr Bayes himself
did, the best way to bet on a horse race.
How to bet on the gee-gees
Let's assume that there is a race between two horses: Fleetfoot
and Dogmeat, and I want to determine which horse to bet my shirt on.
It turns out that Fleetfoot and Dogmeat have raced against each other
on twelve previous occasions, all two-horse races.
Of these last twelve races, Dogmeat won five and Fleetfoot won the other seven.
Therefore, all other things being equal (which they're not — see below),
the probability of Dogmeat winning the next race can be estimated from
his previous wins: 5 / 12, or 0.417, or 41.7%, however you prefer to express
it. Fleetfoot, on the other hand, appears to be a better bet at 58.3%.
So, given only the information that we have, and everything else
being equal (including the odds offered by the bookmakers on these two
horses) you'd want to back Fleetfoot. A 58.3% likelihood of winning is
more favourable than a 41.7% likelihood.
Now let's add a new factor into the calculation. It turns out that on
three of Dogmeat's previous five wins, it had rained heavily before
However, it had rained only once on
any of the days that he lost. It appears, therefore, that
Dogmeat is a horse who
likes 'soft going', as the bookies say. On the day of the race in question,
it is raining. Now, how does this extra information
affect where you should put your money? If you ignore the information
about the weather,
and use only the counts of previous wins and losses, you should, of course,
back Fleetfoot as we previously determined.
On the other hand, if you use only the new information about the
weather, and neglect
the previous counts of wins and losses, you
would perhaps back Dogmeat. This is because three of his previous five
been on rainy days. You might estimate his probability of winning
as 3 / 5, or 60%, on this basis.
However, to do this would be to ignore a
crucial piece of information — that overall Dogmeat has won fewer races
than Fleetfoot. What we need to do is to combine the two pieces of information
to get some kind of overall probability of Dogmeat winning the race.
To do this, we need to examine four possible situations:
Dogmeat wins when it rains; Dogmeat wins when it doesn't rain;
Dogmeat loses when it rains; Dogmeat loses when it doesn't rain.
We know that Dogmeat achieved three of his five wins on rainy
days, and it only rained once when he lost. This means he must
have won twice on a sunny day. Since there were twelve races,
the number of times he lost on a sunny day must be
12 - (3 + 1 + 2), which is 6. We can tabulate these results
What we need to know is the probability of Dogmeat winning, given that it
is raining (because it is now raining). Of course, we could work out the
probability of Fleetfoot winning instead, but since (we assume) one
horse must win, we can always work out one horse's probability by
substracting the other's from 1.0. Note that the probability of Dogmeat
winning given that it is raining is not at all the same as
the probability of its being raining when Dogmeat wins. We have already
calculated this probability as 60% (three rainy days out of five wins).
So what is the probability of Dogmeat winning, given that it is raining?
Like any other probability, we calculate it by dividing the number
of times something
happened, by the number of times if could have happened. We know
that Dogmeat won on three occassions on which it rained, and there
were four rainy days in total. So Dogmeat's probability of
winning, given that it is now raining,
is 3 / 4, or 0.75, or 75%, however you like to write it.
Note that the additional information — that Dogmeat won three times
out of four on a raining day — shifts his probability of winning
this current race from 41.7%, to 75%. More importantly, it means
that now Dogmeat is more likely to win than Fleetfoot, even
though Fleetfoot has won more races overall. This is important information
if you plan to bet your shirt — if it is raining you should back Dogmeat;
if it is not, you should back Fleetfoot.
Bayes' theorem is nothing more than a generalization into algebra of
the procedure I described above — it is a way to work out the
likelihood of something in the face of some particular piece, or pieces,
of evidence. The theorem is usually written like this:
p(A|B) = p(B|A) p(A) / p(B)
p(A|B) (usually read 'probability of A given B') is the probability of
finding observation A, given that some piece of evidence B is present.
In the example above, 'A' was the outcome in which Dogmeat wins, and 'B'
is the piece of evidence that it is raining. P(A|B) is what we want
to find out. p(B|A) is the probability of the evidence turning up, given
that the outcome obtains. In other words, this is the probability that
it is raining, given that Dogmeat wins. Since it was raining three of the five
times that Dogmeat won, p(B|A) is 3 / 5, or 0.6. p(A) is the probability
of the outcome occurring, without knowledge of the new evidence. In our
example, p(A) is 5 / 12, or 0.417, because Dogmeat won on five out of twelve
occasions. Finally, p(B) is the probability of the evidence arising, without
regard for the outcome. In our example, it is the probability of rain,
without regard for which horse won the race. Since we know it rained on
four days, this value is 4 / 12, or 0.333. So the value of p(A|B) — the probability of Dogmeat winning given that it is raining — is
p(A|B) = p(B|A) p(A) / p(B)
= 0.6 * 0.417 / 0.333
You'll see that this is exactly the same answer we arrived at by
tabulating the counts of wins and losses on raining and fine days
earlier. In short, Bayes theorem is a way to perform the same calculation,
but starting with probabilities, and not counts. This can be important
because very often we don't know the exact counts involved, only
the probabilities. But Bayes' method doesn't tell us anything we can't
work out simply by counting and dividing.
Here is another example of Bayes' theorem at work. Consider a
system that tests whether computer components are working or faulty,
based on the temperature they run at when they are working. Suppose,
for example, that 99.5% of components that are faulty run at
over 50 degrees. Now, if we take a particular component and test
it, and the temperature is 53 degrees, how likely is it to be a
faulty component? If you can answer that question with a number — particularly if that number is '99.5%' —
you need to
read the first part of this article again, because your answer
will be wrong. The correct answer is 'I don't know'. I simply
haven't given you all the information you need to answer the question.
What you need to know is the probability of the component's being
faulty, given that runs too hot. We can write this as 'p(faulty|hot)'.
But what I've told you is the probability of a faulty component running
hot — p(hot|faulty). These two figures are related by Bayes theorem:
p(faulty|hot) = p(hot|faulty) * p(faulty) / p(hot)
p(faulty) is the overall probability of a component's turning out to
be faulty, whether or not it runs too hot. Suppose, for example,
that 1 in 100 components is faulty overall. So p(faulty) is 0.01.
p(hot) is the probability of a component's running hot, whether or not
it is faulty. Suppose, for example, that 1 in 20 components runs too
hot, whether or not it is faulty. So p(hot) is 0.05. This
p(faulty|hot) = p(hot|faulty) * p(faulty) / p(hot)
= 0.995 * 0.01 / 0.05
~= 0.2 or 20%
So in fact, in these circumstances, if a component tests as faulty on
the basis of its temperature,
the probability that it is really faulty is only 20%, not 99.5% at all.
This could have profound consequences if you're using this test to
determine which components to sell and which to put out with the
Deriving Bayes' theorem
In this section, I will show how Bayes' theorem, in the form expressed
above, can be derived from our tabulation and counting procedure.
Remember that the quanitity we want to calculate is the probability
of Dogmeat winning given that it is raining. We will write this
'p(w|r)' (probability of win given rain). What we know at the outset
is the probability of it raining on occassions that Dogmeat wins.
We will write this 'p(r|w)'. Now p(r|w) is just a probability, and
can be determined by dividing the number of times that Dogmeat
won on a rainy day, by the number of times he won in total. In
symbols, we express this:
p(r|w) = n(r and w)/n(w)
n(r and w) is the number of times that Dogmeat won on a rainy day,
and n(w) is the numer of times he won in total.
Dividing top and bottom of the right-hand side by N, the number
of races, we have:
p(r|w) = [n(r and w) / N] / [n(w) / N]
But the number of actual somethings divided by the number of
possible somethings is just the probability of
something. That's what probability means. So we can write:
p(r|w) = p(r and w) / p(w)
That is, the probability of rain, given that Dogmeat wins, is the probability
of Dogmeat winning and it raining, divded by the probaility of his winning.
And, if we just swap the symbols around, we can also write:
p(w|r) = p(w and r) / p(r)
We haven't proved anything here — we've just swapped symbols 'r' for 'w'.
But if we write the previous equation like this:
p(w|r) * p(r) = p(r and w)
we can then substitute for p(r and w) in the previous version (before we
swapped 'r' and 'w'), and this gives us:
p(r|w) = p(w|r) * p(r) / p(w)
or, with a bit of symbol juggling:
p(w|r) = p(r|w) * p(w) / p(r)
And this is the classical Bayes' theorem.
I have tried to show in this article that there is nothing mystical or
profound about Bayes theorem although, of course, we've only scratched
the surface of how it can be applied in practice. The theorem does not
really allow us to calculate anything that we can't find out by other
means, but it has the advantage of being notationally compact, and
works directly with probabilities rather than counts.
Bayes' method frequently produces results that are in stark contrast to
our intuitive understanding. Consider the notorious
'fingerprint' example: it is intuitive that if the likelihood of
two people's having the same fingerprints is one in 100,000, then finding
person X's fingerprint on a murder weapon means that person X is the
murderer. Or, at least, it means that person X had his hand on the
murder weapon at some point. But, in fact, it means nothing of
the sort. The information we have is, in a sense, p(fingerprint|person X);
what we need is p(person X|fingerprint). To work that out, we
need to know also p(person X) and p(fingerprint). p(person X) is the likelihood
that person X is the culprit, leaving aside the fingerprint evidence;
p(fingerprint) is the likelihood that the particular fingerprint be found
on the weapon, leaving aside any other evidence. Now, it is highly
probable (pardon the pun) that we don't know p(person X) or p(fingerprint)
with any precision.
But the important point is that if p(person X) is very much less than
p(fingerprint) (perhaps because person X has a very strong alibi), then
the probability that person X is innocent will be very much higher
than one in 100,000. The incorrect assumption that
p(fingerprint|person X) = p(person X|fingerprint)
is what is known as the 'prosecutor's fallacy' in forensic science,
and Bayes tells us how serious an error it is.