The Bayesian approach to statistics is a fascinating subject, which I cover at some length in my book Dice World. What Bayes theorem enables you to do is to improve an estimate of the chances of something happening when you have additional information, and to use one set of probabilities to calculate another linked one.
This can be extremely useful and powerful when, for instance, calculating the effectiveness of disease screening tests, which can be very confusing due to wildly varying conditional probabilities. It's worth getting your head around a bit of probability symbology to get on top of this. In these simple formulae, the '|' sign is read as 'given'.
So, for instance if I have a test that will flag up the presence of a disease 90% of the time, which isn't too bad, I can write that as
P (Positive result | Disease ) = 90% - the probability of a positive result in the test, given the person has the disease, is 90%.
The problem comes, and sadly this has happened for real, when this is represented by the medical profession or (more often) the press office of a university or the press in general as the test being '90% accurate.' This is because it's perfectly possible with the same test for P (Disease | Positive Result ) to be, say, just 20%.
What that's saying is that the probability of a person who tested positive having the disease is only 20%. This, of course, is an important part of what people want to know after a test. I've just had the test and it came up positive. What's the chance that I actually have the disease? In this case it is surprisingly low, given the apparent 'accuracy' of the test.
The reason this can happen is that to work out the first figure, P (Positive result | Disease ) we are only considering the population of people who have the disease, which might be quite small. But for the second figure, P (Disease | Positive Result) we are looking at the population who had the test, which could be a much bigger number, overwhelming the number of correct positive result tests from the smaller population of sufferers with the false positive result tests from the larger population of tested people.
This makes those in the business who understand probability wary of mass screening programmes for relatively rare conditions - they result in tests that, even when very likely to get the result right on any particular individual, can come up with a distressing false positive more often than a true outcome, putting the patient through a time of horrible stress unnecessarily.
See David Colquhoun's blog for more detail on the risks of using these kinds of screening tests.
This can be extremely useful and powerful when, for instance, calculating the effectiveness of disease screening tests, which can be very confusing due to wildly varying conditional probabilities. It's worth getting your head around a bit of probability symbology to get on top of this. In these simple formulae, the '|' sign is read as 'given'.
So, for instance if I have a test that will flag up the presence of a disease 90% of the time, which isn't too bad, I can write that as
P (Positive result | Disease ) = 90% - the probability of a positive result in the test, given the person has the disease, is 90%.
The problem comes, and sadly this has happened for real, when this is represented by the medical profession or (more often) the press office of a university or the press in general as the test being '90% accurate.' This is because it's perfectly possible with the same test for P (Disease | Positive Result ) to be, say, just 20%.
What that's saying is that the probability of a person who tested positive having the disease is only 20%. This, of course, is an important part of what people want to know after a test. I've just had the test and it came up positive. What's the chance that I actually have the disease? In this case it is surprisingly low, given the apparent 'accuracy' of the test.
The reason this can happen is that to work out the first figure, P (Positive result | Disease ) we are only considering the population of people who have the disease, which might be quite small. But for the second figure, P (Disease | Positive Result) we are looking at the population who had the test, which could be a much bigger number, overwhelming the number of correct positive result tests from the smaller population of sufferers with the false positive result tests from the larger population of tested people.
This makes those in the business who understand probability wary of mass screening programmes for relatively rare conditions - they result in tests that, even when very likely to get the result right on any particular individual, can come up with a distressing false positive more often than a true outcome, putting the patient through a time of horrible stress unnecessarily.
See David Colquhoun's blog for more detail on the risks of using these kinds of screening tests.
Comments
Post a Comment