Interpreting Medical Statistics
We include the explanations of P-values and confidence intervals below to provide a guide to interpret statistics included in studies that other pages of this website may reference. It is also included because these concepts come up again and again, and are something we all (doctors and scientists included) struggle with, and can misinterpret. at times. For more information on math and medicine, see our article “Guide to Evidence Based Medicine.”
The P value of an experiment tells how likely the results we get actually are a real* effect and not just something that happened by chance. Lower P values mean it is more likely that the results are a real effect and not just something that might have happened anyway*. We usually say any results with a p-value less than 0.05 is a real effect and not just due to chance. It helps to remember that 0.05 is the same as 1/20. So if somebody tells you the results of an experiment have a p-value of 0.05, they’re saying that if you did the experiment they did 20 times, you’d expect to get the results they got just once just by chance.
That’s why a lower p-value is better. For example, a p-value of 0.01 means that you’d expect to get the results they got one time out of 100 just by chance. As the p-value gets lower, it’s basically saying louder and louder, “Hey, whatever I’m doing in this experiment is causing these results I’m getting, they’re not just random things that might have happened anyway!” Of course, things do happen just by chance, so experiments need to be repeated even if the p-value is low. After all, this could be that 1 time out of 20, 100, or whatever, that the result was just by chance and the results of the experiment don’t really mean that whatever was done in the experiment causes the results we’re getting.
*I’m using “real” as a way of saying “statistically significant.” They are not the same thing, . Statistical significance is an arbitrary measure usually taken to mean that the p-value is less than 0.05. Jordan Ellenberg makes the difference abunduntly clear in his great book How Not to Be Wrong: The Power of Mathematical Thinking
** This idea that what we did in the experiment actually does nothing and the results we got just happened by chance is called the “null hypothesis.” When we do an experiment, we’re trying to show that this “did nothing” idea is not that likely and so we can be reasonably sure that what we did actually caused the result we got.
The P-Value just tells us something about whether an effect is real or not, but it doesn’t tell us about the range of effects we might expect with the experiment we did. This is where the confidence interval comes in. Confidence intervals tell us about this range. For example a confidence interval of [+5%, +16%] means that we’re likely to get an effect from +5% to +16% by doing whatever it is we did in the experiment. Because the range is +5% to +16%, we wouldn’t expect an effect of 0%, or in other words, we wouldn’t expect our experiment to do nothing. When the confidence interval doesn’t overlap zero, like in this case, it is kind of the same thing as saying the result is significant because we can throw out the “does nothing” idea. Of course when the confidence interval is [+20%, +21%] we’ll be more certain about what kind of effect to expect than we would with a confidence interval of [+4%, +32%]. And, if we get a confidence interval of [-0.5%, +0.5%] we’ll be pretty sure that our experiment doesn’t do anything, but if we get an interval of [-20%, +20%], we really just don’t know what is going on*.
*This last example and most of the thinking in this paragraph taken from Jordan Ellenberg’s book How Not to Be Wrong: The Power of Mathematical Thinking