Academic ResearchCancerNewsletterPublic Health

The Bradford Hill Criteria

At the end of last week’s look at HRT and breast cancer, I promised that I would look at “The Bradford Hill criteria” this week. The thing that prompted me to do this was that the headlines in the HRT/breast cancer study were “Risk of breast cancer nearly tripled…” The study claimed that the risk was 2-3 times higher for women taking the oestrogen/progesterone form of HRT relative to women not taking HRT at all.

That study was the first that I can remember dissecting in years where the relative risk ratio was 2-3 times. Differences are usually 20-40%, or similarly small – not 2-3 times. Take the recent exercise study – I dissected this one to show that if you did more than 13 times the recommended amount of exercise (13 times!) vs. less than even the recommended amount, you had a claimed 28% reduced risk of diabetes.

Two to three times is thus unusual and it meets one of nine criteria, which should be known far more widely than they are and used far more often than they are. Welcome to a brilliant test of association studies, by a brilliant emeritus professor of medical statistics…

Bradford Hill

Sir Austin Bradford Hill was a British epidemiologist who was born in 1897 and died in 1991 (94 years – not bad!) He was best known for his collaborative research with Sir Richard Doll, which linked smoking with cancer and other chronic illness. (Epidemiology, as a reminder, is the study of populations to observe patterns/associations.)

Hill served as a pilot in World War I, but he contracted tuberculosis and was seriously ill for four years. This scuppered plans to do a medical degree, so he studied economics instead, but clearly maintained a passion for health. Hill was the statistician for the Medical Research Council’s Streptomycin in Tuberculosis Trials Committee, set up to investigate the value of Streptomycin in treating tuberculosis. Hill had originally suggested the method for randomised controlled trials in the 1930s and this study is generally accepted as being the first randomised controlled trial.

Both Hill and Doll were smokers. They set up the British Doctors Study, which ran from 1951 to 2001, outliving Hill, but not Doll (1912-2005). The British Doctors Study was a “prospective cohort study”. A prospective cohort study just means that a group of people (a population) was identified – in this case 30,000 British Doctors – and lots of lifestyle details about that group would have been captured at the start of the study and then the population would be followed for many years to see which diseases developed. The numbers would then be large enough such that patterns could be observed between lifestyle factors and illness. In this case, by 1956, Hill and Doll were able to demonstrate that smoking caused lung cancer. The British Doctors Study went on to connect smoking to many diseases. Apparently Hill continued pipe smoking until 1954 – presumably then no longer being able to match the enjoyment with the known risk.

Hill’s President’s Address, to the Royal Society of Medicine, was published in May 1965 in the Proceedings of the Royal Society of Medicine. In the speech, Hill set out to answer the question: “we see that the event B is associated with the environmental feature A… In what circumstances can we pass from this observed association to a verdict of causation?”

Specifically, he set out to define: “What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”

That’s a great question and the researchers pumping out association studies, reported almost daily in the media, should be asking this question every day, but they are not. Hill answered his own question in his speech by saying – there are nine criteria, which will answer our question – can we consider this association to be causation? Here they are…

1) Strength

The first factor on Hill’s list is the one that prompted this article – the strength of the association. Having worked in the field of cancer, Hill gave the example that deaths of chimney sweeps from scrotal cancer were “some 200 times” that of workers who were not chimney sweeps. That’s as strong as it gets. Few people could see the strength of that relationship and fail to conclude that being a chimney sweep causes scrotal cancer.

Hill then referred to smoking and lung cancer, as the topic he had been studying for 15 years, and reported that the death rate from lung cancer in smokers was 9-10 times that of non-smokers. This is why Hill and Doll reported that smoking caused lung cancer, not just that they were associated. In contrast, Hill said that the death rate from coronary thrombosis (a blood clot in an artery) in smokers was no more than twice that in non-smokers. Hill did not consider that a factor below two times was sufficient to claim causality. Interesting!

2) Consistency

This factor was saying – do we observe the association in different people, in different circumstances at different times? Is the observation consistent? If we observe that smokers in the UK get lung cancer, but smokers in France do not, then we cannot say that smoking causes lung cancer. If men who smoke get lung cancer, but women who smoke do not, we cannot say that smoking causes lung cancer.

3) Specificity

Statisticians are, by nature, very precise and this is the criteria of precision. Hill here is saying “what specifically/exactly is the relationship?” Is it smoking, or is it smoking in later life for example? Is it smoking and cancer or smoking and lung cancer? Hill and Doll noted many associations between smoking and disease (many different cancers and heart disease), but they were careful to be specific when suggesting causation.

4) Temporality

I would have called this one “Direction of causation.” The pure definition of temporality is: “the state of existing within or having some relationship with time.” Hill asks: with an association, which is the cart and which is the horse? i.e. what comes first? The particular example that Hill used in the speech was to ask: Does a particular diet lead to disease or do the early stages of disease lead to those dietary habits? How relevant that is to today’s studies. I personally ask this very question with, for example, the five-a-day myth. Do healthy people eat fruit and veg or does eating fruit and veg make someone healthy?

5) Biological gradient

What Hill calls “biological gradient”, we would call “dose response” nowadays. If smoking is associated with lung cancer and the more people smoke/the longer they smoke for, the more likely they are to get lung cancer, we say that there is a dose response – more of one thing is seen at the same time as more of the other. Causation is more likely when we see two things not only associated, but more of one being associated with more of the other. Hill had observed that the death rate from cancer of the lung rises with the number of cigarettes smoked daily. That’s what he called a biological gradient; that’s what we would call a dose response.

6) Plausibility

This is a simple one to understand – we observe smoking and lung cancer together, is it plausible that one causes the other? Dr John Briffa used a great example in one of our conferences – ice cream sales and deaths from shark attacks are associated. Does that mean that one causes the other? The answer is, of course, no. The common factor is warm weather. Warm weather ‘causes’ ice cream sales to go up and warm weather ‘causes’ more people to swim in the sea – being more likely to be attacked by sharks. The notion that ice cream consumption causes shark attacks (or vice versa) is implausible.

Every finding against any real food (butter, eggs, red meat) should fail the plausibility test. It is implausible (try inconceivable!) that the food that has sustained us for 3.5 million years will be proven to increase deaths.

7) Coherence

The question Hill asked here is: does the association fit with other facts? If we see an association between smoking and lung cancer and we have seen an increase in population smoking and an increase in lung cancer cases – this is coherent. It fits.

Saturated fat intake almost halved (1975-1999) during the period in which obesity increased almost tenfold. The idea that saturated fat causes obesity doesn’t fit; it’s incoherent.

8) Experiment

Having designed and presided over the first randomised controlled trial, Hill was well aware that the true test of causation is an experiment. Hill suggested that any experimental evidence would be useful to determine whether association was indeed causation. He gave the example that not doing the observed thing was valid as an experiment e.g. stopping smoking. From the many people who Hill and Doll could observe in the British Doctor’s Study, some people would have stopped smoking. If the incidence of lung cancer falls in people who stop smoking, it is more likely that smoking causes lung cancer.

9) Analogy

Hill’s final criterion was summed up with the word “analogy”. The fuller explanation would be – have we seen the likes before? If we observed that a drug or a virus were associated with birth defects, we would be able to recall thalidomide and rubella and be open to the idea that a drug or virus could cause birth defects.  Had nothing ever been known to cause a birth defect, it would be more difficult to hypothesise that, for example, the Zika virus could cause birth defects.

Testing the nine criteria

The Bradford Hill criteria are brilliant. Every researcher should be compelled to review their association against the nine Bradford Hill criteria before any paper be accepted for publication.

As an example, let’s have a quick review of the Bradford Hill criteria for the HRT and breast cancer study. I judge that eight of the nine criteria were met:

1) Strength (the risk – still relative – was more than two times);

3) Specificity (the conclusion specified that the combined oestrogen and progesterone form of HRT alone was the issue);

4) Temporality (breast cancer could follow HRT; it would be most unlikely for women to start HRT after a breast cancer diagnosis);

5) Biological Gradient/Dose response was addressed (the researchers observed higher risk for longer durations of the combined HRT);

6) It is Plausible that HRT causes breast cancer;

7) There is evidence to support the notion of Coherence. (This claims that breast cancer incidence fell following falls in combined HRT usage;

8) Experiment: Interestingly this followed the exact example given by Hill with stopping smoking. The study found that stopping HRT removed the observed risk;  And finally,

9) Analogy is met – we have observed harm from drugs/synthetic hormones in other circumstances and thus we have ‘seen the likes before’.

The one criterion perhaps not met from last week’s review was 2) Consistency. The Million women Study, Women’s Health Initiative and the Collaborative Reanalysis were not consistent in their findings.


The Bradford Hill criteria are a way of assessing if association may be causation. They don’t necessarily tell us what to worry about, or how much to worry. Clearly chimney sweeps should worry about scrotal cancer, at 200 times the incidence, but a factor of 2-3 times may not be an issue…

Back to the breast cancer and HRT study, we have a high likelihood of causation, not just association. However, we still have the fact that 2-3 times higher came down to a handful of cases in 1,000. The worst absolute risk I have calculated has been approximately 1 in 100. I still don’t know many people who would worry about a 1 in 100 risk, let alone a few in 1,000.

A shorthand takeaway from virtually every heath headline in the media therefore would be “Don’t worry”!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.