← Operationalizing the local environment for replicator dynamics

Passive vs. active reading and personalization →

Cancer, bad luck, and a pair of paradoxes

April 4, 2015 by Rob Noble 21 Comments

Among the highlights of my recent visit to IMO were several stimulating discussions with Artem Kaznatcheev. I’m still thinking over my response to his recent post about reductionist versus operationalist approaches in math biology, which is very relevant to some of my current research. Meanwhile, at Artem’s suggestion, this post will discuss a reanalysis of the “cancer and bad luck” paper that spurred so many headlines at the start of this year. Whereas many others have written critiques of that paper’s statistical methods and interpretations, my colleagues and I instead tried fitting alternative models to the underlying data. We thus found ourselves revisiting a couple of celebrated scientific paradoxes.

To start this post, I will introduce you to Simpson’s paradox and Peto’s paradox. With these pair of paradoxes in mind, we’ll turn a critical eye to Tomasetti & Vogelstein (2015), and I will explain our reanalysis of their data set.

Edward H Simpson has several claims to fame. He began his statistical career as a wartime code breaker at Bletchley Park, applying Bayesian methods to speed up the deciphering of German and Japanese ciphers (for the history, see Simpson, 2010). In ecology, he is noted for introducing a widely-used measure of diversity (Simpson, 1949). After the war he became a distinguished civil servant. But perhaps Simpson’s best known achievement is a paper (Simpson, 1951) he wrote as a graduate student, characterising a puzzling quirk of statistics that now bears his name.

To understand the phenomenon, consider the relationship between the numbers of heads (H) and tails (T) observed in experiments, where each experiment comprises N = 100, 200, or 300 coin tosses. Across the whole data set, we’d expect to see a positive correlation between H and T because the expected values of each variable scale with N (as illustrated in the figure at right). However, for each distinct value of N we have H = N – T. Therefore if we split the data into three subsets according to sample size (N = 100, N = 200, and N = 300) then H and T will be negatively correlated within each subset (the differently coloured groups in the figure at right). This is Simpson’s paradox: an observed trend in data can change and even reverse when the data are split into subsets, leading to all sorts of counterintuitive results.

Another English statistician, Sir Richard Peto, gave his name to our second paradox, which concerns cancer incidence across species (something that Artem has touched on before; for original paper, see Peto et al., 1975; Peto, 1977). It is thought that cancer most commonly arises due to DNA modifications during cell division (specifically, stem cell division). Therefore we might expect large animals to have many more tumours than small ones, simply due to the vast difference in total number of cells and therefore in cell divisions per body. Scaling up from human cancer rates, we’d expect blue whales to be overwhelmed by cancer. Peto’s paradox is the observation that, at the species level, there appears to be no such correlation between the incidence of cancer and the number of organismal cells. Evolutionary theory provides a possible solution to this paradox, because selection for cancer-suppressing traits should be stronger in larger organisms.

Thus primed in paradoxes, let’s return to the work of Tomasetti & Vogelstein (2015). The idea behind their study is appealingly simple. As already mentioned, it’s thought that cancer usually starts during stem cell division. Consistent with this theory, Tomasetti & Vogelstein (2015) found a correlation between cancer incidence per tissue and the lifetime number of stem cell divisions within the tissue. For example, stem cells in the brain seldom divide and brain tumours are generally rare, whereas the colon has a high turnover of stem cells and is more commonly afflicted by cancer.

Two observations inspired myself, my boss Michael Hochberg, and our colleague Oliver Kaltz to reanalyse this data set. First, if every stem cell division carries the same risk of seeding cancer then the slope of the correlation should be 1. In other words, doubling the number of divisions should double the risk of cancer. However, the correlation identified by Tomasetti & Vogelstein (2015) has a slope of only ~0.5 (on the log-log scale). Second, the data set is not representative of all cancer types: breast and prostate cancers are omitted (due to lack of reliable measurements), whereas other cancers such as osteosarcoma (bone cancer) are seemingly overrepresented. Such sample bias could skew the analysis.

Cancer risk versus lifetime number of stem cell divisions (lscd) on a log-log scale. Each colour corresponds to a different anatomical site, and the coloured lines are best fit of our model. The length of each coloured line represents the variance in risk due to the lifetime number of stem cell divisions for that anatomical region or tissue type, and the spacing between coloured lines represents the variation due to tissue type. Based on figures 5A and 6A of Noble et al. (2015).

Our response to these issues was to change the statistical model. Instead of fitting a regression model to all cancer types together, we subsetted the data by anatomical site (bone, thyroid, pancreas, etc.). Thus we divided the variation in cancer risk into two parts: between-group variation associated with anatomical site, and within-group variation associated with number of stem cell divisions (see figure at right). Our hypothesis was that the correlation within groups might differ from the correlation across the whole data set. In other words, we predicted that the location of a cancer within the body might be one of the “lurking explanatory variables” that underlie Simpson’s paradox.

As suspected, the subsets revealed a very different pattern compared to the combined data (see figure at right). Within each of the groups defined by anatomical site, the slope of the correlation between cancer risk and lifetime number of stem cell divisions is approximately 1, exactly as predicted by Tomasetti & Vogelstein’s biological hypothesis. However, there is also large variation between subsets, which means that the cancer risk per stem cell division varies enormously depending on anatomical site. For example, our results suggest that a stem cell division is ~10,000 times more likely to lead to cancer if it occurs in bone or in the thyroid than in the small intestine.

And here is where we find the connection to the second paradox. Just as in differently sized species, we suggest that anatomical sites with high numbers of stem cell divisions have evolved more powerful anti-cancer mechanisms. The lining of the colon is constantly renewing itself through stem cell divisions, so we would expect the colon to be especially good at lessening the risk of tumourogenesis per division. Conversely, bone stem cell divisions are rare, so we would not expect bones to invest so heavily in cancer prevention. Although the risk of each cancer type correlates with the lifetime number of stem cell divisions within each anatomical site, there is no such correlation between anatomical sites. Instead, the risks of the most common cancers of each anatomical site appear to saturate at around 1%. Similarly, there may be a correlation between cancer risk and body size within species (e.g. dogs and humans) but not between species. It seems that Peto’s paradox applies to anatomical sites just as well as it applies to species.

In summary, by applying an only slightly more complex statistical model, and by viewing our results in the light of evolutionary theory, we have obtained quite different results from Tomasetti & Vogelstein’s data, adding to the understanding of carcinogenesis and the evolution of cancer control. You can find out more by downloading our arXiv preprint (Noble et al., 2015). Comments are especially welcome now as we’re preparing to submit this work to a journal soon.

[Editor’s note: this work has now appeared as: Noble, R., Kaltz, O., & Hochberg, M. E. (2015). Peto’s paradox and human cancers. Phil. Trans. R. Soc. B, 370(1673): 20150104.]

References

Noble, R., Kaltz, O., & Hochberg, M.E. (2015). Statistical interpretations and new findings on Variation in Cancer Risk Among Tissues. arXiv preprint: 1502.01061.

Peto, R., Roe, F.J.C., Lee, P.N., Levy,L., & Clack, J. (1975). Cancer and ageing in mice and men. British Journal of Cancer, 32(4): 411–426.

Peto R., (1977). Epidemiology, multistage models, and short-term mutagenicity tests. In: Hiatt HH, Watson JD, Winsten JA, editors. The Origins of Human Cancer. NY: Cold Spring Harbor Conferences on Cell Proliferation, 4, Cold Spring Harbor Laboratory. pp. 1403–1428.

Simpson, E.H. (1949). Measurement of diversity. Nature.

Simpson, E.H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. Series B (Methodological), 238-241.

Simpson, E.H. (2010). Edward Simpson: Bayes at Bletchley Park. Significance, 7(2): 76-80.

Tomasetti, C., & Vogelstein, B. (2015). Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science, 347(6217): 78-81.

21 Responses to Cancer, bad luck, and a pair of paradoxes

Steve says:

April 4, 2015 at 15:52

Very nice post! It seems that you’ve done the work that T&V should have done in the first place.

Reply
Steve says:

April 4, 2015 at 16:46

One more comment. The remark about the adaptive benefits of lower cancer risk per stem cell division in the colon vs. bone makes eminent sense to me. Yet I have a nagging suspicion that invoking natural selection in this way might not be justified. Since cancer is highly correlated with aging, it could be that getting cancer is adaptive and in fact nature evolved a special mechanism to make sure that bone will eventually get cancer despite its lower cell division count. In other words, the evolutionary logic could be entirely reversed. Colon is what it is and evolved with no care about cancer, which kicks in with aging. On the other hand, nature put a lot of effort into making sure that bone would eventually get cancer and old people would die. So instead of the evolution of anti-cancer mechanisms, we may have had the evolution of pro-cancer mechanisms. I am not saying I believe any of this. I am only wondering how it can be ruled out and, if it cannot, why is it useful to bring in natural selection at the level of the species (as opposed to the cellular level). This is not a criticism: it’s an honest question that only reflects my ignorance.

Reply
- Rob Noble says:
  
  April 5, 2015 at 09:39
  
  Hi Steve. I’m glad you like the post! With regard to bone cancer specifically, it’s worth noting that this is (unusually among cancers) primarily a disease of children and young adults, and seems to be a rare side effect of rapid growth. More generally, most evolutionary biologists think that diseases of ageing are not adaptive (i.e. they are unlikely to be selected by evolution) but instead occur due to weaker selection in older individuals, who are past reproductive age. The main argument is that group selection is expected to be relatively weak. However, there are some who advocate the alternative argument that you describe (e.g. http://joshmitteldorf.scienceblog.com/2015/03/30/fertility-is-kaput-but-life-goes-on/).
  
  Reply
  - Steve says:
    
    April 5, 2015 at 10:51
    
    Thanks for the reply and the link. Gives me lots to think about.
    
    Reply
  - alistair bain says:
    
    November 11, 2015 at 07:54
    
    If there was a risk that a species could eat all the food in its area so the food became extinct then i can see why some animals could be programmed to die early so their numbers dont become too great.
    
    Reply
    - Rob Noble says:
      
      November 18, 2015 at 06:12
      
      Hi Alistair. I think you’re assuming that natural selection acts for the good of the species, which is a widespread but misconception. For an explanation of this issue see http://evolution.berkeley.edu/evolibrary/misconceptions_faq.php#b4
      
      For an introduction to theories about the evolution of ageing and death, see http://io9.com/what-is-the-evolutionary-advantage-of-death-743044300
      
      Reply
David Colquhoun says:

April 4, 2015 at 19:10

This is a fascinating analysis. I suggest that the paper (and the blog post) should contain a clear, and (as far as possible) non-technical statement of the extent to which the cancer can be regarded as a result of chance, “luck” (excluding, of course, lung cancer in cigarette smokers). It is this aspect of T&V which caused most comment, and not infrequently, outrage, If it isn’t dealt with explicitly in the paper, you’ll probably have to write another ‘clarification’.

Reply
Michael E. Hochberg says:

April 5, 2015 at 15:27

David, in reply to your comments, our analysis (Noble et al. 2015) indicates that incidence can be predicted for the cancers covered in the Tomasetti and Vogelstein dataset based on knowledge about anatomical sites and the total number of stem cell divisions. Two important points emerge. First, based only on this data set, it is not possible to assess to what extent incidence and variation in incidence across cancers is due to the environment and/or “natural” probabilities of cellular transformation to malignancy (e.g., random mutations). That the slope in our statistical analysis is approximately 1 suggests that if environments were having a significant impact on cancer incidence (again, for the subset of cancers used in our analysis), then these environmental effects would appear to be affecting one or more anatomical sites in some proportional kind of way (so that slopes remain little changed). It is impossible to assess this hypothesis without additional data and therefore, it is premature to conclude that even for the cancer types without any apparent major environmental causes, more subtle environmental effects (i.e., which would reduce the “bad luck” component of the inference) do not play a role. This said, even if we are able to identify how such environmental factors may increase cancer incidences across the board, to the extent that exposure is unavoidable and secondary prevention non-existent, the effect of natural and added environmental effects are both effectively “bad luck”. Thus a slope of 1 indicates that variation within each anatomical site is due to “bad luck”, but we cannot assess the relative impacts of bad luck and the environment between anatomical sites.

Second, our analysis says that simple rules (effects of total number of stem cell divisions and anatomical site) predict incidence with high confidence. This is a surprising result and suggests that having fewer or greater numbers of stem cell divisions for a given anatomical site would put a given individual at lower or greater risk, respectively, for cancer in that site. Our analysis did not statistically investigate how differences in the total number of stem cell divisions between individuals was predictive of cancer risk, but some studies are suggestive of this type of effect (e.g., Roychoudhuri et al. 2006; Kabat et al. 2013). Thus, to the extent that a given individual is potentially more prone to certain cancers based on more expected lifetime stem cell divisions, this can be regarded as an uncontrollable causal factor should the cancer be obtained (i.e., “bad luck”).

Therefore, we cautiously conclude that if we call “bad luck” our lack of ability (based on for example, life-style changes or chemoprevention) to reduce cancer risk, and Tomasetti and Vogelstein’s data are indeed accurate, then random effects do explain some and possibly most of the variation, when accounting for anatomical site (Noble et al. 2015). Nevertheless, we do know that primary and secondary prevention can reduce probabilities of morbidity and mortality for a range of cancers (e.g., Martin-Moreno et al. 2008). Thus, even if some degree of “bad luck” contributes to explaining variation in cancer incidence, this does not lessen the fact that life-style and active forms of prevention will influence a person’s life-time risk of cancer.

Kabat GC, Anderson ML, Heo M, Hosgood HD, Kamensky V, Bea JW, Hou L, Lane DS, Wactawski-Wende J, Manson JE, Rohan TE. (2013). Adult stature and risk of cancer at different anatomic sites in a cohort of postmenopausal women. Cancer Epidemiol Biomarkers Prev. 22(8):1353-63.

Martin-Moreno JM, Soerjomataram I, Magnusson G. (2008) Cancer causes and prevention: A condensed appraisal in Europe in 2008, European Journal of Cancer 44(10):1390-1403.

Noble R, Kaltz O, Hochberg ME. 2015. Statistical interpretations and new findings on Variation in Cancer Risk Among Tissues. arXiv q-bio arXiv:1502.01061

Roychoudhuri, R, Putcha, V, and Moller, H. (2006). Cancer and laterality: a study of the five major paired organs (UK). Cancer Causes & Control 17(5):655-662

Reply
- David Colquhoun says:
  
  April 5, 2015 at 19:13
  
  Thanks very much for that clear exposition. I hope the some version of it will appear in your final paper.
  
  Martin-Moreno et al. (2008) is,of course, all based on associations. If you don’t smoke cigarettes, and drink alcohol in moderation, the other lifestyle effects are small, and causality is dubious. I’d presume, therefore, that for this group at least, most cancers are a matter of bad luck.
  
  Reply
  - Michael E. Hochberg says:
    
    April 6, 2015 at 03:56
    
    Agreed, indeed to the extent that we could do better controls (RE the Martin-Moreno et al study), we may be able to detect a signal, but still would not necessarily be able to determine causation, nor whether causation is apparent, but other unquantified intervening factors (past events, genetics, etc) were not also ‘causal’. Doll and Peto’s classic work (BMJ 1976; pg 1531, tables IX and X) is more suggestive about the effects of ceasing to smoke. Cuzick et al’s work (Fig 1, Annals of Oncology 26: 47–57, 2015) on aspirin too would appear to be as far as we can go with human subjects in estimating causal effects.
    
    This all said, the results published in our arXiv article are indicative that “bad luck” could play a considerable role in the subset of cancer data we analyzed. Again, we use “bad luck” in a contrasting way to the use in Tomasetti and Vogelstein. In our study, “bad luck” could have environmental components, increasing incidence beyond expectations based on background levels of variation in stem cell number, division rate, and mutational probabilities. This will be clarified in subsequent publications of our work.
    
    Reply
Ian Johnson says:

April 6, 2015 at 04:50

Excellent post. I have a couple of questions. The original analysis was based, I think, on US incidence rates only. However some cancers, notably colon and esophagus, show marked variations across populations (at least 10 fold for colon and even higher for squamous cell esophagus within e.g. China). Can these issues be addressed or acknowledged within you analysis? Secondly, from memory, I think colon and small intestine have somewhat similar stem cell divisions but markedly different incidence rates. Any comments?

Reply
- Artem Kaznatcheev says:
  
  April 6, 2015 at 11:53
  
  A similar question was asked on my G+ share of this post, and since the discussion is slightly splintered I will try to recreate some of the answer I wrote there. Rob or Mike will probably clarify this further, since this is based on my impression/memory of their work.
  
  When he was presenting their reanalysis, Rob made a good point about the single-patient-type data in relation to skin cancers. The idea is that having single patient types, even weird ones, is actually useful for their analysis because they are breaking things down by tissue type, and different tissues can have different base rates in different environments. So you see the fit with skin cancer in the west, but maybe the base rate is elevated because people generally like to sunbathe. If you looked at some other demographic where sunbathing or exposure was less common then you might see a lower base rate but still the same slope of 1 on the log-log scale. Similar with colon and esophagus cancers, in China you might have a base rate that is higher (so the green and black lines, say, are further to the right on the main plot), but you will still expect the same slope (i.e. still slope 1 on the green and black lines).
  
  I am not 100% sure if this is how Rob is thinking about this, so hopefully he will correct me if I am wrong.
  
  The exciting point for me here, is also that as we move to the stomach, say, we also can get interesting effects from a perspective that is not as reductionist as ‘mutations within individual cells’ by ecological interactions with things like H. pylori. Of course, even in that case, though, it might be still linked to higher rates of stem cell divisions due to inflammation.
  
  Reply
  - Ian Johnson says:
    
    April 6, 2015 at 15:12
    
    Thanks. It would be interesting to see the effect of introducing epidemiological data for some cancers which do show significant geographical variations into this analysis. Also, on closer inspection I see the two points for large and small intestine, but how close is the slope to 1? I think that, unlike colon, the incidence rates for small bowel are consistently low across populations.
    
    Reply
    - Rob Noble says:
      
      April 7, 2015 at 07:43
      
      Artem’s reply saves me answering your first question, Ian, as he expresses my argument very well. We agree that it would be interesting to repeat the analysis with data from different populations; this is one of a few follow-up studies we have in mind.
      
      The difference between colon and small intestine is especially interesting, I think. It may be that grouping these two organs into a single subset is invalid, because they have different stem cell populations and different microenvironments. Richard Peto predicted that the pattern he observed between species might also apply to human organs, and he specifically mentioned the small intestine as a site expected to have very powerful anti-cancer mechanisms (see page 13 of this 1979 article: http://libgallery.cshl.edu/items/show/74089). In the published version of our article, I hope we may be able to say a bit more to say about differences between the colon and the small intestine (and, perhaps, even between the duodenum, jejunum, and ileum).
      
      Reply
Ian Johnson says:

April 8, 2015 at 03:11

Yes that would be valuable. One of the problems with the T&V paper was the failure to make such distinctions. For example they did not distinguish between squamous cell carcinoma of the esophagus and adenocarcinoma, although the epidemiology differs considerably. Their analysis also sorted esophageal cancer into the group mostly attributable to “bad luck”, and yet the adverse effects of alcohol and tobacco use on the risk of squamous type esophageal cancer seem indisputable to me.

Reply
Rob Noble says:

June 12, 2015 at 04:23

An updated version of our analysis has been published as “Peto’s paradox and human cancers” in Philosophical Transactions of the Royal Society B: http://rstb.royalsocietypublishing.org/content/370/1673/20150104.

Reply
alistair bain says:

November 11, 2015 at 07:22

Do you think small intestine cancer is rare because of antioxidant enzymes in intestinal cells.There are many more oxidant molecules in the small intestine because of the large number of mitochondria providing energy to process food.Or is dna polymerase more reliable in intestinal cells? Is dna polymerase in the intestine flexible allowing the genes that code it to mutate but still produce a normal polymerase?

Reply
- Rob Noble says:
  
  November 18, 2015 at 06:21
  
  I’m unaware of any such theory, and I lack the expertise to comment, except to note that antioxidants don’t always act against tumours: http://www.scientificamerican.com/article/antioxidants-may-make-cancer-worse/
  
  An alternative reason for lower cancer risk in the small intestine might be the way cells are organised into crypts. For example see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3744108/
  
  Reply
alistair bain says:

November 11, 2015 at 07:51

It would be interesting to plot number of mitochondria per cell type versus the cancer incidence.Given the ability of mitochondria to damage dna with oxygen species would this be a straight line?

Reply
Pingback: Cataloging a year of blogging | Theory, Evolution, and Games Group
Pingback: Towards a unified theory of cancer risk – These few lines

Theory, Evolution, and Games Group

Cancer, bad luck, and a pair of paradoxes

References

21 Responses to Cancer, bad luck, and a pair of paradoxes

Leave a Reply Cancel reply

Recent Posts

Follow Blog via Email

Contributing authors

Archives

Categories

TheEGG Community

Join the Community via Email

Top Posts & Pages

Posts I Like

April 2015
S	M	T	W	T	F	S
« Mar				Oct »
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Theory, Evolution, and Games Group

Cancer, bad luck, and a pair of paradoxes

References

Share this:

Related

21 Responses to Cancer, bad luck, and a pair of paradoxes

Leave a Reply Cancel reply

Recent Posts

Follow Blog via Email

Contributing authors

Archives

Categories

TheEGG Community

Join the Community via Email

Top Posts & Pages

Posts I Like