Everyone is aware of the "lifestyle diseases" debate. The backward evolution image is taken from a dietary advice book by Andreas Eenfeldt that claims to debunk the widely-held view that fat is bad for you. It exposes the fact that awareness and sensible response do not necessarily go together. Witness the "recent revelation" that lack of exercise is twice as dangerous as obesity*. Ten years data collection from 334,161 individuals followed by twelve years follow up by 48 authors; this is EPIC (no, really! European Prospective Investigation into Cancer and nutrition). Cambridge were so excited about the exercise component that the University website misidentified the Journal as the American Journal of Clinical Exercise (it should be Nutrition). If the study's objective was to make the population healthier, then it does seem a little sledgehammer-to-crack-nut. Maybe there was a different objective? I will watch McDonald's share price with interest! EPIC highlight their 6M person-years of data but say little about the quality of the data collected. Nutritional intake assessment is notoriously unreliable even when standardised across a multi-centre database; EPIC's wasn't, although they did try. The idea that size can overcome imprecision has been around for a long time, although "Big Data" probably wasn't even a term when EPIC began. If you really want to find big data, a good starting point would be closer to our dairy home. The YouTube is lovely, but the idea that data sensing from dairy cows could generate 35 petabytes (mega-giga-tera-peta, kilo ceased to exist years ago!) of data a year is pinch-of-salt stuff! Which brings me to the crux. An eminent colleague and I were discussing the maternal-investment claim that Holstein cattle favour female offspring, ie those cows that give birth to a heifer calf have higher milk yield. During pregnancy, endocrine signals from the fetus and placenta stimulate the development of the mammary gland, and the authors suggest that the gender of the fetus influences the intensity of these signals. Published last year** the conclusion arose from 2.39M lactation records collected in the USA in the 1990s. I remember the graduated jars and manual milk yield recording. At one time my father developed a fictitious interest in Isaiah 43 because the very pretty milk recorder was a Jehova's Witness! I also remember our neighbour explaining exactly how he massaged yield records by not milking-out selected cows the day before. Which cows did he select? Those whose heifer calf might attract best price on the basis of their dam's excellent yield! In the US study, the gender bias disappeared in cows treated with BST. Biology? Possibly, but on the other hand the farmer-reported use of BST in this study was less than 4% of lactations, perhaps representing a subset of honest farmers. Evidence from BST sales indicates that around 33% of cows were being treated. Nevertheless, I shall add sexed-semen sales to my list of shares to watch! Big data is attracting so much interest that the EU has decided that www wwwill no longer be able to cope, and has set up a ppp (you get the pppoint!) to develop a new, cloud-based "Future Internet", or FI. Smart AgriFood is one of the sectors identified as being most likely to use big data, and a Wageningen-led consortium called FIspace has set up an ABCDEF (wait for it: Agri-Business Collaboration and Data Exchange Facilty!) to lead the way. I am not against big data per se, but I do have problems with the blithe assumption that having millions of data-points will render those data reliable. It won't, if there is a systematic fault. We will find ways of using very large datasets properly, probably based around a better knowledge of how the data were obtained and a HACCP-type analysis of where errors might occur. In the meantime we should perhaps be cautious, which, incidentally, is exactly the view that currently predominates in business advice.
*Ekelund et al doi: 10.3945/ajcn.114.100065.
** Hinde et al doi: 10.1371/journal.pone.0086169.