Exploring data: The difference between data and an anecdote

 

If you only have one data point, then you have an “anecdote”. Like the photo below:

The baboon was on the roof of a holiday house, snacking on the red lentils it had found inside (http://vmus.adu.org.za/?vm=MammalMAP-26870)

 

The information we have is that of a single Chacma Baboon Papio ursinis on a date (30 December 2018) at a locality (Bettys Bay, Western Cape, South Africa), engaging in a nefarious activity (housebreaking and theft). This is an anecdote. From this single observation, we cannot draw the conclusion that lone baboons regularly raid holiday homes at Bettys Bay in summer. This is a sample of size one. To make decisions about how and when baboons in Bettys Bay need to be managed, a much larger sample of data is needed.

Sometimes a sample of  size one is massively important. It can alert us to a new and emerging issue. There is an awesome paper in Biodiversity Observations by citizen scientists John Fincham and Nollie Lambrechts. It is called “How many tortoises do a pair of Pied Crows Corvus albus need to kill to feed their chicks?” The abstract reads: “This paper presents proof of heavy predation on tortoises by a pair of Pied Crows at a single nest site in order to rear successive broods of chicks. ” The operative word is “single”. This is a sample of size one. From this single observation, it would be irresponsible to decide to cull Pied Crows to save the tortoise.

The question in the title of the paper in the ejournal Biodiversity Observations is: “How many tortoises do a pair of Pied Crows Corvus albus need to kill to feed their chicks?”. This is the photograph which contains the answer. There are 315 Angulate Tortoises Chersina angulata in it. The paper is at https://journals.uct.ac.za/index.php/BO/article/view/230. The record of the tortoises is curated in ReptileMAP at http://vmus.adu.org.za/?vm=ReptileMAP-171312. Photo: Nollie Lambrechts

 

The paper by John Fincham and Nollie Lambrechts comes to precisely the correct conclusion by saying: “A comprehensive survey to establish the extent to which this degree of damage is replicated needs to be undertaken urgently.” This is an important “biodiversity observation”. The authors might be onto a real conservation issue for tortoises. But they might equally well have discovered an unusual pair of Pied Crows! You cannot take management action on an anecdote, a sample  of  size one

If the sample size is two, then you really just have two anecdotes, two data points. You cannot draw conclusions from a sample of size two. How about three? How large a sample do you need to be able to draw reliable conclusions? How many data points do you need before you can decide whether an intervention is needed? A statistician would talk about “sample size” and denote this unknown number with the letter n.

There is (unfortunately) no straightforward answer to questions about sample size. There is no rule of thumb. Ultimately the answer lies in discovering how variable the thing you are trying to measure is.

If you are a budding astronomer, say in Ancient Egypt, and you wanted to  find out the number of days from one full  moon to the next. The answer is very dull: 29½ days. After you have got the same answer repeatedly, it is clear that you got it right first time. All you really needed was a sample of size one. If there is no variability, a sample of size one is adequate. But you cannot know that at first!

 

Nola Parsons measuring an oystercatcher egg at the Koeberg Nuclear Power Station. She is using “dial callipers”; the “ruler” shows that the egg is a little bit more than 60 mm, and the “dial” reads about 1 mm (but you need to look at it from directly above to get an accurate reading, to 0.1  mm). She used the size measurements of the egg and its mass to estimate how long the egg had been incubated for. You can read this in her PhD is entitled Quantifying abundance, breeding and behaviour of the African Black Oystercatcher. Here is a photo of one her study birds: http://vmus.adu.org.za/?vm=BirdPix-2392

 

The eggs of the African Black Oystercatcher Haematopus moquini are variable in length, so you definitely need to measure more than one egg to get a good handle on average egg length. But this a not a particularly variable characteristic, so once you have measured a small  sample,  you have a pretty accurate estimate of egg length.

This herd of African Bush Elephants Loxodonta africana is curated at http://vmus.adu.org.za/?vm=MammalMAP-706

 

In contrast to the lengths of oystercatcher eggs, the sizes of African Elephant herds are very variable. So to get a good estimate of “average”  herd size, a large sample size is essential.

In this blog, we  have learnt that a sample of size one can (usually) be dismissed  as an “anecdote”. We have learnt that, as the thing we want to measure gets more variable, we need larger and larger sample sizes to be able to draw conclusions from the data. Most of the time, large variability is a pain, requiring that we get a large samples to estimate the “average” of the thing we want to measure.

In future blogs, we will think about sensible ways to measure the average in a sample of data, and about how we measure variability.

 

 

 

Les Underhill
Les Underhill
Prof Les Underhill was Director of the Animal Demography Unit (ADU) at the University of Cape Town from its start in 1991 until he retired. Although citizen science in biology is Les’s passion, his academic background is in mathematical statistics. He was awarded his PhD in abstract multivariate analyses in 1973 at UCT and what he likes to say about his PhD is that he solved a problem that no one has ever had. He soon grasped that this was not the field to which he wanted to devote his life, so he retrained himself as an applied statistician, solving real-world problems.

2 Comments

  1. A very clear explanation about the sample size concept. I have really enjoyed reading this article and with your permission will like to use it to begin during the introductory discussions with the students in my biostatistics class.

Comments are closed.