Exploring data: The difference between data and an anecdote

If you only have one data point, then you have an “anecdote”. Like the photo below:

*The baboon was on the roof of a holiday house, snacking on the red lentils it had found inside (http://vmus.adu.org.za/?vm=MammalMAP-26870)*

The information we have is that of a single Chacma Baboon Papio ursinis on a date (30 December 2018) at a locality (Bettys Bay, Western Cape, South Africa), engaging in a nefarious activity (housebreaking and theft). This is an anecdote. From this single observation, we cannot draw the conclusion that lone baboons regularly raid holiday homes at Bettys Bay in summer. This is a sample of size one. To make decisions about how and when baboons in Bettys Bay need to be managed, a much larger sample of data is needed.

Sometimes a sample of size one is massively important. It can alert us to a new and emerging issue. There is an awesome paper in Biodiversity Observations by citizen scientists John Fincham and Nollie Lambrechts. It is called “How many tortoises do a pair of Pied Crows Corvus albus need to kill to feed their chicks?” The abstract reads: “This paper presents proof of heavy predation on tortoises by a pair of Pied Crows at a single nest site in order to rear successive broods of chicks. ” The operative word is “single”. This is a sample of size one. From this single observation, it would be irresponsible to decide to cull Pied Crows to save the tortoise.

The question in the title of the paper in the ejournal Biodiversity Observations is: “How many tortoises do a pair of Pied Crows Corvus albus need to kill to feed their chicks?”. This is the photograph which contains the answer. There are 315 Angulate Tortoises Chersina angulata in it. The paper is at https://journals.uct.ac.za/index.php/BO/article/view/230. The record of the tortoises is curated in ReptileMAP at http://vmus.adu.org.za/?vm=ReptileMAP-171312. Photo: Nollie Lambrechts

The paper by John Fincham and Nollie Lambrechts comes to precisely the correct conclusion by saying: “A comprehensive survey to establish the extent to which this degree of damage is replicated needs to be undertaken urgently.” This is an important “biodiversity observation”. The authors might be onto a real conservation issue for tortoises. But they might equally well have discovered an unusual pair of Pied Crows! You cannot take management action on an anecdote, a sample of size one

If the sample size is two, then you really just have two anecdotes, two data points. You cannot draw conclusions from a sample of size two. How about three? How large a sample do you need to be able to draw reliable conclusions? How many data points do you need before you can decide whether an intervention is needed? A statistician would talk about “sample size” and denote this unknown number with the letter n.

There is (unfortunately) no straightforward answer to questions about sample size. There is no rule of thumb. Ultimately the answer lies in discovering how variable the thing you are trying to measure is.

If you are a budding astronomer, say in Ancient Egypt, and you wanted to find out the number of days from one full moon to the next. The answer is very dull: 29½ days. After you have got the same answer repeatedly, it is clear that you got it right first time. All you really needed was a sample of size one. If there is no variability, a sample of size one is adequate. But you cannot know that at first!

Nola Parsons measuring an oystercatcher egg at the Koeberg Nuclear Power Station. She is using “dial callipers”; the “ruler” shows that the egg is a little bit more than 60 mm, and the “dial” reads about 1 mm (but you need to look at it from directly above to get an accurate reading, to 0.1 mm). She used the size measurements of the egg and its mass to estimate how long the egg had been incubated for. You can read this in her PhD is entitled Quantifying abundance, breeding and behaviour of the African Black Oystercatcher. Here is a photo of one her study birds: http://vmus.adu.org.za/?vm=BirdPix-2392

The eggs of the African Black Oystercatcher Haematopus moquini are variable in length, so you definitely need to measure more than one egg to get a good handle on average egg length. But this a not a particularly variable characteristic, so once you have measured a small sample, you have a pretty accurate estimate of egg length.

*This herd of African Bush Elephants* Loxodonta africana is curated at http://vmus.adu.org.za/?vm=MammalMAP-706

In contrast to the lengths of oystercatcher eggs, the sizes of African Elephant herds are very variable. So to get a good estimate of “average” herd size, a large sample size is essential.

In this blog, we have learnt that a sample of size one can (usually) be dismissed as an “anecdote”. We have learnt that, as the thing we want to measure gets more variable, we need larger and larger sample sizes to be able to draw conclusions from the data. Most of the time, large variability is a pain, requiring that we get a large samples to estimate the “average” of the thing we want to measure.

In future blogs, we will think about sensible ways to measure the average in a sample of data, and about how we measure variability.

2 Comments

Sam Ivande says:

27/06/2019 at 13:01

A very clear explanation about the sample size concept. I have really enjoyed reading this article and with your permission will like to use it to begin during the introductory discussions with the students in my biostatistics class.
1. admin says:
  
  27/06/2019 at 17:29
  
  Thank you, Sam. You (and everyone else) are welcome to use this in any way. This is going to be part of a series, so please keep coming back. There is already one on means and medians (http://thebdi.org/blog/2019/06/26/exploring-data-the-median-and-the-mean-and-everything-in-between/)

Comments are closed.