Understanding Statistical Error

(0) Donner la première évaluation
CHF 49.00
Download est disponible immédiatement
eBook (epub)
Informations sur les eBooks
Les eBooks conviennent également aux appareils mobiles (voir les instructions).
Les eBooks d'Ex Libris sont protégés contre la copie par ADOBE DRM: apprenez-en plus.
Pour plus d'informations, cliquez ici.


This accessible introductory textbook provides a straightforward, practical explanation of how statistical analysis and error measurements should be applied in biological research.

Understanding Statistical Error - A Primer for Biologists:

Introduces the essential topic of error analysis to biologists
Contains mathematics at a level that all biologists can grasp
Presents the formulas required to calculate each confidence interval for use in practice
Is based on a successful series of lectures from the author's established course
Assuming no prior knowledge of statistics, this book covers the central topics needed for efficient data analysis, ranging from probability distributions, statistical estimators, confidence intervals, error propagation and uncertainties in linear regression, to advice on how to use error bars in graphs properly. Using simple mathematics, all these topics are carefully explained and illustrated with figures and worked examples. The emphasis throughout is on visual representation and on helping the reader to approach the analysis of experimental data with confidence.

This useful guide explains how to evaluate uncertainties of key parameters, such as the mean, median, proportion and correlation coefficient. Crucially, the reader will also learn why confidence intervals are important and how they compare against other measures of uncertainty.

Understanding Statistical Error - A Primer for Biologists can be used both by students and researchers to deepen their knowledge and find practical formulae to carry out error analysis calculations. It is a valuable guide for students, experimental biologists and professional researchers in biology, biostatistics, computational biology, cell and molecular biology, ecology, biological chemistry, drug discovery, biophysics, as well as wider subjects within life sciences and any field where error analysis is required.


Dr Marek Gierlinski is a bioinformatician at College of Life Science, University of Dundee, UK. He attained his PhD in astrophysics and studied X-ray emission from black holes and neutron stars for many years. In 2009 he started a new career in bioinformatics, bringing his knowledge and skills in statistics and data analysis to a biological institute. He works on a variety of topics, including proteomics, DNA and RNA sequencing, imaging and numerical modelling.

Échantillon de lecture
Chapter 2
Probability distributions

Misunderstanding of probability may be the greatest of all impediments to scientific literacy.

-Stephen Jay Gould

Consider an experiment in which we determine the number of viable bacteria in a sample. To do this, we can use a simple technique of dilution plating. The sample is diluted in five consecutive steps, and each time the concentration is reduced 10-fold. After the final step, we achieve the dilution of 10_ 5. The diluted sample is then spread on a Petri dish and cultured in conditions appropriate for the bacteria. Each colony on the plate corresponds to one bacterium in the diluted sample. From this, we can estimate the number of bacteria in the original, undiluted sample.

Now, think of exactly the same experiment, repeated six times under the same conditions. Let us assume that in these six replicates, we found the following numbers of bacterial colonies: 5, 3, 3, 7, 3 and 9. What can we say about these results?

We notice that replicated experiments give different results. This is an obvious thing for an experimental biologist, but can we express it in more strict, mathematical terms? Well, we can interpret these counts as realizations of a random variable . But not just any completely random variable. This variable would follow a certain law, a Poisson law in this case. We can estimate and theoretically predict its probability distribution . We can use this knowledge to predict future results from similar experiments. We can also estimate the uncertainty, or error, of each result.

Firstly, I'm going to introduce the concept of a random variable and a probability distribution. These two are very closely related. Later in this chapter, I will show examples of a few important probability distributions, without which it would be difficult to understand error analysis.
2.1 Random variables

I will not go into gory technical details. A random variable is a mathematical concept, and it has a formal definition. For the purpose of this book, let us say that a random variable can take random values. It sounds a bit tautological, but this is probably the simplest possible definition. In practice, a random variable is a result of an experiment. Its randomness manifests itself in the differing values of repeated measurements of the same quantity. It is quite common that each time you make your measurement, you obtain a different number.

A random variable is a numerical outcome of an experiment. It will vary from trial to trial as the experiment is repeated.

Consider this example. Let us throw two dice and calculate the sum of the numbers shown. This can be any number between 2 and 12. More importantly, some results are more likely than others. For example, there is only one way of getting a 12 (a double 6), but there are five different combinations resulting in the sum of 6 (1+5, 2+4, 3+3, 4+2 and 5+1). It is easy to see that throwing a 6 is five times more likely than throwing a 12.

An example of a non-random variable could be the number of mice used in an experiment. If you have five mice, you have five mice and the result stays unless you drink too much whisky and begin to see little white mice everywhere.

Hold on. In Chapter 1, I showed an example of a repeated measurement that gave a different value each time. So, what is going to happen if you repeat your murine experiment many times? Well, if you come back to the cage after a minute, you are quite likely to find five mice again (unless you forgot to lock the cage). The result is not going to change regardless of how many times you count them. This type of repeated measurement is called pseudo-replication.

More about replication and pseudoreplication in Section 5.11.

But this is not what we are asking about. Typically, you would be conducting an experiment (e.


Introduction 1

Why would you read an introduction? 1

What is this book about? 1

Who is this book for? 2

About maths 2

Acknowledgements 3

Chapter 1 Why do we need to evaluate errors? 4

Chapter 2 Probability distributions 7

2.1 Random variables 8

2.2 What is a probability distribution? 9

Probability distribution of a discrete variable 9

Probability distribution of a continuous variable 10

Cumulative probability distribution 11

2.3 Mean, median, variance and standard deviation 11

2.4 Gaussian distribution 13

Example: estimate an outlier 15

2.5 Central limit theorem 16

2.6 Log-normal distribution 18

2.7 Binomial distribution 20

2.8 Poisson distribution 23

Classic example: horse kicks 25

Inter-arrival times 26

2.9 Student's t-distribution 28

2.10 Exercises 30

Chapter 3 Measurement errors 32

3.1 Where do errors come from? 32

Systematic errors 33

Random errors 34

3.2 Simple model of random measurement errors 35

3.3 Intrinsic variability 38

3.4 Sampling error 39

Sampling in time 39

3.5 Simple measurement errors 41

Reading error 41

Counting error 43

3.6 Exercises 46

Chapter 4 Statistical estimators 47

4.1 Population and sample 47

4.2 What is a statistical estimator? 49

4.3 Estimator bias 52

4.4 Commonly used statistical estimators 53

Mean 53

Weighted mean 54

Geometric mean 55

Median 56

Standard deviation 57

Unbiased estimator of standard deviation 59

Mean deviation 62

Pearson's correlation coefficient 63

Proportion 65

4.5 Standard error 66

4.6 Standard error of the weighted mean 70

4.7 Error in the error 71

4.8 Degrees of freedom 72

4.9 Exercises 73

Chapter 5 Confidence intervals 74

5.1 Sampling distribution 75

5.2 Confidence interval: what does it really mean? 77

5.3 Why 95%? 79

5.4 Confidence interval of the mean 80

Example 83

5.5 Standard error versus confidence interval 84

How many standard errors are in a confidence interval? 84

What is the confidence of the standard error? 85

5.6 Confidence interval of the median 86

Simple approximation 89

Example 89

5.7 Confidence interval of the correlation coefficient 90

Significance of correlation 93

5.8 Confidence interval of a proportion 95

5.9 Confidence interval for count data 99

Simple approximation 102

Errors on count data are not integers 102

5.10 Bootstrapping 103

5.11 Replicates 105

Sample size to find the mean 108

5.12 Exercises 109

Chapter 6 Error bars 112

6.1 Designing a good plot 112

Elements of a good plot 113

Lines in plots 115

A digression on plot labels 116

Logarithmic plots 117

6.2 Error bars in plots 118

Various types of errors 119

How to draw error bars 120

Box plots 121

Bar plots 123

Pie charts 128

Overlapping error bars 128

6.3 When can you get away without error bars? 130

On a categorical variable 130

When presenting raw data 130

Large groups of data points 130

When errors are small and negligible 131

Where errors are not known 131

6.4 Quoting numbers and errors 132

Significant figures 132

Writing significant figures 133

Errors and significant figures 135

Error wi...

Afficher plus

Détails sur le produit

Understanding Statistical Error
A Primer for Biologists
eBook (epub)
Protection contre la copie numérique
Adobe DRM
Taille de fichier
4.39 MB
Nombre de pages
Afficher plus
Les clients ayant acheté cet article ont également acheté :