Accuracy, Precision, Uncertainty, and Error
All measurements of physical quantities are subject to uncertainties in the measurements. It is not possible to measure anything exactly. Of course, steps can be taken to limit the amount of uncertainty but it is always there. In order to interpret data correctly and draw valid conclusions the uncertainty must be indicated and dealt with properly.
Consider the measurement of a person's height. Assuming that a woman's height has been determined to be 5' 8" by a measurement. How accurate is this measurement? Well, the height of this person depends on how straight she stands, whether she just got up (most people are slightly taller when getting up from a long rest in horizontal position), whether she has her shoes on, the length of her hair and how it is styled. These inaccuracies can all be called
errors of definition. A quantity such as height cannot be exactly defined without specifying many other circumstances. Even if the "circumstances," could be precisely controlled, the result would still have an error associated with it. This is because the scale was manufactured with a certain level of quality, it is often difficult to read the scale perfectly, fractional estimations between scale marking may be made and etc.
For the result of a measurement to have clear meaning, the value cannot consist of the measured value alone. An indication of how precise and accurate the result is must also be included. Thus, the result of any physical measurement has two essential components: (1) A numerical value (in a specified system of units) giving the best estimate possible of the quantity measured, and (2) the degree of uncertainty associated with this estimated value. For example, a measurement of the width of a table might yield a result such as 95.3 +/- 0.1 cm. This result is basically communicating that the person making the measurement believe the value to be closest to 95.3cm but it could have been 95.2 or 95.4cm.
The first step in communicating the results of a measurement or group of measurements is to understand the terminology related to measurement quality. It can be confusing, which is partly due to some of the terminology having suddle differenences and partly due to the terminology being used wrongly and inconsistently. Over the last several decades, some progress has been in lessening the confussion, partly due to the development of standard terminology by the International Organization for Standardization (ISO). The following material is provided in an attempt to help further reduce the confussion since using the proper terminology is key to ensuring that results are properly communicated. The terminology addressed includes, accuracy, error, trueness, bias, precision, repeatability, reproducibility and uncertainty.
Accuracy and Error
Accuracy is the closeness of agreement between a measured value and the true value. Since the true value cannot be absolutely determined, in practice an accepted reference value is used. Error is the difference between a measurement and the true value of the measurand (the quantity being measured). The total error is usually composed of both random error and systematic error. Random error is a component of the total error which, in the course of a number of measurements, varies in an unpredictable way. It is not possible to correct for random error. Systematic error is a component of the total error which, in the course of a number of measurements, remains constant or varies in a predictable way. Systematic errors and their causes may be known or unknown and can often be corrected for.
Trueness and Bias
Trueness is the closeness of agreement between the average value obtained from a large series of test results and an accepted reference value. The terminology is very similar to that used in accuracy but trueness applies to the average value of a large number of measurements. Bias is the difference between the average value of the large series of measurements and the accepted reference value. Bias is equivalent to the total systematic error in the measurement and a correction to negate the systematic error can be made by adjusting for the bias.
Precision, Repeatability and Reproducibility
Precision is the closeness of agreement between independent measurements of a quantity under the same conditions. It is a measure of how well a measurement can be made without reference to a theoretical or true value. The number of divisions on the scale of the measuring device generally affects the consistency of repeated measurements and, therefore, the precision. Since precision is not based on a true value there is no bias or systematic error in the value, but instead it depends only on the distribution of random errors. The precision of a measurement is usually indicated by the uncertainty or fractional relative uncertainty of a value.
Repeatability is simply the precision determined under conditions where the same methods and equipment are used by the same operator to make measurements on identical specimens. Reproducibility is simply the precision determined under conditions where the same methods but different equipment are used by different operator to make measurements on identical specimens.
Uncertainty is the component of a reported value that characterises the range of values within which the true value is asserted to lie. Uncertainty estimates incorportate uncertainties form all possible effects and, therefore, usually is the most appropriate means of expressing the precision and accuracy of results. A given accuracy implies an equivalent
The first step in writing a meaningful value is to make sure the number is expressed with the proper number of significant figures. (See page on significant figures for more information.)
There are also specific rules for how to consistently express the uncertainty associated with a number. In general, the last significant figure in any result should be of the same order of magnitude (i.e.. in the same decimal position) as the uncertainty. Also, the uncertainty should be rounded to one or two significant figures. Always work out the uncertainty after finding the number of significant figures for the actual measurement. The uncertainty of the following three measurements is expressed correctly.
9.82 +/- 0.02
10.0 +/- 1.5
4 +/- 1
The uncertainty of these measurements is expressed incorrectly.
9.82 +/- 0.02385 is wrong but 9.82 +/- 0.02 is fine
10.0 +/- 2 is wrong but 10.0 +/- 2.0 is fine
4 +/- 0.5 is wrong but 4.0 +/- 0.5 is fine
The Idea of Error
The concept of error needs to be well understood. What is and what is not meant by "error"? A measurement may be made of a quantity that has an accepted value which can be looked up in a handbook (e.g.. the density of brass). The difference between the measurement and the accepted value is not what is meant by error. Such accepted values are not "right" answers; they are measurements that have errors associated with them as well. Furthermore, it cannot be determined exactly how far off a measurement is. If this could be done, it would be possible to just give a more accurate, corrected value. Nor does error mean "blunder." Reading a scale backwards, reading the wrong value, or using equipment that is not working properly are blunders that can be caught, explained as a mistake, and the data should be excluded from the data set.
Error, then, has to do with uncertainty in measurements that simply can't be eliminated or is impracticle to eliminate due to economic or other reasons. Error is what causes values to differ when a measurement is repeated and none of the results can be preferred over the others. Although it is not possible to do anything about such error, it can be characterized. For instance, the repeated measurements may cluster tightly together or they may spread widely. This pattern can be analyzed systematically. Often, more effort goes into determining the error or uncertainty in a measurement than into performing the measurement itself.
Classification of Error
Generally, errors can be divided into two broad classes:
systematic and random. Systematic errors are those that tend to shift all measurements in a systematic way so their mean value is displaced. This may be due to such things as incorrect calibration of equipment, consistently improper use of equipment or failure to properly account for some effect. In a sense, a systematic error is rather like a blunder and large systematic errors can and must be eliminated in a good experiment. But small systematic errors will always be present. For instance, no instrument can ever be calibrated perfectly. Other sources of systematic errors are external effects which can change the results of the experiment, but for which the corrections are not well known. In science, the reasons why several independent confirmations of experimental results are often required (especially using different techniques) is because different apparatus at different places may be affected by different systematic effects. Aside from making mistakes (such as thinking one is using the x10 scale, and actually using the x100 scale), the reason why experiments sometimes yield results which may be far outside the quoted errors is because of systematic effects which were not accounted for.
Random errors are errors that fluctuate from one measurement to the next. They yield results distributed about some mean value. Random errors can occur for a variety of reasons such as:
- Lack of equipment sensitivity. An instrument may not be able to respond to or indicate a change in some quantitiy that is too small or the observer may not be able to discern the change.
- Noise in the measurement. Noise is extraneous disturbances that are unpredictable or random and cannot be completely accounted for.
- Imprecise definition. It is difficult to exactly define the dimensions of a object. For example, it is difficult to determine the ends of a crack with measuring its length. Two people may likely pick two different starteing and ending points.
Random errors displace measurements in an arbitrary direction whereas systematic errors displace measurements in a single direction. Some systematic error can be substantially eliminated (or properly taken into account). Random errors are unavoidable and must be lived with.
Many times results are quoted with two errors. The first error quoted is usually the random error, and the second is the systematic error. If only one error is quoted it is the combined error. (In quadrature as described in the section on propagation of errors.)
A good example of "random error" is the statistical error associated with sampling or counting. For example, consider radioactive decay which occurs randomly at a some (average) rate. If a sample has, on average, 1000 radioactive decays per second then the expected number of decays in 5 seconds would be 5000. A particular measurement in a 5 second interval will, of course, vary from this average but it will generally yield a value within 5000 +/- 35 decays. The range of error in this case is 70 decays, which is the square root of 5000 Behavior like this, where the error (Dn) is 70.7 is the squareroot of the expected value,
is called a Poisson statistical process. Typically if one does not know it is assumed that,
in order to estimate this error.
A. Mean Value
Suppose an experiment were repeated many, say N, times to get,
N measurements of the same quantity, x. If the errors were random then the errors in these results would differ in sign and magnitude. So if the average or mean value of our measurements were calculated,
some of the random variations could be expected to cancel out with others in the sum. This is the best that can be done to deal with random errors: repeat the measurement many times, varying as many "irrelevant" parameters as possible and use the average as the best estimate of the true value of x. (It should be pointed out that this estimate for a given N will differ from the limit as the true mean value; though, of course, for larger N it will be closer to the limit.) In the case of the previous example: measure the height at different times of day, using different scales, different helpers to read the scale, etc.
Doing this should give a result with less error than any of the individual measurements. But it is obviously expensive, time consuming and tedious. So, eventually one must compromise and decide that the job is done. Nevertheless, repeating the experiment is the only way to gain confidence in and knowledge of its accuracy. In the process an estimate of the deviation of the measurements from the mean value can be obtained.
B. Measuring Error
There are several different ways the distribution of the measured values of a repeated experiment such as discussed above can be specified.
The maximum and minimum values of the data set, and , could be specified. In these terms, the quantity,
is the maximum error. And virtually no measurements should ever fall outside .
The probable error, , specifies the range which contains 50% of the measured values.
The average deviation is the average of the deviations from the mean,
For a Gaussian distribution of the data, about 58% will lie within .
For the data to have a Gaussian distribution means that the probability of obtaining the result x is,
where is most probable value and , which is called the standard deviation, determines the width of the distribution. Because of the law of large numbers this assumption will tend to be valid for random errors. And so it is common practice to quote error in terms of the standard deviation of a Gaussian distribution fit to the observed data distribution. This is the way you should quote error in your reports.
It is just as wrong to indicate an error which is too large as one which is too small. In the measurement of the height of a person, we would reasonably expect the error to be +/-1/4" if a careful job was done, and maybe +/-3/4" if we did a hurried sample measurement. Certainly saying that a person's height is 5' 8.250"+/-0.002" is ridiculous (a single jump will compress your spine more than this) but saying that a person's height is 5' 8"+/- 6" implies that we have, at best, made a very rough estimate!
C. Standard Deviation
The mean is the most probable value of a Gaussian distribution. In terms of the mean, the standard deviation of any distribution is,
The quantity , the square of the standard deviation, is called the variance. The best estimate of the true standard deviation is,
The reason why we divide by N to get the best estimate of the mean and only by N-1 for the best estimate of the standard deviation needs to be explained. The true mean value of x is not being used to calculate the variance, but only the average of the measurements as the best estimate of it. Thus, as calculated is always a little bit smaller than , the quantity really wanted. In the theory of probability (that is, using the assumption that the data has a Gaussian distribution), it can be shown that this underestimate is corrected by using N-1 instead of N.
If one made one more measurement of x then (this is also a property of a Gaussian distribution) it would have some 68% probability of lying within . Note that this means that about 30% of all experiments will disagree with the accepted value by more than one standard deviation!
However, we are also interested in the error of the mean, which is smaller than sx if there were several measurements. An exact calculation yields,
for the standard error of the mean. This means that, for example, if there were 20 measurements, the error on the mean itself would be = 4.47 times smaller then the error of each measurement. The number to report for this series of N measurements of x is where . The meaning of this is that if the N measurements of x were repeated there would be a 68% probability the new mean value of would lie within (that is between and ). Note that this also means that there is a 32% probability that it will fall outside of this range. This means that out of 100 experiments of this type, on the average, 32 experiments will obtain a value which is outside the standard errors.
For a Gaussian distribution there is a 5% probability that the true value is outside of the range , i.e. twice the standard error, and only a 0.3% chance that it is outside the range of .
Suppose the number of cosmic ray particles passing through some detecting device every hour is measured nine times and the results are those in the following table.
Thus we have = 900/9 = 100 and = 1500/8 = 188 or = 14. Then the probability that one more measurement of x will lie within 100 +/- 14 is 68%.
The value to be reported for this series of measurements is 100+/-(14/3) or 100 +/- 5. If one were to make another series of nine measurements of x there would be a 68% probability the new mean would lie within the range 100 +/- 5.
Random counting processes like this example obey a Poisson distribution for which . So one would expect the value of to be 10. This is somewhat less than the value of 14 obtained above; indicating either the process is not quite random or, what is more likely, more measurements are needed.
------------------------------------------ 1 80 400 2 95 25 3 100 0 4 110 100 5 90 100 6 115 225 7 85 225 8 120 400 9 105 25 S 900 1500 ------------------------------------------
The same error analysis can be used for any set of repeated measurements whether they arise from random processes or not. For example in the Atwood's machine experiment to measure g you are asked to measure time five times for a given distance of fall s. The mean value of the time is,
and the standard error of the mean is,
where n = 5.
For the distance measurement you will have to estimate [[Delta]]s, the precision with which you can measure the drop distance (probably of the order of 2-3 mm).
Propagation of Errors
Frequently, the result of an experiment will not be measured directly. Rather, it will be calculated from several measured physical quantities (each of which has a mean value and an error). What is the resulting error in the final result of such an experiment?
For instance, what is the error in Z = A + B where A and B are two measured quantities with errors and respectively?
A first thought might be that the error in Z would be just the sum of the errors in A and B. After all,
But this assumes that, when combined, the errors in A and B have the same sign and maximum magnitude; that is that they always combine in the worst possible way. This could only happen if the errors in the two variables were perfectly correlated, (i.e.. if the two variables were not really independent).
If the variables are independent then sometimes the error in one variable will happen to cancel out some of the error in the other and so, on the average, the error in Z will be less than the sum of the errors in its parts. A reasonable way to try to take this into account is to treat the perturbations in Z produced by perturbations in its parts as if they were "perpendicular" and added according to the Pythagorean theorem,
That is, if A = (100 +/- 3) and B = (6 +/- 4) then Z = (106 +/- 5) since .
This idea can be used to derive a general rule. Suppose there are two measurements, A and B, and the final result is Z = F(A, B) for some function F. If A is perturbed by then Z will be perturbed by
where (the partial derivative) [[partialdiff]]F/[[partialdiff]]A is the derivative of F with respect to A with B held constant. Similarly the perturbation in Z due to a perturbation in B is,
Combining these by the Pythagorean theorem yields
In the example of Z = A + B considered above,
so this gives the same result as before. Similarly if Z = A - B then,
which also gives the same result. Errors combine in the same way for both addition and subtraction. However, if Z = AB then,
or the fractional error in Z is the square root of the sum of the squares of the fractional errors in its parts. (You should be able to verify that the result is the same for division as it is for multiplication.) For example,
It should be noted that since the above applies only when the two measured quantities are independent of each other it does not apply when, for example, one physical quantity is measured and what is required is its square. If Z = A 2 then the perturbation in Z due to a perturbation in A is,
Thus, in this case,
and not A 2 (1 +/- /A) as would be obtained by misapplying the rule for independent variables. For example,
(10 +/- 1)2 = 100 +/- 20 and not 100 +/- 14.
If a variable Z depends on (one or) two variables (A and B) which have independent errors ( and ) then the rule for calculating the error in Z is tabulated in following table for a variety of simple relationships. These rules may be compounded for more complicated situations.
Relation between Z Relation between errors
and(A,B) and (
) ---------------------------------------------------------------- 1 Z = A + B
2 Z = A - B
3 Z = AB
4 Z = A/B
5 Z = An
6 Z = ln A
7 Z = e A
1. Taylor, John R. An Introduction to Error Analysis: The Study of Uncertainties if Physical Measurements. University Science Books, 1982.
2. P.V. Bork, H. Grote, D. Notz, M. Regler. Data Analysis Techniques in High Energy Physics Experiments. Cambridge University Press, 1993.