8.7. NB6: Measurements and their uncertainty#

  • Learning objectives

  • Mean, standard deviation and standard uncertainty of repeated readings

  • Normal distribution

  • Agreement analysis

  • Outliers: Chauvenet’s criterium

  • Poisson distribution

  • Error propagation

  • Gaussian fit

  • Solutions to Exercises

8.8. Pre/Post-test#

This test is for testing your current skills in Python. You can use it in two ways:

  • pre-test: to test your skills beforehand. If you are already proficient in Python, and can do this test within approximately 15 minutes, you can scan through the notebook rather than carfefully reading each sentence.

  • post-test: test your skills after Notebook 6. Check whether you learned enough.

8.8.1. Diode#

In an experiment 10 measurements were taken of the voltage over and current through a Si-diode. The results are displayed in the following table together with their uncertainties:

I (μA)

\(\alpha_I\) (μA)

V (mV)

\(\alpha_V\) (mV)

3034

4

670

4

1494

2

636

4

756.1

0.8

604

4

384.9

0.4

572

4

199.5

0.3

541

4

100.6

0.2

505

4

39.93

0.05

465

3

20.11

0.03

432

3

10.23

0.02

400

3

5.00

0.01

365

3

2.556

0.008

331

3

1.269

0.007

296

2

0.601

0.007

257

2

0.295

0.006

221

2

0.137

0.006

181

2

0.067

0.006

145

2

The diode is expected to behave according to the following relation: \(I = a(e^{bV}-1)\), where \(a\) and \(b\) are unknown constants .

Exercise: Use the curve_fit function to determine whether this is a valid model to describe the dataset. Is the found parameter \(a\) in agreement with the value found in another experiment \(a_{2} = 2.0 \pm 0.5\) nA?

Hint: use a logscale on the y-axis

I = np.array([3034,1494,756.1,384.9,199.5,100.6,39.93,20.11,10.23,5.00,2.556,1.269,0.601,0.295,0.137,0.067])*1e-6
a_I = np.array([4,2,0.8,0.4,0.3,0.2,0.05,0.03,0.02,0.01,0.008,0.007,0.007,0.006,0.006,0.006])*1e-6
V = np.array([670,636,604,572,541,505,465,432,400,365,331,296,257,221,181,145])*1e-3
a_V = np.array([4,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2])*1e-3


def current(V,a,b):
    return a*(np.exp(b*V)-1)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 I = np.array([3034,1494,756.1,384.9,199.5,100.6,39.93,20.11,10.23,5.00,2.556,1.269,0.601,0.295,0.137,0.067])*1e-6
      2 a_I = np.array([4,2,0.8,0.4,0.3,0.2,0.05,0.03,0.02,0.01,0.008,0.007,0.007,0.006,0.006,0.006])*1e-6
      3 V = np.array([670,636,604,572,541,505,465,432,400,365,331,296,257,221,181,145])*1e-3

NameError: name 'np' is not defined

8.9. Learning objectives#

Now that we know how to do some basic Python, it is time to use it to understand features and the nature of physical measurements.

In this notebook, we will introduce you to measurements, variance in repeated measurements, and measurement uncertainties. If we do calculations with a value with an uncertainty, we end up with an answer that has an uncertainty as well… But how do we calculate this uncertainty? Moreover, we often do not calculate a value on a single set of repeated measurements but rather on datasets with various mean values of repeated measurements. How do we properly analyse these data? For each topic, a summary is given. What is new, compared to the previous notebooks, is that we have * assignments. These are not mandatory, but can be made to help you improve your knowledge of Python and data-analysis.

Now that we know how to do some basic Python, it is time to use it to understand features of physical measurements. It is very useful to study the chapter in the online manual as well!

After completing this notebook, you are able to:

  • calculate the mean, standard deviation and uncertainty of a dataset

  • report findings using the scientific conventions

  • understand and use Gaussian and Poisson distribution

  • identify outliers

  • carry out an agreement analysis

  • calculate how uncertainties propagate when calculations are performed

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit 
from scipy.stats import norm, poisson
from IPython.lib.display import YouTubeVideo
YouTubeVideo('8uj8rbIQlus')

What is, in science, the value of a value? In physics, this depends on how certain you are of this value. As quantities are determined using experiments, and these experiments are subject to error and uncertainties, the ‘true’ value can only be determined to some degree. We are never 100% sure of the exact value. Therefore, values are given with their uncertainty: \(g\) = 9.81 ± 0.01 m/s\(^2\). So how do we determine to what extent a value is certain?

8.9.1. 6.1 Mean, standard deviation and standard uncertainty of repeated readings#

If you measure a physical quantity with a proper instrument, variability in measurements is inevitable. When a certain value of a physical quantity \(x\) is determined \(N\) times, the best estimate for that value would be the average or mean value \((\mu(x) = \bar x)\). The average of a series of \(N\) measurements \(x_i\) is calculated by: $\( \mu(x) = \bar x = \frac{x_1+x_2+x_3+...+x_N}{N} = \frac{1}{N}\sum_{i=1}^N x_i\)\( However, how precise does this average reflect the real value? Would you be more confident in your result if the individual measurement are close together? To quantify the deviation in the measurements, we can calculate the *standard deviation*: \)\(\sigma(x) = \sqrt{\frac{\sum_{i=1}^N(x_i-\bar x)^2}{N-1}}\)\( As you can see, the standard deviation is a measure of the average distance from the measurements to the mean. We expect, with a certainty of roughly 2/3, that a next measurement is within \)\mu(x) \pm \sigma(x)\( or roughly 0.95 that a next measurement is within \)\mu(x) \pm 2\sigma(x)$.

This standard deviation does not really increase or decrease if more measurements are done (given a sufficient number at first). Therefore it is not a good measure of uncertainty in the result. The uncertainty is therefore defined to be given by $\(u(x) = \frac{\sigma(x)}{\sqrt {N}}\)\( This uncertainty is equal to the deviation of the mean. That is, if the measurement series is repeated, another mean value will be found. The uncertainty is the deviation of that mean. Roughly speaking, the probabilty that a repeated experiment will yield a mean in the interval \)\bar x \pm u(x) \approx 2/3$.

The final outcome of a measurement is to be reported as: measurement \(\pm\) uncertainty + unit. The number of significant digits of the result is determined by the uncertainty. When less than 1000 measurements are done, the uncertainty should have 1 significant figure. The number of decimals of the result should match the number of decimals of the uncertainty.

YouTubeVideo('Dl4AGXWjNfc')

Exercise 6.1
The results of eight titrated volumes (in ml) are: 25.8, 26.2, 26.0, 26.5, 25.8, 26.1, 25.8 and 26.3.

Calculate:

(i) the average,

(ii) the standard deviation,

(iii) the uncertainty in the mean of the volume,

(iv) present the best estimate of the true volume with its associated uncertainty.

# Answer exercise 6.1:
V = np.array([25.8, 26.2, 26.0, 26.5, 25.8, 26.1, 25.8, 26.3])
#Your code


print('The average is: ')

print('The standard deviation is: ')

print('The uncertainty in the mean of the volume: ')

print('The best estimate of the true volume is: ' )

Exercise 6.1*
You probably know the np.mean and the np.std commands. However you can also do them yourself using the definitions: $\( \overline{x} = \frac{1}{N} \sum_{i=1}^N x_i \)$

\[ \sigma_x = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \mu_x)^2 } \]

Use these on the following dataset and see if you get the same results as the np.mean and the np.std commands.
Data

Tree Number

Total Height (m)

1

14.02

2

14.15

3

15.27

4

11.81

5

14.22

6

20.52

7

18.2

8

16.92

9

16.28

10

20.8

11

4.5

Models of knot and stem development in black spruce trees indicate a shift in allocation priority to branches when growth is limited. Available from: https://www.researchgate.net/figure/Characteristics-of-the-10-sample-trees-in-the-dataset_tbl1_274717928 [accessed 24 Jul, 2020]

a Calculate in two different ways the mean value and the standard deviation.

b If you notice a difference between the np.std function and the manual function I would encourage you to look at the documentation of the np.std function Here.

c How would you as a physicist write down the result?

L = np.array([14.02,14.15,15.27,11.81,14.22,20.52,18.2,16.92,16.28,20.8,4.5])
#your code

Exercise 6.2
Below are eight resistor measurements, each based on ten repeated readings. However, only two measurements are adequately displayed.

Which ones?

(i) \((99.8 \pm 0.270) \cdot 10^3 \,\Omega\)

(ii) \((100 \pm 0.3) \cdot 10^3 \,\Omega\)

(iii) \((100.0 \pm 0.3) \cdot 10^3 \,\Omega\)

(iv) $(100.1 \pm 0.3) \cdot 10^3 $

(v) \(99.4 \cdot 10^3 \pm 36.0 \cdot 10^2 \,\Omega\)

(vi) \(101.5 \cdot 10^3 \pm 0.3 \cdot 10^1 \,\Omega\)

(vii) \((99.8 \pm 0.3) \cdot 10^3 \,\Omega\)

(viii) \(95.2 \cdot 10^3 \pm 273 \,\Omega\)

# Answer exercise 6.2
print('The measurements adequately displayed are: ')

Exercise 6.3
Round the following numbers displaying two figures and use scientific notation.

(i) 602.20

(ii) 0.00135

(iii) 0.0225

(iv) 1.60219

(v) 91.095

# Answer exercise 6.3
print('The correct notation of (i) is: ')
print('The correct notation of (ii) is: ')
print('The correct notation of (iii) is: ')
print('The correct notation of (iv) is: ')
print('The correct notation of (v) is: ')

So, let us look at what happens when we repeat our measurements. How does this change our best estimate of the true value, and how does repeating measurements decrease the measurement uncertainty?

Exercise 6.4
During an experiment, an automated reading of a certain quantity is carried out. The data is stored in the file ‘experimental_results.csv’.

a Load the data.

b Calculate the average value and the standard deviation. Plot as well the histogram of the readings.

# your code for 6.4 a and b.

We expect that the average value converges to a certain value (the true value) when we increase the number of repeated readings.

c Plot the average value as function of the number of repeated readings. Thereto, make an array in which the average value as function of the number of repeated readings is stored.

We also expect that the uncertainty in the average value decreases with an increasing number of repeated readings.

d Plot the uncertainty in the average value as function of the number of repeated readings. Use a log-log scale.

# your code for 6.4 c and d.

8.9.2. 6.2 Normal distribution#

A measurement \(M\) can be interpreted as the physical quantities’ true value \(G\) and some unwanted noise \(s\): \(M = G + s\). In many cases, this noise is normally distributed. For a normal distribution with mean \(\mu\), and standard deviation \(\sigma\), the probability density function is given by $\(p(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{-(x-\mu)^2/2\sigma ^2}\)$

In general, a probability density \(p(x)\) is a function that has the property that the probabilty that a random variable \(X\) has a value smaller than \(x\) is given by: $\(\mathbb{P}[X<x] = \int_{-\infty}^{x}p(\tilde{x}) d \tilde{x}\)$

The normal distribution is sometimes called a Gaussian distribution, and its density function is sometimes called a Bell curve. Unfortunately, it is not possible to analytically solve the integral of the density of the normal distribution. Luckily, good numerical approximations of this integral, that is also called the errorfunction \(Erf(x,\mu, \sigma)\), exist. In python you can use scipy.stats.norm.pdf for the probabilty density function, and scipy.stats.norm.cdf for the error function. [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html].

Exercise 6.5
Bags of rice with indicated weight of 500 g are sold. In practice, this weight is normally distributed with an average weight of 502 g and a standard deviation of 14 g.

(a) What is the chance that a bag of rice weights less than 500 g?

(b) If we would buy 1000 bags of rice, how many can be expected to weigh at least 530 g? Use the error function, Erf(\(x\); \(\bar{x}\), \(\sigma\)) or scipy.stats.norm.cdf to compute this expectation.

# Answer exercise 6.5

print('(a)  P = ... ')
print('(b)  number = ...')
from scipy.stats import norm

x = np.linspace(-10,10,100)
plt.plot(x, norm.pdf(x, 0,1))
plt.plot(x, norm.cdf(x,0,1))
plt.show()

Exercise 6.6
In the movieclip below you see a blinking LED. Here we investigate how long the LED is ON.

(a) Take 15 (3 sets of 5) repeated readings of the time that the LED is ON. Note these measurements in the array below. Also ask the dataset of someone else so you have 6x5 repeated readings.

(b) Provide a reason for a potential systematic error and two reasons for potential random errors.

(c) For each of set of five repeated readings calculate the average time \(\mu(t)\) the LED is on and the associated standard deviation \(\sigma(t)\) and the uncertainty \(u(t)\).

The average time found is probably not the same each time. We can calculate the standard deviation of the mean of each set of five readings.

(d) Calculate the standard deviation of the mean.

(e) Compare the standard deviation of the mean with the mean of the uncertainty of the 6 sets. Are these two values comparable?

(f) Provide the best estimate of the time \(\mu(t) \pm u(t)\) that the LED is on. Use the scientific conventions for presenting your result.

YouTubeVideo('wEtzi3oTLCk')
# Answer exercise 6.6
Measurements = np.array([[,,,,],\
                     [,,,,],\
                     [,,,,],\
                     [,,,,],\
                     [,,,,],\
                     [,,,,]]) #s


Measurements_av = np.array([np.mean(),\
                       np.mean(),\
                       np.mean()])
YouTubeVideo('cXcfwthH1SE')

8.9.3. 6.3 Agreement analysis#

If we have found a certain value and want to compare it with a known value, when do these values agree with each other? Certainly this depends on the average value found. But in the two figures below, we can also see that this depends on how (un)certain our determined values are. The first values might be in agreement with each other, as their pdf’s overlap. But the second values certainly are not.

To see whether two values \(P\) and \(Q\) are in agreement, we check whether their difference in values are less than twice their combined uncertainties: \(|P-Q|<2\sqrt{u(P)^2+u(Q)^2} \).

Exercise 6.7
For a certain physics quantity \(E\), Eva has found a value of 14.5\(\pm\)0.2. The literature value of this quantity is 14.9\(\pm\)0.1.

a) Check whether these values are in agreement with each other.

# Answer exercise 6.7a

b) Write a function that makes use of four inputvariables, namely two quantities with their uncertainties. The function should return whether the values are in agreement with each other.

# Answer exercise 6.7b

8.9.4. 6.4 Outliers: Chauvenet’s criterium#

In a repeated measurement series it is possible that there is an outlier, that deviates largely from the average. You may have made an error while performing the measurement, or it may be caused by random noise. However, it is not allowed to just selectively discard datapoints. Chauvenet’s criterium provides a rule of thumb on when to remove a datapoint, arguing that that a statistical fluctuation is unlikely and there has likely been made a mistake. A datapoint may be removed if $\(2N \cdot P_{\text{outlier}} < 0.5\)\( where \)P_{\text{outlier}} = Erf(x_{\text{outlier}}, \bar x, \sigma)\( when the outlier is below average, and \)P_{\text{outlier}} = 1-Erf(x_{\text{outlier}}, \bar x, \sigma)$ else.

Exercise 6.8
Erin has conducted seven measurements on a capacitor (in \(\mu F\)): 45.7, 53.2, 48.4, 45.1, 51.4, 62.1 and 49.3. The sixth measurement seems suspicious. Use Chauvenet’s criterium to decide whether she should discard that measurement.

# Answer exercise 6.8
C = np.array([45.7, 53.2, 48.4, 45.1, 51.4, 62.1, 49.3])

print('The sixth measurement should (not) be discarded, because ... ')

Exercise 6.8*
In excercise 6.1* you looked at different tree heights. One height stood out.

Would you be allowed to discard the measurment of height 4.5 m? Give a supporting and a counter argument.

8.9.5. 6.5 Poisson distribution#

Another widely used probability distribution is the Poisson distribution. Many situations that involve counting are described by the poisson distribution, such as radioactive decay or the number of people in a queue.

In a poisson process with parameter \(\lambda\), the probabilty that \(k\) events occur is: $\(\mathbb{P}[X = k] = \frac{\lambda^k}{k!}e^{-\lambda}\)\( The mean counts would be \)\lambda\( and the standard deviation \)\sigma = \sqrt{\lambda}$.

Exercise 6.9
In an experiment with a radioactive source, the number of counts in a specific timeinterval is measured 58 times. This resulted in the following distribution of the counts \(N\):

N (counts)

Frequency

1

1

2

0

3

2

4

3

5

6

6

9

7

11

8

8

9

8

10

6

11

2

12

1

13

1

Calculate:

(a) the total number of counts;

(b) the average number of counts \(\mu\);

(c) the standard deviation \(\sigma\).

If the experiment is to be repeated again using 58 measurements

(d) calculate the expected number of measurements with five or less counts.

# Answer exercise 6.9

print('(a) The total number of counts ... ')
print('(b) The average number of counts ... ')
print('(c) The standard deviation is ... ')
print('(d) The expected number of measurements with five or less counts is ... ')

Exercise 6.9*
A shop owner wants to know how many people visit his shop. He installs a device that counts the number of people that enter the shop every minute. In total 1000 measurements were done.

a Import the person_count.dat file.

b Determine the average value as well as the standard deviation.

c Plot the data in a histogram. What kind of distribution do you think the data is best described by, and why? Plot the distribution function on top of the histogram to check.

8.9.6. 6.6 Error propagation #

Let’s say you have determined a value in an experiment. When you want to use it in an equation, the uncertainty propagates to the new calculated quantity. In general, we use two methods: the functional approach and the calculus approach.

Functional approach:
For a quantity \(Z(A)\) that depends on \(A\) we have: $\(u(Z) = \frac{Z(A+u(A)) - Z(A-u(A))}{2}\)$

Calculus approach:
For a quantity \(Z(A)\) that depends on \(A\) we have: $\(u(Z) = \frac{d Z}{d A} u(A)\)$

Multiple variables:
For a quantity \(Z(A,B)\) that depends on multiple variables, the uncertainty is calculated by taking the square root of the sum of squared individual uncertainties:

Functional: $\(u(Z) = \sqrt{(\frac{Z(A+u(A)) - Z(A-u(A))}{2})^2 + (\frac{Z(B+u(B)) - Z(B-u(B))}{2})^2}\)\( **Calculus:** \)\(u(Z) = \sqrt{(\frac{\partial Z}{\partial A} u(A))^2 + (\frac{\partial Z}{\partial B} u(B))^2}\)$

YouTubeVideo('Qbwikf7TqqI')
YouTubeVideo('Lb6lhWIxqhg')

Exercise 6.10
Give the most important steps in your calculations.

The value of a physical quantity \(A\) is determined to be \(A = 9.274 \pm 0.005\). Calculate the value of the physical quantity \(Z\) and its uncertainty \(\alpha_{Z}\) using the calculus approach and the functional approach, when \(Z\) depends on \(A\) as follows:

(a) \(Z = \sqrt{A}\)

(b) \(Z = \exp⁡{(A^{2})}\)

# Answer exercise 6.10
# add your calculations here
# ...
print('(a) Z =  ... ± ... ')
print('(b) Z =  ... ± ... ')

Exercise 6.11
Give the most important steps in your calculations.

Three physical quantities have been determined to be \(A = 12.3 \pm 0.4\), \(B = 5.6 \pm 0.8\) en C = \(89.0 \pm 0.2\). Calculate the value of the physical quantity \(Z\) and its uncertainty \(u(Z)\) using the calculus approach and/or the functional approach, when \(Z\) depends on \(A\), \(B\), and \(C\) as follows:

(a) \(Z = A - B\)

(b) \(Z = \frac{A B }{C}\)

(c) \(Z = \exp{⁡(\frac{A B}{C})}\)

# Answer exercise 6.11
# add your calculations here

print('(a) Z =  ... ± ... ')
print('(b) Z =  ... ± ... ')
print('(c) Z =  ... ± ... ')

Exercise 6.12
You have both seen the functional and the analytical method for calculating the propagation of an error.

Apply both methods on the following functions: $\(Z(x)=\frac{x-1}{x+1}\)$

With \(x = 3.2 \pm 0.2\) $\( Z(x)=e^{x^2} \)$

With \( x = 8.745 \pm 0.005\)

# Solution 6.12
# add your calculations here

8.9.7. 6.7 Gaussian Fit #

In Notebook 5 (section 5.3.5 Residuals) we have already seen that residuals that are the result of noise in the measurements should typically be normally distributed. To verify if the residuals are indeed normally distributed one can make a histogram of these residuals and compare it with the related normal distribution, as will be done in the following exercise.

Exercise 6.13

Interest in ratios of the human body is of all times. A linear relation between height and weight is suspected. In this exercise, we examine this relation.
Attached is a csv file which contains the height (first column) and the weight (second column) of a (male) person.

a Convert the data to metric units as it is now in inches and in lbs.

b Make a scatter plot and investigate whether there is a linear relation.

c Determine the mean and std of both the height and weight.

d Make a histogram for both and overlay a Gaussian to see if the data follows a normal distribution.

#import data
H_W_data = np.genfromtxt('weight-height.csv',delimiter=',',dtype=float,skip_header=1)
height = []
weight = []

#Convert to metric units


#Calculate mean and std


#Linear function to fit


#Curvefit


#Scatter plot
plt.figure(figsize=(12,4))


plt.show()


#Histogram plot

8.9.7.1. Additional Exercises#

Exercise 6.14 Fitting with a measurement error

When measuring there is always a very real possibility of a systematic error. One of these systematic erros can be found in a mass-springsystem. Normally the period of a mass-spring system is given by: \(T = 2\pi \sqrt{\frac{m}{C}}\). Here \(m\) is the mass and \(C\) is the spring constant. However this formula assumes that you have a massless spring, this is not true unfortunately. This means that the mass of the spring is also vibrating, we should thus change the formula to take this into account. This gives the following equation: \(T = 2\pi \sqrt{\frac{m + \Delta m}{C}}\), where \(\Delta m\) is the systematic error.

With the measurements that we have we can find both the spring constant and its uncertainties. The array m is an array with the values for the measured m and the array T is an array with all the measured data for the period. You can disregard the invalid use of significant figures.

a Plot the data

b Find the parameters \(\Delta m\) and \(C\) with its corresponding uncertainties

c Plot the fitted function over the data and examine the residuals

m = np.array([50, 100, 150, 200, 250, 300])
T = np.array([2.47, 3.43, 4.17, 4.80, 5.35, 5.86])

Exercise 6.15 Gravitational force

The gravitational force between two bodies can be described with Newton’s law of universal gravitation: \(F = \frac{Gm_1m_2}{r^2}\), where \(G\) is the gravitational constant, \(m_i\) the masses of the bodies and \(r\) the distance between the bodies.

Suppose that a meteorite of mass \((4.739\pm0.154)\cdot10^8\)kg at a distance of \((2.983\pm0.037)\cdot10^6\)m is moving towards the earth.

Exercise: Determine the attracting force between the meteorite and earth. Use both the functional and the calculus approach to calculate the uncertainty in \(F\) and compare the results. You can use the following values:

  • Earth mass: \((5.9722\pm0.0006)\cdot10^{24}\)kg

  • Gravitational constant: \((6.67259\pm0.00030)\cdot10^{-11}\)m\(^3\) s\(^{-2}\) kg\(^{-1}\)

#function for gravitational force
def FG(G,m1,m2,r):
    
#values
G = 6.6759e-11
u_G = 0.00030e-11

m1 = 4.739e8
u_m1 = 0.154e8

m2 = 5.9722e24
u_m2 = 0.0006e24

r = 2.983e6
u_r = 0.037e6

#value of gravitatonal force


#Calculus approach


#functional approach

Exercise 6.16 A mass-spring system

A student measures the position of a simple mass-spring system. Unfortunately, he accidently moves his measuring device during the experiment. He is not sure if the device was put back in the right position and wants to know if there is a systematic error in his data. The dataset consists of 400 position measurements (in cm) over the course of 5 seconds. The data is expected to follow a sine function with an amplitude of 4.5 cm and a period of 0.4 s.

a Import the meting_massaveer.dat file.

b Plot the raw data, calculate and plot the residuals, and use it to determine if there is a systematic error. If so, determine the size and time of the shift.

m_C_data = np.genfromtxt('meting_massaveer.dat')
t = np.linspace(0,5,400)

def sine(x):                        #sine function to fit the data
      return 4.5*np.sin(2*np.pi*2.5*x)

#plot of data and sine function

#calculate residuals

#plot of Residuals

8.9.8. 6.8 Solutions to Exercises#

# Solution 6.1
# Data
V = np.array([25.8, 26.2, 26.0, 26.5, 25.8, 26.1, 25.8, 26.3])

# Edit
Vmean = np.mean(V)
Vstd = np.std(V,ddof=1)
Verr = Vstd/np.sqrt(len(V))

#Print
print('(a) The average is: {} ml' .format(round(Vmean,2)))
print('(b) The standard deviation is: {} ml'.format(round(Vstd,2)))
print('(c) The uncertainty in the mean of the volume: {} ml'.format(round(Verr,2)))
print('(d) The best estimate of the true volume is: {:.2f} \u00B1 {:.2f} ml'.format(Vmean,Verr))
# Solution 6.1*
# add your calculations here

#calculate mean and std 'manually'
Tree_av = np.sum(L)/len(L)
Tree_std = np.sqrt(np.sum((L-Tree_av)**2)/(len(L)-1))

print('The best estimate for the height %.0f +/- %.0f m' %(Tree_av, Tree_std/np.sqrt(len(L))))

#Compare values with numpy functions, if true, nothing happens
assert Tree_av == np.mean(L), 'The results for the mean are different'
assert Tree_std == np.std(L,ddof=1), 'The results for the standard deviation are different'
# Solution 6.2
opg2_1 = 'iii'
opg2_2 = 'vii'
print('The measurements adequately displayed are: {} en {}'.format(opg2_1,opg2_2))
#Solution 6.3
#Data
numbers = np.array([602.20,0.00135,0.0225,1.60219,91.095])

#Print
print('The correct notation of (a) is: {:.1e}'.format(numbers[0]))
print('The correct notation of (b) is: {:.1e}'.format(numbers[1]))
print('The correct notation of (c) is: {:.1e}'.format(numbers[2]))
print('The correct notation of (d) is: {:.1e}'.format(numbers[3]))
print('The correct notation of (e) is: {:.1e}'.format(numbers[4]))
#Solution 6.4
data = np.loadtxt('experimental_results.csv')

N = len(data)
mean_val = np.mean(data)

plt.hist(data,bins=np.linspace(0,2*mean_val,50))
plt.xlabel('Value (-)')
plt.ylabel('Counts (-)')
plt.show()

# Calculation of mean and standard deviation
mean_array = np.zeros(N)
sigma_array = np.zeros(N)

for i in range(N):
    mean_array[i] = np.mean(data[:i+1])
    sigma_array[i] = np.std(data[:i+1], ddof=1)/(i+1)
    
# Plotting
plt.figure()
plt.plot(mean_array)
plt.axhline(mean_val,linestyle='dashed',color='grey')
plt.grid()
plt.xlabel('Data length (-)')
plt.ylabel('Average (-)')
plt.show()

plt.figure()
plt.loglog(sigma_array)
plt.grid()
plt.xlabel('Data length (-)')
plt.ylabel('Uncertainty (-)')
plt.show()
#Solution 6.5

# Input
mu = 502                   # mean
sigma = 14                 # standard deviation

# Values to be measured
x1 = 500
x2 = 530

# Calculation
Pa = norm.cdf(x1,mu,sigma)      # calc: Pa = 0.44320150
Pb = 1 - norm.cdf(x2,mu,sigma)  # calc: Pb = 1 - 0.97724987
aantal = Pb * 1000

#Print
print('(a) P = {:.4f}'.format(Pa))
print('(b) Expected amount = {:.2f}'.format(aantal))
#Solution 6.6

# a. Each series of measurements consists of five repeated measurements.
Measurements = np.array([[4.560,4.470,4.620,4.570,4.570],\
                     [4.570,4.540,4.700,4.510,4.150],\
                     [4.540,4.570,4.550,4.430,4.460],\
                     [4.470,5.130,4.490,5.150,4.530],\
                     [4.400,5.050,4.490,5.080,4.530],\
                     [4.530,5.130,4.560,4.910,4.560]]) #s

# b
# Random error: reaction time, teacher walks in front of the setup
# Systematic error: inertia of lamp in turning off and on (the lamps keeps glowing for a certain amount of time)

# c. The average of the measurement series
Measurements_av = np.array([np.mean(Measurements[0,:]),np.mean(Measurements[1,:]),\
                       np.mean(Measurements[2,:]),np.mean(Measurements[3,:]),\
                       np.mean(Measurements[4,:]),np.mean(Measurements[5,:])])


print('The average times, in s, are: ', np.round(Measurements_av,1))


# Standerd deviation of the measurements series
Measurements_std = np.array([np.std(Measurements[0,:],ddof=1),np.std(Measurements[1,:],ddof=1),\
                         np.std(Measurements[2,:],ddof=1),np.std(Measurements[3,:],ddof=1),\
                         np.std(Measurements[4,:],ddof=1),np.std(Measurements[0,:],ddof=1)])

print('The standard deviation in the times, in s, are: ', np.round(Measurements_std,1))
print('')


# d. Standard deviation in the mean
std_van_gem = np.std(Measurements_av,ddof=1)
print('The standard deviation in the mean is: ', np.round(std_van_gem,1), 's')


# Uncertainty in the measurement series
Measurements_unc = Measurements_std/np.sqrt(5)
print('The uncertainty in each measurement series: ', np.round(Measurements_unc,2), 's')
print('The mean uncertainty: ', np.round(np.mean(Measurements_unc),2), 's')
print()


# e. Final answer
print('The time during which the lamp is on: ', round(np.mean(Measurements),2), '+/-', round(np.std(Measurements,ddof=1)/np.sqrt(30),2),'s')
#Note: We act like the the 6 measurement series are just one big data set
# Solution 6.7

E = 14.5
u_E = 0.2
L = 14.9
u_L = 0.1

if (E-L) > 2*np.sqrt(u_E**2+u_L**2):
    print('No agreement')
else:
    print('The values are in agreement')
#Solution 6.8

#Data
Q = np.array([45.7, 53.2, 48.4, 45.1, 51.4, 62.1, 49.3])
N = len(Q)        # Total number of measurements

#Number to be determined
n = 6             # Number of measurement

#Editing
Qmean = np.mean(Q)
Qstd = np.std(Q,ddof=1)
Qerr = Qstd/np.sqrt(N)
Pout = 1-norm.cdf(Q[n-1],Qmean,Qstd)

#Print
print('The average is: {:.4f} uC'.format(Qmean))
print('The standard deviation is: {:.4f} uC'.format(Qstd))
print('The standard error is: {:.4f} uC'.format(Qerr))
print('Pout = {:.4f}'.format(Pout))
if 2*Pout*N < 0.5:
    print('Measurement {} has to be excluded'.format(n))
else:
    print('Measurement {} cannot be excluded'.format(n))
#Solution 6.8*
print('The 4.5 m may not be a faulty measurement, and can therefore not be discarded.')
print('The 4.5 m deviates from the other tree heights and may be another type or may be planted later.')
#Solution 6.9

#Data
N = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13])
f = np.array([0,1,0,2,3,6,9,11,8,8,6,2,1,1])

#Calculations (a)
N_tot = sum(N*f)              # (a) Total number of counts
print('(a) The total number of counts is {:.0f}'.format(Ntot))
print()

#Calculations (b)
N_mean = sum(N*f)/sum(f)        # (b) Average number of counts
print('(b) The average number of counts is {:.2f}'.format(N_mean))
print()

#Calculations (c)
Nstd = np.sqrt(N_mean)              # (c) Standard deviation
Nerr = Nstd / np.sqrt(N_tot)

print('(c) The standard deiation is {:.2f}'.format(Nstd))
print()

#Calculations (d)
P=0
for i in range(6):
    P += N_mean**i*np.exp(-N_mean)/math.factorial(i)
print('(d) The expected number is {:.0f}'.format(P*58))
#Solution 6.9*

persons = np.genfromtxt("person_count.dat") #! aantal_peronen = persons
persons_mean = np.mean(persons)
persons_std = np.std(persons, ddof=1)

print('Average: %.3f \n Standard deviation: %.3f' %(persons_mean,persons_std))

num_bins = int(max(persons)-min(persons))

x = np.linspace(min(persons),max(persons),num_bins+1)
prob = poisson(persons_mean)

f,g,_ = plt.hist(persons,bins=num_bins,density=1)
plt.plot(x+.5,prob.pmf(x),'k.')
plt.xlabel('Number of visitors per minute')
plt.ylabel('Probability')
plt.show()

print('It looks like a Poisson distribution. It concerns a counting experiment, so that makes sense.')
# Solution 6.10

def func2a(a):
    z = np.sqrt(a)
    return z

def calc2a(a,ua):
    uz = np.abs( (1/2) * a**(-1/2) ) * ua
    return uz

def func2b(a):
    z = np.exp(a**2)
    return z

def calc2b(a,ua):
    uz = np.abs( 2 * a * np.exp(a**2) ) * ua
    return uz

a = 1.274
ua = 0.005

za = func2a(a)
uza_func = ( func2a(a+ua)-func2a(a-ua) ) / 2
uza_calc = calc2a(a,ua)
zb = func2b(a)
uzb_func = ( func2b(a+ua)-func2b(a-ua) ) / 2
uzb_calc = calc2b(a,ua)

# The number of significant figures is only given for demonstrative purposes
print('(a) functional: Z = {:.4f} \u00B1 {:.6f}'.format(za,uza_func))
print('      calculus: Z = {:.4f} \u00B1 {:.6f}'.format(za,uza_calc))
print('(b) functional: Z = {:.4f} \u00B1 {:.5f}'.format(zb,uzb_func))
print('      calculus: Z = {:.4f} \u00B1 {:.5f}'.format(zb,uzb_calc))
# Solution 6.11
# add your calculations here

# data
a = 12.3
ua = 0.4

b = 5.6
ub = 0.8

c = 89.0
uc = 0.2

# (a)
za = a - b
uza = np.sqrt( ua**2 + ub**2 )

# (b)
zb = a*b/c
uzb = zb * np.sqrt( (ua/a)**2 + (ub/b)**2 + (uc/c)**2 )

# (c)
zc = np.exp(a*b/c)
dzc_da = (b/c) * np.exp(a*b/c)
dzc_db = (a/c) * np.exp(a*b/c)
dzc_dc = (-a*b/c**2) * np.exp(a*b/c)
uzc = np.sqrt( dzc_da**2 * ua**2 + dzc_db**2 * ub**2 + dzc_dc**2 * uc**2 )

print('(a) Z = {:.1f} \u00B1 {:.1f}  (calculus)'.format(za,uza))
print('(b) Z = {:.1f} \u00B1 {:.1f}  (simplified calculus)'.format(zb,uzb))
print('(c) Z = {:.1f} \u00B1 {:.1f}  (calculus)'.format(zc,uzc))
# Solution 6.12
# add your calculations here
def func1(x):                      #first function
    return (x-1)/(x+1)

def func2(r):                      #second function
    return np.exp(r**2)

def functional(x,sig,f):           #function for the functional method
    return (f(x+sig) - f(x-sig))/2

#first function 
x = 3.2
sig = 0.2
error_functional = functional(x,sig,func1)
error_calculus = sig*((x+1)-(x-1))/(x+1)**2 

print('First function \n\
Value of Z: %.2f \n\
Error with functional approach: %.2f \n\
Error with calculus approach: %.2f \n\
' %(func1(x),error_functional,error_calculus))

#second function
r = 8.745
sig = 0.005
error_functional = functional(r,sig,func2)
error_calculus = sig*2*r*func2(r)  


print('Second function \n\
Value of Z: %.1e \n\
Error with functional approach: %.0e \n\
Error with calculus approach: %.0e \
' %(func2(x),error_functional,error_calculus))
#Solution 6.13
#import data
H_W_data = np.genfromtxt('weight-height.csv',delimiter=',',dtype=float,skip_header=1)
height = []
weight = []

for i in H_W_data:
    height.append(i[0]*2.54)   #convert to cm
    weight.append(i[1]*0.4536) #convert to kg

#Calculate mean and std
height_mean = np.mean(height)
height_std = np.std(height, ddof=1)
weight_mean = np.mean(weight)
weight_std = np.std(weight)

#Linear function to fit
def lin(x,a,b):
    return a*x + b

#Curvefit
val, cov = curve_fit(lin,height,weight)

x = np.linspace(145,205)
y = lin(x,val[0],val[1])

#Scatterplot
plt.figure(figsize=(12,4))
plt.plot(height,weight,'.',label='data')
plt.plot(x,y,label='fit')
plt.xlabel('Height (cm)')
plt.ylabel('Mass (kg)')
plt.xlim(min(x),max(x))
plt.show()

#Histogram plot
N=200
x2 = np.linspace(np.min(height),np.max(height),N)

plt.figure()
x1, y1, _ = plt.hist(height,bins=100)
dist = norm.pdf(x2,height_mean,height_std)
plt.plot(x2,dist*x1.max()/np.max(dist),'r')
plt.xlabel('Height (cm)')
plt.ylabel('Occurence')
plt.show()

#Histogram plot
N=50
x2 = np.linspace(np.min(weight),np.max(weight),N)

plt.figure()
x1, y1, _ = plt.hist(weight,bins=100)
dist = norm.pdf(x2,weight_mean,weight_std)
plt.plot(x2,dist*x1.max()/np.max(dist),'r')
plt.xlabel('Mass (kg)')
plt.ylabel('Occurence')
plt.show()
#Solution 6.14

m = np.array([50,100,150,200,250,300])
T = np.array([2.47,3.43,4.17,4.80,5.35,5.86])


#Define Function to fit
def Per(m,dm,C):
    return 2*np.pi*np.sqrt((m+dm)/C)

#Make figure
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(121)
ax.errorbar(m,T,linestyle='none',marker='o')
ax.set_xlabel('m [kg]')
ax.set_ylabel('T [s]')

#Make Fit
vals, uncmat = curve_fit(Per,m,T)

#Make linspace
linm = np.linspace(0.7*np.min(m),1.1*np.max(m),200)

#Plot the fitted line
ax.plot(linm,Per(linm,*vals))

#Set the x-limit
ax.set_xlim(np.min(linm),np.max(linm))

#Find the uncertainties
u_dm, u_C = np.sqrt(np.diag(uncmat))


#Print the values gotten and its uncertainties
print('dm = {:.1f} +/- {:.1f} m'.format(vals[0],u_dm))
print('C = {:.1f} +/- {:.1f} kg m/s^2'.format(vals[1],u_C))

#Make a residual analysis
r = T - Per(m,*popt)

#Plot the risiduals
ax2 = fig.add_subplot(122)
ax2.plot(m,r,linestyle='none',marker='o')
ax2.set_xlabel('m [kg]')
ax2.set_ylabel('R [s]')

#Make it a bit pretier
plt.tight_layout()
#Solution 6.15
#function for gravitational force
def FG(G,m1,m2,r):
    return G * m1 * m2 /(r*r)

#values
G = 6.6759e-11
u_G = 0.00030e-11

m1 = 4.739e8
u_m1 = 0.154e8

m2 = 5.9722e24
u_m2 = 0.0006e24

r = 2.983e6
u_r = 0.037e6

#value of gravitatonal force
F_m = FG(G,m1,m2,r)

#calculus appraoch
r2 = r**2
u_F2 = (m1*m2/r2*u_G)**2 + (G*m2/r2*u_m1)**2 + (G*m1/r2*u_m2)**2 + (-2*G*m1*m2/(r2*r)*u_r)**2
u_F_calc = np.sqrt(u_F2)

#Func
U_F2 = ((FG(G+u_G,m1,m2,r)-FG(G-u_G,m1,m2,r))/2)**2 + ((FG(G,m1+u_m1,m2,r)-FG(G,m1-u_m1,m2,r))/2)**2 + ((FG(G,m1,m2+u_m2,r)-FG(G,m1,m2-u_m2,r))/2)**2 + ((FG(G,m1,m2,r+u_r)-FG(G,m1,m2,r-u_r))/2)**2
u_F_func = np.sqrt(U_F2)

print("F = %.2f +/- %.2f 10^10 N \n" %(F_m/1e10,u_F_func/1e10))

diff = np.abs(u_F_calc-u_F_func)

#onzekerheid in veel decimale cijfers om verschil te tonen
print("Uncertainties \n\
Caluculus approach: %f 10^8 N \n\
Functional approach: %f 10^8 N \n\
Difference: %f 10^8 N" %(u_F_calc/1e8,u_F_func/1e8,diff/1e8))
#Solution 6.16
data = np.genfromtxt('meting_massaveer.dat')
t = np.linspace(0,5,400)

def sine(x):                        #sine function to fit the data
      return 4.5*np.sin(x*5*np.pi)

#plot of data and sine function
plt.figure(figsize=(12,4))
plt.title('Data')
plt.plot(t,data, '.', label='data')
plt.plot(t,sine(t), label='$4.5\sin(5\pi x$)')
plt.xlim(min(t),max(t))
plt.ylabel('Position (cm)')
plt.xlabel('Time (s)')
plt.legend()
plt.show()

R = data - sine(t)  #Residuals

#plot of Residuals
plt.figure(figsize=(12,4))
plt.title('Residuals')
plt.plot(t,R)
plt.ylabel('Position (cm)')
plt.xlabel('Time (s)')
plt.show()

plt.hist(R[:100])
plt.show()
# Solution (PRE/POST EXERCISE)
I = np.array([3034,1494,756.1,384.9,199.5,100.6,39.93,20.11,10.23,5.00,2.556,1.269,0.601,0.295,0.137,0.067])*1e-6
a_I = np.array([4,2,0.8,0.4,0.3,0.2,0.05,0.03,0.02,0.01,0.008,0.007,0.007,0.006,0.006,0.006])*1e-6
V = np.array([670,636,604,572,541,505,465,432,400,365,331,296,257,221,181,145])*1e-3
a_V = np.array([4,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2])*1e-3

def current(V,a,b):
    return a*(np.exp(b*V)-1)

popt, pcov = curve_fit(current,V,I)
x = np.linspace(0.1,0.7)
y = current(x,popt[0],popt[1])

plt.figure(figsize=(8,4))
plt.errorbar(V,I,xerr=a_V,yerr=a_I,linestyle='none',marker= '.',label='data')
plt.plot(x,y,label='fit')
plt.yscale('log')
plt.xlabel('Voltage (V)')
plt.ylabel('Current (A)')
#plt.legend()
plt.xlim(0.1,0.7)
plt.show()
print('The model is in good agreement with the measurements, it is therefore a valid model.')


# agreement analysis
print('a = %.3g +/- %.1g' %(popt[0], np.sqrt(pcov[0,0])))

a_2 = 2.0e-9
u_a_2 = 5e-10

if 2*np.sqrt(u_a_2**2 + pcov[0,0]) > np.abs(a_2 - popt[0]):
    print('The value of $a$ found in the experiment is in agreement with the value from theory,')
else:
    print('The value of $a$ found in the experiment does not agree with the value from theory,')