8.5. NB4: Scientific Computing in Python with Numpy#

  • Learning objectives

  • Numpy arrays

    • Indexing arrays

    • Slicing numpy arrays

    • Mathematical operations on arrays

  • Functions for creating numpy arrays

    • np.linspace()

    • np.arange()

    • Random numbers

    • Multidimensional arrays (matrices)

  • Numpy functions

    • Vectorisation” and fast code with numpy functions

  • Monte Carlo Simulation

  • Solutions to exercises

8.5.1. Pre/Post-test#

This test is for testing your current skills in Python. You can use it in two ways:

  • pre-test: to test your skills beforehand. If you are already proficient in Python, and can do this test within approximately 15 minutes, you can scan through the notebook rather than carefully reading each sentence.

  • post-test: to test your skills after Notebook 4. Check whether you learned enough.

8.5.2. Calculating a derivative#

Eric is asked to develop a tool for people that need to do some mathematical calculations. He is asked to write a function that calculates and plots the derivative of a given function. The derivative of a continuous function \(f(x)\) can be approximated by \(f'(x) = \frac{f(x+\epsilon)-f(x-\epsilon)}{2\epsilon}\) for some small value of \(\epsilon\). As Eric knows that the derivative of \(f(x) = \sin(x)\) is \(f'(x) = \cos(x)\), he uses this function to test whether his function works correct.

  • Make an array in the domain [0, 2*\(\pi\)] with 1e4 even spaced values.

  • Plot the function \(f(x)\) for this domain.

  • Write a function that calculates the derivative using the approach above.

  • Plot the graph of the derivative in the same figure.

  • Test the correctness of the function by using any other input function.

### Your Code

8.5.3. Learning objectives#

Numpy (Numerical Python) is a library designed for performing scientific computing in Python.

In this notebook, we will introduce numpy arrays, a data structure introduced in numpy for working with vectors and matrices. We will explore how to create them, how to manipulate them, and how to use them for efficient numerical calculations using numpy functions.

After completing this notebook, you are able to:

  • create (multidimensional) numpy arrays from a list of numbers

  • use indexing and slicing with (multidimensional) numpy arrays

  • iterate over a numpy array

  • perform mathematical operations on numpy arrays

  • use functions for creating arrays (eg. np.zeros(), np.linspace(), np.random.random()

  • use numpy functions for vectorized calculations

  • to demonstrate the speed increase of vectorized calculations using time()

import matplotlib.pyplot as plt
from IPython.lib.display import YouTubeVideo
YouTubeVideo('xECXZ3tyONo')

8.5.4. Numpy Arrays#

You have encountered numpy arrays in previous notebooks. If you don’t remember, open the first notebook and read the section where we introduce numpy arrays.

To use numpy arrays, we first need to import the numpy library, which we will do using the shortened name “np”:

import numpy as np

Now that we have imported numpy, we can use functions in numpy to create a numpy array. A simple way to do this is to use the function np.array() to make a numpy array from a comma-separated list of numbers in square brackets:

a = np.array([1,2,3,4,5])
print(a)
[1 2 3 4 5]

Note that numpy does not make a distinction between row vectors and column vectors: they are just vectors.

Look at the cell below, what is the difference with the cell above?

a = np.array([1,2,3,4,5],dtype=float)
print(a)
[1. 2. 3. 4. 5.]

8.5.4.1. Indexing arrays (and counting from zero)#

One useful thing about arrays is that you can access the elements of the array using square brackets:

a[n] will give you the n-th element of the array a.

This process of extracting a single element from the array is called indexing.

Note that here we encounter for the first time what is known as the python counting from zero convention. What is the counting from zero convention? In the example above, we created an array:

a = np.array([1,2,3,4,5])

The first element of a is 1. You might think that if you want to access the first element of a, you would use the notation a[1]. Right?

Let’s try it:

print(a[1])
2.0

WRONG! Why? Because the makers of Python decided to start counting from zero: the first element of a tuple a is actually a[0].

(This is a long-standing discussion among computer scientists, and the convention is different in many different languages. There are advantages and disadvantages of both, and even essays written about it…but in any case, Python chose to start arrays at zero.)

This also helps better understand the range() function: for example, to loop over all the elements in a, I can use this code:

for i in range(len(a)):
    n = a[i]
    print('a[%d] is %d' % (i,n))
a[0] is 1
a[1] is 2
a[2] is 3
a[3] is 4
a[4] is 5

Here the len function returns the length of the array a. As we saw before, Python has very smart for loops that can automatically iterate over many types of objects, which means we can also print out all the elements of our array like this:

for n in a:
    print(n)
1.0
2.0
3.0
4.0
5.0

In Python, if you try to index beyond the end of the array, you will get an error:

a[5]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 a[5]

IndexError: index 5 is out of bounds for axis 0 with size 5

(Remember: indexing starts at zero!)

Python also has a handy feature: negative indices count backwards from the end, and index -1 corresponds to the last element in the array!

a[-1]
a[-2]

We can also use indexing to change the values of elements in our array:

print(a)
a[2] = -1
print(a)

Exercise 4.1
Set the first three, and the last two, entries of the following array to zero:

a = np.array([1,2,3,4,5,6,7,8,9,10,11,32,55,78,22,99,55,33.2,55.77,99,101.3])

#some code to set the first three and last two entries to zero 

print(a)

8.5.4.2. Slicing numpy arrays#

Numpy arrays also support a special type of indexing called “slicing” that does not just return a single element of an array, but instead returns a whole part of array.

To do this, I put not just a single number inside my square brackets, but instead two numbers, separated by a colon :

a[n:m] will return a new tuple that consist of all the elements in a, starting at element n and ending at element m-1.

Let’s look at a concrete example:

a = np.array(range(10))
print(a)
print(a[0:5])

The notation a[0:5] has “sliced” out the first five elements of the array.

With slicing, you can also leave off either n or m from the slice: if leave off n it will default to n=0, and if you leave off m, it will default to the end of the array (also the same as m=-1 in Python indexing):

print(a[:5])
print(a[5:])

Also handy: you can can have Python slice an array with a “step” size that is more than one by adding another : and a number after that. Find out its operation using:

print(a[0:10:2])

Fun: you can also use negative steps:

print(a[-1:-11:-1])

And finally, unlike indexing, Python is a bit lenient (merciful) if you slice off the end of an array:

print(a[0:20])

Exercise 4.2
Slicing can also be used to set multiple values in an array at the same time. Use slicing to set first 10 entries of the array below to zero in one line of code.

a = np.array(range(20))+1
print(a)
#some code that sets the first 10 entries to zero

print(a)

Exercise 4.2*
The same exercise as 4.1 but now code in a smarter way.

#your code here 

8.5.4.3. Mathematical operations on arrays#

An advantage of using numpy arrays for scientific computing is the way they behave under mathematical operations. In particular, they very often do exactly what we would want them to do if they were a vector:

a = np.array([1,2,3,4,5])
print(2*a)
print(a+a)
print(a+1)
print(a-a)
print(a/2)
print(a**2)

What about if I multiply two vectors together?

In mathematics, if I multiply two vectors, what I get depends on if I use the “dot product” or the “outer product” for my multiplication:

https://en.wikipedia.org/wiki/Row_and_column_vectors#Operations

The “dot product” corresponds to multiplying a column vector by a row vector to produce a single number. The “outer product” (also called the “tensor product”) corresponds to multiplying the column vector by the row vector to make a matrix.

Question: If I type a*a, or more generally a*b, does Python use the inner or outer product?

It turns out: it uses neither! In Python, the notation a*a produces what is commonly called the “element-wise” product: specifically,

a*b = [a[0]*b[0] a[1]*b[1] a[2]*b[2] ...]

(Mathematically, this has a fancy name called the Hadamard product, but as you can see, despite the fancy name, it’s actually very simple…)

We can see this in action here:

print(a*a)

What if I actually want the dot product or the outer product? For that, Python has functions np.dot() and np.outer():

print(np.dot(a,a))
print(np.outer(a,a))

A useful mathematical operator is the cross product (https://en.wikipedia.org/wiki/Cross_product) where one calculates a vector which is perpendicular to vectors x and y.

x = np.array([1,0,0])
y = np.array([0,1,0])
z = np.cross(x,y)

Pretty much all operators work with numpy arrays, even comparison operators, which can sometimes be very handy:

print(a)
print(a>3)

Exercise 4.3
Generate a sequence of the first 20 powers of 2 in a numpy array (starting at \(2^0\)).

Your output should be an array \([2^0, 2^1, 2^2, 2^3, ...]\).

(Hint: Start with a numpy array created using an appropriate range function that makes an array [0,1,2,3,…])

# your code that makes the desired array

8.5.5. Functions for creating numpy arrays#

In numpy, there are also several handy functions for automatically creating arrays, see https://numpy.org/devdocs/reference/routines.array-creation.html. We provide some examples:

a = np.zeros(5)
print('zeros', a)
a = np.ones(5)
print('ones', a)
a = np.eye(5)
print('diagonal')
print(a)
a = np.tri(5)
print('ones below and on diagonal')
print(a)

8.5.5.1. np.linspace#

To automatically generate an array with linearly increasing values you can use np.linspace():

https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

np.linspace takes three arguments: the starting number, the ending number, and the number of points.

This is a bit like the range function we saw before, but allows you to pick the total number of points, automatically calculating the (non-integer) step size you need:

a = np.linspace(0,20,40)
print(a)
print()
print("Length is: ", len(a))
print("Step size is: ", a[1]-a[0])

Note that if we wanted to have a step size of exactly 0.5, we need a total of 41 points:

a = np.linspace(0,20,41)
print(a)
print()
print("Length is: ", len(a))
print("Step size is: ", a[1]-a[0])

Exercise 4.4
Generate an array that runs from -2 to 1 with 20 points using linspace.

a = your code
print(a)

8.5.5.2. np.arange()#

If we want to have more control on the exact spacing, we can use the np.arange() function. It is like range(), asking you for the start, stop, and step size:

a = np.arange(0,20,0.5)
print(a)
print()
print("Length is: ", len(a))
print("Step size is: ", a[1]-a[0])

Here, we already see a small quirk of arange: the stopcondition. It does not include this specified number (rather < than <=). If we want to get a range that stops at 20.0, we need to make the stop point any number a bit bigger than 20 (but smaller than our step size):

a = np.arange(0,20.00000001,0.5)
print(a)
print()
print("Length is: ", len(a))
print("Step size is: ", a[1]-a[0])

For this reason, I do not find myself using np.arange() very often, and mostly use np.linspace(). There are also several other useful functions, such as np.geomspace(), which produces geometrically spaced points (such that they are evenly spaced on a log scale).

Exercise 4.5
Generate a numpy array that has a first element with value 60 and last element 50 and takes steps of -0.5 between the values.

a = your code
print(a)

8.5.5.3. Random numbers#

Numpy can also generate arrays of random numbers:

a = np.random.random(40)
print(a)

This will generate uniform random numbers on the range of 0 to 1, but there are also several other random number generator functions that can make normally distributed random numbers, or random integers, and more.

Exercise 4.6
Generate a numpy array, rounded_grades that contains 300 random grades that have a distribution of a bell-shaped curve that might represent the final grades of the students in this course, with an average grade of 7.5 and a standard deviation of 1. Make sure your grades are rounded to a half point.

(You may find it useful have to look at the help of the np.random.normal() function.

(Because of the properties of a normal distribution a small pecentage of the grades may be above a 10, you may leave this for now.)

#Your code here that results in a numpy array rounded_grades
...some code...
print(rounded_grades)

Exercise 4.7
There are various ways in which you can analyse your grade distribution.

You can plot a histogram of the grade distribution. Make sure that the width of our histogram bars is 0.5, corresponding to our rounded grades.

import matplotlib.pyplot as plt

... some code ...

8.5.5.4. Multidimensional arrays (matrices)#

We have looked at 1D arrays especially. However, numpy also supports two-dimensional (or N-dimensional) numpy arrays, that can represent matrices. To make a 2D numpy array, you can use the zeros() function, for example, but with a two-entry list of numbers specifying the size N and M of the matrix:

m = np.zeros([10,10])
print(m)

For two dimensional matrices, the usual function len() is not enough to tell us about the shape of our matrix. Instead, we can use a property of the numpy matrix itself called its shape:

print(len(m))
print(m.shape)

Indexing two dimensional arrays works by using commas inside square brackets to specify the index of the first and second dimensions:

a = np.array(range(1,6))
m = np.outer(a,a)
print("Initial matrix:")
print(m)

# First index in the row number (counting from zero), second index in the column number
m[2,3] = -1 
print("\nAfter changing entry [2,3] to -1:")
print(m)

You can also use slicing to to assign values to an array from a vector, which can be a handy way to enter a matrix by hand:

m = np.zeros([3,3])
m[0,:] = [1,2,3]
m[1,:] = [4,5,6]
m[2,:] = [7,8,9]
print(m)

Similarly, slicing also can be used to extract rows, columns, or blocks of the matrix:

# A row
print(m[1,:])
# A column
print(m[:,1])
# Extract a block as a sub-matrix
print(m[1:,1:])

There are several functions for making matrices which you may find useful someday:

https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

including this one which is used often:

# The identity matrix
print(np.eye(10))
# A band diagonal matrix
print(np.eye(10,k=-1))

Exercise 4.8
Use Python to calculate the following matrix multiplication:

\[\begin{split} \begin{bmatrix} 1 & 1 & 0 \\ 0 & 2 & 1 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 3 & 0 \\ 3 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \end{split}\]

To perform this multiplication, you will need to use the matmul or dot routine of numpy:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

(For the nerds: do these two matrices commute?)

(For the real nerds: have your program check if they commute and inform the user!)

m1 = np.zeros([3,3])
some code to fill the matrix
m2 = np.zeros([3,3])
some code to fill the matrix

# Check the matrices  
print(m1)
print()
print(m2)
print()

product = some code
print(product)

do nerd stuff if you want...

We can do seemingly smart things using these ‘special’ vectors and matrices. For instance, what if we want to take the sum of all values in an array? We can use the dotproduct: $\( \begin{bmatrix} 1 & 1 & 1 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix} = 6 \)$

ones = np.ones(3)
array = np.linspace(1,3,3)
print(np.dot(one_diag,array))

Moreover, if we want to create a cummulative sum of values in an array, we can use the tri matrix:

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 1\cdot1 + 0\cdot2 + 0\cdot3 \\ 1\cdot1 + 1\cdot2 + 0\cdot3 \\ 1\cdot1 + 1\cdot2 + 1\cdot3 \end{bmatrix} = \begin{bmatrix} 1 \\ 3 \\ 6 \end{bmatrix} \end{split}\]
tri = np.tri(3)
print(np.dot(tri,array))

Another way to do so is to use a for-loop:

a = np.linspace(1,3,3)
sum_a = 0
cumsum = np.array([])

for x in a:
    sum_a += x
    cumsum = np.append(cumsum,sum_a)
    
print("The sum is", sum_a)
print("The cummulative sum is", cumsum)

Exercise 4.9

Below we have a set of repeated measurements. We want to understand how the average value of our repeated measurements evaluates over time, that is, how does the average value change as function of \(N\) repeated measurements? To investigate this: calculate and plot the average value as function of the number of measurements.

measurements = np.random.normal(5,1,20)
print(measurements)

#your code here

But wait… is there not a more direct way to do all of the above… Sure! We can make use of Numpy functions.

8.5.6. Numpy functions#

For many common (mathematical) operations, fuctions exists. For instance, we can calculate the average value of our measurements of above using a for loop, but also using the function np.average:

sum_meas = 0

for measurement in measurements:
    sum_meas += measurement

print(sum_meas/len(measurements))
print(np.average(measurements))

This is very handy: it saves us loads of thinking and typing! From the function name, it is also easy to understand what you are doing, making the code clearer and easier to read. However, the purpose of numpy functions is not only to save lots of typing: they also can often perform calculations MUCH faster than if you do program the calculation yourself with a for loop, as we will see in the next section.

Python also has many other useful functions for performing calculations using arrays:

# Calculate the standard deviation
print(np.std(measurements,ddof=1))
# The square root
print(np.sqrt(a))
# Numpy also has max and min, here is an example of min
a = np.linspace(-10,10,100)
print(np.min(a**2))

Good question for you to think about: why is the minimum value not zero? And what would I have to change above to get the code to return zero?

In addition to finding the minimum value in a vector, the function argmin can tell you where (what index number) the minimum is:

# Find the index number of the minimum of the array
i = np.argmin(a**2)
print(i)
print((a**2)[i])

Note also here that we used round brackets () around the a**2 in the print statement to be able to then index the resulting array a**2 array using square brackets [].

You can find the full list of mathematical numpy functions on the documentation website:

https://docs.scipy.org/doc/numpy/reference/routines.math.html

and the full list of all functions in the reference guide:

https://docs.scipy.org/doc/numpy/reference/index.html

Exercise 4.10

a) Create an array with values ranging from 14 to 42, skipping the odd numbers.

b) Make an array x that runs from 0 to 4 with 20 points, and calculate an array y whose entries are equal to the square root of the entries in x.

your code to make the requested arrays x and y
print(y)

8.5.6.1. Vectorisation” and fast code with numpy functions#

In the first example above, we showed two ways of calculating an average: one using a for loop, and one using the numpy function.

Functionally, they are equivalent: they do exactly the same thing.

A curious feature of Python is that if you use functions instead of coding loops yourself, often things are MUCH MUCH faster.

To show this quantitatively, we will use the time library to calculate the time it takes to find the average of a pretty big array using both techniques:

# The time() function from the time library will return a floating point number representing 
# the number of seconds since January 1, 1970, 00:00:00 (UTC), with millisecond or even microsecond
# precision
# 
# We will use this to make a note of the starting time and the ending time, 
# and then print out the time difference 
from time import time

# A pretty big array, 50 million random numbers
a = np.random.random(int(50e6))

# Set timer
t1 = time()

# Caluclate the average
avg = 0
for x in a:
    avg += x    
avg /= len(a)
t2 = time()
t_forloop = t2-t1
print("The 'for' loop took %.3f seconds" % (t2-t1))

t1 = time()
avg = np.average(a)
t2 = time()
t_np = t2-t1
print("Numpy took %.3f seconds" % (t_np))

# Now let's compare them
print("\nNumpy was %.1f times faster!" % (t_forloop/t_np))

Why is numpy so much faster? The reason is that Python is an interpreted language. In each of the steps of the for loop, the Python kernel reads in the next step it has to do, translates that into an instruction for your computer processor, asks the computer to perform the step, gets the result back, reads in the next step, translates that into a processor instruction, sends that as an instruction to the computer processor, etc, etc.

If we did the same test in a compiled programing language like C, there would be no difference if we used a library function or if we wrote our own for loop.

When you use smart functions in Python libraries, like (many of) those in numpy, numpy will actually use an external library compiled in a language like C or Fortran that is able to send all of the calculation in one step to your computer processor, and in one step, get all the data back. This makes Python nearly as fast as a compiled language like C or Fortran, as long as you are smart in how you use it and avoid having “manual” for loops for large or long calculations.

(For small calculations, Python coded for loops are perfectly fine and very handy!)

In the language of interpreted programmers, finding smart ways of getting what you need done using “compiled library functions” is often referred to as vectorisation.

Note that even normal mathematical operators are actually “vectorized functions” when they operate:

# This is actually a vectorized 'for' loop, it involves multiplying 50 million numbers by 5
b = 5*a

Here is a nice example of a vectorized way of counting the number of times the number ‘5’ occurs in a random sample of 100 integers between 0 and 20:

nums = np.random.randint(0,21,100)
print("There are %d fives" % np.sum(nums == 5))

To see how this works, we can look at the intermediate steps:

nums = np.random.randint(0,21,100)
print(nums)
print(nums == 5)
print("There are %d fives" % np.sum(nums == 5))

Note that in this case, np.sum() will convert the bool value True into 1 and False into 0 for calculating the sum, according the the standard convertion of bool types to int types. You can see this in action if you want using the function astype() that is built into numpy arrays:

print(nums == 5)
print((nums == 5).astype('int'))

Another neat feature that numpy has is that is can ‘vectorize’ normal Python functions so that they can take numpy functions and make them a bit faster (10-20%). This is done using the np.frompyfunc function. An example is given below.

def fac(N):
    if (N == 1) or (N == 0):
        return 1
    return N * fac(N-1)

#As you can see applying the following array does not work
N = np.arange(9)

try:
    print(fac(N))
    print("This works")
except:
    print("This does not work \n")
    
#Now if we make this a numpy function is will work 
#(the numbers in the frompyfunc represent the input and output values)

numpy_fac = np.frompyfunc(fac,1,1)


try:
    print(numpy_fac(N))
    print("This works")
except:
    print("This does not work")

8.5.7. Monte Carlo Simulation#

A commonly used method for numerical calculations is a Monte Carlo simulation (named after the (in)famous casino district). Simply stated, the method consists of generating a great number of random points and checking afterwards which points satisfy the boundary conditions of the calculation.

For example: to calculate the area of a circle with radius one, one can generate a great number of random points within a square of size 1 by 1, and check for each points whether they fall within the circle \(r < 1\). The area of the circle can then be obtained by multiplying the area of the square with the number of points that are within the circle divided by the total number of points.

This may sound a bit cumbersome, and of course in this example calculating \(\pi \cdot r^2\) is a lot faster. There are however many applications where the Monte Carlo method proves to be very useful; for example in calculating integrals that cannot (easily) be solved analytically.

In the next exercises, you are going to calculate the integral of the function \(f(x) = e^{x^2}\) in the interval [0,1] using the Monte Carlo method.

Exercise 4.11

First, make a plot to get an idea of what the function looks like.

What is the maximum value of \(f(x)\) in the interval [0,1]?

Make a (rough) estimate of the value of the integral by looking a the graph

import matplotlib.pyplot as plt

#First we define our function
def f(x):
    return np.exp(x**2)

Now we are going to generate random samples in a square that has an area that is greater than the integral that we want to calculate.

For this, we will use the np.random.uniform() function, which generates random float in the interval [a,b) - notice, this is an half-open interval so b is not included.

Moreover, it is wise to define the area by four points: \(x_a\), \(x_b\), \(y_a\) and \(y_b\).

Exercise 4.12

Complete the code below to generate a random sample of y-values in an appropriate range.

N = int(1e6)   #number of points that we are going to use in our calculation
x_a = 
x_b = 
y_a = 
y_b = 
#Generate samples:
random_x = np.random.uniform( , ,size=N)   
random_y = np.random.uniform( , ,size=N)

Next, we want to determine the number of points in our sample that fall within the area under the line f(x).

Exercise 4.13

Determine the number of points that satisfy this condition using a for loop and calculate the value of the integral.

from time import time

t1 = time()

### your code ###



solution_integral = 

print('The solution of the integral is %.6f' %(solution_integral))            
print('Time for calculation: %.3f s' %(time()-t1))
1.464291
The solution of the integral is 1.464291
Time for calculation: 5.187 s

We have a solution of the integral (much easier than on paper, right?)! But how do we know how accurate our solution is? First, compare your result to your estimate from the previous exercise. Does it make sense?

To make a substantiated claim about the accuracy of our solution we have to determine the uncertainty in our result. As we are basically performing a count, Poisson statistics tells us that the uncertainty in the counted number of points that satisfy the condition is the square root of the counted number: \(u(N_{counted}) = \sqrt{N_{counted}}\). We can then determine the uncertainty in the area using the calculus approach.

Let’s check our solution by comparing it to the answer given by another numerical method from the scipy library.

from scipy.integrate import quad

Area = np.abs(x_b-x_a)*np.abs(y_b-y_a)
err_solution = Area*np.sqrt(s)/N          #estimated error of our solution, which is the area of our random uniform box divided by the number of points used in the calculation times the poisson uncertainty

scipy_solution = quad(f,0,1)       #calculating integral using quad function from scipy, returns: value of integral and estimated error

abs_difference = np.abs(solution_integral-scipy_solution[0]) #determine absolute difference between solutions


assert abs_difference <= 2*np.sqrt(err_solution**2 + scipy_solution[1]**2), 'The results are not in agreement'  #check if the values are in agreement 

print('Our solution: %.3f +/- %.3f' %(solution_integral, err_solution))
print('Scipy solution: %.14f +/- %.14f' %(scipy_solution[0], scipy_solution[1]))
print('Difference between the solutions: %.6f' %(abs_difference))
Our solution: 1.464 +/- 0.002
Scipy solution: 1.46265174590718 +/- 0.00000000000002
Difference between the solutions: 0.001639

Looks pretty good!

Our method gives a reasonable answer, but as you can see the uncertainty is relatively high…If we want to improve our result we should use more points to lower the uncertainty (which scales with \(\sqrt{N}\)). However, the for loop we used is already quite slow. More points would mean an annoyingly long calculation.

We should therefore try to avoid the use of a for loop and instead use a comparison operator to speed up the calculation.

Exercise 4.14

Again, complete the code below and calculate the integral, this time without using a for loop

t1 = time()

s = np.sum(  <= )

solution_integral = 

print('The solution of the integral is %.6f' %(solution_integral))            
print('Time for calculation: %.3f s' %(time()-t1))

This is much faster!

Now lastly we want to plot our random generated points that are within the area underneath f(x) and those that are not in the same graph with a different colour. For this, we need the actual values of the points that satisfy the condition.

As you have seen before, a comparison operation on a numpy array returns a type boolean array:

array = np.array([4,2,6,8,5,4])

boolean_array = array > 4

print(boolean_array)

We can subsequently use this boolean array to get the values of the array that satisfy the condition:

new_array = array[boolean_array]

print(new_array)

Or, more simply:

new_array = array[array > 4]

print(new_array)

Exercise 4.15

Make a figure in which you plot:

  • the function f(x)

  • the random points that are within the area under the line f(x)

  • the random points that are outside this area (in a different colour)

We will generate new random samples with fewer points, otherwise we cannot distinguish them.

Don’t forget to label your axes!

random_x = np.random.uniform( , ,1000)
random_y = np.random.uniform( , ,1000)

#your code

Exercise 4.16

One way to ‘improve’ the quality of your data, making it less noisy, is the use of a moving average filter. The moving average filter calculates the average value of N elements, N often being 3 or 5. This smooth out any abrupt changes and allows a better focus on long term trends. Of course you will lose two elements of the entire dataset (think yourselves why).

The filtered data is stored in an array: \(F(i) = \frac{p(i)+p(i+1)+p(i+2)}{3}\), looped over all elements \(p(i)\). More precise: \(F(i)=\frac{1}{k}\sum_{i}^{i+k-1}p(i)\)

Z = np.arange(1,20)
#def moving_average_1(p, k):
#     F = np.empty(len(p)-(k-1))

print(Z)
print(moving_average_1(Z,3))

8.5.8. Solutions to exercises#

Exercise 4.1

a = np.array([1,2,3,4,5,6,7,8,9,10,11,32,55,78,22,99,55,33.2,55.77,99,101.3])

#some code to set the first three and last two entries to zero 

# A bit tricky, but these work
a[0] = 0
a[1] = 0
a[2] = 0

# We could count forwards (a[19] = 0 and a[20] = 0), but is much 
# easier to count backwards from the  end
a[-2] = 0
a[-1] = 0
print(a)

Exercise 4.2:

a = np.array(range(20))+1
print(a)
a[0:10]  =  0
print(a)

Exercise 4.2\(*\):

a = np.array([1,2,3,4,5,6,7,8,9,10,11,32,55,78,22,99,55,33.2,55.77,99,101.3])
a[-2:]=0
a[:2]=0
print(a)

Exercise 4.3:

n = np.array(range(21))
out = 2**n
print(out)

Exercise 4.4:

a = np.linspace(-2,1,20)
print(a)

Exercise 4.5:

a = np.arange(60,49.9,-0.5)
print(a)

Exercise 4.6:

raw = np.random.normal(7.5,1,300)
rounded_grades = np.round(raw*2)/2
print(rounded_grades)

Exercise 4.7:

import matplotlib.pyplot as plt
bins = np.arange(np.min(rounded_grades),np.max(rounded_grades)+0.5,0.5)

plt.figure()
plt.hist(rounded_grades, bins=bins)
plt.xlabel('Grade')
plt.ylabel('#')
plt.show()

Exercise 4.8:

m1 = np.zeros([3,3])
m1[0,:] = np.array([1,1,0])
m1[1,:] = np.array([0,2,1]) 
m1[2,:] = np.array([1,0,1]) 
m2 = np.zeros([3,3])
m2[:,0] = np.array([1,3,1])
m2[:,1] = np.array([3,1,1]) 
m2[:,2] = np.array([0,1,1]) 

# Check the matrices  
print(m1)
print()
print(m2)
print()

product = np.matmul(m1,m2)
print(product)

Exercise 4.9:

import numpy as np
import matplotlib.pyplot as plt

measurements = np.random.normal(5,1,20)
print(measurements)

average_n = np.cumsum(measurements)/np.arange(1,len(measurements)+1)

plt.figure()
plt.plot(np.arange(1,len(measurements)+1), average_n)
plt.xlabel('average')
plt.ylabel('n')
plt.show()

Exercise 4.10:

import numpy as np

array_skip = np.arange(14,44,2)
print(array_skip)


x = np.linspace(0,4,20)
y = np.sqrt(x)
print(y)

Excercise 4.11

import matplotlib.pyplot as plt

#First we define our function
def f(x):
    return np.exp(x**2)

x = np.linspace(0,1,1000)         #generate an array with x values

#make the figure
plt.figure(figsize=(6,4))
plt.plot(x,f(x))
plt.xlim(0,1)
plt.ylim(0)
plt.ylabel('f(x)')
plt.xlabel('x')
plt.show()

print('Maximum value of f(x) in the interval [0,1] is %.3f' %(np.max(f(x))))

The area under the graph is roughly equal to half of the total area of the square 1x3, i.e. 1.5.

Exercise 4.12

N = int(1e6)   #number of points that we are going to use in our calculation
x_a = 0
x_b = 1
y_a = 0
y_b = 3
#Generating our samples:
random_x = np.random.uniform(x_a,x_b,size=N)   
random_y = np.random.uniform(y_a,y_b,size=N)

Exercise 4.13

from time import time

t1 = time()

s = 0

for i in range(len(random_x)):
    if random_y[i] <= np.exp(random_x[i]**2):
        s+= 1

solution_integral = (x_b-x_a)*(y_b-y_a)*s/N

print('The solution of the integral is %.6f' %(solution_integral))            
print('Time for calculation: %.3f s' %(time()-t1))

Exercise 4.14

t1 = time()

s = np.sum(random_y <= np.exp(random_x**2))

solution_integral = 3*s/N

print('The solution of the integral is %.6f' %(solution_integral))            
print('Time for calculation: %.3f s' %(time()-t1))

Exercise 4.15

random_x = np.random.uniform(0,1,1000)
random_y = np.random.uniform(0,3,1000)

points_within_area = np.array([random_x[random_y <= np.exp(random_x**2)],random_y[random_y <= np.exp(random_x**2)]])
points_outside_area = np.array([random_x[random_y > np.exp(random_x**2)],random_y[random_y > np.exp(random_x**2)]])

plt.figure(figsize=(6,4))
plt.plot(points_outside_area[0],points_outside_area[1], '.',c='red')
plt.plot(points_within_area[0],points_within_area[1], '.', c='green')
plt.plot(x,f(x),linewidth=3,label='f(x)')
plt.xlim(0,1)
plt.ylim(0,3)
plt.ylabel('f(x)')
plt.xlabel('x')
plt.legend()
plt.show()

Exercise 4.16

def moving_avg(p, k):
    
    F = np.empty(len(p)-(k-1))
    
    for i in range(len(p)-(k-1)):
        F[i] = (1/k)*np.sum(p[i:i+k])
        
    return F

Z = np.arange(4,20)
print(Z)
print(moving_average(Z, 3))