#  Module 7: NumPy Arrays
## Chapter 7 from the Alex DeCaria textbook: 'NumPy Arrays'

The Numpy module, which is readily installed with my Python distributions, is designed to work with large data sets, particularly those with multiple dimensions. However, unlike Python lists and tuples, NumPy arrays cannot hold multiple data types. For example, a defined numpy array must be all floating numbers, strings, integers, ect....  Despite this rule, it is much more computationally efficient to work with NumPy arrays than with lists/tuples. For this lecture, we will learn how to: 
- Create arrays
- Review common NumPy data types
- Go over useful array functions
- Discuss array indexing and subsetting
- Learn how to reshape arrays
- Combine logical operators with arrays

**Note:** There are *ALOT* of things you can do with NumPy arrays, so for the purpose of time, we will not be abe to go over every function/trick related to NumPy arrays. *It will be up to you* to read the DeCaria book and review other online resources! This lecture is meant to give you the tools so that you can work with basic NumPy arrays.

**Before starting:** Make sure that you open up a Jupyter notebook session using OnDemand so you can interactively follow along with today's lecture! Also, be sure to copy this script into your atmos5340/module_7 subdirectory!

<br><br>


# Creating an array

**Opening files:** There are many ways one can create an array using NumPy. The most simple way to this is supply NumPy's array function and give it a list/tuple as its input. Before starting, you must load the NumPy module:

In [1]:
import numpy as np

then, we can create a 1D array by doing the following:

In [2]:
a = np.array([1,5,3,-6,-2,4,-9,2,2])

The array that we created will be all integers, since we only supplied it with integer values. 

In [3]:
type(a)

numpy.ndarray

In [4]:
type(a[0])

numpy.int64

However, we can also create an array and predefine the data type using the `dtype` agrument:

In [5]:
a = np.array([1,5,3,-6,-2,4,-9,2,2],dtype=np.float64)
print(type(a[0]))

<class 'numpy.float64'>


Here, you can see that the first element in array a is now a floating number.

<br>
Listed below are the data typs most commonly used within NumPy arrays:

>- `np.float`: Double percision (64 bit) floating point
>- `np.int64`: Double percision (64 bit) integer
>- `np.complex128`: Complex number, with a real and imaginary part that are each 64 bits
>- `np.bool_`: Boolean (True/False) data type. Note the underscore `_`. 

<br>

You can also create 2-D arrays (or other multidimensional arrays) by inserting nested lists/tuples as the input for `np.array` function...


In [6]:
a = np.array([[2,5],[1,-4]])
print(a)

[[ 2  5]
 [ 1 -4]]


Finally, you can also create arrays using single values:

In [7]:
a_zero = np.zeros((10,10))
print(a_zero)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


or

In [8]:
a_nan = np.empty((10,10))
a_nan[:] = np.nan 
print(a_nan)

[[nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]]


Personally, I like using NaN arrays when I want to fill in the data later on...

<br>

# Sequential arrays

There are 3 functions that are primarily used to generate sequential arrays in Python. These functions are `arange()`, `linspace()`, and `logspace()`. The `arange()` function behaves very similarly to the `range()` function in Python:


In [9]:
seq_array = np.arange(0,11,2)
print(seq_array)

[ 0  2  4  6  8 10]


In [10]:
seq_array = np.arange(11,0,-2)
print(seq_array)

[11  9  7  5  3  1]


And you get the idea... The data type of the array is determined by the input. So if all the inputs are integers, it will be an integer-type data array. Note that you can define the dtype argument when using the `np.arange` function.


<br>

The `linspace()` function allows the user to specify a begining and end value and the number of points to create:


In [11]:
seq_array = np.linspace(1,10,30)
print(seq_array)
len(seq_array)

[ 1.          1.31034483  1.62068966  1.93103448  2.24137931  2.55172414
  2.86206897  3.17241379  3.48275862  3.79310345  4.10344828  4.4137931
  4.72413793  5.03448276  5.34482759  5.65517241  5.96551724  6.27586207
  6.5862069   6.89655172  7.20689655  7.51724138  7.82758621  8.13793103
  8.44827586  8.75862069  9.06896552  9.37931034  9.68965517 10.        ]


30

<br>

Finally, the `np.logspace()` function works similar to the `linspace()` function, but here, the values are spaced logarithmically. See the DeCaria text for more example on this!


<br>

# Indexing and subsetting arrays

Specific elements in an array can be accessed by indexing or subsetting Python NumPy arrays, similar to that of lists/tuples. For multidimensional arrays, the different dimensions are seperated by commas. The first index often refers to the row of the array, while the second index refers to the column. For 3 dimensional arrays, the 3rd index would represent the height, and so on...

⚠️ Note: *Technically*, subsetting an array and saving it to a variable does not create a new copy of it, essentially it is just a pointer to the original array. This ultimately saves memory for the computer, which can be important when working with large data sets. This does not change anything for the purposes of this class, more or less this is just good to know. This is referred to as a *shallow* copy. If you must copy an array, you can use the `np.copy()` function, but this is rarely needed. 

Here are some example on how to index arrays wich is similar to indexing tuples/lists. In the most part should be review... ;-)


In [12]:
seq_array = np.arange(0,11,2)
print(seq_array)

[ 0  2  4  6  8 10]


In [13]:
print(seq_array[3])

6


In [14]:
print(seq_array[3:])

[ 6  8 10]


In [15]:
print(seq_array[-1])

10


<br>

And some striding examples...

In [16]:
seq_array = np.arange(0,21,1) 
print(seq_array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]


In [17]:
print(seq_array[0:12:2])

[ 0  2  4  6  8 10]


In [18]:
print(seq_array[::-1])

[20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0]


In [19]:
print(seq_array[::-2])

[20 18 16 14 12 10  8  6  4  2  0]


And you get the idea of striding...!

<br>

Finally, it is also possible to index with lists, which can be very useful:

In [20]:
my_list = [0,2,4,6,8,10]
print(seq_array[my_list])

[ 0  2  4  6  8 10]


This is useful, especially when utilizing the `where()` function. More on this later!

<br>

Indexing multidimensional arrays is very similar to that of 1-D arrays, except that there are 2 or more dimensions that you need to consider. For example, lets say I wanted to grab the first element of a 2D array (upper left corner or the '1'):


In [21]:
array_2D = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(array_2D)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [22]:
print(array_2D[0,0])

1


<br>

# Do it yourself #1

1) What if I want to grab the top middle index (the '2')?

2) What is I wanted to grab the entire middle row (4, 5, 6)?

3) What if I wanted to subset for the numbers 5, 6, 7, 8?

4) What if we wanted to grab the left middle value of our array (the '4')?


<br>

# Broadcasting arrays

After defining an array, we can use a technique referred to as *broadcasting* to perform mathematical expressions or other Python functions. Think of this as if you are telling Python to broadcast a command to an audience, with this audience being all of the elements within your array. Some examples:
    

In [23]:
print(array_2D)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [24]:
print(array_2D * 2)

[[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]


 This works with other mathematical functions like addition, division, substraction, ect....

You can also broadcast to specific elements within an array as well...

In [25]:
print(array_2D[1:,1:] + 10)

[[15 16]
 [18 19]]


In [26]:
print(array_2D[:2,:2]**2)

[[ 1  4]
 [16 25]]


Arrays can also be added and subtracted together...

In [27]:
array1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
array2 = np.array([[1,2,3],[4,5,6],[7,8,9]])
    
array3 = array1 + array2
    
print(array3)

[[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]


<br>

# Explicit and implicit loops

For more complicated expressions, especially those that require multiple lines of code, we can use the `for` loop construct to go through elements in our array. Generally, this is less efficient, and so it should only be used when absolutely necessary.

⚠️ Note: If Python is your first programming language, you may find yourself using loops more often that not until you start mastering programming. As time goes on you will become a more proficient programmer, but this does not happen overnight so be patient!

An example of an explicit loop:


In [28]:
x = np.linspace(0,4*np.pi,100) 
y = np.zeros_like(x)
    
for i, val in enumerate(x):
    y[i] =  np.sin(val)
    print(y[i])

0.0
0.12659245357374926
0.2511479871810792
0.3716624556603276
0.4861967361004687
0.5929079290546404
0.690079011482112
0.7761464642917568
0.8497254299495144
0.9096319953545183
0.9549022414440739
0.984807753012208
0.998867339183008
0.9968547759519424
0.9788024462147787
0.9450008187146685
0.8959937742913359
0.8325698546347714
0.7557495743542583
0.6667690005162917
0.5670598638627709
0.4582265217274105
0.3420201433256689
0.2203105327865408
0.09505604330418244
-0.03172793349806786
-0.15800139597335008
-0.28173255684142984
-0.4009305354066138
-0.5136773915734064
-0.6181589862206053
-0.7126941713788629
-0.7957618405308321
-0.8660254037844388
-0.9223542941045814
-0.9638421585599422
-0.9898214418809327
-0.9998741276738751
-0.9938384644612541
-0.9718115683235417
-0.9341478602651068
-0.881453363447582
-0.8145759520503358
-0.7345917086575332
-0.6427876096865396
-0.5406408174555974
-0.4297949120891719
-0.31203344569848707
-0.18925124436040974
-0.06342391965656452
0.06342391965656492
0.18925124436041

Of course we can also simplify the above code by doing the following...

In [29]:
x = np.linspace(0,4*np.pi,100) 
y2 = np.sin(x)
print(y2)

[ 0.00000000e+00  1.26592454e-01  2.51147987e-01  3.71662456e-01
  4.86196736e-01  5.92907929e-01  6.90079011e-01  7.76146464e-01
  8.49725430e-01  9.09631995e-01  9.54902241e-01  9.84807753e-01
  9.98867339e-01  9.96854776e-01  9.78802446e-01  9.45000819e-01
  8.95993774e-01  8.32569855e-01  7.55749574e-01  6.66769001e-01
  5.67059864e-01  4.58226522e-01  3.42020143e-01  2.20310533e-01
  9.50560433e-02 -3.17279335e-02 -1.58001396e-01 -2.81732557e-01
 -4.00930535e-01 -5.13677392e-01 -6.18158986e-01 -7.12694171e-01
 -7.95761841e-01 -8.66025404e-01 -9.22354294e-01 -9.63842159e-01
 -9.89821442e-01 -9.99874128e-01 -9.93838464e-01 -9.71811568e-01
 -9.34147860e-01 -8.81453363e-01 -8.14575952e-01 -7.34591709e-01
 -6.42787610e-01 -5.40640817e-01 -4.29794912e-01 -3.12033446e-01
 -1.89251244e-01 -6.34239197e-02  6.34239197e-02  1.89251244e-01
  3.12033446e-01  4.29794912e-01  5.40640817e-01  6.42787610e-01
  7.34591709e-01  8.14575952e-01  8.81453363e-01  9.34147860e-01
  9.71811568e-01  9.93838

Gives you the same result...

In [30]:
print(y2 - y)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0.]


<br>

# Other useful array-related commands

Listed below are useful functions and methods that I use when working with arrays!

>- `<var name>.sum()`: Computes the sum of an array
>- `<var name>.mean()`: Computes the mean of an array
>- `<var name>.std()`: Computes the standard deviation of an array
>- `<var name>.var()`: Computes the variance of an array
>- `<var name>.var()`: Computes the variance of an array
>- `shape()`: Returns the shape of an array
>- `size()`: Returns the number of elements in an array



<br>

# Reshaping, transposing, and shifting arrays

There are also a number of functions available for manipulating and changing the shape of an array, or for moving elements around within an array. 

>- `<var name>.flatten()`: Flattens a multidimensional array to a 1-D version
>- `reshape(a,ns)`: Returns a copy of an array (a) with new shape ns. ⚠️ The new shape must have the same number of elements!
>- `roll(a, shift,axis)`: Moves elements of a by the amount of shift. For multidimensional arrays, the arguments axis must be provided,, which specifies the axis to roll. For 1D arrays this can be left out.
>- `transpose(a)`: returns a transposed copy of a.
>- `rot90(a,n)`: returns a copy of 'a'  rotated clockwise by n x 90 degree. A negative 'n' will rotate 'a' counterclockwise
>- `squeeze(a)`: Returns a copy of 'a' with a single-element dimensions removed (i.e a 0 x 10 array will just be 10). 


Create some arrays and play around with some of these functions and methods!


<br>

**Appending:** Elements can also be appended to arrays. For example:


In [31]:
seq_array = np.arange(0,11,2)
print(seq_array)

[ 0  2  4  6  8 10]


In [32]:
seq_array = np.append(seq_array,[12,14])
print(seq_array)

[ 0  2  4  6  8 10 12 14]


<br>

**Inserting:** Elements can also be inserted into an array using the `np.insert()` function. This function has arguments 'a', which is our array we are inserting into, 'ind', which is the index of 'a' that we are inserting into. 'Elements' is the last argument, which will be the elements that we will be inserting into array 'a':

In [33]:
print(seq_array)

[ 0  2  4  6  8 10 12 14]


In [34]:
seq_array = np.insert(seq_array,2,[24,22,20])
print(seq_array)

[ 0  2 24 22 20  4  6  8 10 12 14]


<br>

**Deleting:** Elements can be deleted from an array. The `np.delete()` function, which has arguments 'a' and 'index', can remove elements from 'a' from the specified indices. 

In [35]:
seq_array = np.delete(seq_array,[2,3,4])
print(seq_array)

[ 0  2  4  6  8 10 12 14]


<br>

Elements within an array can also be reassigned following the syntax below:


In [36]:
seq_array = np.arange(0,11,2)
print(seq_array)

[ 0  2  4  6  8 10]


In [37]:
seq_array[3] = 99
print(seq_array)

[ 0  2  4 99  8 10]


In [38]:
seq_array[2:5] = [-999,-999,-999]
print(seq_array)

[   0    2 -999 -999 -999   10]


<br>

# Stacking and splitting arrays

NumPy arrays can also be combined to form a new, multidimensional array or they can be splitted into multiple 'subarrays'.

**Stacking:** Multiple arrays can be stacked horizontally (by column) or vertically (by row) to form a single array. This can be done using the `np.vstack()` or `np.hstack()` functions. An example of a `np.vstack` function can be seen below:


In [39]:
array1 = np.array([1,2,3])
array2 = np.array([4,5,6])
array3 = np.array([7,8,9])
    
print(array1)
print(' ')
print(array2)
print(' ')
print(array3)

[1 2 3]
 
[4 5 6]
 
[7 8 9]


In [40]:
array_2D = np.vstack((array1,array2,array3))
print(array_2D)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


<br>

**Splitting:**  Arrays can also be seperated into subarrays using the `np.split()`, `np.hsplit()`, & `np.vsplit()` functions. Each of these has arguments for the array we are splitting 'a' and the number of subarrays we want to split our main array into. A `np.vsplit()` example can be seen below:


In [41]:
arrays = np.vsplit(array_2D,3)
print(arrays)

[array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]


<br>

**Merging:** Finally, two 1-D arrays can be merged together to form a single, multidimensional array using the `np.meshgrid(array1,array2)` function that has arguments of array1 (first array) and array2 (second array):


In [42]:
lon = np.linspace(-119,-110,10)
lat = np.linspace(41,50,10)
    
x2d, y2d = np.meshgrid(lon,lat)

What happens when we do this?

<br>


# Logical operations with arrays

The `np.where` function provides a way for the programmer to search through an array and determine which elements meet a certain criteria. This function then returns indices of our array where these conditions are met. For example, lets say we have a 3 x 3 array (2D):


In [43]:
array_2D = np.array([[3,2,0],[4,-4,-10],[-1,4,11]])

Using the `np.where` function, lets determine which indices have elements that are less than 0:

In [44]:
negative_indices = np.where(array_2D < 0)
print(negative_indices)

(array([1, 1, 2]), array([1, 2, 0]))


Did this work? Lets check!

In [45]:
print(array_2D[negative_indices])

[ -4 -10  -1]


<br>

We can also add multiple conditions using the where statement....


In [46]:
x = np.arange(-10,10,1)
idx = np.where((x > -5) & (x < 5))
print(x)


[-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
   8   9]


Do this work?

In [47]:
print(x[idx])

[-4 -3 -2 -1  0  1  2  3  4]


<br><br>

# Do it yourself #2
    
1) Replace all negative numbers within the array below with a NaN

    array([0, -1.2, 2, -1, 4.9, -1, 6.1, -1, 8, -1])

<br>

2) Create a sequence that goes from 20 to 0 with intervals of 2

<br>

3) Create a 10 by 10 array that goes from 0 and ends at 99.

<br>

4) Compute the mean, min and max of our 10 by 10 array that we just created.

<br>

5) Check the data type of our newly created array.

<br>

6) What indices are greater than or equal to 40 but less than 50 in our 10 by 10 array?



<br><br>

# Want more practice!?
Check out the following webpages:<br>  
https://www.tutorialspoint.com/numpy/index.htm<br>
https://www.w3schools.com/python/default.asp (left navigation bar)<br>
<br>
