{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 4: IO and NumPy Arrays\n", "## Chapters 6 and 7 from the Alex DeCaria textbook: \n", "\n", "Chapter 6: opening and closing files properly is important, but the approaches given in the text are less useful than the ways to open/close files to/from numpy arrays (and later pandas) that we will use\n", "\n", "The Numpy module, which is readily installed with Python distributions, is designed to work with large data sets, particularly those with multiple dimensions. However, unlike Python lists and tuples, each NumPy array can hold only one data type. For example, a defined numpy array must be all floating numbers, strings, integers, etc.... Despite this syntax rule, it is much more computationally efficient to work with NumPy arrays than with lists/tuples. In this module, we will focus on:\n", "- Reading data into numpy arrays\n", "- Create arrays from scratch\n", "- Review common NumPy data types\n", "- Go over useful array functions\n", "- Discuss array indexing and subsetting\n", "- Learn how to reshape arrays\n", "- Combine logical operators with arrays\n", "\n", "**Note:** There are *ALOT* of things you can do with NumPy arrays, so for the purpose of time, we will not be abe to go over every function/trick related to NumPy arrays. *It will be up to you* to read the DeCaria book and review other online resources! This module is meant to give you the tools so that you can work with basic NumPy arrays.\n", "\n", "**Before starting:** Make sure that you open up a Jupyter notebook session using OnDemand so you can interactively follow along with this week's sessions! Also, be copy this script in yso you have the original plus the ones we will go through during class.\n", "\n", "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Revisiting opening numpy arrays in Chapter1.ipynb notebook\n", "\n", "Run the chapter1.ipynb notebook that is duplicated from the chapter1 subdirectory in this module_4 " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# import numpy\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the description of the data file used in chapter 1.\n", "\n", "Alta snowfall https://utahavalanchecenter.org/alta-monthly-snowfall\n", "\n", "Look in the data folder to see the file alta_snow.csv created from that resource.\n", "\n", "Open the alta_snow.csv file see the column contents and the units.\n", "\n", "The 0th column is the Year at Season End\n", "The 1st-6th column are the total snowfall in each month from November to April (in inches)\n", "The 7th column is the Nov-Apr total snowfall (inches)\n", "Begins in the 1946 season and ends in 2022" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Ending Year' 'NOV' 'DEC' 'JAN' 'FEB' 'MAR' 'APR' 'TOTAL']\n", "[1946. 1947. 1948. 1949. 1950. 1951. 1952. 1953. 1954. 1955. 1956. 1957.\n", " 1958. 1959. 1960. 1961. 1962. 1963. 1964. 1965. 1966. 1967. 1968. 1969.\n", " 1970. 1971. 1972. 1973. 1974. 1975. 1976. 1977. 1978. 1979. 1980. 1981.\n", " 1982. 1983. 1984. 1985. 1986. 1987. 1988. 1989. 1990. 1991. 1992. 1993.\n", " 1994. 1995. 1996. 1997. 1998. 1999. 2000. 2001. 2002. 2003. 2004. 2005.\n", " 2006. 2007. 2008. 2009. 2010. 2011. 2012. 2013. 2014. 2015. 2016. 2017.\n", " 2018. 2019. 2020. 2021. 2022.]\n", "[1145.54 949.96 1394.46 1328.42 1211.58 886.46 1628.14 1043.94\n", " 972.82 1198.88 1168.4 980.44 1421.13 980.44 1004.57 828.04\n", " 1019.81 1018.54 1437.64 1455.42 1099.82 1381.76 1217.93 1437.894\n", " 1165.86 1223.01 1185.164 1261.11 1512.824 1536.7 1116.33 798.83\n", " 1332.23 1493.52 1305.56 993.14 1767.84 1617.98 1888.49 1160.78\n", " 1521.46 969.772 1042.162 1477.01 1137.92 1473.708 1003.3 1652.016\n", " 1245.362 1893.316 1427.48 1521.714 1460.246 1164.336 1132.84 1193.038\n", " 1441.958 1014.476 1449.832 1406.144 1609.09 904.24 1661.16 1468.12\n", " 1092.2 1404.62 836.93 971.55 908.05 679.45 998.22 1347.47\n", " 731.52 1206.5 1056.64 949.96 717.296]\n", "Min: 679.5 Max: 1893.3\n", "\n" ] } ], "source": [ "#use the numpy genfromtxt function to read the csv data\n", "\n", "#notes: \n", "#access the file in the data subdirectory using the path relative format\n", "#read the header line of the Alta snowfall data\n", "#specify the delimiter in the data file\n", "\n", "#first get the header row that are string values\n", "headers = np.genfromtxt('../data/alta_snow.csv', delimiter=',', max_rows=1,dtype=(str))\n", "print(headers)\n", "\n", "#read the year column for the Alta snowfall data\n", "#have to skip over the first row\n", "year = np.genfromtxt('../data/alta_snow.csv', delimiter=',', usecols=0, skip_header=1)\n", "print(year)\n", "\n", "#read the seasonal totals from the 8th column and convert from inches to cm\n", "snow = 2.54 * np.genfromtxt('../data/alta_snow.csv', delimiter=',', usecols=7, skip_header=1)\n", "#print out the data after converting it to cm\n", "print(snow)\n", "\n", "#what are the min and max values?\n", "#note the simpler way to format printing than shown in Chapter 3\n", "print(\"Min: %.1f Max: %.1f\" % (np.min(snow),np.max(snow)))\n", "\n", "#note that the numpy arrays \"look\" like lists, but they are not they are np.ndarrays\n", "print(type(snow))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 6.8 Internet File Access\n", "Now let's get the Alta snow file directly from the web.\n", "\n", "We want to avoid the web html as much as possible from\n", "Alta snowfall https://utahavalanchecenter.org/alta-monthly-snowfall\n", "\n", "So, if you select the data table and after we change from output=html to output=csv\n", "\n", "Then here is a link to grab the data table in csv format\n", "https://docs.google.com/spreadsheets/d/1VoKi4OY-pj33uvyTZdQFclQ7rUshF-7wOyFOWvQ27dA/pub?output=csv\n", "\n", "Let's then grab that file" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saved alta_monthly_snow_from_url.csv 3.649 KB\n" ] } ], "source": [ "#access a function from the urllib module\n", "from urllib.request import urlretrieve\n", "#also let's use a linux command to see the file size\n", "import os\n", "#and lets exit the program if we can't access the file for some reason\n", "import sys\n", "\n", "\n", "#define where the file is on the web\n", "url = \"https://docs.google.com/spreadsheets/d/1VoKi4OY-pj33uvyTZdQFclQ7rUshF-7wOyFOWvQ27dA/pub?output=csv\"\n", "# define the file to write the data into\n", "filename = \"alta_monthly_snow_from_url.csv\"\n", "#let's try if we can get the file from the web\n", "try:\n", " #get the file over the web\n", " urlretrieve(url, filename)\n", " print(\"Saved\", filename, os.path.getsize(filename)/1000., 'KB')\n", "except:\n", " print(\"something wrong grabbing the file\")\n", " print(\"but the program continues, so may be in error\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Season' 'Year ending' 'Nov.' 'Dec.' 'Jan.' 'Feb.' 'Mar.' 'Apr.' 'Total'] \n", "\n", "[1946. 1947. 1948. 1949. 1950. 1951. 1952. 1953. 1954. 1955. 1956. 1957.\n", " 1958. 1959. 1960. 1961. 1962. 1963. 1964. 1965. 1966. 1967. 1968. 1969.\n", " 1970. 1971. 1972. 1973. 1974. 1975. 1976. 1977. 1978. 1979. 1980. 1981.\n", " 1982. 1983. 1984. 1985. 1986. 1987. 1988. 1989. 1990. 1991. 1992. 1993.\n", " 1994. 1995. 1996. 1997. 1998. 1999. 2000. 2001. 2002. 2003. 2004. 2005.\n", " 2006. 2007. 2008. 2009. 2010. 2011. 2012. 2013. 2014. 2015. 2016. 2017.\n", " 2018. 2019. 2020. 2021. 2022.] \n", "\n", "[1145.54 949.96 1394.46 1328.42 1211.58 886.46 1628.14 1043.94\n", " 972.82 1198.88 1168.4 980.44 1421.13 980.44 1004.57 828.04\n", " 1019.81 1018.54 1437.64 1455.42 1099.82 1381.76 1217.93 1437.894\n", " 1165.86 1223.01 1185.164 1261.11 1512.824 1536.7 1116.33 798.83\n", " 1332.23 1239.52 1305.56 993.14 1767.84 1617.98 1888.49 1160.78\n", " 1521.46 969.772 1042.162 1477.01 1137.92 1473.708 1003.3 1652.016\n", " 1245.362 1893.316 1427.48 1521.714 1460.246 1164.336 1132.84 1193.038\n", " 1441.958 1014.476 1449.832 1406.144 1609.09 904.24 1661.16 1468.12\n", " 1092.2 1404.62 836.93 971.55 908.05 679.45 998.22 1347.47\n", " 731.52 1206.5 1056.64 949.452 717.296]\n", "Min: 679.5 Max: 1893.3\n" ] } ], "source": [ "#do you have the file in your module_4 directory?\n", "# look at it- the file is more complicated with different columns and content at the bottom, etc.\n", "# note first year has missing values (--) so, we skip over 1945\n", "#we have to change the file read in order to have this work\n", "#experiment with this\n", "#read the headers\n", "headers = np.genfromtxt(filename, delimiter=',', max_rows=1,dtype=(str),skip_header=3)\n", "print(headers,\"\\n\")\n", "\n", "#read the year of the Alta snowfall data\n", "year = np.genfromtxt(filename, delimiter=',',usecols=1,skip_header = 6,skip_footer=5)\n", "print(year,'\\n')\n", "#read the seasonal total and convert from inches to cm\n", "snow_new = 2.54 * np.genfromtxt(filename, delimiter=',', usecols=8, skip_header=6,skip_footer=5)\n", "#print out the data after converting it to cm\n", "print(snow_new)\n", "print(\"Min: %.1f Max: %.1f\" % (np.min(snow_new),np.max(snow_new)))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 254.0\n", "\n", "(array([33, 75]),)\n", "[1979. 2021.] [1493.52 949.96] [1239.52 949.452]\n" ] } ], "source": [ "# check that reading from the web is the same as that from the file previously downloaded\n", "diff_snow = snow - snow_new\n", "print(np.min(diff_snow),np.max(diff_snow))\n", "#hmmn why the difference?\n", "#use a numpy function \"where\" that will be discussed later\n", "indices= np.where(diff_snow > 0)\n", "print(type(indices))\n", "print(indices)\n", "#what years are those and what are the values?\n", "print(year[indices],snow[indices],snow_new[indices])\n", "#Are the differences important? Which do you trust?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating an array\n", "\n", "There are many ways to create an array from scratch using NumPy. A simple way is use NumPy's array function and give it a list/tuple as its input. Before starting, you must load the NumPy module (we did above, so really no need to do so here)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "then, we can create a 1D array by doing the following:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,5,3,-6,-2,4,-9,2,2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array that we created will be all integers, since we only supplied it with integer values. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(a)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(a[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, we can also create an array and predefine the data type using the `dtype` agrument:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "a = np.array([1,5,3,-6,-2,4,-9,2,2],dtype=np.float64)\n", "print(type(a[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, you can see that the first element in array a is now a floating number." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Listed below are the data types most commonly used within NumPy arrays:\n", "\n", ">- `np.float`: Double percision (64 bit) floating point\n", ">- `np.int64`: Double percision (64 bit) integer\n", ">- `np.complex128`: Complex number, with a real and imaginary part that are each 64 bits\n", ">- `np.bool_`: Boolean (True/False) data type. Note the underscore `_`. \n", "\n", "
\n", "\n", "You can also create 2-D arrays (or other multidimensional arrays) by inserting nested lists/tuples as the input for `np.array` function...\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2 5]\n", " [ 1 -4]]\n" ] } ], "source": [ "a = np.array([[2,5],[1,-4]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 2) \n", "\n", "4 \n", "\n" ] } ], "source": [ "#what is the shape and size of a?\n", "#shape function returns array dimensions\n", "print(np.shape(a),'\\n')\n", "#size function returns total number of elements in array\n", "print(np.size(a),'\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can also create arrays using single values:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]\n" ] } ], "source": [ "a_zero = np.zeros((10,10))\n", "print(a_zero)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]\n", "[[3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]\n", " [3.14159265 3.14159265 3.14159265 3.14159265 3.14159265 3.14159265\n", " 3.14159265 3.14159265 3.14159265 3.14159265]]\n" ] } ], "source": [ "#fill array elements with the same value, can be np.NaN too\n", "a_pi = np.empty((10,10))\n", "print(a_pi)\n", "a_pi[:] = np.pi \n", "print(a_pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above is useful in order to know if you incorrectly fill in an array later on..." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.00000000e-04 1.00099900e+03 3.14159265e+02]\n", "[ 0. 1001. 314.16] \n", "\n", "[[3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]\n", " [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14]]\n" ] } ], "source": [ "#printing numpy arrays neatly\n", "z = np.array([0.0001,1000.999,np.pi*100])\n", "#reset the print option to the default\n", "np.set_printoptions(suppress=False,precision=8)\n", "print(z)\n", "# add the following to decrease precision and turn off scientific notation\n", "np.set_printoptions(precision=2,suppress=True)\n", "print(z,'\\n')\n", "\n", "#what about for the a_pi array?\n", "print(a_pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Sequential arrays\n", "\n", "There are 3 functions that are primarily used to generate sequential arrays in Python. These functions are `arange()`, `linspace()`, and `logspace()`. The `arange()` function behaves very similarly to the `range()` function in Python:\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "seq_array = np.arange(0,11,2)\n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[11 9 7 5 3 1]\n" ] } ], "source": [ "seq_array = np.arange(11,0,-2)\n", "print(seq_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you get the idea... The data type of the array is determined by the input. So if all the inputs are integers, it will be an integer-type data array. Note that you can define the dtype argument when using the `np.arange` function.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "The `linspace()` function allows the user to specify a begining and end value and the number of points to create:\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 1.31 1.62 1.93 2.24 2.55 2.86 3.17 3.48 3.79 4.1 4.41\n", " 4.72 5.03 5.34 5.66 5.97 6.28 6.59 6.9 7.21 7.52 7.83 8.14\n", " 8.45 8.76 9.07 9.38 9.69 10. ]\n" ] }, { "data": { "text/plain": [ "30" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seq_array = np.linspace(1,10,30)\n", "print(seq_array)\n", "len(seq_array)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "#note that if you want to have values at even fractions say 0.25, you need to adjust a bit\n", "#if from 0 to 10 at 0.25 there would be 4*10+1 values to include the first and last values" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75\n", " 3. 3.25 3.5 3.75 4. 4.25 4.5 4.75 5. 5.25 5.5 5.75\n", " 6. 6.25 6.5 6.75 7. 7.25 7.5 7.75 8. 8.25 8.5 8.75\n", " 9. 9.25 9.5 9.75 10. ]\n" ] }, { "data": { "text/plain": [ "41" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seq_array = np.linspace(0,10,41)\n", "print(seq_array)\n", "len(seq_array)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 32.0\n", "0.25 32.45\n", "0.5 32.9\n", "0.75 33.35\n", "1.0 33.8\n", "1.25 34.25\n", "1.5 34.7\n", "1.75 35.15\n", "2.0 35.6\n", "2.25 36.05\n", "2.5 36.5\n", "2.75 36.95\n", "3.0 37.4\n", "3.25 37.85\n", "3.5 38.3\n", "3.75 38.75\n", "4.0 39.2\n", "4.25 39.65\n", "4.5 40.1\n", "4.75 40.55\n", "5.0 41.0\n", "5.25 41.45\n", "5.5 41.9\n", "5.75 42.35\n", "6.0 42.8\n", "6.25 43.25\n", "6.5 43.7\n", "6.75 44.15\n", "7.0 44.6\n", "7.25 45.05\n", "7.5 45.5\n", "7.75 45.95\n", "8.0 46.4\n", "8.25 46.85\n", "8.5 47.3\n", "8.75 47.75\n", "9.0 48.2\n", "9.25 48.650000000000006\n", "9.5 49.1\n", "9.75 49.55\n", "10.0 50.0\n" ] } ], "source": [ "#these sequential arrays are useful for looping\n", "for t in seq_array:\n", " f = t * 1.8 + 32\n", " print(t,f)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75\n", " 3. 3.25 3.5 3.75 4. 4.25 4.5 4.75 5. 5.25 5.5 5.75\n", " 6. 6.25 6.5 6.75 7. 7.25 7.5 7.75 8. 8.25 8.5 8.75\n", " 9. 9.25 9.5 9.75 10. ] \n", " [32. 32.45 32.9 33.35 33.8 34.25 34.7 35.15 35.6 36.05 36.5 36.95\n", " 37.4 37.85 38.3 38.75 39.2 39.65 40.1 40.55 41. 41.45 41.9 42.35\n", " 42.8 43.25 43.7 44.15 44.6 45.05 45.5 45.95 46.4 46.85 47.3 47.75\n", " 48.2 48.65 49.1 49.55 50. ]\n" ] } ], "source": [ "#but there is a simpler way. just compute using the array (this is broadcasting or implicit looping)\n", "f = seq_array * 1.8 + 32\n", "print(seq_array,'\\n',f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Finally, the `np.logspace()` function works similar to the `linspace()` function, but the values are spaced logarithmically. See the DeCaria text for more example on this!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Indexing and subsetting arrays\n", "\n", "Specific elements in an array can be accessed by indexing or subsetting Python NumPy arrays, similar to that of lists/tuples. For multidimensional arrays, the different dimensions are seperated by commas. The first index often refers to the row of the array, while the second index refers to the column. For 3 dimensional arrays, the 3rd index would represent the height, and so on...\n", "\n", "⚠️ Note: *Technically*, subsetting an array and saving it to a variable does not create a new copy of it, essentially it is just a pointer to the original array. This ultimately saves memory for the computer, which can be important when working with large data sets. This does not change anything for the purposes of this class, more or less this is just good to know. This is referred to as a *shallow* copy. If you must copy an array, you can use the `np.copy()` function, but this is rarely needed. \n", "\n", "Here are some example on how to index arrays wich is similar to indexing tuples/lists. For most part this should be review... ;-)\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "seq_array = np.arange(0,11,2)\n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n" ] } ], "source": [ "print(seq_array[3])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 6 8 10]\n" ] } ], "source": [ "print(seq_array[3:])" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print(seq_array[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "And some striding examples..." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]\n" ] } ], "source": [ "seq_array = np.arange(0,21,1) \n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "print(seq_array[0:12:2])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0]\n" ] } ], "source": [ "print(seq_array[::-1])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[20 18 16 14 12 10 8 6 4 2 0]\n" ] } ], "source": [ "print(seq_array[::-2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you get the idea of striding...!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Finally, it is also possible to index with lists, which can be very useful:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "my_list = [0,2,4,6,8,10]\n", "print(seq_array[my_list])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is useful, especially when utilizing the `where()` function as shown above and discussed more below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Where indexing becomes more of a challenge is when there are more than 1 dimensions..\n", "\n", "Generally it is a good idea to not go crazy and use 3 or more dimensions unless you really have to. Keeping track and debugging becomes more of a challenge.\n", "\n", "Indexing multidimensional arrays is very similar to that of 1-D arrays, except that there are 2 or more dimensions that you need to consider. For example, lets say I wanted to grab the first element of a 2D array (upper left corner or the '1'):\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n" ] } ], "source": [ "array_2D = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "print(array_2D)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print(array_2D[0,0])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "#remember- in a 2d array, the first index is row and the second index is column " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Do it yourself #1\n", "\n", "1) What if I want to grab the top middle index (the '2')?\n", "\n", "2) What if I wanted to grab the left middle value of our array (the '4')?\n", "\n", "3) What is I wanted to grab the entire middle row (4, 5, 6)?\n", "\n", "4) What if I wanted to subset for the values 5, 6, 7, 8?\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "4\n", "[4 5 6]\n", "[5 6] [7 8]\n", "[1 2 3 4 5 6 7 8 9]\n", "[5 6 7 8]\n" ] } ], "source": [ "#1\n", "print(array_2D[0,1])\n", "#2\n", "print(array_2D[1,0])\n", "#3\n", "print(array_2D[1,:])\n", "#4\n", "#this is more complicated\n", "print(array_2D[1,1:],array_2D[2,0:2])\n", "#another way is to flatten the array into a single dimension array\n", "a_2D = array_2D.flatten()\n", "print(a_2D)\n", "print(a_2D[4:8])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Broadcasting arrays\n", "\n", "After defining an array, we can use *broadcasting* to perform mathematical expressions or other Python functions. You are telling Python to broadcast a command across all elemnts of the array. \n", "\n", "For those familiar with Matlab, broadcasting is described there as element multiplication, (e.g., the math operator .*,etc.).\n", "\n", "Some examples of broadcasting:\n", " " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n" ] } ], "source": [ "print(array_2D)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2 4 6]\n", " [ 8 10 12]\n", " [14 16 18]]\n" ] } ], "source": [ "print(array_2D * 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " This works with other mathematical functions like addition, division, substraction, etc...." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also broadcast to specific elements within an array as well..." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[15 16]\n", " [18 19]]\n" ] } ], "source": [ "print(array_2D[1:,1:] + 10)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 4]\n", " [16 25]]\n" ] } ], "source": [ "print(array_2D[:2,:2]**2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays can also be added and subtracted together..." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]] \n", " [[ 2 3 4]\n", " [ 5 6 7]\n", " [ 8 9 10]] \n", " [[ 3 5 7]\n", " [ 9 11 13]\n", " [15 17 19]]\n" ] } ], "source": [ "array1 = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "array2 = np.array([[2,3,4],[5,6,7],[8,9,10]])\n", " \n", "array3 = array1 + array2\n", " \n", "print(array1,'\\n',array2,'\\n',array3)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "# you need to be careful when attempting to broadcast when the dimensions are not the same\n", "# example multiply each row by a unique value. values changing the same in each column" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]\n", "(4,) (3, 4) (3, 4)\n", "[0 1 2 3] \n", " [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]\n", "[[ 0 2 6 12]\n", " [ 0 6 14 24]\n", " [ 0 10 22 36]]\n" ] } ], "source": [ "array4 = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]\n", "print(array4)\n", "valc = np.array([0,1,2,3])\n", "array_mult = valc * array4\n", "print(np.shape(valc),np.shape(array4),np.shape(array_mult))\n", "print(valc, '\\n',array4)\n", "print(array_mult)\n", "\n", "#hint later in the program some variables are used " ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "# you need to be careful when attempting to broadcast when the dimensions are not the same\n", "# for most applications, you need to insure that either the number of rows or columns (or both) are the same\n", "# example multiply each column by a unique value\n", "# values changing similarly across each row" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 1) (3, 4) (3, 4)\n", "[[0]\n", " [1]\n", " [2]] \n", " [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]\n", "[[ 0 0 0 0]\n", " [ 5 6 7 8]\n", " [18 20 22 24]]\n" ] } ], "source": [ "#broadcasting across rows\n", "valr = np.array([[0],[1],[2]])\n", "array_mult = valr * array4\n", "print(np.shape(valr),np.shape(array4),np.shape(array_mult))\n", "print(valr, '\\n',array4)\n", "print(array_mult)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Explicit and implicit loops\n", "\n", "For more complicated expressions, especially those that require multiple lines of code, we can use the `for` loop construct to go through elements in our array. Generally, this is less efficient, and so it should only be used when absolutely necessary.\n", "\n", "⚠️ Note: If Python is your first programming language, you may find yourself using loops more often until you start mastering programming. However, for/while loops are a last resort rather than what you should do at first\n", "\n", "Implicit looping (broadcasting) will become more natural if you just recognize you are applying some action to each element. \n", "\n", "Another example of an explicit loop:\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x: 0.00 y: 0.00\n", "x: 0.13 y: 0.13\n", "x: 0.25 y: 0.25\n", "x: 0.38 y: 0.37\n", "x: 0.51 y: 0.49\n", "x: 0.63 y: 0.59\n", "x: 0.76 y: 0.69\n", "x: 0.89 y: 0.78\n", "x: 1.02 y: 0.85\n", "x: 1.14 y: 0.91\n", "x: 1.27 y: 0.95\n", "x: 1.40 y: 0.98\n", "x: 1.52 y: 1.00\n", "x: 1.65 y: 1.00\n", "x: 1.78 y: 0.98\n", "x: 1.90 y: 0.95\n", "x: 2.03 y: 0.90\n", "x: 2.16 y: 0.83\n", "x: 2.28 y: 0.76\n", "x: 2.41 y: 0.67\n", "x: 2.54 y: 0.57\n", "x: 2.67 y: 0.46\n", "x: 2.79 y: 0.34\n", "x: 2.92 y: 0.22\n", "x: 3.05 y: 0.10\n", "x: 3.17 y: -0.03\n", "x: 3.30 y: -0.16\n", "x: 3.43 y: -0.28\n", "x: 3.55 y: -0.40\n", "x: 3.68 y: -0.51\n", "x: 3.81 y: -0.62\n", "x: 3.93 y: -0.71\n", "x: 4.06 y: -0.80\n", "x: 4.19 y: -0.87\n", "x: 4.32 y: -0.92\n", "x: 4.44 y: -0.96\n", "x: 4.57 y: -0.99\n", "x: 4.70 y: -1.00\n", "x: 4.82 y: -0.99\n", "x: 4.95 y: -0.97\n", "x: 5.08 y: -0.93\n", "x: 5.20 y: -0.88\n", "x: 5.33 y: -0.81\n", "x: 5.46 y: -0.73\n", "x: 5.59 y: -0.64\n", "x: 5.71 y: -0.54\n", "x: 5.84 y: -0.43\n", "x: 5.97 y: -0.31\n", "x: 6.09 y: -0.19\n", "x: 6.22 y: -0.06\n", "x: 6.35 y: 0.06\n", "x: 6.47 y: 0.19\n", "x: 6.60 y: 0.31\n", "x: 6.73 y: 0.43\n", "x: 6.85 y: 0.54\n", "x: 6.98 y: 0.64\n", "x: 7.11 y: 0.73\n", "x: 7.24 y: 0.81\n", "x: 7.36 y: 0.88\n", "x: 7.49 y: 0.93\n", "x: 7.62 y: 0.97\n", "x: 7.74 y: 0.99\n", "x: 7.87 y: 1.00\n", "x: 8.00 y: 0.99\n", "x: 8.12 y: 0.96\n", "x: 8.25 y: 0.92\n", "x: 8.38 y: 0.87\n", "x: 8.50 y: 0.80\n", "x: 8.63 y: 0.71\n", "x: 8.76 y: 0.62\n", "x: 8.89 y: 0.51\n", "x: 9.01 y: 0.40\n", "x: 9.14 y: 0.28\n", "x: 9.27 y: 0.16\n", "x: 9.39 y: 0.03\n", "x: 9.52 y: -0.10\n", "x: 9.65 y: -0.22\n", "x: 9.77 y: -0.34\n", "x: 9.90 y: -0.46\n", "x: 10.03 y: -0.57\n", "x: 10.15 y: -0.67\n", "x: 10.28 y: -0.76\n", "x: 10.41 y: -0.83\n", "x: 10.54 y: -0.90\n", "x: 10.66 y: -0.95\n", "x: 10.79 y: -0.98\n", "x: 10.92 y: -1.00\n", "x: 11.04 y: -1.00\n", "x: 11.17 y: -0.98\n", "x: 11.30 y: -0.95\n", "x: 11.42 y: -0.91\n", "x: 11.55 y: -0.85\n", "x: 11.68 y: -0.78\n", "x: 11.80 y: -0.69\n", "x: 11.93 y: -0.59\n", "x: 12.06 y: -0.49\n", "x: 12.19 y: -0.37\n", "x: 12.31 y: -0.25\n", "x: 12.44 y: -0.13\n", "x: 12.57 y: -0.00\n" ] } ], "source": [ "x = np.linspace(0,4*np.pi,100) \n", "#create another array with exactly the same shape as x\n", "y = np.zeros_like(x)\n", "\n", " #loop over all values\n", "for i, val in enumerate(x):\n", " y[i] = np.sin(val)\n", " print(\"x: %.2f y: %.2f\" % (x[i],y[i]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course we can also simplify the above code by doing the following..." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.13 0.25 0.37 0.49 0.59 0.69 0.78 0.85 0.91 0.95 0.98\n", " 1. 1. 0.98 0.95 0.9 0.83 0.76 0.67 0.57 0.46 0.34 0.22\n", " 0.1 -0.03 -0.16 -0.28 -0.4 -0.51 -0.62 -0.71 -0.8 -0.87 -0.92 -0.96\n", " -0.99 -1. -0.99 -0.97 -0.93 -0.88 -0.81 -0.73 -0.64 -0.54 -0.43 -0.31\n", " -0.19 -0.06 0.06 0.19 0.31 0.43 0.54 0.64 0.73 0.81 0.88 0.93\n", " 0.97 0.99 1. 0.99 0.96 0.92 0.87 0.8 0.71 0.62 0.51 0.4\n", " 0.28 0.16 0.03 -0.1 -0.22 -0.34 -0.46 -0.57 -0.67 -0.76 -0.83 -0.9\n", " -0.95 -0.98 -1. -1. -0.98 -0.95 -0.91 -0.85 -0.78 -0.69 -0.59 -0.49\n", " -0.37 -0.25 -0.13 -0. ]\n" ] } ], "source": [ "y2 = np.sin(x)\n", "print(y2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gives you the same result..." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0.]\n" ] } ], "source": [ "print(y2 - y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Other useful array-related commands\n", "\n", "Listed below are useful functions and methods that I use when working with arrays!\n", "\n", ">- `.sum()`: Computes the sum of an array\n", ">- `.mean()`: Computes the mean of an array\n", ">- `.std()`: Computes the standard deviation of an array\n", ">- `.var()`: Computes the variance of an array\n", "important note: numpy default std and var assume the degrees of freedom = 0 \n", "which means they are \"sample\" std devs and variances. More on this later\n", "(and this is good that it is the default!)\n", ">- `shape()`: Returns the shape of an array\n", ">- `size()`: Returns the number of elements in an array\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Reshaping, transposing, and shifting arrays\n", "\n", "There are also a number of functions available for manipulating and changing the shape of an array, or for moving elements around within an array. \n", "\n", ">- `.flatten()`: Flattens a multidimensional array to a 1-D version\n", ">- `reshape(a,ns)`: Returns a copy of an array (a) with new shape ns. ⚠️ The new shape must have the same number of elements!\n", ">- `roll(a, shift,axis)`: Moves elements of a by the amount of shift. For multidimensional arrays, the arguments axis must be provided,, which specifies the axis to roll. For 1D arrays this can be left out.\n", ">- `transpose(a)`: returns a transposed copy of a.\n", ">- `rot90(a,n)`: returns a copy of 'a' rotated clockwise by n x 90 degree. A negative 'n' will rotate 'a' counterclockwise\n", ">- `squeeze(a)`: Returns a copy of 'a' with a single-element dimensions removed (i.e a 0 x 10 array will just be 10). \n", "\n", "\n", "Create some arrays and play around with some of these functions and methods!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Appending:** Elements can also be appended to arrays. For example:\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "seq_array = np.arange(0,11,2)\n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10 12 14]\n" ] } ], "source": [ "seq_array = np.append(seq_array,[12,14])\n", "print(seq_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Inserting:** Elements can also be inserted into an array using the `np.insert()` function. This function has arguments 'a', which is our array we are inserting into, 'ind', which is the index of 'a' that we are inserting into. 'Elements' is the last argument, which will be the elements that we will be inserting into array 'a':" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10 12 14]\n" ] } ], "source": [ "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 24 22 20 4 6 8 10 12 14]\n" ] } ], "source": [ "seq_array = np.insert(seq_array,2,[24,22,20])\n", "print(seq_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Deleting:** Elements can be deleted from an array. The `np.delete()` function, which has arguments 'a' and 'index', can remove elements from 'a' from the specified indices. " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10 12 14]\n" ] } ], "source": [ "seq_array = np.delete(seq_array,[2,3,4])\n", "print(seq_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Elements within an array can also be reassigned following the syntax below:\n" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 6 8 10]\n" ] } ], "source": [ "seq_array = np.arange(0,11,2)\n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 4 99 8 10]\n" ] } ], "source": [ "seq_array[3] = 99\n", "print(seq_array)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 -999 -999 -999 10]\n" ] } ], "source": [ "seq_array[2:5] = [-999,-999,-999]\n", "print(seq_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Stacking and splitting arrays\n", "\n", "NumPy arrays can also be combined to form a new, multidimensional array or they can be splitted into multiple 'subarrays'.\n", "\n", "**Stacking:** Multiple arrays can be stacked horizontally (by column) or vertically (by row) to form a single array. This can be done using the `np.vstack()` or `np.hstack()` functions. An example of a `np.vstack` function can be seen below:\n" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3]\n", " \n", "[4 5 6]\n", " \n", "[7 8 9]\n" ] } ], "source": [ "array1 = np.array([1,2,3])\n", "array2 = np.array([4,5,6])\n", "array3 = np.array([7,8,9])\n", " \n", "print(array1)\n", "print(' ')\n", "print(array2)\n", "print(' ')\n", "print(array3)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n" ] } ], "source": [ "array_2D = np.vstack((array1,array2,array3))\n", "print(array_2D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Splitting:** Arrays can also be seperated into subarrays using the `np.split()`, `np.hsplit()`, & `np.vsplit()` functions. Each of these has arguments for the array we are splitting 'a' and the number of subarrays we want to split our main array into. A `np.vsplit()` example can be seen below:\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]\n" ] } ], "source": [ "arrays = np.vsplit(array_2D,3)\n", "print(arrays)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Merging:** Finally, two 1-D arrays can be merged together to form a single, multidimensional array using the `np.meshgrid(array1,array2)` function that has arguments of array1 (first array) and array2 (second array):\n", "\n", "This is super important for plotting. We will discuss this a great deal more later.\n" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "lon = np.linspace(-119,-110,10)\n", "lat = np.linspace(41,50,10)\n", " \n", "x2d, y2d = np.meshgrid(lon,lat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens when we do this?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "\n", "# Logical operations with arrays\n", "\n", "The `np.where` function provides a way for the programmer to search through an array and determine which elements meet a certain criteria. This function then returns indices of our array where these conditions are met. For example, lets say we have a 3 x 3 array (2D):\n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "array_2D = np.array([[3,2,0],[4,-4,-10],[-1,4,11]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the `np.where` function, lets determine which indices have elements that are less than 0:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([1, 1, 2]), array([1, 2, 0]))\n" ] } ], "source": [ "negative_indices = np.where(array_2D < 0)\n", "print(negative_indices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Did this work? Lets check!" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ -4 -10 -1]\n" ] } ], "source": [ "print(array_2D[negative_indices])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "We can also add multiple conditions using the where statement....\n" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7\n", " 8 9]\n" ] } ], "source": [ "x = np.arange(-10,10,1)\n", "idx = np.where((x > -5) & (x < 5))\n", "print(x)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does this work?" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-4 -3 -2 -1 0 1 2 3 4]\n" ] } ], "source": [ "print(x[idx])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "# Do it yourself #2\n", " \n", "1) Replace all negative numbers within the array below with a NaN\n", "\n", " array([0, -1.2, 2, -1, 4.9, -1, 6.1, -1, 8, -1])\n", "\n", "
\n", "\n", "2) Create a sequence of numbers between 0 and 20 that when divided by 4 have no remainder)\n", "\n", "\n", "
\n", "\n", "3) Create a 10 by 10 array that goes from 0 and ends at 99.\n", "\n", "
\n", "\n", "4) Compute the mean and std of our 10 by 10 array that we just created.\n", "\n", "
\n", "\n", "5) Check the data type of our newly created array.\n", "\n", "
\n", "\n", "6) What indices and values are greater than or equal to 40 but less than 50 in our 10 by 10 array?\n", "\n", "7) what are the mean and std of the valyes computed in #6\n", "\n" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 5)\n", "[0. nan 2. nan 4.9 nan 6.1 nan 8. nan]\n" ] } ], "source": [ "#1\n", "a = np.array([0, -1.2, 2, -1, 4.9, -1, 6.1, -1, 8, -1])\n", "a_neg_ind = np.where(a<0)\n", "print(np.shape(a_neg_ind))\n", "a[a_neg_ind] = np.NaN\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 4 8 12 16]\n" ] } ], "source": [ "# 2\n", "vals = np.arange(0,20)\n", "val_4_index = np.where(np.mod(vals,4)==0)\n", "print(vals[val_4_index])" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23\n", " 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47\n", " 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71\n", " 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95\n", " 96 97 98 99]\n", "[[ 0 1 2 3 4 5 6 7 8 9]\n", " [10 11 12 13 14 15 16 17 18 19]\n", " [20 21 22 23 24 25 26 27 28 29]\n", " [30 31 32 33 34 35 36 37 38 39]\n", " [40 41 42 43 44 45 46 47 48 49]\n", " [50 51 52 53 54 55 56 57 58 59]\n", " [60 61 62 63 64 65 66 67 68 69]\n", " [70 71 72 73 74 75 76 77 78 79]\n", " [80 81 82 83 84 85 86 87 88 89]\n", " [90 91 92 93 94 95 96 97 98 99]]\n" ] } ], "source": [ "#3\n", "vals = np.arange(0,100)\n", "print(vals)\n", "a = np.reshape(vals,[10,10])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "49.5 28.86607004772212\n" ] } ], "source": [ "#4\n", "a_mean = np.mean(a)\n", "a_std = np.std(a)\n", "print(a_mean,a_std)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "#5\n", "print(type(a))" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])) \n", "[40 41 42 43 44 45 46 47 48 49]\n" ] } ], "source": [ "#6\n", "a_40s_indices = np.where(((a>=40) & (a< 50)))\n", "print(a_40s_indices,type(a_40s_indices))\n", "a_40s_values = a[a_40s_indices]\n", "print(a_40s_values)\n" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "44.5 2.8722813232690143\n" ] } ], "source": [ "#7\n", "av_mean = np.mean(a_40s_values)\n", "av_std = np.std(a_40s_values)\n", "print(av_mean,av_std)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "# for completeness, the code used to generate Figure 1.3 is repeated here since it relies on the n.random module\n", "# the plot is shown but not written to a file" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "#generate a Gaussian type empirical distribution for figure 1.3\n", "from numpy.random import normal,uniform\n", "sample = normal(loc=0, scale=1, size=1000000)\n", "# plot the histogram\n", "fig,(ax1) = plt.subplots(1,1,figsize=(5,5))\n", "ax1.hist(sample, bins=31, color='cyan',edgecolor='black',linewidth=1,align='mid')\n", "ax1.set(xlim=(-3,3),ylim=(0,150000))\n", "ax1.set(xlabel=\"Magnitude\",ylabel=\"Count\")\n", "ax1.set(title=\"Figure 1.3\")\n", "#add grids to the plot\n", "ax1.grid(linestyle='--', color='grey', linewidth=.2)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#lets do a uniform distribution\n", "sample = uniform(-3.,3., size=1000)\n", "# plot the histogram\n", "fig,(ax1) = plt.subplots(1,1,figsize=(5,5))\n", "ax1.hist(sample, bins=31, color='cyan',edgecolor='black',linewidth=1,align='mid')\n", "ax1.set(xlim=(-3,3),ylim=(0,100))\n", "ax1.set(xlabel=\"Magnitude\",ylabel=\"Count\")\n", "ax1.set(title=\"Figure 1.3\")\n", "#add grids to the plot\n", "ax1.grid(linestyle='--', color='grey', linewidth=.2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Do it yourself #3\n", "\n", "This is a big question, so don't expect to do this quickly.\n", "\n", "1) Start from alta_monthly_snow_from_url.csv file that you created above and extract the snow totals in cm in each January from 1946 to 2022 into the variable \"JAN\"\n", "\n", "Print out the values\n", "\n", "2) Create a variable year_2000s that contains the years from 2000 to 2022. Your code should include a way to determine the set of values from 2000 to 2020, not just by counting, etc. Hint- use the where function to find the index values and then use the indices to assign the values to years_2000s\n", "\n", "Print out the index values and the values\n", "\n", "3) Create a new variable JAN_2000s that contains only the January values from 2000 to 2022. You should be able to use info from #2\n", "\n", "Print out the values\n", "\n", "4) Copy the relevant code from the chapter.ipynb file to plot the time series for the years from 2000 to the present of January snow totals from Alta and create a file Alta_JAN_snow_2000s.png.\n", "\n", "While we have discussed plotting in depth yet, the point here is to be able to look at existing code and make small modifications. Do it sequentially and you will likely not run into any errors you cannot resolve.\n", "\n", "Be sure you import matplotlib (see chapter1 code)\n", "\n", "Your name and unid should be in the title. \n", "\n", "Tick marks on the time axis should be every 5 years from 2000 to 2020. \n", "\n", "The range in values should be between 0 and 500 cm\n", "\n", "5) Compute and print with appropriate precision the min,max,and median values of January snowfall at Alta during the 2000's\n" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[214.63 154.94 116.84 335.28 337.82 284.48 292.1 284.48 137.16 340.36\n", " 261.62 218.44 212.09 205.74 149.86 2.54 218.44 215.9 274.32 381.\n", " 185.42 426.72 99.06 287.02 262.89 147.32 240.03 163.83 265.43 264.16\n", " 189.23 128.27 252.73 199.39 363.22 185.42 363.22 191.77 106.68 111.76\n", " 142.24 243.84 266.95 179.71 273.05 210.31 106.17 419.86 311.66 507.24\n", " 474.98 359.41 327.41 267.46 254. 168.15 256.29 66.04 188.72 288.29\n", " 375.92 97.79 472.44 287.02 224.79 106.68 163.83 144.78 162.56 72.39\n", " 246.38 391.16 130.81 215.9 297.18 126.49 54.1 ]\n" ] } ], "source": [ "#read the Jan total and convert from inches to cm\n", "JAN = 2.54 * np.genfromtxt(filename, delimiter=',', usecols=4, skip_header=6,skip_footer=5)\n", "print(JAN)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2000. 2001. 2002. 2003. 2004. 2005. 2006. 2007. 2008. 2009. 2010. 2011.\n", " 2012. 2013. 2014. 2015. 2016. 2017. 2018. 2019. 2020. 2021. 2022.]\n" ] } ], "source": [ "year_2000s_index = np.where(year>=2000)\n", "year_2000s = year[year_2000s_index]\n", "print(year_2000s)\n" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[254. 168.15 256.29 66.04 188.72 288.29 375.92 97.79 472.44 287.02\n", " 224.79 106.68 163.83 144.78 162.56 72.39 246.38 391.16 130.81 215.9\n", " 297.18 126.49 54.1 ]\n" ] } ], "source": [ "JAN_2000s = JAN[year_2000s_index]\n", "print(JAN_2000s)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "#Create bar plot time series of Alta seasonal snowfall in the 2000s\n", "#create a list for the times for tick marks on the x axis. This will stop at 2020 (not 2030)\n", "year5_ticks = np.arange(2000,2025,5)\n", "\n", "#create a fig of Alta snowfall time series\n", "fig,(ax1) = plt.subplots(1,1,figsize=(10,3))\n", "ax1.bar(year_2000s,JAN_2000s,color='green')\n", "ax1.set(xlim=(1999,2023),ylim=(0,500))\n", "ax1.set(xlabel=\"Year\",ylabel=\"Snowfall (cm)\")\n", "ax1.set(xticks=year5_ticks)\n", "ax1.set(title=\"Alta January Snowfall: 1946-2022\")\n", "#add grids to the plot\n", "ax1.grid(linestyle='--', color='grey', linewidth=.2)\n", "\n", "#save the figure to \n", "plt.savefig('alta_JAN_snowfall.png')" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "54.102000000000004 188.722 472.44\n" ] } ], "source": [ "min_jan = np.min(JAN_2000s)\n", "med_jan = np.median(JAN_2000s)\n", "max_jan = np.max(JAN_2000s)\n", "print(min_jan,med_jan,max_jan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "# Want more practice!?\n", "Check out the following webpages:
\n", "https://www.tutorialspoint.com/numpy/index.htm
\n", "https://www.w3schools.com/python/default.asp (left navigation bar)
\n", "
\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }