{"cells":[{"cell_type":"markdown","id":"7f6f372d","metadata":{"id":"7f6f372d"},"source":["[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ai2es/WAF_ML_Tutorial_Part1/blob/main/colab_notebooks/Notebook03_TrainValTest.ipynb)\n","\n","# Notebook 03: TrainValTest [Colab Version]\n","\n","### Goal: Understand how to do train/val/test splitting\n","\n","#### Background\n","\n","In the paper, Section X.X, we discuss the importance of separating the total dataset into ```train```, ```validation``` and ```test``` subsets. Please go re-read this section if it is not clear to you why we do this.\n","\n","#### Step 0: Get the github repo (we need some of the functions there)\n","\n","The first step with all of these Google Colab notebooks will be to grab the github repo and cd into the notebooks directory. \n","\n","To run things from the command line, put a ```!``` before your code\n","\n","\n"]},{"cell_type":"code","source":["#get the github repo \n","!git clone https://github.com/ai2es/WAF_ML_Tutorial_Part1.git \n","\n","#cd into the repo so the paths work \n","import os \n","os.chdir('/content/WAF_ML_Tutorial_Part1/jupyter_notebooks/')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"jckPeyZGTxaK","executionInfo":{"status":"ok","timestamp":1648651932935,"user_tz":300,"elapsed":25813,"user":{"displayName":"Randy C","userId":"12301342565919601334"}},"outputId":"928dc619-bbb9-40b1-d0f9-37327be3fea6"},"id":"jckPeyZGTxaK","execution_count":1,"outputs":[{"output_type":"stream","name":"stdout","text":["Cloning into 'WAF_ML_Tutorial_Part1'...\n","remote: Enumerating objects: 301, done.\u001b[K\n","remote: Counting objects: 100% (301/301), done.\u001b[K\n","remote: Compressing objects: 100% (197/197), done.\u001b[K\n","remote: Total 301 (delta 139), reused 236 (delta 96), pack-reused 0\u001b[K\n","Receiving objects: 100% (301/301), 195.77 MiB | 16.15 MiB/s, done.\n","Resolving deltas: 100% (139/139), done.\n","Checking out files: 100% (100/100), done.\n"]}]},{"cell_type":"markdown","source":["#### Step 1: Import packages and load data. \n","This is basically the same steps as the last notebooks, but this time we load the same dataframe as the previous notebook."],"metadata":{"id":"T5DOYHZTT3J1"},"id":"T5DOYHZTT3J1"},{"cell_type":"code","execution_count":2,"id":"b9b4e1de","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"b9b4e1de","executionInfo":{"status":"ok","timestamp":1648651938352,"user_tz":300,"elapsed":3120,"user":{"displayName":"Randy C","userId":"12301342565919601334"}},"outputId":"5b1549ae-f1ed-4836-f9c3-b25684f4ca9a"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" q000 q001 q010 q025 q050 q075 q090 q099 \\\n","2018-02-04 16:40:00 -61.31 -59.45 -55.30 -52.59 -47.00 -36.92 -26.560 -4.4163 \n","2018-02-04 16:45:00 -61.90 -59.13 -55.37 -52.52 -47.68 -38.00 -26.753 -6.9400 \n","2018-02-04 16:50:00 -61.16 -58.93 -55.35 -52.55 -48.17 -39.07 -27.013 -8.8400 \n","2018-02-04 16:55:00 -61.63 -58.87 -55.38 -52.68 -48.34 -40.16 -27.030 -8.9163 \n","2018-02-04 17:00:00 -61.67 -58.82 -55.42 -52.81 -48.38 -40.75 -27.240 -9.0526 \n","\n"," q100 event \n","2018-02-04 16:40:00 15.19 Thunderstorm Wind \n","2018-02-04 16:45:00 15.32 Thunderstorm Wind \n","2018-02-04 16:50:00 15.62 Thunderstorm Wind \n","2018-02-04 16:55:00 14.27 Thunderstorm Wind \n","2018-02-04 17:00:00 6.57 Thunderstorm Wind "],"text/html":["\n","
\n"," | q000 | \n","q001 | \n","q010 | \n","q025 | \n","q050 | \n","q075 | \n","q090 | \n","q099 | \n","q100 | \n","event | \n","
---|---|---|---|---|---|---|---|---|---|---|
2018-02-04 16:40:00 | \n","-61.31 | \n","-59.45 | \n","-55.30 | \n","-52.59 | \n","-47.00 | \n","-36.92 | \n","-26.560 | \n","-4.4163 | \n","15.19 | \n","Thunderstorm Wind | \n","
2018-02-04 16:45:00 | \n","-61.90 | \n","-59.13 | \n","-55.37 | \n","-52.52 | \n","-47.68 | \n","-38.00 | \n","-26.753 | \n","-6.9400 | \n","15.32 | \n","Thunderstorm Wind | \n","
2018-02-04 16:50:00 | \n","-61.16 | \n","-58.93 | \n","-55.35 | \n","-52.55 | \n","-48.17 | \n","-39.07 | \n","-27.013 | \n","-8.8400 | \n","15.62 | \n","Thunderstorm Wind | \n","
2018-02-04 16:55:00 | \n","-61.63 | \n","-58.87 | \n","-55.38 | \n","-52.68 | \n","-48.34 | \n","-40.16 | \n","-27.030 | \n","-8.9163 | \n","14.27 | \n","Thunderstorm Wind | \n","
2018-02-04 17:00:00 | \n","-61.67 | \n","-58.82 | \n","-55.42 | \n","-52.81 | \n","-48.38 | \n","-40.75 | \n","-27.240 | \n","-9.0526 | \n","6.57 | \n","Thunderstorm Wind | \n","