lab12
April 18, 2020
1 Lab 12: Regression
Welcome to Lab 12!
Today we will get some hands-on practice with linear regression. You can find more information
about this topic in section 15.2.
[1]: # Run this cell, bu
...
lab12
April 18, 2020
1 Lab 12: Regression
Welcome to Lab 12!
Today we will get some hands-on practice with linear regression. You can find more information
about this topic in section 15.2.
[1]: # Run this cell, but please don't change it.
# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *
# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
warnings.simplefilter('ignore', UserWarning)
# These lines load the tests.
import otter
grader = otter.Notebook()
1.1 1. How Faithful is Old Faithful? Revisited
Let’s revisit a question from lab 1. Last lab, we investigated Old Faithful, a geyser in Yellowstone
National Park in the central United States. It’s famous for erupting on a fairly regular schedule.
To recap, some of Old Faithful’s eruptions last longer than others. Today, we will use the same
dataset on eruption durations and waiting times to see if we can make predict the wait time from
the eruption duration using linear regression.
The dataset has one row for each observed eruption. It includes the following columns: - duration:
Eruption duration, in minutes - wait: Time between this eruption and the next, also in minutes
Run the next cell to load the dataset.
1[2]: faithful = Table.read_table("faithful.csv")
faithful
[2]: duration | wait
3.6 | 79
1.8 | 54
3.333 | 74
2.283 | 62
4.533 | 85
2.883 | 55
4.7 | 88
3.6 | 85
1.95 | 51
4.35 | 85
… (262 rows omitted)
Remember from last lab that we concluded eruption time and waiting time are positively correlated.
The table below called faithful_standard contains the eruption durations and waiting times in
standard units.
[3]: duration_mean = np.mean(faithful.column("duration"))
duration_std = np.std(faithful.column("duration"))
wait_mean = np.mean(faithful.column("wait"))
wait_std = np.std(faithful.column("wait"))
faithful_standard = Table().with_columns(
"duration (standard units)", (faithful.column("duration") - duration_mean) /
,! duration_std,
"wait (standard units)", (faithful.column("wait") - wait_mean) / wait_std
)
faithful_standard
[3]: duration (standard units) | wait (standard units)
0.0984989 | 0.597123
-1.48146 | -1.24518
-0.135861 | 0.228663
-1.0575 | -0.655644
0.917443 | 1.03928
-0.530851 | -1.17149
1.06403 | 1.26035
0.0984989 | 1.03928
-1.3498 | -1.4
[Show More]