Added details to quickstart documentation on missing r-squared value #473

mitevpi · 2018-06-09T20:42:18Z

This PR is in response to #406

Added some more details and explanation on where the value was coming from based on which we are determining which model to use. Linked to the source sklearn .score() documentation rather than diving into details on the quickstart page.

Syncing from original

Syncing latest from DistrictDataLabs head.

bbengfort

@mitevpi thank you for expanding that paragraph; it certainly adds more detail! Just a few typos and a quick suggestion. Thanks again!

bbengfort · 2018-06-11T01:53:54Z

docs/quickstart.rst


 Finally the residuals are colored by training and test set. This helps us identify errors in creating train and test splits. If the test error doesn't match the train error then our model is either overfit or underfit. Otherwise it could be an error in shuffling the dataset before creating the splits.

-Because our coefficient of determination for this model is 0.328, let's see if we can fit a better model using *regularization*, and explore another visualizer at the same time.
+Along with generating the residuals plot, we also measured the peformance, or "scored" our model on the test data above: ``visualizer.score(X_test, y_test)``. Because we used a Linear Regression model, the `scoring consists of finding the R-squared value of the data <http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score>`_, which is a statistical measure of how close the data are to the fitted regression line. The R-squared value of any model may vary slightly between prediction/test runs, however it should generally be comparable. In our case, the R-squared value for this model was only 0.328, suggestion that linear correlation may not be the most appropriate to use for fitting this data. Let's see if we can fit a better model using *regularization*, and explore another visualizer at the same time.


minor typos:

peformance --> performance
Linear Regression --> LinearRegression or "linear regression".
was only 0.328, suggestion --> suggesting

also, how about "we also measured the performance by "scoring" our model on the test data, e.g. the code snippet visualizer.score(X_test, y_test)."?

Thanks for the review! I'll make the changes and re-commit.

bbengfort

Thank you again @mitevpi for working on the documentation - looks great!

mitevpi and others added 8 commits June 7, 2018 14:41

Merge pull request #1 from DistrictDataLabs/develop

b87fe90

Syncing from original

Documentation fixes #425

acea91c

Addressing review comments in #471

c51913d

Merge pull request #2 from DistrictDataLabs/develop

9f681f6

Syncing latest from DistrictDataLabs head.

Updated .gitignore - ignore vscode json pref file

8d5dbbe

Adding r-squared details on quickstart documentation ref #406

4ed0c0e

rst Formatting fix ref #406

aa39c9e

fixed space

23f42cf

bbengfort reviewed Jun 11, 2018

View reviewed changes

Petr Mitev added 2 commits June 12, 2018 20:35

Adressing comments in #473

4c0b3dc

Removed stray quotation mark.

3e93448

bbengfort approved these changes Jun 13, 2018

View reviewed changes

bbengfort merged commit 90355a3 into DistrictDataLabs:develop Jun 13, 2018

mitevpi deleted the quickstart-documentation branch June 13, 2018 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added details to quickstart documentation on missing r-squared value #473

Added details to quickstart documentation on missing r-squared value #473

mitevpi commented Jun 9, 2018

bbengfort left a comment

bbengfort Jun 11, 2018

mitevpi Jun 11, 2018

bbengfort left a comment

Added details to quickstart documentation on missing r-squared value #473

Added details to quickstart documentation on missing r-squared value #473

Conversation

mitevpi commented Jun 9, 2018

bbengfort left a comment

Choose a reason for hiding this comment

bbengfort Jun 11, 2018

Choose a reason for hiding this comment

mitevpi Jun 11, 2018

Choose a reason for hiding this comment

bbengfort left a comment

Choose a reason for hiding this comment