-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues for lesson 5 #18
Comments
I am tackling this rewrite now, Points 2-4 I'm happy to address, but how far into the weeds do we want to get with data cleaning? It's about 90% of data analytics, but if we add too much it might be overwhelming. I also notice the name of the lesson ends in 1, is there an expectation of adding a 2 with more in depth information at some point? |
Lesson 6 is Data Cleaning II and has an overview of these topics:
|
So the question for this lesson/problem set pair is do we want to provide additional information in lesson 5 or if we should adjust the assignment to only reflect what is presented already? Comparing what we have in lesson 5 and 6 with the issues @jrmcgarvey brought up I believe the following in lesson 6 would need to be moved without additional editing if we choose to not adjust the assignment.
I believe transformation of a column is already addressed in lesson 5 using
I believe this is a reasonable change, although we may want to remove them from lesson 6 as it would mean nearly half of the lesson would be information already taught. |
Chance, I think you just use your judgement. I think it is appropriate to move stuff around to a more reasonable organization. So I think you may take considerable liberties, e.g. moving lesson objectives to the most appropriate place. One point: This example: df['Age'] = df['Age'].astype(int) is a pretty spare explanation of transformation of a column. That is something that probably should be fixed in lesson 3. There are Series methods that can be used, like astype(), and perhaps most importantly, map(). Then there are Series operations, like df['column'] += 4. And finally there are numpy methods like numpy.sqrt() that operate on a Series. I think we need to explain each. One of my concerns about some of the lessons is they just give one example and move on, without really explaining. I think we need to make sure that students understand, not just emulate. |
The lesson is very short. A lot should be added.
There is no explanation of the handling of outliers, which is required for the assignment.
There is no explanation of the transformation of a column by a function, asfloat() or equivalent, which is required for the assignment.
There is no explanation of the standardization of string appearance, e.g. LA should be Los Angeles, as is required by the assignment.
The text was updated successfully, but these errors were encountered: