Data Sciencing

Identifying, answering & communicating relevant questions.

"Work that takes more programming skills than most statisticians have, and more statistics skills than a programmer has." - kdnuggets

Know Your ...

Project Domain

Data science is not a stand-alone endeavor. Unless you go into data science research, odds are you will always be working on projects in a domain outside of your expertise. Study up! Be prepared to follow and contribute to conversations that are relevant to your project but outside of your comfort zone.

Context

The work you do, how you do it, and how you work with others will change in each professional context. Are you working in research? With a sales & marketing team? Is the project a long-term project, or a quick turn around? Are you learning & practicing, or performing to a deadline? Will your results be used as advice, or to make the final decision? All of these contextual Considerations will modify the way approach your project.

Team

You will be working with other humans. You will rely on other humans, and they will rely on you. Know your team mates, for better or worse you will be stuck with each other. Be prepared to support each other when needed and ask for help before it's necessary. Failure is more likely to come from between, not within you team members.

Questions

The ultimate purpose of your data analyses is to answer questions relevant to your project's main objective. To do this you need to have well-defined questions that can be explored effectively by data analysis. Your whole team must agree on exactly what questions are being asked, and what qualifies as a satisfactory answer. These questions will act as the central pillar of your investigation and every decision made will have to circle back to the question in one way or another.

Data

Know your data and how it relates to your central question. Where does it come from? How was it collected? What might be missing? How might it be corrupted? Is there extra data? Which dimensions are most relevant to your investigation? What format is should it be in for your analysis? Before moving on to any analysis minimize simplify your data as much as possible.

Strategy

How will you ask the data your question? What's the simplest possible analysis? What are possible pitfalls to your strategy? The less complexity the less room for error, and the easier it will be to find your mistakes when you make them. Identify key milestones in your analysis that can be used for testing and communication.

Tools

Which tool set is best for you question, team, context, and data? Either take the time to learn the chosen tools, or find a way to do the project with tools you do know. Working with unfamiliar tools, techniques, or libraries can not only slow down a project, but is likely to lead to mistakes.

Conclusions

Be prepared to be wrong. or not find anything conclusive!
Keep your conclusions tight and simple, tied directly to what the data says. Be careful not to use the results simply as support for your own ideas. You have to let the data answer you question. Your conclusion should serve only to consolidate what your analysis has uncovered.
Make sure the whole team knows how you understand the findings. It's more than just a friendly thing to do, this will help you all learn and and catch mistakes that evade even the most experienced analysts.

Audience

Communicate to the audience you do have, not the one you'd like. What level of understanding do they have of the domain, context, and data science? What do they want from you; clear, actionable advice? further research questions? When in doubt, ask.

Resources

DataCamp assignments

Infinte Practice

Write-up Guide

General:

Off-DataCamp dev workflow:

Working in terminal (for unix)
- Cmder (unix emulator for windows)
Git: Study this video, practice here
- For saving and versioning your work
Git & GitHub - for sharing and collaborating
- Git?
  - for practicing: "learngitbranching.js.org"
  - for how it works behind the scenes: "THE Git video"
  - for a quick reference: Roger Dudler's handbook
- GitHub?
- cloning, pushing & pulling
- pull requests & merging
- LearnGitBranching
Visual Studio Code: Download
Installing python
Jupyter Notebook

Practice:

Awesome JSON data sets

Analysis & Inference:

descriptive vs inferencial stats
common distributions: article, quick reference
parametric vs non-parametric

Software Design:

Data Science perspectives:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
workshop-1		workshop-1
workshop-2		workshop-2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
common-distributions.pdf		common-distributions.pdf
constraining-for-success.md		constraining-for-success.md
data-camp-assignments.md		data-camp-assignments.md
infinite-practice.md		infinite-practice.md
writeup-guide.md		writeup-guide.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Sciencing

Know Your ...

Resources

About

Releases

Packages

Languages

License

elewa-academy/data-science

Folders and files

Latest commit

History

Repository files navigation

Data Sciencing

Know Your ...

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages