Near term
- machinedatahub.ai site live
- Metadata fully populated
- Make sure JSON schema is same from dataset to dataset
- Change "Dataset 1, 2" etc to "File 1/2" etc
- rebrand github group/repos to match
- (In Progress) Netlify Open Source plan application submitted
- License
- Code of Conduct at the top level directory of the project repository or prominently in the documentation (with a link in the navigation, footer, or homepage)
- Must feature a link to Netlify service
- (In Progress) Review that all conditions are met, fill out the form and submit
- Nested dataset schema
- each dataset can contain multiple files
- break out per file metrics vs. dataset metrics
- Submit a Dataset fully functioning
- Front end form
- Back end saves suggestion to Github API (preferred) or Postgres
- machine-data-hub published to PyPI
- unit testing runs on every push
- sphinx documentation pushes to readthedocs on tag
- library builds and pushes to PyPI on tag
- release notes section added to sphinx documentation
- Blog functionality added to web app
- blog content can be added to repo in markdown format
- (In Progress) create a getting started page
- Add general step by step process
- Add why people should use it
- Add python package section
- Three documented examples of ML model built from a dataset
- Get working ML model in notebook
- Write blog post tutorial with example
- Get feedback from LM mentors
- Implement feedback from LM mentors and update on website
- (In Progress) UW ML Course students use machine-data-hub as data source for class project
- Talk to UW 416 Course Instructor and send out email about response
- Follow up with Wes for Feedback
- Web app receives a 90+ rating from [lighthouse] (https://developers.google.com/web/tools/lighthouse) for performance
- (In Progress) Fix slow image loading
Longer term
- machine-data-hub CLI does local ETL on at least three of the datasets
- Web App automated end to end testing
- Auth-N (Authentication) implemented
- Up Voting datasets
- mitigation plan for duplicate votes (i.e. require Auth-N to cast a vote)
- Dataset content pre-rendered, only user interaction elements (upvote controls and counts) load after hydration
- machinedatahub analytics (page views, dataset download counts) with Postgres
- User trial with survey and reward to get feedback from potential users (possibly use to incentivize students above)
- External user submits a new dataset
- First pull request merged from non-original team member
- Academic Paper Published
Maybe
- Auth-Z (Authorization) - allow private datasets