Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets Google Sheet #25

Open
cbarnes7 opened this issue Feb 11, 2021 · 5 comments
Open

Datasets Google Sheet #25

cbarnes7 opened this issue Feb 11, 2021 · 5 comments
Assignees

Comments

@cbarnes7
Copy link

@PeterSulcs https://docs.google.com/spreadsheets/d/10uP6UNHjwNCjbushx1TaSFz8pueOpEEjYvJRCtEGwXI/edit?usp=sharing

@PeterSulcs PeterSulcs self-assigned this Feb 11, 2021
@PeterSulcs
Copy link
Contributor

PeterSulcs commented Feb 16, 2021

  • I would recommend breaking out into two tables.
    • one that lists datasets and includes all the meta data and
    • one for the individual files that are associated with that data set.
    • The only exception to this would be cases where the files are drastically different. Download stats can be tracked by file, but most of the characteristics can be tracked for the dataset as a whole.
  • I would recommend tracking highest level source where the dataset came from, I think this might just be column P.
  • In general, the list doesn't seem to be comprehensive,

@cbarnes7
Copy link
Author

@mattsul @pjw901015

@PeterSulcs PeterSulcs assigned mattsul and pjw901015 and unassigned PeterSulcs Apr 13, 2021
@PeterSulcs
Copy link
Contributor

I think this task might be complete, @mattsul and @pjw901015 please review.

@jwaltner
Copy link

Please add this dataset to the list. This seems like it is a fairly large set of hard drive data (~162k units) with quite a few failures that you could try to predict (~1300 failed drives) with 51M days worth of data. On the surface seems like quite a fairly good dataset for multiple uses, including PHM.

https://www.backblaze.com/b2/hard-drive-test-data.html

@jwaltner
Copy link

@cbarnes7 Also, what was decided for the nomenclature for "Date Donated" since this is a listing and the owners of the data did not actually donate it? Date logged / aggregated? I forget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants