-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add links to alternatives to the readme #1006
Conversation
README.md
Outdated
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/), | ||
which is perfect for quickly reading small files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it returns Array
s, so for this to be perfect, your DelimitedFiles should probably be homogenous w/ respect to types, otherwise you'll get an Matrix{Any}
, which I wouldn't call perfect, even for small files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Maybe something like this:
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/), | |
which is perfect for quickly reading small files. | |
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/). | |
This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogenous element type. | |
On large files, CSV.jl will be much faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more notes:
CSV doesn't return a DataFrame by default, it returns a CSV.File, so we should be careful not to imply it does
DelimitedFiles won't be a stdlib in upcoming Julia versions, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly I shouldn't be writing this... see what you think, DataFrames -> Tables.jl now.
And, I hadn't seen, but JuliaLang/julia#44663 is the proposal to remove. But it will surely be a while, and the package will probably remain the right choice for reading 10 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DelimitedFiles won't be a stdlib in upcoming Julia versions
Well that's slightly horrifying
Co-authored-by: Chris Elrod <[email protected]>
Yeah..........this is mostly fine. I'm hopeful that by the CSV.jl 1.0 release, we can get the time-to-first-read to be really competitive with DelimitedFiles.jl and then I don't think there's really any reason to use it. DLMReader.jl is a great reference because it supports some more exotic parsing configurations if you need to get really custom. I'd prefer we remove the reference to CSVFiles.jl/TextParse.jl; they haven't been updated or had any real work done for a long time, and they don't provide any kind of functionality not supported in CSV.jl. |
Unless you want a matrix not a dataframe, right? There is a place for loading/saving 10 numbers without figuring out any complicated types.
I guess my vote is to then say roughly that. It's useful information if you are trying to figure out how all these packages relate to each other. Otherwise you have to try to infer from the dates... is X not mentioned by Y because it's the newer nice thing which didn't exist when Y's summary was written, or because it's the older attempt at the same which is no longer needed, or just because the authors of X and Y don't get along? |
Yeah, that's fair. I'm fine adding that text then. |
I'm fine either way, but it is probably worth pointing out that this set of packages works well and for some applications not having updates frequently is a huge plus. We are using these packages in my research group extensively, because they work just fine, and not having to deal with updates is a plus for many of our use-cases. And neither package is deprecated. |
* [DLMReader.jl](https://github.com/sl-solution/DLMReader.jl) also aims to be fast for large files, | ||
closely associated with [InMemoryDatasets.jl](https://github.com/sl-solution/InMemoryDatasets.jl). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLMReader.jl is a great reference because it supports some more exotic parsing configurations if you need to get really custom.
Should this link say something like "exotic custom parsing"?
As discussed here https://discourse.julialang.org/t/how-do-i-know-if-a-package-is-good/82133, it might be nice if this package linked to alternative ways to read CSV files. Really all packages should do this, but this one is what you find if you google "csv julia"... and I bet that many people googling that just want DelimitedFiles.
I'm not really qualified to write the list of fancier alternatives, since DelimitedFiles does what I need right now. But @juliohm @sl-solution @chriselrod from discourse thread may have better ideas. And in particular about what most important differences should be mentioned.
I'm especially unqualified to list Python or R alternatives. That's less obviously necessary, but their names are terms people might search for.
Xref JuliaPy/Pandas.jl#87 about Pandas.jl -> DataFrames.jl.