Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One or multiple resources #7

Open
loleg opened this issue Jul 8, 2019 · 5 comments
Open

One or multiple resources #7

loleg opened this issue Jul 8, 2019 · 5 comments

Comments

@loleg
Copy link

loleg commented Jul 8, 2019

Please clarify the behaviour of the library in respect to having one or multiple resources in the Data Package, i.e. under what conditions to expect the read_datapackage call to directly return the DataFrame vs. an array of them. It seems to me that the latter only works when there are multiple compatible types (CSV or GeoJSON). This is also not very logical, and error-prone when trying to build an application for arbitrary data input.

@rgieseke
Copy link
Owner

rgieseke commented Jul 8, 2019

Can you elaborate? Clarify the documentation?

@loleg
Copy link
Author

loleg commented Jul 8, 2019

I'd be happy to update the doc, but I first would want to make sure that this behavior is "by design".

What about a way to make it more explicit? For example, with a top() function that returns the first resource in the package. Do other datapackage-reader libraries do it similarly, i.e. is this specced somewhere?

@rgieseke
Copy link
Owner

rgieseke commented Jul 8, 2019

An update would be great, I think you described the behaviour correctly - it's definitely not well documented atm and the silent discarding probably can be confusing.

https://github.com/frictionlessdata/datapackage-py has a more general approach where you can/need to iterate over "resources".

@augusto-herrmann
Copy link

There is already a package that supports reading multiple resources of a data package into a Pandas Dataframe. Even though the last commit was in 2017, at first glance it seems to offer more functionality than this one. @danfowler even did a post about it on the Open Knowledge Labs Blog. Would it make sense to merge these efforts?

@rgieseke
Copy link
Owner

@augusto-herrmann Don't know, when i started this tool i wanted something that would quickly load CSVs from a Data Package into Pandas DataFrames. I think the scope of the tableschema tool was more general and requires more knowledge of the DataPackage toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants