-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176
Comments
@nevion thanks for you feedback. Are you able to propose a solution? |
return this.getContent().values.length, similiar for Series. iloc(index: number) : this.getContent().values[index] also noticed dtypes aren't stored as part of the context - these should be preserved for 0 length data frames |
Thanks @nevion, are you up for getting that in a PR? |
I think one thing that needs evaluation is remove allowing deferred values for construction which is something for instance pandas doesn't do and which is satisfiable by another strategy which is user holding a promise around a dataframe / using an async expression , not having the dataframe deal with that internally. This would remove the checking for deferred values inside the dataframe and allow the trivial implementations above which optimize out well and simplify + optimize to the common case. Wdyt? |
I think it could be a good idea, but I'm worried removing something could break existing code. I suppose you could implement it so that the deferred values are evaluated immediately - would that still result in the desired simplification? |
yea I think so, test if it's a generator vs an array and if generator eval in constructor |
Sounds like it shouldn't be too hard. Do you want to try it and submit a PR? For the first step (evaluating the lazy argument in the constructor) shouldn't require any new tests... just need to keep existing tests passing and try to make sure the interface to the user is the same. No hurry though... take your time with it. |
count requires O(N) and at(offset) - the most frequent type of index - does hashmap lookups.
consider an O(N) for loop becomes O(N^2) and the performance of this sort of loop could have.... The random places in practice one needs the shape of a dataframe are high and shouldn't incur computation or heavy operations.
The text was updated successfully, but these errors were encountered: