Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support lazy batching again, support general iterators #76

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

d-maurer
Copy link
Contributor

Fixes #75.

dtml-in's lazy batching was broken by listifying the incoming sequence in order to support some iterators (e.g. dict key views).
This PR replaces the listifying by a sequence_ensure_subscription wrapper ensuring that the returned object has sequence like behavior sufficient for dtml-in. It uses the original object if this already fulfills the requirement, otherwise a wrapper which lazily emulates sequence behavior for an arbitrary iterator.

The main problem with the approach is the recognition whether the incoming sequence already fulfills the requirements. For this it uses the following heuristics:

  • it accesses seq[0]. If this raises IndexError it assumes "sufficiently sequence like"; if it raises AttributeError, TypeError or KeyError it assumes "not sufficiently sequence like"
  • it tries to verify that seq is not a mapping. For this it accesses seq[None]. Most sequence types will raise an exception if the index is not an integer or a slice. The heuristics checks for TypeError, ValueError and RuntimeError (the latter because ZTUtils.Lazy.Lazy used this exception. If one of those exceptions is raised, it assumes "sufficuetly sequence like".

The approach is not very robust as it makes assumptions about the raised exceptions. Nevertheless, the PR relies on explicit exception enumeration because in the context of Zope temporary exceptions can arise which we do not want to catch. I am open for other heuristics.

Copy link
Member

@dataflake dataflake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I don't have any good test case to see what difference it makes.

@d-maurer
Copy link
Contributor Author

but I don't have any good test case to see what difference it makes.

I hope that @sauzher will check this with his large catalog.

@sauzher
Copy link

sauzher commented Sep 30, 2024

@d-maurer
I've checked with ~50.000 objects populated catalog.

It still takes ~7000ms to load. The call that takes time is

this inner call

@d-maurer
Copy link
Contributor Author

d-maurer commented Sep 30, 2024 via email

@d-maurer
Copy link
Contributor Author

d-maurer commented Oct 1, 2024 via email

@icemac icemac removed their request for review October 1, 2024 06:53
@sauzher
Copy link

sauzher commented Oct 1, 2024

@d-maurer You're right, infact I modified my comment just 5 minutes after the post with the right expensive call (

) and indeed it seems that accessing first element sequence[0] costs time.

Zope caching works well, the second time I call the manage_catalogView it renders very fast, but we are always at the same point: loading that zmi page in production the first time (in a load balanced enviroment) could leads to proxyerrors for each zeoclient.

... reading your last comment just right in this moment it seems we are on the same page.

Paying the batch loading time only when the path filter is set is something a manager could take in count but as @drfho pointed out here there are catalogs where path index is not defined and one may wants to check some brains immediately to test catalog consistency (makes sense).

Perhaps the manage_catalogView is to be refactored at least for the first opening call showing only, and only those, the first 20 brains and use an action button (or a path search if it's there) to load next ones. We can still have the page showing how much items are in the catalog by len(lazyMap) call

@d-maurer
Copy link
Contributor Author

d-maurer commented Oct 1, 2024 via email

…bscription avoiding actual item access (as it may be expensive)
@d-maurer
Copy link
Contributor Author

d-maurer commented Oct 1, 2024

I have changed the heuristics to distinguish sequence from mapping subscription to avoid a potentially costly sequence access. It now uses
hasattr(obj, "__getitem__") and hasattr(obj, "__len__") and not hasattr(obj, "get") and not hasattr(obj, "keys") to detect sequence subscription.

@d-maurer d-maurer merged commit f08870f into master Oct 3, 2024
13 checks passed
@d-maurer d-maurer deleted the lazy_batching branch October 3, 2024 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dtml-in batching accesses the complete list
3 participants