Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Simplified RequestQueueV2 implementation #2775

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

Conversation

janbuchar
Copy link
Contributor

@janbuchar janbuchar commented Dec 17, 2024

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Dec 17, 2024
@github-actions github-actions bot added this to the 105th sprint - Tooling team milestone Dec 17, 2024
@janbuchar janbuchar marked this pull request as draft December 17, 2024 14:59
Comment on lines 281 to 299
const giveUpLock = async (id?: string, uniqueKey?: string) => {
if (id === undefined) {
return;
}

try {
await this.client.deleteRequestLock(id);
} catch {
this.log.debug('Failed to delete request lock', { id, uniqueKey });
}
};

// If we tried to read new forefront requests, but another client appeared in the meantime, we can't be sure we'll only read our requests.
// To retain the correct queue ordering, we rollback this head read.
if (hasPendingForefrontRequests && headData.hadMultipleClients) {
this.log.debug(`Skipping this read - forefront requests may not be fully consistent`);
await Promise.all(headData.items.map(({ id, uniqueKey }) => giveUpLock(id, uniqueKey)));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barjin I'm pretty sure this is equivalent to the previous version, but please check it.

/**
* @inheritDoc
*/
override async isFinished(): Promise<boolean> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drobnikj I didn't remove the inheritance from RequestProvider completely, just overwrote this method. I don't think there's any other stuff in RequestProvider that could cause any trouble, but feel free to prove me wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote, I would remove original implementation from Provider do not to confuse future developers.

Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 💪

I would do some testing myself, but the first what about some unit tests, did you consider,add some? There are none -> https://github.com/apify/crawlee/blob/03951bdba8fb34f6bed00d1b68240ff7cd0bacbf/test/core/storages/request_queue.test.ts
Honesly, we are dealing with various bugs during time and we do not have any tests for these features still.

packages/core/src/storages/request_provider.ts Outdated Show resolved Hide resolved
Comment on lines +293 to +298
// If we tried to read new forefront requests, but another client appeared in the meantime, we can't be sure we'll only read our requests.
// To retain the correct queue ordering, we rollback this head read.
if (hasPendingForefrontRequests && headData.hadMultipleClients) {
this.log.debug(`Skipping this read - forefront requests may not be fully consistent`);
await Promise.all(headData.items.map(async ({ id, uniqueKey }) => giveUpLock(id, uniqueKey)));
}
Copy link
Contributor

@barjin barjin Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about those scenarios during the last batch of forefront fixes, but I considered them too edge case-y to handle them.

I'm not sure about these changes, though - let's say two clients are using the queue. Won't this cause any forefront request to be locked and then immediately unlocked? headData.hadMultipleClients doesn't iirc say "a new client just appeared", once it's set, it's never false again, no? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered them too edge case-y to handle them

My point here: I'd be happy not to do this at all. Both the clients are using the same queue - as a user, I'd be fine with some request intermingling.

@drobnikj
Copy link
Member

drobnikj commented Jan 6, 2025

The build did not finish, can you check @janbuchar ?
I would like to test it in some Actors.

@janbuchar
Copy link
Contributor Author

The build did not finish, can you check @janbuchar ? I would like to test it in some Actors.

I can, but only later this week - I have different stuff to finish first.

@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jan 22, 2025
@janbuchar
Copy link
Contributor Author

@drobnikj the unit tests are now passing so you should be able to build. I'm still working on some e2e tests, if you have any ideas for scenarios to test (e2e, unit, doesn't matter), I'd love to hear those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utilize queueHasLockedRequests to simplify RequestQueue v2
3 participants