Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new option to run just new jobs or jobs without history #831

Merged
merged 1 commit into from
Jan 5, 2025

Conversation

nille02
Copy link
Contributor

@nille02 nille02 commented Nov 10, 2024

I have the issue that i have some url.yaml files with hundreds to thousands of jobs. A full run can take hours and i do that in general just once a month or even less.
Before that option i checked for the new ids and run them manual. but it was quite the chore and i looked how i can avoid that.

My only issue is that get_history_data() is quite slow but i guess sqllite is the reason?

@nille02 nille02 marked this pull request as ready for review November 10, 2024 12:34
Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/command.py Outdated Show resolved Hide resolved
lib/urlwatch/config.py Outdated Show resolved Hide resolved
@nille02 nille02 force-pushed the run-jobs branch 3 times, most recently from ae415cc to c3e02a9 Compare November 25, 2024 08:13
lib/urlwatch/storage.py Outdated Show resolved Hide resolved
@nille02
Copy link
Contributor Author

nille02 commented Nov 26, 2024

I added the redis version of has_history_data() to it but it just wraps around get_history_data() since redis is already fast.

For this a new cache.db was used and a redis server that runs on a debian stable vm.

Redis          Benchmark: 864 Jobs took 0:00:00.680740
sqllite        Benchmark: 864 Jobs took 0:00:30.835610
sqllite Cached Benchmark: 864 Jobs took 0:00:00.322071

The time is just the loop for all jobs over has/get_history_data()

@nille02 nille02 requested a review from thp November 26, 2024 19:20
Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

lib/urlwatch/config.py Outdated Show resolved Hide resolved
lib/urlwatch/storage.py Outdated Show resolved Hide resolved
lib/urlwatch/storage.py Outdated Show resolved Hide resolved
lib/urlwatch/storage.py Outdated Show resolved Hide resolved
@nille02 nille02 requested a review from thp December 11, 2024 11:36
Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Please also add a changelog entry to the PR.

@nille02
Copy link
Contributor Author

nille02 commented Jan 1, 2025

Just a entry in the CHANGELOG.md? If yes its done.

And everyone i hope you all had a happy new year.

Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the updates!

@thp
Copy link
Owner

thp commented Jan 5, 2025

@nille02 Can you fix up the last remaining code style issues?

----------------------------- Captured stdout call -----------------------------
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/command.py:151:36: E226 missing whitespace around arithmetic operator
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:608:89: E231 missing whitespace after ','
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:609:115: W504 line break after binary operator
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:610:121: E261 at least two spaces before inline comment
=========================== short test summary info ============================

After that, this is ready to merge :)

@nille02
Copy link
Contributor Author

nille02 commented Jan 5, 2025

Should be fixed now

@thp
Copy link
Owner

thp commented Jan 5, 2025

There's still 3 more:

/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:608:89: E231 missing whitespace after ','
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:608:108: W291 trailing whitespace
/home/runner/work/urlwatch/urlwatch/lib/urlwatch/storage.py:609:114: W291 trailing whitespace

(you can run this locally to verify it's fixed, no need to wait for CI -- use pytest -v after installing dependencies, see https://github.com/thp/urlwatch/blob/master/.github/workflows/unit-tests.yml)

New command-line option `--prepare-jobs` to initialize new jobs or jobs without history

Fix some typos

Fix Again
@nille02
Copy link
Contributor Author

nille02 commented Jan 5, 2025

Now it should be done.

I did try pytest but i got some different error and i guess its due to windows. So had to skip the documentation test but pep8 was happy now.

@thp thp merged commit b601917 into thp:master Jan 5, 2025
6 checks passed
@thp
Copy link
Owner

thp commented Jan 5, 2025

Thanks for working on this and seeing it through! Merged :)

@nille02 nille02 deleted the run-jobs branch January 5, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants