Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize backoff handling and improve error handling in api_get_post #13576

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions apigetpost.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,16 @@ def __setitem__(self, key, value):
setattr(self, key, value)

def __getitem__(self, item):
getattr(self, item)
return getattr(self, item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This correction fixes a clear bug. Thank you.



def api_get_post(post_url):
with GlobalVars.api_request_lock:
# Respect backoff, if we were given one
if GlobalVars.api_backoff_time > time.time():
time.sleep(GlobalVars.api_backoff_time - time.time() + 2)
# Use max(0, ...) to ensure no negative sleep times
time.sleep(max(0, GlobalVars.api_backoff_time - time.time() + 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not wrong, the only ways for this to be needed is

  1. if it takes > 2 seconds between the if in the previous line and the call to time.sleep() or
  2. if GlobalVars.api_backoff_time changes between the two statements. I'd have to double-check everywhere else in the code, but we should be using the GlobalVars.api_request_lock to lock any changes to GlobalVars.api_backoff_time, so that shouldn't be an issue.

Overall, I'd probably rather not have this guard against a negative value here and have the normal exception raised when/if the value passed to time.sleep() is negative, because if the value is negative, then something has probably gone wrong somewhere and we should be made aware, by the exception, that there's an issue somewhere, even if that causes a reboot (which would result in a safe recovery).


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with all of the additional blank lines added in this commit, please don't change the style used when making other changes, particularly when not a primary contributor the project. While I wouldn't object to some additional blank lines in this file, the current style in this file uses blank lines very sparingly. Please maintain the current style.

If style changes are to be made, they would ideally be made in a separate commit, even if in the same PR.

Incidental changes would reasonable, but the changes here are a definite change it the style used in this file, which really should be separate.

d = parsing.fetch_post_id_and_site_from_url(post_url)
if d is None:
return None
Expand All @@ -66,9 +68,12 @@ def api_get_post(post_url):
request_url = get_se_api_url_for_route("{}s/{}".format(post_type, post_id))
params = get_se_api_default_params_questions_answers_posts_add_site(site)
response = requests.get(request_url, params=params, timeout=GlobalVars.default_requests_timeout).json()

if "backoff" in response:
# Update backoff time if required
if GlobalVars.api_backoff_time < time.time() + response["backoff"]:
GlobalVars.api_backoff_time = time.time() + response["backoff"]

if 'items' not in response or len(response['items']) == 0:
return False

Expand All @@ -78,6 +83,7 @@ def api_get_post(post_url):
post_data.post_url = parsing.url_to_shortlink(item['link'])
post_data.post_type = post_type
post_data.title = html.unescape(item['title'])

if 'owner' in item and 'link' in item['owner']:
post_data.owner_name = html.unescape(item['owner']['display_name'])
post_data.owner_url = item['owner']['link']
Expand All @@ -86,17 +92,18 @@ def api_get_post(post_url):
post_data.owner_name = ""
post_data.owner_url = ""
post_data.owner_rep = 1

post_data.site = site
post_data.body = item['body']
post_data.body_markdown = item['body_markdown']
post_data.score = item['score']
post_data.up_vote_count = item['up_vote_count']
post_data.down_vote_count = item['down_vote_count']
post_data.creation_date = item['creation_date']
try:
post_data.last_edit_date = item['last_edit_date']
except KeyError:
post_data.last_edit_date = post_data.creation_date # Key not present = not edited

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are making a style change, this blank line is inconsistent with the new style. If you are making a change, please keep it consistent.

post_data.last_edit_date = item.get('last_edit_date', post_data.creation_date) # Use .get() for safer access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a direct equivalent to the try/except block which it is replacing. There's significant discussion in the Python world as to which is "better", for various values of "better". Personally, I prefer the .get(), but not enough to have previously pushed the change, but I wouldn't object to it being changed.

However, I would retain the functional comment that's in the old code, rather than have the comment that's been added here, which is just a comment on how to write code. In other words, the old comment is a reasonable code comment. The added comment is something that would be more appropriate in a commit comment instead of a code comment.


if post_type == "answer":
post_data.question_id = item['question_id']

return post_data
Loading