Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2881 url shortener #2358

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open

Conversation

philipkcl
Copy link
Contributor

@philipkcl philipkcl commented Mar 5, 2024

[2881] Make our own short URLs

Please don't delete any sections when completing this PR template; instead enter N/A for checkboxes or sections which are not applicable, unless otherwise stated below

See #

Describe the scope/purpose of the PR here in as much detail as you like

Categorisation

This PR...

  • has scripts to run
  • has migrations to run
  • adds new infrastructure
  • changes the CI pipeline
  • affects the public site
  • affects the editorial area
  • affects the publisher area
  • affects the monitoring

Basic PR Checklist

Instructions for developers:

  • For each checklist item, if it is N/A to your PR check the N/A box
  • For each item that you have done and confirmed for yourself, check Developer box (including if you have checked the N/A box)

Instructions for reviewers:

  • For each checklist item that has been confirmed by the Developer, check the Reviewer box if you agree
  • For multiple reviewers, feel free to add your own checkbox with your github username next to it if that helps with review tracking

Code Style

  • No deprecated methods are used

    • N/A
    • Developer
    • Reviewer
  • No magic strings/numbers - all strings are in constants or messages files

    • N/A
    • Developer
    • Reviewer
  • ES queries are wrapped in a Query object rather than inlined in the code

    • N/A
    • Developer
    • Reviewer
  • Where possible our common library functions have been used (e.g. dates manipulated via dates)

    • N/A
    • Developer
    • Reviewer
  • Cleaned up commented out code, etc

    • N/A
    • Developer
    • Reviewer
  • Urls are constructed with url_for not hard-coded

    • N/A
    • Developer
    • Reviewer

Testing

  • Unit tests have been added/modified

    • N/A
    • Developer
    • Reviewer
  • Functional tests have been added/modified

    • N/A
    • Developer
    • Reviewer
  • Code has been run manually in development, and functional tests followed locally

    • N/A
    • Developer
    • Reviewer
  • Have CSS/style changes been implemented? If they are of a global scope (e.g. on base HTML elements) have the downstream impacts of the change in other areas of the system been considered?

    • N/A
    • Developer
    • Reviewer

Documentation

Release Readiness

Testing

List the Functional Tests that must be run to confirm this feature

  1. ...
  2. ...

Deployment

What deployment considerations are there? (delete any sections you don't need)

Configuration changes

What configuration changes are included in this PR, and do we need to set specific values for production

Scripts

What scripts need to be run from the PR (e.g. if this is a report generating feature), and when (once, regularly, etc).

Migrations

What migrations need to be run to deploy this

Monitoring

What additional monitoring is required of the application as a result of this feature

New Infrastructure

What new infrastructure does this PR require (e.g. new services that need to run on the back-end).

Continuous Integration

What CI changes are required for this

@philipkcl
Copy link
Contributor Author

  • new Goals Urlshort should be added to plausible
  • edges library updated for generate url shorten

Copy link
Contributor

@richard-jones richard-jones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some structural changes, and a question about why a change to the core of edges, and a request to roll that back and submit it as a PR

doajtest/unit/test_lib_urlshort.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code organisation perspective, I'd like this to be a ShortURLService in portality.bll rather than in portality.lib and to follow the usual service pattern

Copy link
Contributor Author

@philipkcl philipkcl Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to portality.bll should be fine.
About usual service pattern,

portality.bll.doaj.shortUrlService() should already resolve circle import problem, class a less flexible and complex than simple module file.

what is the benefit to make service as a class ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of reasons it's a class: one is just style preference, the other is that services can then be instantiated with arguments if needed, to enable alternative behaviours in different contexts (e.g. testing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure what's happening here, is this just an update to the latest edges develop branch? I can see at least one commit from a username I don't recognise in the edges develop branch, that needs to be rolled back and submitted as PR at least, because I don't see how a change to the URL shortener that DOAJ uses requires a core edges change (it shouldn't).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I just update develop branch because I don't know the procedure of update submodule . That user name was my default name.

Reason of changing edges is because generateShortUrl only passing query to url shortener handler , logic of the value that showing in textbox and value that used to generate shortened url are different. That easy to making mismatch in what the user see and what actually generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipkcl philipkcl requested a review from richard-jones March 20, 2024 10:53
Copy link
Contributor

@richard-jones richard-jones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some architectural changes still

@@ -443,6 +460,10 @@ def run_server(host=None, port=None, fake_https=False):
that can help for debugging Plausible
:return:
"""

if app.config.get('DEBUG_DEV_LOG', False):
setup_dev_log()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally clear what this does, do we not get DEBUG level logging just by being in DEBUG mode?

If this is relevant to all DEBUG level stuff, I'd suggest we just use this on DEBUG, but if we do need a new config value, can we also have it in settings.py set to False with a note about what it does?

Copy link
Contributor Author

@philipkcl philipkcl Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will show all log of each modules.
I will add that new variable to settings.

2024-04-19 13:42:23,645 WARN MainProcessMainThread -  * Debugger is active! --- [werkzeug][_log:225]
2024-04-19 13:42:23,645 INFO MainProcessMainThread -  * Debugger PIN: 119-706-544 --- [werkzeug][_log:225]
2024-04-19 13:42:28,030 DEBU MainProcessThread-7 - http://localhost:9200 "GET /doaj-account/_doc/hiJuZXveKy HTTP/1.1" 200 545 --- [urllib3.connectionpool][_make_request:474]
2024-04-19 13:42:28,030 INFO MainProcessThread-7 - GET http://localhost:9200/doaj-account/_doc/hiJuZXveKy [status:200 request:0.001s] --- [elasticsearch][log_request_success:270]
2024-04-19 13:42:28,031 DEBU MainProcessThread-7 - > None --- [elasticsearch][log_request_success:273]

def urlshortService(cls):
from portality.bll.services import urlshort
return urlshort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to follow the pattern of the other services, for code style consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be structured the same as the other services, for code style consistency

Copy link
Contributor Author

@philipkcl philipkcl Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class have a lot of problem because it is stateful which mean high coupling, to make a program more simple I prefer not using class except the design need it.
functions in urlshort are no relation and independence each other.

about code style consistency, if you check javascript files, some components are start defined with class now instead of transitional function. If you read python build-in library, every module implemented in different style but build-in library still maintain very well.

So, Sorry but I will not update it if no other reason.

# global current status of the alias length
alias_len = 6
ALIAS_CHARS = string.ascii_letters + string.digits
alias_lock = threading.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of these globals; each service worker will have their own one and I'm not sue threading locking will affect that, I think they're separate processes (I may be wrong).

I'd suggest just not doing this, and instead start with a minimum alias size and increase it per request if needed. The number of aliases we're going to generate is small, collisions are extremely unlikely, and even if it gets very full the number of retries will never be large.

Set the minimum alias length in settings.py and then we can increase it if we really need to later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will become no logic to defend duplicate alias issues If remove both lock and alias count checking.

CPU have no real Random logic, if the program or computer restarted, the program choice the same seed to generate "random" number, the collision will be happen and those bugs is very difficult to be debug.

I will

  • remove lock
  • remove auto increase aliases length
  • keep alias exit checking

@@ -1371,7 +1370,7 @@ $.extend(true, doaj, {
this.toggleShorten = function(element) {
if (!this.component.shortUrl) {
var callback = edges.objClosure(this, "updateShortUrl");
this.component.generateShortUrl(callback);
this.component.generateShortUrl(this.component.edge.fullUrl(), callback);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at the downstream change to edges that this introduces, and we can't accept it. It changes the URL shortening approach, which will have downstream consequences for other users/projects. I'm not sure why you need to put this fullUrl cal here, edges can figure out the full url and ensure that there are no unwanted elements, so we should just leave this as it was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't merge this change on edges, as noted above: it could cause problems for other users of the library, and takes away edges ability to control the URL. It's not a generic url shortener, it's specifically for creating short versions of edges urls, which means that internally it knows better how to generate them.

@philipkcl philipkcl requested a review from richard-jones May 3, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants