Skip to content
This repository has been archived by the owner on Oct 21, 2024. It is now read-only.

Make use of bulk_create to avoid scaling issues #11

Open
tbrlpld opened this issue Dec 11, 2020 · 5 comments
Open

Make use of bulk_create to avoid scaling issues #11

tbrlpld opened this issue Dec 11, 2020 · 5 comments

Comments

@tbrlpld
Copy link
Contributor

tbrlpld commented Dec 11, 2020

It was pointed out to me by Wagtail core developer @kaedroho that we should consider making use of Django's bulk_create to avoid scaling issues with the automatic redirects.

We are currently using create_or_update (here and here).

I was wondering if the update part of that method is ever needed. If we already have a redirect pointing from an old path to the page object then this should still point to the right page object, even it if its slug is changed again (or the page is moved). Please correct me if I am wrong. Might be worth adding a test to make sure this works.

In that case the redirect don't need updating, we could drop the update part and create only new redirects in an efficient manner.

@dest81
Copy link

dest81 commented Jan 15, 2021

@tbrlpld I see the case when update is needed.
Let me try to explain... For example:
We have a page: slug=blog1, path=/blogs/blog1, id=1
change slug blog1->blog3 - this will auto create redirect /blogs/blog1 -> 1
then create a page with the same data (of course id will be different): slug=blog1, path=/blogs/blog1, id=2
and now if you would change slug of the second page (with id=2) it will/should update existing redirect to /blogs/blog1 -> 2

@tbrlpld
Copy link
Contributor Author

tbrlpld commented Jan 17, 2021

@dest81 I understand the case you want to address. I am wondering though, what URL/slug will the page with id 2 have? Because on the client side, the redirect needs to be going from one URL to another.

It seems like the case you are describing would create a redirect loop. URL of page 2 is /blog/blog1, but there is a redirect for this URL to page 2, lookup of the URL of page 2 will result in blog/blog1, ...

This would happen unless Wagtails URL resolution first looks which page would be served and then serve that one and only look through the redirects if no page with the given URL is found. In this case though, we could delete the redirect as it is never used. This of course would be bad from a SEO perspective, because the search engine could have already received a 301 Permanent Redirect (/blogs/blog1-> /blogs/blog3 ).

Maybe it should be prevented to create pages with URLs that match a permanent redirect? This does not seem to be a issue for this package, but rather for Wagtail itself.

@tbrlpld
Copy link
Contributor Author

tbrlpld commented Jan 17, 2021

Also, for updates there is also bulk_update

@dest81
Copy link

dest81 commented Jan 17, 2021

@dest81 I understand the case you want to address. I am wondering though, what URL/slug will the page with id 2 have? Because on the client side, the redirect needs to be going from one URL to another.

It seems like the case you are describing would create a redirect loop. URL of page 2 is /blog/blog1, but there is a redirect for this URL to page 2, lookup of the URL of page 2 will result in blog/blog1, ...

This would happen unless Wagtails URL resolution first looks which page would be served and then serve that one and only look through the redirects if no page with the given URL is found. In this case though, we could delete the redirect as it is never used. This of course would be bad from a SEO perspective, because the search engine could have already received a 301 Permanent Redirect (/blogs/blog1-> /blogs/blog3 ).

Wagtail serves existing page first even if there is a redirect for with the same url
https://github.com/wagtail/wagtail/blob/master/wagtail/contrib/redirects/middleware.py#L37

Maybe it should be prevented to create pages with URLs that match a permanent redirect? This does not seem to be a issue for this package, but rather for Wagtail itself.

Agree it sounds reasonable on my opinion and as you said it's not an issue of this package. But I think it would be good to remove redirect in case if page with the same url is published. Then preventing creation of page with url which exists in redirects will be up to wagtail.

@tbrlpld
Copy link
Contributor Author

tbrlpld commented Jan 18, 2021

Wagtail serves existing page first even if there is a redirect for with the same url
https://github.com/wagtail/wagtail/blob/master/wagtail/contrib/redirects/middleware.py#L37

I take it then that the update of the redirect is not necessary, since Wagtail will already serve the desired page (id = 2) at the original URL (blog/blog1).

It could be considered to delete the redirect if there is a page served at the original URL. Though again, this also seems like more of a general way for how Wagtail deals with redirects.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants