Make use of `bulk_create` to avoid scaling issues #11

tbrlpld · 2020-12-11T14:53:46Z

It was pointed out to me by Wagtail core developer @kaedroho that we should consider making use of Django's bulk_create to avoid scaling issues with the automatic redirects.

We are currently using create_or_update (here and here).

I was wondering if the update part of that method is ever needed. If we already have a redirect pointing from an old path to the page object then this should still point to the right page object, even it if its slug is changed again (or the page is moved). Please correct me if I am wrong. Might be worth adding a test to make sure this works.

In that case the redirect don't need updating, we could drop the update part and create only new redirects in an efficient manner.

The text was updated successfully, but these errors were encountered:

dest81 · 2021-01-15T12:23:53Z

@tbrlpld I see the case when update is needed.
Let me try to explain... For example:
We have a page: slug=blog1, path=/blogs/blog1, id=1
change slug blog1->blog3 - this will auto create redirect /blogs/blog1 -> 1
then create a page with the same data (of course id will be different): slug=blog1, path=/blogs/blog1, id=2
and now if you would change slug of the second page (with id=2) it will/should update existing redirect to /blogs/blog1 -> 2

tbrlpld · 2021-01-17T18:19:03Z

@dest81 I understand the case you want to address. I am wondering though, what URL/slug will the page with id 2 have? Because on the client side, the redirect needs to be going from one URL to another.

It seems like the case you are describing would create a redirect loop. URL of page 2 is /blog/blog1, but there is a redirect for this URL to page 2, lookup of the URL of page 2 will result in blog/blog1, ...

This would happen unless Wagtails URL resolution first looks which page would be served and then serve that one and only look through the redirects if no page with the given URL is found. In this case though, we could delete the redirect as it is never used. This of course would be bad from a SEO perspective, because the search engine could have already received a 301 Permanent Redirect (/blogs/blog1-> /blogs/blog3 ).

Maybe it should be prevented to create pages with URLs that match a permanent redirect? This does not seem to be a issue for this package, but rather for Wagtail itself.

tbrlpld · 2021-01-17T18:19:38Z

Also, for updates there is also bulk_update

dest81 · 2021-01-17T19:17:56Z

@dest81 I understand the case you want to address. I am wondering though, what URL/slug will the page with id 2 have? Because on the client side, the redirect needs to be going from one URL to another.

It seems like the case you are describing would create a redirect loop. URL of page 2 is /blog/blog1, but there is a redirect for this URL to page 2, lookup of the URL of page 2 will result in blog/blog1, ...

This would happen unless Wagtails URL resolution first looks which page would be served and then serve that one and only look through the redirects if no page with the given URL is found. In this case though, we could delete the redirect as it is never used. This of course would be bad from a SEO perspective, because the search engine could have already received a 301 Permanent Redirect (/blogs/blog1-> /blogs/blog3 ).

Wagtail serves existing page first even if there is a redirect for with the same url
https://github.com/wagtail/wagtail/blob/master/wagtail/contrib/redirects/middleware.py#L37

Maybe it should be prevented to create pages with URLs that match a permanent redirect? This does not seem to be a issue for this package, but rather for Wagtail itself.

Agree it sounds reasonable on my opinion and as you said it's not an issue of this package. But I think it would be good to remove redirect in case if page with the same url is published. Then preventing creation of page with url which exists in redirects will be up to wagtail.

tbrlpld · 2021-01-18T00:35:46Z

Wagtail serves existing page first even if there is a redirect for with the same url
https://github.com/wagtail/wagtail/blob/master/wagtail/contrib/redirects/middleware.py#L37

I take it then that the update of the redirect is not necessary, since Wagtail will already serve the desired page (id = 2) at the original URL (blog/blog1).

It could be considered to delete the redirect if there is a page served at the original URL. Though again, this also seems like more of a general way for how Wagtail deals with redirects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of `bulk_create` to avoid scaling issues #11

Make use of `bulk_create` to avoid scaling issues #11

tbrlpld commented Dec 11, 2020

dest81 commented Jan 15, 2021

tbrlpld commented Jan 17, 2021

tbrlpld commented Jan 17, 2021

dest81 commented Jan 17, 2021

tbrlpld commented Jan 18, 2021

Make use of bulk_create to avoid scaling issues #11

Make use of bulk_create to avoid scaling issues #11

Comments

tbrlpld commented Dec 11, 2020

dest81 commented Jan 15, 2021

tbrlpld commented Jan 17, 2021

tbrlpld commented Jan 17, 2021

dest81 commented Jan 17, 2021

tbrlpld commented Jan 18, 2021

Make use of `bulk_create` to avoid scaling issues #11

Make use of `bulk_create` to avoid scaling issues #11