You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If http://domain/dir/page1.html contains a link to page2.html the parser interprets this as http://domain/page2.html, correct is http://domain/dir/page2.html.
Furthermore on a page containing references to the upper directories (..), these are changed to . by self.clean_link.
I recommend to use urllib.parse.urljoin(crawling_url, link) to make a link to an absolute URL. This will handle everything except "//" in the path.
The text was updated successfully, but these errors were encountered:
If
http://domain/dir/page1.html
contains a link topage2.html
the parser interprets this ashttp://domain/page2.html
, correct ishttp://domain/dir/page2.html
.Furthermore on a page containing references to the upper directories (
..
), these are changed to.
by self.clean_link.I recommend to use
urllib.parse.urljoin(crawling_url, link)
to make a link to an absolute URL. This will handle everything except "//" in the path.The text was updated successfully, but these errors were encountered: