You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many documents use relative links like overview.html instead of https://www.example.com/overview.html. It would be useful to convert those into absolute links before the page content is saved.
The first step would be to find each link and to determine if it is absolute or relative. This must cover other protocols besides http and https.
It is possible that a base-URL was set in the document code, which is not the URL of the page just crawled. This has to be found and regarded.
To capture cases like ../../foobar.html the urllib.parse.urljoin function should be used.
The best place for this functionality seems to be an optional feature of the prettify_html function.
The text was updated successfully, but these errors were encountered:
Many documents use relative links like
overview.html
instead ofhttps://www.example.com/overview.html
. It would be useful to convert those into absolute links before the page content is saved.../../foobar.html
theurllib.parse.urljoin
function should be used.The best place for this functionality seems to be an optional feature of the prettify_html function.
The text was updated successfully, but these errors were encountered: