Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for skipping nofollow links #548

Closed
cipriancraciun opened this issue Mar 10, 2022 · 3 comments · Fixed by #572
Closed

Add support for skipping nofollow links #548

cipriancraciun opened this issue Mar 10, 2022 · 3 comments · Fixed by #572
Labels
enhancement New feature or request workaround

Comments

@cipriancraciun
Copy link

cipriancraciun commented Mar 10, 2022

Especially in relation to #78 (but this should apply independently of recursion), it would be nice to have a flag that would instruct lychee to just skip any <a> links that are marked with rel="nofollow", rel="...,nofollow,...", etc., as specified in the HTML standard (https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel).

The reasoning behind this, is say one has a site with lots of user-generated-content (wikis, comments, blogs, etc.), and the code already marks these "untrusted" links with nofollow. At the moment if one runs lychee on such a site, at best it will get loads of errors, and at worst it could actually be used to DOS the targeted sites. (Granted one could use exclusion, but that list would be so long and so often changing, that perhaps an --include-only option would be better.)

However with nofollow, the site administrator can still use lychee to check that important (and trusted) resources and links are still functional, while skipping those untrusted links in the user-generated-content.

@lebensterben
Copy link
Member

yes. this definitely should be made default.

@cipriancraciun cipriancraciun changed the title Add support for skipping nofollow or noindex links Add support for skipping nofollow links Mar 10, 2022
@mre mre added the enhancement New feature or request label Mar 10, 2022
@mre
Copy link
Member

mre commented Mar 10, 2022

Sounds good. We should add support for it.
In case anyone stumbles upon this in the future, there is an --include option as a workaround for now:

--include <include>... URLs to check (supports regex). Has preference over all excludes

@mre
Copy link
Member

mre commented Mar 27, 2022

@cipriancraciun, thanks for the suggestion. Please check out #572 if you find the time.

@mre mre closed this as completed in #572 Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request workaround
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants