Skip to content

Commit

Permalink
Generically handle some inaccessible links
Browse files Browse the repository at this point in the history
  • Loading branch information
Josh-Cena committed Aug 8, 2024
1 parent 04689ff commit 85440f1
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 29 deletions.
30 changes: 4 additions & 26 deletions config/inaccessible-links.txt
Original file line number Diff line number Diff line change
@@ -1,33 +1,11 @@
Links in this file will be ignored by the link checker, usually because they are behind auth or crawler checks.
Note that known firewall pages and redirections to login are handled generically in the link checker.
Use two-space indent for comments. Use {...} to embed regex.

Cloudflare protection (response status 403 and returned HTML contains "Just a moment..."):
https://codepen.io{/?}
https://gitlab.com/projects/new
https://help.glitch.com/hc/{.*}
https://journals.sagepub.com/doi/{.*}
https://linux.die.net/man/{.*}
https://live.browserstack.com/dashboard
https://onlinelibrary.wiley.com/doi/{.*}
https://pixabay.com/
https://www.browserstack.com/{(users|accounts)/.*}
https://www.cloudflare.com/{.*}
https://www.researchgate.net/publication/{.*}
https://www.udemy.com/{(topic|course)/.*}

Other kinds of firewall:
Custom firewalls:
https://www.canva.com/colors/color-wheel/
https://www.openwebanalytics.com{/?}
https://www.techopedia.com/definition/{.*}
https://www.webpagetest.org{/?}
https://www.reddit.com/r/{.*}

Goes to login:
https://cloud.mongodb.com/v2
https://console.cloud.google.com/{.*}
https://docs.google.com/drawings
https://shell.cloud.google.com/{.*}
https://sites.google.com/{.*}
https://github.com/new
https://github.com/{.*}/issues/new{.*}
https://github.com/orgs/mdn/teams{.*}
404 on purpose:
https://konmari.com/404
2 changes: 1 addition & 1 deletion src/server/create-graph.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ const allowedSpacedCodeLink = [
// HTTP status
/^\d+ [\w '-]+$/,
// HTTP header
/^(Cache-Control|Clear-Site-Data|Connection|Content-Length|Content-Security-Policy|Cross-Origin-Opener-Policy|Cross-Origin-Resource-Policy|Feature-Policy|Permissions-Policy|Sec-Purpose|Transfer-Encoding): ([\w-]+|"[\w-]+")$/,
/^(Cache-Control|Clear-Site-Data|Connection|Content-Length|Content-Security-Policy|Cross-Origin-Opener-Policy|Cross-Origin-Resource-Policy|Expect|Feature-Policy|Permissions-Policy|Sec-Purpose|Transfer-Encoding): ([\w-]+|"[\w-]+")$/,
// MIME
/^[a-z]+\/[\w+-]+; [a-z]+=("[\w ,.-]+"|\w+);?$/,
// Macro calls
Expand Down
16 changes: 14 additions & 2 deletions src/server/process-warnings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -133,20 +133,32 @@ async function checkLink(href: string) {
};
}
}
} else if (res.status === 403) {
const text = await res.text();
// Cloudflare firewall & similar
if (
text.includes("<title>Just a moment...</title>") ||
text.includes("Verify you are human")
) {
return {
type: "ok",
};
}
}
return {
type: "error status",
data: res.status,
};
}
if (res.url !== href) {
const resURL = new URL(res.url);
const hrefURL = new URL(href);
if (
// Allow root URLs even if the root URL goes elsewhere
(hrefURL.pathname === "/" && res.url.startsWith(href)) ||
// Allow if the only change is addition of queries
resURL.href === hrefURL.href && hrefURL.search === ""
hrefURL.href === res.url.split("?")[0] ||
// Allow redirection to login
/\/(login|signin)\b/.test(res.url)
) {
return {
type: "ok",
Expand Down

0 comments on commit 85440f1

Please sign in to comment.