Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split cleaning logic into functions #102

Closed

Conversation

jugaltheshah
Copy link

@jugaltheshah jugaltheshah commented Jan 5, 2024

Created this PR based on the discussion in #101.

For reference, here are the mappings from the code comments to the 10 items mentioned in the issue. I split into functions the code I found under these comments.

  1. Fetching matching rules - Loop through the rules and match them to the host name
  2. Handle AMP redirects - Check if the match has any amp rules, if not we can redirect
  3. Delete parameters - Delete any matching parameters
  4. Updating path names - Update the pathname if needed
  5. Rebuild URL
  6. Redirecting based on parameter - Redirect if the redirect parameter exists
  7. Decode handler
  8. Empty hashes - Handle empty hash / anchors
  9. Removing empty values - Remove empty values when requested
  10. Checking reduction/difference - Reset the original URL if there is no change, just to be safe;

@jugaltheshah
Copy link
Author

@DrKain please review. This is a first pass solution - purely splitting out the functionality from clean() into functions. Definitely open to feedback & suggestions.

I wasn't able to test as the test array inside test.ts is empty, but please let me know if there are any issues. Also, are you open to me adding e.g. jest and adding some unit tests for the new functions?

@DrKain
Copy link
Owner

DrKain commented Jan 5, 2024

This fails several of the example links in the README section. Did you use copilot for this PR?
Many of the functions do not even return and update the data object, I can't use this.

Input: https://poetsroad.bandcamp.com/?from=search&search_item_id=1141951669&search_item_type=b&search_match_part=%3F&search_page_id=1748155363&search_page_no=1&search_rank=1&search_sig=a9a9cbdfc454df7c2999f097dc8a216b
Clean: https://poetsroad.bandcamp.com/?from=search&search_item_id=1141951669&search_item_type=b&search_match_part=%3F&search_page_id=1748155363&search_page_no=1&search_rank=1&search_sig=a9a9cbdfc454df7c2999f097dc8a216b
New Host: false
0% smaller (0 characters)


Input: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K?plink=GZIIiCHG0Uo5V8ND&ref=a_hp_c9_adblp13nmpxxp13n-mpl-dt-c_1_2&pf_rd_p=164101a8-2aab-4c5e-91ee-1f39e10719e6&pf_rd_r=2Q5R6VH8HJAD48PSQRS4
Clean: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K?plink=GZIIiCHG0Uo5V8ND&ref=a_hp_c9_adblp13nmpxxp13n-mpl-dt-c_1_2&pf_rd_p=164101a8-2aab-4c5e-91ee-1f39e10719e6&pf_rd_r=2Q5R6VH8HJAD48PSQRS4
New Host: false
0% smaller (0 characters)


Input: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1?dchild=1&keywords=Poets+of+the+fall&qid=1621684985&sr=8-1
Clean: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1?dchild=1&keywords=Poets+of+the+fall&qid=1621684985&sr=8-1
New Host: false
0% smaller (0 characters)


Input: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih?si=-k8RwDQwTCK923jxZuy07w&utm_source=copy-link
Clean: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih?si=-k8RwDQwTCK923jxZuy07w&utm_source=copy-link
New Host: false
0% smaller (0 characters)


Input: https://www.aliexpress.com/item/1005001913861188.html?spm=a2g0o.productlist.0.0.b1c55e86sFKsxH&algo_pvid=b4648621-2371-4d1e-9a9c-89b4d6c59395&algo_expid=b4648621-2371-4d1e-9a9c-89b4d6c59395-0&btsid=0b0a556816216865399393168e562d&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_       
Clean: https://www.aliexpress.com/item/1005001913861188.html?spm=a2g0o.productlist.0.0.b1c55e86sFKsxH&algo_pvid=b4648621-2371-4d1e-9a9c-89b4d6c59395&algo_expid=b4648621-2371-4d1e-9a9c-89b4d6c59395-0&btsid=0b0a556816216865399393168e562d&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_       
New Host: false
0% smaller (0 characters)


Invalid URL: null
Missing redirect target: url
Input: https://www.google.com/search?q=cat&source=hp&ei=AwGpYKzyE7uW4-EPy_CnSA&iflsig=AINFCbYAAAAAYKkPE4rmSi0Im0sHgmOVb3DYosyq2B0B&oq=cat&gs_lcp=Cgdnd3Mtd2l6EAMyBQguEJMCMgIILjICCAAyAggAMgIILjICCAAyAggAMgIILjICCC4yAgguOggIABDqAhCPAToLCC4QxwEQowIQkwI6CAguEMcBEKMCUNgEWIQHYMwIaAFwAHgAgAHIAYgB2ASSAQMyLTOYAQCgAQGqAQdnd3Mtd2l6sAEK&sclient=gws-wiz&ved=0ahUKEwjs_9PdrN3wAhU7yzgGHUv4CQkQ4dUDCAY&uact=5
Clean: https://www.google.com/search?q=cat&source=hp&ei=AwGpYKzyE7uW4-EPy_CnSA&iflsig=AINFCbYAAAAAYKkPE4rmSi0Im0sHgmOVb3DYosyq2B0B&oq=cat&gs_lcp=Cgdnd3Mtd2l6EAMyBQguEJMCMgIILjICCAAyAggAMgIILjICCAAyAggAMgIILjICCC4yAgguOggIABDqAhCPAToLCC4QxwEQowIQkwI6CAguEMcBEKMCUNgEWIQHYMwIaAFwAHgAgAHIAYgB2ASSAQMyLTOYAQCgAQGqAQdnd3Mtd2l6sAEK&sclient=gws-wiz&ved=0ahUKEwjs_9PdrN3wAhU7yzgGHUv4CQkQ4dUDCAY&uact=5
New Host: false
0% smaller (0 characters)

@jugaltheshah
Copy link
Author

This fails several of the example links in the README section. Did you use copilot for this PR? Many of the functions do not even return and update the data object, I can't use this.

Input: https://poetsroad.bandcamp.com/?from=search&search_item_id=1141951669&search_item_type=b&search_match_part=%3F&search_page_id=1748155363&search_page_no=1&search_rank=1&search_sig=a9a9cbdfc454df7c2999f097dc8a216b
Clean: https://poetsroad.bandcamp.com/?from=search&search_item_id=1141951669&search_item_type=b&search_match_part=%3F&search_page_id=1748155363&search_page_no=1&search_rank=1&search_sig=a9a9cbdfc454df7c2999f097dc8a216b
New Host: false
0% smaller (0 characters)


Input: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K?plink=GZIIiCHG0Uo5V8ND&ref=a_hp_c9_adblp13nmpxxp13n-mpl-dt-c_1_2&pf_rd_p=164101a8-2aab-4c5e-91ee-1f39e10719e6&pf_rd_r=2Q5R6VH8HJAD48PSQRS4
Clean: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K?plink=GZIIiCHG0Uo5V8ND&ref=a_hp_c9_adblp13nmpxxp13n-mpl-dt-c_1_2&pf_rd_p=164101a8-2aab-4c5e-91ee-1f39e10719e6&pf_rd_r=2Q5R6VH8HJAD48PSQRS4
New Host: false
0% smaller (0 characters)


Input: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1?dchild=1&keywords=Poets+of+the+fall&qid=1621684985&sr=8-1
Clean: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1?dchild=1&keywords=Poets+of+the+fall&qid=1621684985&sr=8-1
New Host: false
0% smaller (0 characters)


Input: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih?si=-k8RwDQwTCK923jxZuy07w&utm_source=copy-link
Clean: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih?si=-k8RwDQwTCK923jxZuy07w&utm_source=copy-link
New Host: false
0% smaller (0 characters)


Input: https://www.aliexpress.com/item/1005001913861188.html?spm=a2g0o.productlist.0.0.b1c55e86sFKsxH&algo_pvid=b4648621-2371-4d1e-9a9c-89b4d6c59395&algo_expid=b4648621-2371-4d1e-9a9c-89b4d6c59395-0&btsid=0b0a556816216865399393168e562d&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_       
Clean: https://www.aliexpress.com/item/1005001913861188.html?spm=a2g0o.productlist.0.0.b1c55e86sFKsxH&algo_pvid=b4648621-2371-4d1e-9a9c-89b4d6c59395&algo_expid=b4648621-2371-4d1e-9a9c-89b4d6c59395-0&btsid=0b0a556816216865399393168e562d&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_       
New Host: false
0% smaller (0 characters)


Invalid URL: null
Missing redirect target: url
Input: https://www.google.com/search?q=cat&source=hp&ei=AwGpYKzyE7uW4-EPy_CnSA&iflsig=AINFCbYAAAAAYKkPE4rmSi0Im0sHgmOVb3DYosyq2B0B&oq=cat&gs_lcp=Cgdnd3Mtd2l6EAMyBQguEJMCMgIILjICCAAyAggAMgIILjICCAAyAggAMgIILjICCC4yAgguOggIABDqAhCPAToLCC4QxwEQowIQkwI6CAguEMcBEKMCUNgEWIQHYMwIaAFwAHgAgAHIAYgB2ASSAQMyLTOYAQCgAQGqAQdnd3Mtd2l6sAEK&sclient=gws-wiz&ved=0ahUKEwjs_9PdrN3wAhU7yzgGHUv4CQkQ4dUDCAY&uact=5
Clean: https://www.google.com/search?q=cat&source=hp&ei=AwGpYKzyE7uW4-EPy_CnSA&iflsig=AINFCbYAAAAAYKkPE4rmSi0Im0sHgmOVb3DYosyq2B0B&oq=cat&gs_lcp=Cgdnd3Mtd2l6EAMyBQguEJMCMgIILjICCAAyAggAMgIILjICCAAyAggAMgIILjICCC4yAgguOggIABDqAhCPAToLCC4QxwEQowIQkwI6CAguEMcBEKMCUNgEWIQHYMwIaAFwAHgAgAHIAYgB2ASSAQMyLTOYAQCgAQGqAQdnd3Mtd2l6sAEK&sclient=gws-wiz&ved=0ahUKEwjs_9PdrN3wAhU7yzgGHUv4CQkQ4dUDCAY&uact=5
New Host: false
0% smaller (0 characters)

Ahh, I missed the examples in the README. Didn't use copilot but wasn't able to test. Let me see what's wrong and fix it up.

@jugaltheshah
Copy link
Author

jugaltheshah commented Jan 5, 2024

@DrKain ok please re-check when you can, found the issue and fixed it:

Input: https://poetsroad.bandcamp.com/?from=search&search_item_id=1141951669&search_item_type=b&search_match_part=%3F&search_page_id=1748155363&search_page_no=1&search_rank=1&search_sig=a9a9cbdfc454df7c2999f097dc8a216b
Clean: https://poetsroad.bandcamp.com/
New Host: false
85.31% smaller (180 characters)


Input: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K?plink=GZIIiCHG0Uo5V8ND&ref=a_hp_c9_adblp13nmpxxp13n-mpl-dt-c_1_2&pf_rd_p=164101a8-2aab-4c5e-91ee-1f39e10719e6&pf_rd_r=2Q5R6VH8HJAD48PSQRS4
Clean: https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K
New Host: false
68.14% smaller (139 characters)


Input: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1?dchild=1&keywords=Poets+of+the+fall&qid=1621684985&sr=8-1
Clean: https://www.amazon.com/Alexander-Theatre-Sessions-Poets-Fall/dp/B08NT852YT/ref=sr_1_1
New Host: false
40.56% smaller (58 characters)


Input: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih?si=-k8RwDQwTCK923jxZuy07w&utm_source=copy-link
Clean: https://open.spotify.com/track/1hhZQVLXpg10ySFQFxGbih
New Host: false
47% smaller (47 characters)


Input: https://www.aliexpress.com/item/1005001913861188.html?spm=a2g0o.productlist.0.0.b1c55e86sFKsxH&algo_pvid=b4648621-2371-4d1e-9a9c-89b4d6c59395&algo_expid=b4648621-2371-4d1e-9a9c-89b4d6c59395-0&btsid=0b0a556816216865399393168e562d&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_
Clean: https://www.aliexpress.com/item/1005001913861188.html
New Host: false
81.47% smaller (233 characters)


Invalid URL: null
Missing redirect target: url
Input: https://www.google.com/search?q=cat&source=hp&ei=AwGpYKzyE7uW4-EPy_CnSA&iflsig=AINFCbYAAAAAYKkPE4rmSi0Im0sHgmOVb3DYosyq2B0B&oq=cat&gs_lcp=Cgdnd3Mtd2l6EAMyBQguEJMCMgIILjICCAAyAggAMgIILjICCAAyAggAMgIILjICCC4yAgguOggIABDqAhCPAToLCC4QxwEQowIQkwI6CAguEMcBEKMCUNgEWIQHYMwIaAFwAHgAgAHIAYgB2ASSAQMyLTOYAQCgAQGqAQdnd3Mtd2l6sAEK&sclient=gws-wiz&ved=0ahUKEwjs_9PdrN3wAhU7yzgGHUv4CQkQ4dUDCAY&uact=5
Clean: https://www.google.com/search?q=cat
New Host: false
90.93% smaller (351 characters)


Input: https://www.emjcd.com/links-i/?d=eyJzdXJmZXIiOiIxMDAzMDQ3Mjg5ODMzODAxMDI6VlBTbFlUN3JBeHpsIiwibGFzdENsaWNrTmFtZSI6IkxDTEsiLCJsYXN0Q2xpY2tWYWx1ZSI6ImNqbyF4aTU5LXZ0Zm1nOTkiLCJkZXN0aW5hdGlvblVybCI6Imh0dHBzOi8vd3d3LnZ1ZHUuY29tL2NvbnRlbnQvbW92aWVzL2RldGFpbHMvTW9vbmxpZ2h0LVNlYXNvbi0xLzEzMzEyMCIsInNpZCI6IltzdWJpZF92YWx1ZV0iLCJ0eXBlIjoiZGxnIiwicGlkIjo5MDExNjczLCJldmVudElkIjoiMGFjZGE1ZDdmNzNlMTFlYzgyYWM3NDliMGExYzBlMGUiLCJjalNlc3Npb24iOiIyZjBjNGNjYi1lNmVmLTQ0YzItYjIzYy02NzNjZjY2MTZlMTYiLCJsb3lhbHR5RXhwaXJhdGlvbiI6MCwicmVkaXJlY3RlZFRvTGl2ZXJhbXAiOmZhbHNlLCJjakNvbnNlbnRFbnVtIjoiTkVWRVJfQVNLRUQifQ%3D%3D
Clean: https://www.vudu.com/content/movies/details/Moonlight-Season-1/133120
New Host: true
88.44% smaller (528 characters)


{
  url: 'https://tomscott.com/',
  info: {
    original: 'https://www.youtube.com/redirect?event=video_description&redir_token=QCFCLUhqbUVVVVc2Vm53OGdFMi15NU1vSzloWkZveGcyUXxBQ3Jtc0tsR143azQxRVpxZ3lUampXUEkyaTdpdy1reU1OVGcyb3pmOUhzU22Ldm5QZ0tueEMzMy1TQTA1Mm85SEpCUW14UHlq11ZCUVlhU3QzdW52U2Uyd01pbTVINDRjNlhf124ySEZqMHBJbnFEWDdiMTNUVQ&q=https%3A%2F%2Ftomscott.com%2F&v=k7fXbdRH9v4',
    reduction: 93.46,
    difference: 300,
    replace: [],
    removed: [],
    match: [ [Object], [Object] ],
    decoded: null,
    is_new_host: true,
    full_clean: true
  }
}
Input: https://www.youtube.com/redirect?event=video_description&redir_token=QCFCLUhqbUVVVVc2Vm53OGdFMi15NU1vSzloWkZveGcyUXxBQ3Jtc0tsR143azQxRVpxZ3lUampXUEkyaTdpdy1reU1OVGcyb3pmOUhzU22Ldm5QZ0tueEMzMy1TQTA1Mm85SEpCUW14UHlq11ZCUVlhU3QzdW52U2Uyd01pbTVINDRjNlhf124ySEZqMHBJbnFEWDdiMTNUVQ&q=https%3A%2F%2Ftomscott.com%2F&v=k7fXbdRH9v4
Clean: https://tomscott.com/
New Host: true
93.46% smaller (300 characters)

PS: TIL you can perform operations such as push on arrays passed to functions, but you can't replace the array entirely:

// Works
function testPassByRef1(a) {
	a.push(4);
	a.push(5);  
}

// Doesn't work!
function testPassByRef2(a) {
  a = ['a', 'b', 'c'];
}


let arr = [1,2,3];
console.log(arr); // [1,2,3]

testPassByRef1(arr);
console.log(arr); // [1,2,3,4,5]

testPassByRef2(arr);
console.log(arr); // [1,2,3,4,5]

@DrKain
Copy link
Owner

DrKain commented Jan 5, 2024

At a glance the amazon link is not being cleaned correctly and empty page anchors will still throw an error.
I appreciate the effort you've put into this PR but I can't merge, there's too many problems with the new functions.

As for the array comment you should return the modified data and update accordingly.

function testPassByRef1(a) {
    a.push(4);
    a.push(5);  
    return a;
}

function testPassByRef2(a) {
    a = ['a', 'b', 'c'];
    return a;
}


let arr = [1,2,3];
console.log(arr); // [ 1, 2, 3 ]

arr = testPassByRef1(arr);
console.log(arr); // [ 1, 2, 3, 4, 5 ]

arr = testPassByRef2(arr);
console.log(arr); // [ "a", "b", "c" ]

This is why your PR is not functioning correctly.

@DrKain
Copy link
Owner

DrKain commented Jan 10, 2024

Just letting you know I'll be away for an indeterminate amount of time. If you want to keep working on this PR you're more than welcome to, I will take another look if/when I return.

@DrKain DrKain mentioned this pull request Jan 13, 2024
@DrKain
Copy link
Owner

DrKain commented Feb 13, 2024

Closing due to broken code, inactivity, lack of communication and largely outdated handlers.
If you'd like to take another shot at this in the future please open a new PR.

@DrKain DrKain closed this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants