Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop trailing slash of prefix in #get_local_files #425

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

alanbrent
Copy link
Contributor

@alanbrent alanbrent commented Jan 26, 2022

The local filesystem glob in AssetSync::Storage#get_local_files uses fuzzy matching when config.prefix is present. This can present a problem in some cases, as it doesn't allow for distinguishing between (e.g.) a folder called assets/ and another folder called assets-temp/. A situation could arise where the latter folder has thousands/millions of files and we mistakenly publish local-only files, or worse we could clobber another directory in the bucket managed in a completely different context.

This change allows developers to be more specific in their config.prefix by using a trailing slash for their folder name.

The local filesystem glob in `AssetSync::Storage#get_local_files` uses
fuzzy matching when config.prefix is present. This can present a problem
in some cases, as it doesn't allow for distinguishing between (e.g.) a
folder called `assets/` and another folder called `assets-temp/`. A
situation could arise where the latter folder has thousands/millions of
files and we mistakenly publish local-only files, or worse we could
clobber another directory in the bucket managed in a completely
different context.

This change allows developers to be more specific in their
`config.prefix` by using a trailing slash for their folder name.
@alanbrent alanbrent force-pushed the integration/upstream-v2.15.1.1 branch from bbd258a to 26c45ed Compare January 27, 2022 00:47
@PikachuEXE
Copy link
Member

Can you explain this PR in usage aspect?
I am unable to comprehend your description here sorry 😑

@alanbrent
Copy link
Contributor Author

Apologies. I am actually still gathering years-old context, and am attempting to forward-port a patch our org has maintained on top of v1.1.0. It's possible that the patch is not actually necessary. I will circle back in the next few days to find out and update this PR when I've done so. Thanks for taking a look.

@alanbrent alanbrent marked this pull request as draft February 1, 2022 21:29
@obrie
Copy link
Contributor

obrie commented Aug 26, 2024

I believe the reason this is necessary is due to the following scenario:

This is relevant because it would mean that if you don't set the existing_remote_files config to keep, then it would delete files from all directories in the remote bucket with a prefix of assets, such as a folder that might be named assets-temp (and has nothing to do with what we're sync'ing).

  • If you specify a prefix of assets/, it prevents the above scenario but now causes incorrect behavior on upload.

The trailing slash in the asset prefix would mean that the list of local files would include a double slash. For example:

> Dir['tmp//**/**']
=> ["tmp//cache", ...]

The reason this is problematic is because the diff between local_files and remote_files will fail because of the double slash:

local_files_to_upload = local_files - ignored_files - remote_files + always_upload_files
.
remote_files will contain a single slash while local_files will contain a double slash. This would result in files being unnecessarily uploaded. For example:

[14] pry(main)> AssetSync.storage.local_files.select {|f| f =~ /colorbox-2f.*.css$/}
=> ["assets//colorbox-2fc06e452b448d7129291faf21995751702615d91027eec14ef42f0b1c66460d.css"]
[15] pry(main)> AssetSync.storage.remote_files.select {|f| f =~ /colorbox-2f.*.css$/}
=> ["assets/colorbox-2fc06e452b448d7129291faf21995751702615d91027eec14ef42f0b1c66460d.css"]

Even though this file does exist in the remote bucket, it will still be uploaded because of the trailing slash in the asset prefix.


We could avoid this by not using a trailing slash and ensuring that the bucket prefix only ever matches 1 directory, but it's risky and prone to human error. Protecting against this scenario with this PR I think would be better long-term.

Thoughts?

@PikachuEXE
Copy link
Member

Can you guys test this PR for a while and see if it works?
If so I can merge and release (I no longer use it)

@obrie
Copy link
Contributor

obrie commented Aug 27, 2024

We've been using this patch since 2017 on our primary production application. I can confirm it's been working for us since then.

Let me know if there's anything other data / specific tests you'd like to run. Thanks!

@PikachuEXE
Copy link
Member

There should be test cases for get_local_files with different assets_prefix values
Can be merged after adding test cases

@obrie
Copy link
Contributor

obrie commented Aug 28, 2024

Assuming specs look reasonable, I can squash the commits prior to merge.

@PikachuEXE PikachuEXE marked this pull request as ready for review August 28, 2024 22:35
@PikachuEXE
Copy link
Member

squash is not needed, anything else that should be changed/added before I merge?

@obrie
Copy link
Contributor

obrie commented Aug 28, 2024

Nope, all set on our end!

@PikachuEXE PikachuEXE merged commit 9531c16 into AssetSync:master Aug 29, 2024
16 checks passed
@PikachuEXE
Copy link
Member

I should make a new release in a week
Still have to deal with all the CI config update

@obrie
Copy link
Contributor

obrie commented Aug 29, 2024

No worries, thanks!

@PikachuEXE
Copy link
Member

Released in https://rubygems.org/gems/asset_sync/versions/2.19.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants