Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-479] Raise exception for copy/move replace folder that orphans itself. #274

Closed
wants to merge 5 commits into from

Conversation

TomBaxter
Copy link
Member

@TomBaxter TomBaxter commented Sep 25, 2017

Purpose:

SVCS-479
Replace folder from inside folder deletes both.

Changes:

Updated waterbutler/core/provider.py
-- Add def replace_will_orphan
-- Update def move calls replace_will_orphan
-- Update def copy calls replace_will_orphan
Update waterbutler/core/exceptions.py
-- Add class OrphanSelfError
Update tests/providers/osfstorage/test_provider.py
-- Add test test_intra_copy_folder_orphan

Side effects

Folder move/copies that use conflict=replace may now be more costly

@coveralls
Copy link

coveralls commented Sep 25, 2017

Coverage Status

Coverage increased (+0.03%) to 82.969% when pulling baf91c6 on TomBaxter:feature/svcs-479 into 6058fd0 on CenterForOpenScience:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 82.969% when pulling 78e2273 on TomBaxter:feature/svcs-479 into 6058fd0 on CenterForOpenScience:develop.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 82.969% when pulling 78e2273 on TomBaxter:feature/svcs-479 into 6058fd0 on CenterForOpenScience:develop.

@TomBaxter TomBaxter changed the title [RFR] [SVCS-479] Raise exception for copy/move replace folder that orphans itself. [SVCS-479] Raise exception for copy/move replace folder that orphans itself. Oct 5, 2017
@@ -476,6 +485,25 @@ def can_intra_move(self,
await self.delete(src_path)
return data, created

async def replace_will_orphan(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function makes no awaits so it shouldn't need to be async

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete.


This can be an expensive operation. Should be used as last resort. Should be overridden if
provider has a cheaper option.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this can be expensive, is there a possible check you can do here that will quickly tell you if you need to go down the whole chain, ie if

src_path.name != dest_path.name:
   return False

Unsure if this works or if this check is already done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is separate logic/exception, from a different ticket for checking for self overwrite. Perhaps they should be combined.

@@ -381,6 +381,22 @@ def file_lineage():

@pytest.mark.asyncio
@pytest.mark.aiohttpretty
async def test_intra_copy_folder_orphan(provider_and_mock, provider_and_mock2, children_response):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I already asked but putting this here for thought anyway.
Any possibility of moving this to core/test_provider? (feels more complete that way)
If it is too large of a change, go ahead and ignore this and let Longze or Fitz decide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with Fitz. For now going with this for brevity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -193,6 +193,12 @@ def __init__(self, path):
'folder onto itself is not supported.'.format(path))


class OrphanSelfError(InvalidParameters):
def __init__(self, path):
super().__init__('Unable to move or copy \'{}\'. Moving or copying a folder onto parent '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The error message is a bit hard for a user to understand. Not a deal breaker though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is a lot to parse, but not sure how to phrase it in a more clear way that still provides all needed information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rephrase this but failed. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@AddisonSchiller
Copy link
Contributor

AddisonSchiller commented Oct 30, 2017

Forgot to add message to review....

Local testing went fine (just with osfstorage however)
1 change for sure (look at use of async/await)
some rewording and possible moving of test?
1 possible change on orphan function.
Otherwise looks good

@coveralls
Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage decreased (-0.0006%) to 89.103% when pulling 37c8756 on TomBaxter:feature/svcs-479 into 473191c on CenterForOpenScience:develop.

Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have 2 very tiny lines not hit in test coverage.
#L499 and #L248. Not a deal breaker, can be done in next phase of review

Ready for next phase now.

if not dest_path.kind == 'folder':
return False
if src_path.name != dest_path.name:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't hit by test coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed.

conflict == 'replace' and
dest_provider.replace_will_orphan(dest_path, src_path)
):
raise exceptions.OrphanSelfError(src_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line isn't hit by test coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed.

@coveralls
Copy link

coveralls commented Nov 22, 2017

Coverage Status

Coverage decreased (-0.003%) to 89.939% when pulling 71e2c08 on TomBaxter:feature/svcs-479 into 26bf209 on CenterForOpenScience:develop.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and works as expected locally! Please add more tests and add/update the doc str for new methods. Thanks! 🔥 🔥

@@ -208,6 +208,12 @@ def __init__(self, path):
'folder onto itself is not supported.'.format(path))


class OrphanSelfError(InvalidParameters):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc str for this exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed.

src_path: wb_path.WaterButlerPath) -> bool:
"""Check if copy/move conflict=replace will orphan src_path.

This can be an expensive operation. Should be used as last resort. Should be overridden if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a test, this is not as bad as it seems. Only when there is a nested structure like the following will trigger iteration on each level (for both valid and invalid move and copy).

nested_folders

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -342,6 +342,24 @@ class TestIntraCopy:
src_mock.validate_v1_path.assert_not_called()


@pytest.mark.asyncio
@pytest.mark.aiohttpretty
async def test_intra_copy_folder_orphan(self, provider_and_mock, provider_and_mock2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @AddisonSchiller , this test only covers one situation when it is an invalid copy. Can you add tests to cover the following for both copy and move? So in total 4 * 2 = 8.

  • invalid one
  • return false if is not folder
  • return false if name is not the same
  • loop breaks if materialized path is not the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved tests to core/provider and added move test for invalid.
Added single tests for 'not folder', 'not same name', and 'loop break' since they aren't running through copy/move.
Completed.

@@ -193,6 +193,12 @@ def __init__(self, path):
'folder onto itself is not supported.'.format(path))


class OrphanSelfError(InvalidParameters):
def __init__(self, path):
super().__init__('Unable to move or copy \'{}\'. Moving or copying a folder onto parent '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rephrase this but failed. 👍

@coveralls
Copy link

coveralls commented Dec 11, 2017

Coverage Status

Coverage increased (+0.03%) to 89.806% when pulling 9b3b8b7 on TomBaxter:feature/svcs-479 into 6249a7a on CenterForOpenScience:develop.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and move to PCR 🎆 🎆 !

@felliott
Copy link
Member

There seems to be a lot of overlap between this and #270 so they may end up getting merged into one super-PR. Leaving both open for now for reference until a final PR is merged.

AddisonSchiller and others added 5 commits December 19, 2017 14:01
Fixing an issue where when a child component is hooked up
To a project, and they share the same provider, it is
possible to silently delete files
[#SVCS-479] Replace folder from inside folder deletes both

Add replace_will_orphan(src_path,dest_path)
Checks if dest_path overwrite will orphan src_path
Call replace_will_orphan in copy/move defs in waterbutler.core.provider
SVCS-479

Move three pieces of overwrite conflict logic from move/copy into
handle_naming and rename handle_naming to handle_conflict. Improved
logic and reduced code.
@coveralls
Copy link

coveralls commented Dec 19, 2017

Coverage Status

Coverage increased (+0.06%) to 89.829% when pulling 4f985e3 on TomBaxter:feature/svcs-479 into 3ec764a on CenterForOpenScience:develop.


if (
self == src_provider and
conflict == 'replace' and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean operators should be at the start of new lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no

if not dest_path.kind == 'folder':
return False
if src_path.name != dest_path.name:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more readable x != y rather than not x == y; It'll save a few bytes, (I know we're pressed for disk), but we should probably be a little more consistent about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we have a 4th one, which is used here ... Instead of doing str comparison, we should use one of the other three which is of type bool.

    @property
    def kind(self) -> str:
        """ Returns `folder` if the path represents a folder, otherwise returns `file`. """
        return 'folder' if self._is_folder else 'file'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like its a private method because it's prefixed with an underscore. Should this be private, or can consuming code rely on self._is_folder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore this for now. It's everywhere in the code base, not only for WB but also for OSF and MFR ...

dest_path = WaterButlerPath('/folder1/',
_ids=['root','folder'],
folder=True)
assert provider1.replace_will_orphan(src_path, dest_path) == False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test that covers the case the replace will orphan; i.e assert provider1.replace_will_orphan(src_path, dest_path) == True?

self.shares_storage_root(src_provider) and
src_path.materialized_path == dest_path.materialized_path
):
raise exceptions.OverwriteSelfError(src_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that OverwriteSelf is a degenerate case of OrphanSelf, why do we need to check twice and/or can these checks be combined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src_path has an is_dir attribute. Does dest_path too? If there isn't there anything other than path or dir that it can be it should be possible for line 405 to just be if dest_path.is_dir:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Both dest_path and src_path are instances of WaterButlerPath, which has is_dir, is_folder and is_file. I don't know why we have three when we only need one. Let's choose one and be consistent at least within each provider. For new providers, I use is_folder.
    @property
    def is_dir(self) -> bool:
        """ Returns `True` if the path represents a folder. """
        return self._is_folder

    @property
    def is_folder(self) -> bool:
        return self._is_folder

    @property
    def is_file(self) -> bool:
        """ Returns `True` if the path represents a file. """
        return not self._is_folder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For the two exception, my understanding is that they handle different cases.

    • OverwriteSelfError handles the case when a move/copy's dest_path is exactly the same as the src_path.
    • OrphanSelfError handles the case when src_path and dest_path are different but dest_path happens to be one of the parent folders of the src_path up in the tree.
  • My suggestion is that we keep the two separate. If necessary, please update the docstr for both class and maybe a few comments on what each checks does if necessary.

raise exceptions.OverwriteSelfError(src_path)

# files and folders shouldn't overwrite themselves
# TODO: how is this different from above will_self_overwrite call?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the todo? the former deals specifically with the case the destination is a dir, and the original will be kept?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please!


if handle_conflicts:

dest_path = await dest_provider.handle_conflicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace?

dest_path = await dest_provider.handle_naming(
if handle_conflicts:

dest_path = await dest_provider.handle_conflicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace

@cslzchen cslzchen removed the Add'l Dev label Apr 4, 2018
Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@birdbrained Thanks for the initial review. If you are taking this PR over, here are a few things:

  • Improve the code style, control/condition statement and consistency the way you prefer. I am always 👍 x 💯 for it.

  • For OverwriteSelfError and OrphanSelfError, I suggest keep them separate based on my understanding. However, feel free to change it if you have a good reason that I am not aware of.

self.shares_storage_root(src_provider) and
src_path.materialized_path == dest_path.materialized_path
):
raise exceptions.OverwriteSelfError(src_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Both dest_path and src_path are instances of WaterButlerPath, which has is_dir, is_folder and is_file. I don't know why we have three when we only need one. Let's choose one and be consistent at least within each provider. For new providers, I use is_folder.
    @property
    def is_dir(self) -> bool:
        """ Returns `True` if the path represents a folder. """
        return self._is_folder

    @property
    def is_folder(self) -> bool:
        return self._is_folder

    @property
    def is_file(self) -> bool:
        """ Returns `True` if the path represents a file. """
        return not self._is_folder

if not dest_path.kind == 'folder':
return False
if src_path.name != dest_path.name:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we have a 4th one, which is used here ... Instead of doing str comparison, we should use one of the other three which is of type bool.

    @property
    def kind(self) -> str:
        """ Returns `folder` if the path represents a folder, otherwise returns `file`. """
        return 'folder' if self._is_folder else 'file'

self.shares_storage_root(src_provider) and
src_path.materialized_path == dest_path.materialized_path
):
raise exceptions.OverwriteSelfError(src_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For the two exception, my understanding is that they handle different cases.

    • OverwriteSelfError handles the case when a move/copy's dest_path is exactly the same as the src_path.
    • OrphanSelfError handles the case when src_path and dest_path are different but dest_path happens to be one of the parent folders of the src_path up in the tree.
  • My suggestion is that we keep the two separate. If necessary, please update the docstr for both class and maybe a few comments on what each checks does if necessary.

raise exceptions.OverwriteSelfError(src_path)

# files and folders shouldn't overwrite themselves
# TODO: how is this different from above will_self_overwrite call?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please!

@NyanHelsing
Copy link
Contributor

New PR #333

@cslzchen
Copy link
Contributor

This PR is replaced and closed.

@cslzchen cslzchen closed this Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants