-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-426] Update googledrive provider to use googledrive v3 API #276
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style changes + a few questions to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very long line length change needed.
One moving of functions change.
after this, will be good for next phase
@@ -75,10 +74,12 @@ class GoogleDriveProvider(provider.BaseProvider): | |||
NAME = 'googledrive' | |||
BASE_URL = settings.BASE_URL | |||
FOLDER_MIME_TYPE = 'application/vnd.google-apps.folder' | |||
FILE_FIELDS = {'fields': 'version, id, name, size, modifiedTime, createdTime, mimeType, webViewLink, webContentLink, md5Checksum, capabilities(canDelete, canEdit, canTrash, canDownload, canRename, canReadRevisions, canShare, canCopy)'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed.
@@ -140,6 +141,45 @@ def can_intra_move(self, | |||
path=None) -> bool: | |||
return self == other | |||
|
|||
def build_move_url(self, src_path, dest_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you split up can_intra_move
and can_intra_copy
with all the build URL methods.
Move all the can_<do something>
methods so they are all right by each other again (including can_duplicate_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Ready for next phase.
@TomBaxter Let's sit down and do an in-person review for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick look view a few questions. Let's discuss in person @TomBaxter 🎆
@@ -159,7 +159,8 @@ def created_utc(self): | |||
|
|||
@property | |||
def etag(self): | |||
return self.raw['etag'] | |||
# Google Doc revision representations do not return etag | |||
return '{}::{}'.format(self.raw['id'], self.raw['modifiedTime']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we use etag
or its alternative?
etag
tracks both file and metadata change while modifiedTime
only tracks file content change. Can this be a problem? Is it possible to still use v2 API for etag
?
It is possible for file have the different modifiedTime
but the same hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I brought this up with @felliott at the time of writing PR. Can't remember the specifics of the conversation though. Let's revisit with him. Not sure which is the less of two evils. Sub par etag or staying on V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More question for @felliott: What do we use etag
for?
'id': dest_path.parent.identifier | ||
}], | ||
'title': dest_path.name | ||
'name': dest_path.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dropping 'parents'
intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For myself, double check where parents
is retrieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, v2 has a new way of selecting parents.
is_docs_file = drive_utils.is_docs_file(metadata.raw) | ||
# GoogleDrive v3 API does not allow the download of GoogleDoc revisions | ||
# Use v2 API for this functionality | ||
if revision and is_docs_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The if..elif...else...
makes the logic much clearer than metadata.raw.get('downloadUrl') or drive_utils.get_export_link(metadata.raw)
. I used to have a hard time understanding the how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to make the meaning of the four conditions more clear.
@@ -146,6 +149,45 @@ def can_intra_copy(self, | |||
# gdrive doesn't support intra-copy on folders | |||
return self == other and (path and path.is_file) | |||
|
|||
def build_move_url(self, src_path, dest_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many methods here. It is not intuitive to use. Let's discuss if we can make only two util methods: build_v3_url()
and build_v2_url()
? The actions are parsed as arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made them separate for ease of use in tests. Have now re-factored them all out. Complete.
# GoogleDrive v3 API does not allow the download of GoogleDoc revisions | ||
# Use v2 API for this functionality | ||
if revision and is_docs_file: | ||
meta_url = self.build_url('files', path.identifier, 'revisions', revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, let's make a build_url_v2()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do this, however, I was holding off creating a new method for logic that will only be run in one place and likely removed entirely in the future. If it turns out that we want to use v2 for retrieving etag
s then it might make more sense to me to make build_url_v2
it's own thing.
meta_url = self.build_url('files', path.identifier, 'revisions', revision) | ||
meta_url = meta_url.replace('/v3/', '/v2/', 1) | ||
|
||
async with self.request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some difference between the old and the new logic. Let's discuss.
|
||
def _build_upload_metadata(self, folder_id: str, name: str) -> dict: | ||
return { | ||
'parents': [ | ||
{ | ||
'kind': 'drive#parentReference', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to V3 upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it was coded that way previously. I don't believe there was a change in the API. And I don't believe it was coded correctly originally.
@@ -24,6 +24,7 @@ | |||
'type': 'application/vnd.openxmlformats-officedocument.presentationml.presentation', | |||
}, | |||
] | |||
DOCS_MIMES = [format['mime_type'] for format in DOCS_FORMATS] | |||
DOCS_DEFAULT_FORMAT = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add 'mime_type': None
to the default one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked code. mime_type
is only referenced after a check for is_google_doc
, so adding mime_type = ''
for consistency, though it currently will never be referenced.
@@ -685,22 +763,22 @@ def _build_upload_metadata(self, folder_id: str, name: str) -> dict: | |||
resp = await self.make_request( | |||
'GET', | |||
self.build_url('files', | |||
q="'{}' in parents".format(file_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have trashed = false
before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevents returning deleted files.
if drive_utils.is_docs_file(data): | ||
if can_access_revisions: | ||
if data['capabilities'].get('canReadRevisions', None) and data['capabilities']['canReadRevisions']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
if data.get('capabilities', {}).get('canReadRevisions', None):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These do the same thing, but yours is shorter. Change made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. Looks good to me. I will test it fully locally (and make a PR with minor changes if necessary).
[SVCS-426 ] GoogleDrive migration guide https://developers.google.com/drive/v3/web/migration Of particular note: GD v3 API has no method of downloading GoogleDoc revisions. This PR leaves behind GD v2 calls, in order to maintain this functionality as long as possible. GD v3 returns minimal representations of resources. Fields must be specified, to be returned. Many field names changed. Most commonly in the provider: title -> name modifiedDate -> modifiedTime fileSize -> size etags are no longer available from GoogleDrive exportLinks are no longer available in the 'file' representation of GoogleDoc files. GD v3 returns lists of resources as either 'files' or 'revisions' as opposed to GD v2 which returned 'items' for all resource lists.
Update: secondary PR https://github.com/TomBaxter/waterbutler/pull/1 pending. |
[SVCS-426] Style Refactor and Minor Code Change
Update: a second secondary PR https://github.com/TomBaxter/waterbutler/pull/2 pending. |
- gdrive metadata and provider tests
[SVCS-426] Update Style and Trivial Code Change for Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
] | ||
|
||
# Use dummy ID if no revisions found | ||
metadata = await self.metadata(path, raw=True) | ||
# TODO: considering keep using v2 to obtain etag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phase 2: etag
is no longer available through v3
, but we can still use v2
.\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phase 2: please be ware that using id
, modifiedTime
(and even with md5Checksum
) is not equivalent to etag
. Here is a useful discussion I found: https://stackoverflow.com/questions/42174600/alternative-for-etag-field-in-google-drive-v3.
Ticket
SVCS-426
Purpose
Update GoogleDrive provider to v3 of the GoogleDrive API
Changes
Substantial changes to all aspects of the provider.
Of particular note:
GD v3 API has no method of downloading GoogleDoc revisions.
This PR leaves behind GD v2 calls, in order to maintain this
functionality as long as possible.
GD v3 returns minimal representations of resources. Fields must be
specified, to be returned.
Many field names changed. Most commonly in the provider:
title -> name
modifiedDate -> modifiedTime
fileSize -> size
etags are no longer available from GoogleDrive
exportLinks are no longer available in the 'file' representation of
GoogleDoc files.
GD v3 returns lists of resources as either 'files' or 'revisions' as
opposed to GD v2 which returned 'items' for all resource lists.
Side effects
None expected
QA Notes
This provider will need a full test of all functionality.
Deployment Notes
We will need to keep an eye on 'End of Life' announcements for GoogleDrive v2 API. As we have kept v2 call for GoogleDocs revisions downloads.