-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-528] [Blocked] Improve gdoc rendering - WB Part #304
base: develop
Are you sure you want to change the base?
[SVCS-528] [Blocked] Improve gdoc rendering - WB Part #304
Conversation
else: | ||
|
||
# TODO figure out metadata.raw.get('downloadUrl') | ||
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_export_link(metadata.raw) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that metadata.raw.get('downloadUrl')
is always None
. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of this line is that downloadUrl is always present unless it's a file that that isn't exportable in it's native type, so a .txt will have a downloadUrl, but a .gdoc won't. Google files have frustratingly nonstandard metadata.
@@ -234,9 +234,18 @@ def can_intra_copy(self, | |||
|
|||
metadata = await self.metadata(path, revision=revision) | |||
|
|||
if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true': | |||
download_url = drive_utils.get_alt_export_link(metadata.raw) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw)
4768410
to
38b4e1a
Compare
We can export .gdoc as a pdf if we know its coming from mfr. Added `alt_export` functions and properties to Gdrive utils and metadata. Gdrive download now looks for 'mfr' in its kwargs and will request file as pdf if used properly.
raises key error and update gdrive utils to use `dict.get()` instead of `dict[]`.
38b4e1a
to
a7c70ef
Compare
@@ -234,9 +234,16 @@ def can_intra_copy(self, | |||
|
|||
metadata = await self.metadata(path, revision=revision) | |||
|
|||
if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true': | |||
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, fixed. Need to check metadata.raw.get('downloadUrl')
first. There are file types on gdrive (e.g. .txt
) that do not have export_links
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
if kwargs.get('mfr', '').lower() == 'true':
Also I don't get the or
here.
@@ -234,9 +234,16 @@ def can_intra_copy(self, | |||
|
|||
metadata = await self.metadata(path, revision=revision) | |||
|
|||
if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true': | |||
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
if kwargs.get('mfr', '').lower() == 'true':
Also I don't get the or
here.
if format['mime_type'] == metadata['mimeType']: | ||
return format | ||
for format_type in DOCS_FORMATS: | ||
if format_type.get('mime_type') == metadata.get('mimeType'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata.get('mimeType') should be outside the for loop so it doesn't need to be accessed each iteration. Given that this data structure is tiny so I doubt there will be any perf. difference.
format = get_format(metadata) | ||
return metadata['exportLinks'][format['type']] | ||
format_type = get_format(metadata) | ||
return metadata.get('exportLinks').get(format_type.get('type')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if metadata doesn't have an exportLinks
?
I think some of the stuff here may be better passed as a query param that waterbutler passes through to gdrive transparently, this pr currently puts code that deals with exporting into mfr, I'd like to look at if theres a way to put that logic in mfr |
Ticket
https://openscience.atlassian.net/browse/SVCS-528
MFR counterpart: CenterForOpenScience/modular-file-renderer#303
Purpose
This PR replaces #290. Credit goes to @AddisonSchiller 🎆 🎆 .
Allow MFR to request
.gdoc
files in.pdf
format while keep.docx
format for downloading. This will allow MFR to cut out the expensive unoconv conversion for .gdoc filesWe can export .gdoc as a pdf if we know its coming from mfr. Added
alt_export
functions and properties to Gdrive utils and metadata. Gdrive download now looks for 'mfr' in its kwargs and will request file as pdf if used properly.Changes
WB V1 provider now looks for a
mfr=
query param. Ifmfr=true
, the google drive provider will usealt_export_name
andalt_export_link()
instead of its normal counterparts. This will cause the file being sent to be a.pdf
instead of.docx
.Side effects
None
QA Notes
Test that
.gdoc
rendering works as usual.Deployment Notes
This PR must be merged/deployed at the same time as its MFR counterpart.
This PR should be blocked by @TomBaxter 's v3 update: #276.