Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache for the discovery loading #89

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Conversation

donquixote
Copy link
Collaborator

@donquixote donquixote commented Jan 17, 2025

Sibling MR: https://git.drupalcode.org/project/collabora_online/-/merge_requests/6

Changes in this PR

  • Move the discovery classes to different namespace.
  • Add persistent cache in the discovery fetcher, as a decorator.
  • Add memory cache in the discovery itself (no decorator here)
  • Let the main discovery fetcher set cache metadata (tags, ttl), because this is the correct food chain.

Still missing: Additional tests to cover the caching.

Features:

  • Discovery methods always return up-to-date info after a config change, in the same request.
    • this might not be needed for most uses in production code, but it was easy enough to do and is useful during the kernel tests.

Optional additional changes not in this PR:

  • Convert discovery to a value object and a separate loader, discovery-cache...discovery-cache-and-loader
    • This adds more clutter overall, I don't like it as much.
    • It would be useful if we need the cache metadata outside of the discovery. But this is not the case for now, none of the other Collabora operations that depend on discovery are cached. It might change if we let CoolUtils::canEdit() depend on the discovery.
  • Split the expire vs ttl conversion to separate class or service: discovery-cache-and-loader...discovery-cache-and-loader-and-timestamp-converter
    • The diff shows two different directions, one is a service and the other is a utility class with static methods. Also the param signatures are different.

So.. probably we don't need these optional changes.
I had these changes locally before I simplified, and pushed them instead of discarding.

@donquixote donquixote force-pushed the discovery-cache branch 7 times, most recently from dd904e2 to f8fa3c1 Compare January 22, 2025 18:33
* @return string
* The full contents of discovery.xml.
*
* @throws \Drupal\collabora_online\Exception\CollaboraNotAvailableException
* The client url cannot be retrieved.
*/
public function getDiscoveryXml(): string;
public function getDiscoveryXml(RefinableCacheableDependencyInterface $cacheability): string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are forced to add this because we are separating the cache into a different class. But it feels unneeded.
We should just put the caching inside the fetcher itself. See e.g. \Drupal\media\OEmbed\ResourceFetcher. The caching is coded inside.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I don't see ResourceFetcher adding any cache tags to the cache entry.
It makes sense here because the data is coming from an external source, and does not depend on any Drupal config or content. The $url parameter might be, but that already goes into the $cache_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all this stuff is in one class, CollaboraDiscoveryFetcher. No more passing around of cache metadata.

* {@inheritdoc}
*/
public function getDiscoveryXml(RefinableCacheableDependencyInterface $cacheability): string {
$cached = $this->cache->get($this->cid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a configuration value that determines for how long the xml is cached. We should put a standard value of 5 minutes, which sounds quite sensible.
See https://git.drupalcode.org/project/media_avportal/-/blob/8.x-1.x/src/AvPortalClient.php?ref_type=heads#L109 for a similar approach .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft recommends 12-24 hours TTL for this cache.
(I have not found a recommendation from Collabora Online)
https://learn.microsoft.com/en-us/microsoft-365/cloud-storage-partner-program/online/discovery
But they also mention other options.

12-24 hours is a good cadence to refresh, although in practice it's updated much less frequently.

A more dynamic option is to re-run discovery when proof key validation fails, or when it succeeds using the old key. That implies that the keys have been rotated, so discovery should definitely be re-run to get the new public key.

Lastly, another option is to run discovery whenever one of your machines restart. All these approaches, as well as combinations of them, have been used by hosts in the past. Whichever approach makes the most sense depends on your infrastructure.

I don't know what is the typical circulation period for the proof keys in Collabora Online.

I would be careful with the "when proof key validation fails".
This would also apply to maliciously crafted requests from untrusted sources.
Not sure if we care.

We should put a standard value of 5 minutes, which sounds quite sensible.

In a low traffic scenario, this would mean cold cache on almost every request.

For now I chose the 12h as default, but we can discuss this.

use Drupal\Component\Datetime\TimeInterface;
use Drupal\Core\Cache\Cache;
use Drupal\Core\Cache\CacheableMetadata;
use Drupal\Core\Cache\MemoryCache\MemoryCacheInterface;
use Symfony\Component\ErrorHandler\ErrorHandler;

/**
* Service to get values from the discovery.xml.
*/
class CollaboraDiscovery implements CollaboraDiscoveryInterface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end this service is a value object. Probably a factory would help in retrieving either a fresh or a cached instance of it, otherwise it gets cached in the service container and doesn't get really invalidated.
That's why I would not implement any static caching at the moment. We don't want to have to know here the cache metadata regarding how the xml was retrieved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the "factory" is the *Fetcher->getDiscovery().
The value object is no longer cached (as an object) but the raw xml is cached.

@donquixote donquixote force-pushed the discovery-cache branch 5 times, most recently from 7b8ec0a to 3054303 Compare January 30, 2025 15:22
@donquixote donquixote force-pushed the discovery-cache branch 2 times, most recently from 43eea19 to 8341d85 Compare January 30, 2025 16:18
@donquixote donquixote marked this pull request as ready for review January 30, 2025 18:31
/**
* Creates a discovery value object.
*/
class CollaboraDiscoveryFetcher implements CollaboraDiscoveryFetcherInterface {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing TBD here is the names of the classes.
E.g. instead of CollaboraDiscoveryFetcher we could just have DiscoveryFetcher. Or we could rely on the namespace and name it Fetcher.

To me, the shortname of a class should be sufficiently unique and distinguishable in most places where it is meant to be used, but not necessarily in all.

// This is known to happen when $xml is an empty string.
// Instead we could check for $xml === '' earlier, but we don't know for
// sure if this is, and always will be, the only such case.
throw new CollaboraNotAvailableException('The discovery.xml file is empty.');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the text, we can't assure that is empty. At least given the comment above.

Comment on lines +12 to +13
* Most of this code is copied from openeuropa/oe_showcase, which is published
* under the EUPL license.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?


/** @var \Drupal\Core\Cache\CacheTagsInvalidatorInterface $invalidator */
$invalidator = \Drupal::service(CacheTagsInvalidatorInterface::class);
$invalidator->invalidateTags(['config:collabora_online.settings']);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check after cache TTL period has expired?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with value 0?

fn () => simplexml_load_string($xml),
);
$this->assertNotFalse($parsed_xml, "XML: '$xml'");
return new CollaboraDiscovery($parsed_xml, $cache_tags, $cache_max_age);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 params?

Comment on lines +200 to +201
$this->assertTrue($this->httpClientGetCalls[1][1]['verify']);
$this->assertTrue($this->httpClientGetCalls[1][1]['verify']);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it checking the same array index twice?

try {
$keys = $this->getKeys();
}
catch (CollaboraNotAvailableException $e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check for this case in the tests?

@@ -81,6 +83,13 @@ public function editor(MediaInterface $media, Request $request, $edit = FALSE):
['content-type' => 'text/plain'],
);
}
if ($wopi_client_url === NULL) {
return new Response(
(string) $this->t('The Collabora Online editor/viewer is not available for this file type.'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this. Add a check in the tests?

@@ -1,6 +1,8 @@
cool:
# The address of the COOL server.
server: https://localhost:9980/
# Microsoft recommends 12-24 hours for this cache.
discovery_cache_ttl: 43200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more about this. Can you post the source, please?

),
'#field_suffix' => $this->t('seconds'),
'#default_value' => $cool_settings['discovery_cache_ttl'] ?? 43200,
'#step' => 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants