Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API for downloading LDML files from projects that allow sharing WS data #1309

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

rmunn
Copy link
Contributor

@rmunn rmunn commented Dec 9, 2024

Fixes #1306.

To test locally, you'll need to get a project with addToSldr="true" in its writing system configuration. None of our default test projects have that, but you can add that with the following method:

  • kubectl --context=docker-desktop exec -it deploy/hg -- bash
  • Inside the container, cd /var/hg/repos/s/sena-3 or /e/elawa-dev-flex
  • hg co tip to populate the repo's working directory
  • edit the CachedSettings/SharedSettings/LexiconSettings.plsx file
  • change the <WritingSystems> line to <WritingSystems addToSldr="true">
  • hg commit -m "Add to SLDR"
    • You might have to run hg config --edit to set your username
  • hg co null to empty the working directory now that the commit is in the repo
  • Exit kubectl, go to browser
  • Go to http://localhost/api/project/getLdmlZip

If all went well, the zipfile should contain a folder named for a project ID at the root. If you get a 22-byte zip file that's empty, then something went wrong.

Currently the directory structure for each project looks like this:

{project ID}/CachedSettings/WritingSystemStore/lots of .ldml files

Do we want to change that to just this?

{project ID}/lots of .ldml files

Discussion items

  • What should the API endpoint be called?
  • Should it be GET or POST? We don't want GoogleBot or other crawlers archiving the link (hopefully the AdminRequired attribute will prevent that).
  • What permissions should be use? AdminRequired, or something else?
  • Flatten directory structure by removing CachedSettings/WritingSystemStore/?

@rmunn rmunn self-assigned this Dec 9, 2024
hgweb/command-runner.sh Outdated Show resolved Hide resolved
@rmunn
Copy link
Contributor Author

rmunn commented Dec 9, 2024

Unfortunately it looks as though we won't be able to use ZipArchive to add files to the output stream asynchronously as they come in: we'd probably run into this issue where ZipArchive tries to write the end-of-zipfile directory synchronously when it's disposed, but Kestrel doesn't allow synchronous writes to its HTTP output stream. So we might have to extract everything, then send the zip file separately. Which means that the HTTP connection might time out, so it's possible we'll have to make two API endpoints, one to prepare the zip file and one to download it. Testing required.

Will return 403 Forbidden if project does not allow sharing ws data with
SLDR. Will also return same 403 Forbidden error code if project does not
exist, to avoid possibly leaking project codes.

If project exists and allows data sharing, command will return a zipfile
containing CachedSettings/WritingSystems/*.ldml from the tip revision.
@rmunn rmunn force-pushed the feat/api-for-ldml-zips branch from 8888014 to ae60963 Compare January 15, 2025 16:09
@rmunn
Copy link
Contributor Author

rmunn commented Jan 15, 2025

Rebased on top of current develop since it's been a month since this was started, and a lot has changed on the develop branch in that time.

rmunn added 2 commits January 15, 2025 14:11
Alas, this fails (when a ProjectController method is added to call
PrepareLdmlZip) with "System.InvalidOperationException: Synchronous
operations are disallowed. Call WriteAsync or set AllowSynchronousIO to
true instead."

We'll have to change this to prepare the entire zip file first, then
send it.
This is less efficient, but the only method that actually works given
the current state of the ZipFile / ZipArchive code in .NET.
Copy link

github-actions bot commented Jan 15, 2025

C# Unit Tests

104 tests  ±0   101 ✅  - 3   5s ⏱️ ±0s
 16 suites ±0     0 💤 ±0 
  1 files   ±0     3 ❌ +3 

For more details on these failures, see this check.

Results for commit f8b0dba. ± Comparison against base commit bb51178.

♻️ This comment has been updated with latest results.

@rmunn
Copy link
Contributor Author

rmunn commented Jan 15, 2025

Now working; marking as ready for review. Not ready to merge yet, as there are several discussion items (marked with TODO comments) in this PR, such as what the permission structure should be. But it's tested and working.

To test locally, you'll need to get a project with addToSldr="true" in its writing system configuration. None of our default test projects have that, but you can add that with the following method:

  • kubectl --context=docker-desktop exec -it deploy/hg -- bash
  • Inside the container, cd /var/hg/repos/s/sena-3 or /e/elawa-dev-flex
  • hg co tip to populate the repo's working directory
  • edit the CachedSettings/SharedSettings/LexiconSettings.plsx file
  • change the <WritingSystems> line to <WritingSystems addToSldr="true">
  • hg commit -m "Add to SLDR"
    • You might have to run hg config --edit to set your username
  • hg co null to empty the working directory now that the commit is in the repo
  • Exit kubectl, go to browser
  • Go to http://localhost/api/project/getLdmlZip

If all went well, the zipfile should contain a folder named for a project ID at the root. If you get a 22-byte zip file that's empty, then something went wrong.

@rmunn rmunn requested a review from hahn-kev January 15, 2025 20:15
@rmunn rmunn marked this pull request as ready for review January 15, 2025 20:15
Copy link
Collaborator

@hahn-kev hahn-kev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a bunch of feedback, let me know if I missed any questions you still have.

One thought is to return 404 if there's no projects found, that should make it clear what's going on rather than an empty zip being an expected result.

data[nameof(JobTriggerTraceId)] = Activity.Current?.Context.TraceId.ToHexString() ?? string.Empty;
data[nameof(JobTriggerSpanParentId)] = Activity.Current?.Context.SpanId.ToHexString() ?? string.Empty;
var trigger = TriggerBuilder.Create()
// TODO: Is there a simpler way of telling Quartz "Hey, enqueue this job after X delay"? Picking a unique trigger name each time seems unnecessarily complicated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, actually if you look at the source, the method that we're using in LexJob scheduler.TriggerJob internally creates a random identity for the trigger

var trigger = TriggerBuilder.Create()
// TODO: Is there a simpler way of telling Quartz "Hey, enqueue this job after X delay"? Picking a unique trigger name each time seems unnecessarily complicated.
.WithIdentity(key.Name + "_Trigger_" + now.Ticks.ToString(), key.Group)
.StartAt(now.Add(delay))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably validate that delay is positive


namespace LexBoxApi.Jobs;

public abstract class DelayedLexJob() : LexJob
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class isn't really doing anything, can we just rename the QueueJob method and move it into LexJob then remove this class, maybe call it schedule job and or DelayedQueueJob something like that?

public class DeleteTempDirectoryJob() : DelayedLexJob
{
public static async Task Queue(ISchedulerFactory schedulerFactory,
string path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where, but this should be heavily validated, probably before queing the job, and right before the job actually runs to avoid deleting anything we care about. For example this should never be used to delete repo data. I would suggest only allowing paths prefixed with Path.GetTempPath() though I would validate that the temp path is not empy otherwise that'll match anything.

@@ -500,6 +508,16 @@ private async Task<HttpContent> ExecuteHgCommandServerCommand(ProjectCode code,
return response.Content;
}

private async Task<HttpContent?> ExecuteHgCommandServerCommand_ErrorsOk(ProjectCode code, string command, IEnumerable<HttpStatusCode> okErrors, CancellationToken token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of this method is non standard C#. Also okErrors could just be a parameter of the original ExecuteHgCommandServerCommand, with a default value of null or empty and it would have the same result

public async Task<DirectoryInfo?> ExtractLdmlZip(Project project, string destRoot, CancellationToken token = default)
{
if (project.Type != ProjectType.FLEx) return null;
var zip = await hgService.GetLdmlZip(project.Code, token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a using statement to dispose of the zip once this method returns

protected override Task ExecuteJob(IJobExecutionContext context)
{
ArgumentException.ThrowIfNullOrEmpty(Path);
return Task.Run(() =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be wrapped in a task since it's not running as part of a web request

Response.Headers.ContentDisposition = "attachment;filename=\"ldml.zip\""; // TODO: Put timestamp in filename, or use the filename that PrepareLdmlZip returns once it has a timestamp in it
Response.ContentType = "application/zip";
Response.StatusCode = 200;
await Response.SendFileAsync(path, token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you return File then you don't need to handle status codes or content disposition manually. You'll have to change the return type of this method

@@ -232,6 +232,17 @@ private async Task StreamHttpResponse(HttpContent hgResult)
await hgResult.CopyToAsync(writer.AsStream());
}

[HttpGet("getLdmlZip")] // TODO: Discuss endpoint name, and whether it should be GET or POST, at next opportunity
[AdminRequired] // TODO: Decide on permissions, because we don't want everyone triggering this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admin only is a good choice, it won't be exposed in the UI so I'm not worried about our admins executing it

@@ -232,6 +232,17 @@ private async Task StreamHttpResponse(HttpContent hgResult)
await hgResult.CopyToAsync(writer.AsStream());
}

[HttpGet("getLdmlZip")] // TODO: Discuss endpoint name, and whether it should be GET or POST, at next opportunity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a GET is fine since it's not actually modifying data. maybe sldr-export?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New API to export ldml files for projects which have "Share ws data" set
2 participants