-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nmt download #164
Nmt download #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 21 files at r1, 4 of 13 files at r2, all commit messages.
Reviewable status: 6 of 23 files reviewed, 4 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine/Translation/ModelDownloadUrl.cs
line 0 at r2 (raw file):
This should be moved to SIL.Machine.AspNetCore/Models
.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 211 at r2 (raw file):
builder.Services.AddHangfireServer(o => { o.Queues = queues.ToArray();
I like ToArray
better. It is clearer.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 421 at r2 (raw file):
); builder.Services.AddSingleton<ICleanupOldModelsJob, CleanupOldModelsJob>(); RecurringJob.AddOrUpdate<ICleanupOldModelsJob>(
This should be run using a BackgroundService
. See RecurrentTask
. ClearMLMonitorService
and SmtTransferEngineCommitService
are existing examples.
src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs
line 10 at r2 (raw file):
public string SourceLanguage { get; set; } = default!; public string TargetLanguage { get; set; } = default!; public bool IsModelRetrievable { get; set; } = false;
This should be IsModelPersisted
.
Previously, ddaspit (Damien Daspit) wrote…
Done |
Previously, ddaspit (Damien Daspit) wrote…
Quick fix likes the collection expression, but I can change it. |
Previously, ddaspit (Damien Daspit) wrote…
ok - that makes sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 23 files reviewed, 4 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs
line 10 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be
IsModelPersisted
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 21 files at r1, 2 of 13 files at r2, 12 of 12 files at r3, all commit messages.
Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 410 at r3 (raw file):
public static IMachineBuilder AddModelCleanupService(this IMachineBuilder builder) { builder.Services.AddSingleton<CleanupOldModelsService>();
You only need to add a hosted service as a singleton if you want to allow it to be injected into other services.
src/SIL.Machine.AspNetCore/Services/IFileStorage.cs
line 17 at r3 (raw file):
Task<Stream> OpenWriteAsync(string path, CancellationToken cancellationToken = default); Task<string> GetPresignedUrlAsync(string path, int minutesToExpire, CancellationToken cancellationToken = default);
This should be named GetDownloadUrlAsync
.
src/SIL.Machine.AspNetCore/Services/InMemoryStorage.cs
line 105 at r3 (raw file):
) { return Task.FromResult(path);
We should throw NotSupportedException
.
src/SIL.Machine.AspNetCore/Services/ISharedFileService.cs
line 5 at r3 (raw file):
public interface ISharedFileService { public const string ModelDirectory = "models/";
This constant should be moved to NmtEngineService
.
src/SIL.Machine.AspNetCore/Services/LocalStorage.cs
line 45 at r3 (raw file):
) { return Task.FromResult(path);
We should throw NotSupportedException
.
src/SIL.Machine.AspNetCore/Services/NmtClearMLBuildJobFactory.cs
line 47 at r3 (raw file):
+ $" 'shared_file_folder': '{folder}',\n" + (buildOptions is not null ? $" 'build_options': '''{buildOptions}''',\n" : "") + (engine.IsModelPersisted ? $" 'save_model': '{engineId}_{engine.BuildRevision + 1}',\n" : "")
Why + 1
? Aren't we getting the current build revision?
src/SIL.Machine.AspNetCore/Services/NmtEngineService.cs
line 124 at r3 (raw file):
TranslationEngine engine = await GetEngineAsync(engineId, cancellationToken); if (!engine.IsModelPersisted) throw new InvalidOperationException(
You should throw a NotSupportedException
.
src/SIL.Machine.AspNetCore/Services/NmtEngineService.cs
line 148 at r3 (raw file):
).ToString(), ModelRevision = engine.BuildRevision, ExipiresAt = DateTime.UtcNow.AddMinutes(MinutesToExpire)
We should either pass in the DateTime
into the _sharedFileService.GetPresignedUrlAsync
call or return the expiration time from the method.
src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs
line 146 at r3 (raw file):
throw new RpcException(new Status(StatusCode.Aborted, e.Message)); } catch (FileNotFoundException e)
We shouldn't handle this exception. It only happens if there is something really wrong.
src/SIL.Machine.AspNetCore/Services/SharedFileService.cs
line 62 at r3 (raw file):
string presignedUrl = path; if (_baseUri is not null) if (_baseUri.Scheme == "s3")
Why are we checking if this is an S3 file storage here? The IFileStorage
abstraction should hide any internal implementation details like this.
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 46 at r3 (raw file):
SourceLanguage = sourceLanguage, TargetLanguage = targetLanguage, IsModelPersisted = isModelPersisted
SMT models are always persisted.
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 223 at r3 (raw file):
) { throw new NotImplementedException();
This should be NotSupportedException
.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 3 at r3 (raw file):
namespace SIL.Machine.AspNetCore.Services; public class CleanupOldModelsService(
This should be named ModelCleanupService
.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 10 at r3 (raw file):
) : RecurrentTask("Cleanup Old Models Service", services, RefreshPeriod, logger) { public ISharedFileService SharedFileService { get; } = sharedFileService;
Is this public
for a reason?
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 19 at r3 (raw file):
protected override async Task DoWorkAsync(IServiceScope scope, CancellationToken cancellationToken) { await CheckModelsAsync();
You should pass in the cancellation token and use it for all async calls.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 22 at r3 (raw file):
} public async Task CheckModelsAsync()
Is this public
for a reason?
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 25 at r3 (raw file):
{ _logger.LogInformation("Running model cleanup job"); var paths = await SharedFileService.ListFilesAsync(ISharedFileService.ModelDirectory);
Please try to remember to only use var
when the type is explicit elsewhere in the line.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 68 at r3 (raw file):
{ // If a file has been requested to be deleted twice, delete it. Otherwise, mark it for deletion. if (_filesPreviouslyMarkedForDeletion.Contains(filename))
We need a more deterministic method for cleaning up old models that is dependent on temporary state that gets wiped out when the server restarts. For example, we could delete models 30 days after they were created.
Previously, ddaspit (Damien Daspit) wrote…
Ah - forgot. |
Previously, johnml1135 (John Lambert) wrote…
Hmm. If I remove Hosted service, it doesn't get called. I believe I do want it as a hosted service: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/host/hosted-services?view=aspnetcore-8.0&tabs=visual-studio |
Previously, ddaspit (Damien Daspit) wrote…
No. |
Previously, ddaspit (Damien Daspit) wrote…
Done |
Previously, ddaspit (Damien Daspit) wrote…
I was wanting to test it. I was able to find a work around as per here: https://stackoverflow.com/questions/9122708/unit-testing-private-methods-in-c-sharp. I could have also used "InternalsVisibleTo", but was unsure the best way to approach it. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Any filename that does not correspond to a database entry gets wiped out. I think that is strong enough. |
Previously, ddaspit (Damien Daspit) wrote…
I thought I had gotten rid of all of them. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Added comment for clarity. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
You're right - I knew it was funky. I will pass it down. |
Previously, ddaspit (Damien Daspit) wrote…
Sure - a 500 code is appropriate. |
Previously, ddaspit (Damien Daspit) wrote…
I don't know - or why I chose the Uri format. Moved everything to strings. I think I was thinking that we would want an implementation for inmemory or filestorage. |
Previously, ddaspit (Damien Daspit) wrote…
Ok - hardcode it? |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, johnml1135 (John Lambert) wrote…
I will update this logic to be dead reckoning. We know what filenames should exist - let's just delete any that aren't those. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
55ebe3e
to
9dd74a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 17 files at r4, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 410 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Hmm. If I remove Hosted service, it doesn't get called. I believe I do want it as a hosted service: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/host/hosted-services?view=aspnetcore-8.0&tabs=visual-studio
All you should need is
builder.Services.AddHostedService<ModelCleanupService>();
src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs
line 149 at r4 (raw file):
}; } catch (InvalidOperationException e)
I don't think we need to catch InvalidOperationException
anymore.
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 46 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Ok - hardcode it?
That works for Machine, but what about Serval? We should make IsModelPersisted
optional. We can throw an exception if they pass an invalid value.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 10 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
No.
Now that it is private, this should be a field instead of a property.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 68 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I will update this logic to be dead reckoning. We know what filenames should exist - let's just delete any that aren't those.
I'm not sure I understand the purpose of marking it for deletion. Why not just delete it?
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 15 at r4 (raw file):
private List<string> _filesPreviouslyMarkedForDeletion = []; private readonly List<string> _filesNewlyMarkedForDeletion = []; private static readonly TimeSpan RefreshPeriod = TimeSpan.FromSeconds(10);
This seems very short to me.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 25 at r4 (raw file):
{ _logger.LogInformation("Running model cleanup job"); IReadOnlyCollection<string> paths = await SharedFileService.ListFilesAsync(
I really don't love that we enumerate all model files and all engines in order to check for which models to delete. Is there any way we can design this so that we avoid these expensive operations?
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 31 at r4 (raw file):
// Get all engine ids from the database Dictionary<string, int> engineIdsToRevision = _engines .GetAllAsync(cancellationToken: cancellationToken)
You should await this call.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 32 at r4 (raw file):
Dictionary<string, int> engineIdsToRevision = _engines .GetAllAsync(cancellationToken: cancellationToken) .Result.Select(e => (e.EngineId, e.BuildRevision))
There is no need for separate Select
and ToDictionary
calls. You can specify the key and value in the ToDictionary
call.
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
This was used for testing - I was supposed to change it back to 1 day. |
Previously, ddaspit (Damien Daspit) wrote…
The engines are not keys on engineId's, therefore every engine lookup is a O(n) operation. Therefore, if each engine had a model, the opearation would be O(n*n). By listing them out once, first, we can reduce this burden. |
Previously, ddaspit (Damien Daspit) wrote…
Reworked for clarity. |
Previously, ddaspit (Damien Daspit) wrote…
The wonders of dotnet... |
Previously, ddaspit (Damien Daspit) wrote…
If the file is actively being downloaded... |
Previously, ddaspit (Damien Daspit) wrote…
Yes we still need it - if the model has not been built yet. |
Previously, ddaspit (Damien Daspit) wrote…
Is invalid false, or anything other than null? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs
line 10 at r5 (raw file):
public string SourceLanguage { get; set; } = default!; public string TargetLanguage { get; set; } = default!; public bool? IsModelPersisted { get; set; } = false;
This does not need to be nullable. It will always be set.
src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs
line 149 at r4 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Yes we still need it - if the model has not been built yet.
Right.
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 46 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Is invalid false, or anything other than null?
We should throw an exception if false
. null
or true
is fine.
src/SIL.Machine.AspNetCore/Services/CleanupOldModelsService.cs
line 68 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
If the file is actively being downloaded...
It is still not clear to me what this is buying us. I also don't like relying on temporary state to cleanup the files. I think we should simplify the logic and just delete the file.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 25 at r4 (raw file):
Previously, johnml1135 (John Lambert) wrote…
The engines are not keys on engineId's, therefore every engine lookup is a O(n) operation. Therefore, if each engine had a model, the opearation would be O(n*n). By listing them out once, first, we can reduce this burden.
I can live with it, but there is an alternative. If we tracked the model files in Mongo, we could retrieve just the files that need to be deleted instead of retrieving all of the model files and all of the engines.
Previously, ddaspit (Damien Daspit) wrote…
I think this is a simple and good (enough) path forward right now and am inclined just to move forward with it. |
Previously, ddaspit (Damien Daspit) wrote…
The biggest thing is deleting a model as it is being downloaded and then a build finishes, the download will crash ungraciously. If this is ok, then we can delete right away. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Fixed. |
cleaning script IsModelPersisted nullable Return IsModelPersistedState to Serval Check for modelRevision + 1 in cleanup and just delete without keeping internal states.
52826e9
to
5fe8bcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r6, 9 of 9 files at r7, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 46 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Done.
I don't see where the exception is thrown. The SMT engine doesn't currently support IsModelPersisted == false
, so we should throw an InvalidOperationException
that gets caught in ServalTranslationEngineServiceV1
and rethrown as an RpcException
with the status code InvalidArgument
.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 13 at r7 (raw file):
private ILogger<ModelCleanupService> _logger = logger; private IRepository<TranslationEngine> _engines = engines; private List<string> _filesPreviouslyMarkedForDeletion = [];
__filesPreviouslyMarkedForDeletion
and _filesNewlyMarkedForDeletion
can be removed. The rest of the fields should be readonly
.
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
10 lines above there is a "NotSupportedException". What is the logic we should use for these two exceptions? https://stackoverflow.com/questions/12669805/when-to-use-invalidoperationexception-or-notsupportedexception? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs
line 46 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
10 lines above there is a "NotSupportedException". What is the logic we should use for these two exceptions? https://stackoverflow.com/questions/12669805/when-to-use-invalidoperationexception-or-notsupportedexception?
I didn't see it. I think the NotSupportedException
is sufficient for this case.
Fixes sillsdev/serval#268
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)