diff --git a/.editorconfig b/.editorconfig index 43d37152a..db9ec586d 100644 --- a/.editorconfig +++ b/.editorconfig @@ -201,6 +201,7 @@ dotnet_diagnostic.CA1825.severity = warning # CA1826: Do not use Enumerable methods on indexable collections. Instead use the collection directly dotnet_diagnostic.CA1826.severity = warning +dotnet_code_quality.CA1826.exclude_ordefault_methods = true # CA1827: Do not use Count() or LongCount() when Any() can be used dotnet_diagnostic.CA1827.severity = warning diff --git a/src/SIL.Machine.AspNetCore/Models/Build.cs b/src/SIL.Machine.AspNetCore/Models/Build.cs index bedfc6032..40eef7969 100644 --- a/src/SIL.Machine.AspNetCore/Models/Build.cs +++ b/src/SIL.Machine.AspNetCore/Models/Build.cs @@ -14,12 +14,12 @@ public enum BuildJobRunner ClearML } -public class Build +public record Build { - public string BuildId { get; set; } = default!; - public BuildJobState JobState { get; set; } - public string JobId { get; set; } = default!; - public BuildJobRunner JobRunner { get; set; } - public string Stage { get; set; } = default!; - public string? Options { get; set; } = default; + public required string BuildId { get; init; } + public required BuildJobState JobState { get; init; } + public required string JobId { get; init; } + public required BuildJobRunner JobRunner { get; init; } + public required string Stage { get; init; } + public string? Options { get; set; } } diff --git a/src/SIL.Machine.AspNetCore/Models/ClearMLMetricsEvent.cs b/src/SIL.Machine.AspNetCore/Models/ClearMLMetricsEvent.cs index 8c1cc26b5..b4546028b 100644 --- a/src/SIL.Machine.AspNetCore/Models/ClearMLMetricsEvent.cs +++ b/src/SIL.Machine.AspNetCore/Models/ClearMLMetricsEvent.cs @@ -1,12 +1,12 @@ namespace SIL.Machine.AspNetCore.Models; -public class ClearMLMetricsEvent +public record ClearMLMetricsEvent { - public string Metric { get; set; } = default!; - public string Variant { get; set; } = default!; - public double Value { get; set; } - public double MinValue { get; set; } - public int MinValueIteration { get; set; } - public double MaxValue { get; set; } - public int MaxValueIteration { get; set; } + public required string Metric { get; init; } + public required string Variant { get; init; } + public required double Value { get; init; } + public required double MinValue { get; init; } + public required int MinValueIteration { get; init; } + public required double MaxValue { get; init; } + public required int MaxValueIteration { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/ClearMLProject.cs b/src/SIL.Machine.AspNetCore/Models/ClearMLProject.cs index 6d378428d..ca6cf6883 100644 --- a/src/SIL.Machine.AspNetCore/Models/ClearMLProject.cs +++ b/src/SIL.Machine.AspNetCore/Models/ClearMLProject.cs @@ -1,6 +1,6 @@ namespace SIL.Machine.AspNetCore.Models; -public class ClearMLProject +public record ClearMLProject { - public string Id { get; set; } = default!; + public required string Id { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs b/src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs index 8fcbb449a..ba2feadcf 100644 --- a/src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs +++ b/src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs @@ -14,16 +14,19 @@ public enum ClearMLTaskStatus Unknown } -public class ClearMLTask +public record ClearMLTask { - public string Id { get; set; } = default!; - public string Name { get; set; } = default!; - public ClearMLProject Project { get; set; } = default!; - public ClearMLTaskStatus Status { get; set; } - public string StatusReason { get; set; } = default!; - public string StatusMessage { get; set; } = default!; - public DateTime Created { get; set; } - public int LastIteration { get; set; } - public int ActiveDuration { get; set; } - public Dictionary> LastMetrics { get; set; } = default!; + public required string Id { get; init; } + public required string Name { get; init; } + public required ClearMLProject Project { get; init; } + public required ClearMLTaskStatus Status { get; init; } + public required string StatusReason { get; init; } + public required string StatusMessage { get; init; } + public required DateTime Created { get; init; } + public required int LastIteration { get; init; } + public required int ActiveDuration { get; init; } + public required IReadOnlyDictionary< + string, + IReadOnlyDictionary + > LastMetrics { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/Corpus.cs b/src/SIL.Machine.AspNetCore/Models/Corpus.cs index bf741e298..c2d421149 100644 --- a/src/SIL.Machine.AspNetCore/Models/Corpus.cs +++ b/src/SIL.Machine.AspNetCore/Models/Corpus.cs @@ -1,16 +1,16 @@ namespace SIL.Machine.AspNetCore.Models; -public class Corpus +public record Corpus { - public string Id { get; set; } = default!; - public string SourceLanguage { get; set; } = default!; - public string TargetLanguage { get; set; } = default!; - public bool TrainOnAll { get; set; } - public bool PretranslateAll { get; set; } - public Dictionary>? TrainOnChapters { get; set; } - public Dictionary>? PretranslateChapters { get; set; } - public HashSet TrainOnTextIds { get; set; } = default!; - public HashSet PretranslateTextIds { get; set; } = default!; - public List SourceFiles { get; set; } = default!; - public List TargetFiles { get; set; } = default!; + public required string Id { get; init; } + public required string SourceLanguage { get; init; } + public required string TargetLanguage { get; init; } + public required bool TrainOnAll { get; init; } + public required bool PretranslateAll { get; init; } + public IReadOnlyDictionary>? TrainOnChapters { get; init; } + public IReadOnlyDictionary>? PretranslateChapters { get; init; } + public required IReadOnlySet TrainOnTextIds { get; init; } + public required IReadOnlySet PretranslateTextIds { get; init; } + public required IReadOnlyList SourceFiles { get; init; } + public required IReadOnlyList TargetFiles { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/CorpusFile.cs b/src/SIL.Machine.AspNetCore/Models/CorpusFile.cs index 378d41469..4df8dd8c1 100644 --- a/src/SIL.Machine.AspNetCore/Models/CorpusFile.cs +++ b/src/SIL.Machine.AspNetCore/Models/CorpusFile.cs @@ -6,9 +6,9 @@ public enum FileFormat Paratext = 1 } -public class CorpusFile +public record CorpusFile { - public string Location { get; set; } = default!; - public FileFormat Format { get; set; } - public string TextId { get; set; } = default!; + public required string Location { get; init; } + public required FileFormat Format { get; init; } + public required string TextId { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/Lock.cs b/src/SIL.Machine.AspNetCore/Models/Lock.cs index 87a38d662..3ea408614 100644 --- a/src/SIL.Machine.AspNetCore/Models/Lock.cs +++ b/src/SIL.Machine.AspNetCore/Models/Lock.cs @@ -1,8 +1,8 @@ namespace SIL.Machine.AspNetCore.Models; -public class Lock +public record Lock { - public string Id { get; set; } = default!; - public DateTime? ExpiresAt { get; set; } - public string HostId { get; set; } = default!; + public required string Id { get; init; } + public DateTime? ExpiresAt { get; init; } + public required string HostId { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/ModelDownloadUrl.cs b/src/SIL.Machine.AspNetCore/Models/ModelDownloadUrl.cs index 7c8ae88ed..a00da8f57 100644 --- a/src/SIL.Machine.AspNetCore/Models/ModelDownloadUrl.cs +++ b/src/SIL.Machine.AspNetCore/Models/ModelDownloadUrl.cs @@ -1,8 +1,8 @@ -namespace SIL.Machine.AspNetCore.Models; +namespace SIL.Machine.AspNetCore.Models; -public class ModelDownloadUrl +public record ModelDownloadUrl { - public string Url { get; set; } = default!; - public int ModelRevision { get; set; } = default!; - public DateTime ExpiresAt { get; set; } = default!; + public required string Url { get; init; } + public required int ModelRevision { get; init; } + public required DateTime ExpiresAt { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/Pretranslation.cs b/src/SIL.Machine.AspNetCore/Models/Pretranslation.cs index 170b48d66..31d895184 100644 --- a/src/SIL.Machine.AspNetCore/Models/Pretranslation.cs +++ b/src/SIL.Machine.AspNetCore/Models/Pretranslation.cs @@ -1,9 +1,9 @@ namespace SIL.Machine.AspNetCore.Models; -public class Pretranslation +public record Pretranslation { - public string CorpusId { get; set; } = default!; - public string TextId { get; set; } = default!; - public List Refs { get; set; } = default!; - public string Translation { get; set; } = default!; + public required string CorpusId { get; init; } + public required string TextId { get; init; } + public required IReadOnlyList Refs { get; init; } + public required string Translation { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/RWLock.cs b/src/SIL.Machine.AspNetCore/Models/RWLock.cs index 895cbcfb6..6475e29f5 100644 --- a/src/SIL.Machine.AspNetCore/Models/RWLock.cs +++ b/src/SIL.Machine.AspNetCore/Models/RWLock.cs @@ -1,12 +1,12 @@ namespace SIL.Machine.AspNetCore.Models; -public class RWLock : IEntity +public record RWLock : IEntity { - public string Id { get; set; } = default!; - public int Revision { get; set; } - public Lock? WriterLock { get; set; } - public List ReaderLocks { get; set; } = new List(); - public List WriterQueue { get; set; } = new List(); + public string Id { get; set; } = ""; + public int Revision { get; set; } = 1; + public Lock? WriterLock { get; init; } + public required IReadOnlyList ReaderLocks { get; init; } + public required IReadOnlyList WriterQueue { get; init; } public bool IsAvailableForReading() { diff --git a/src/SIL.Machine.AspNetCore/Models/TrainSegmentPair.cs b/src/SIL.Machine.AspNetCore/Models/TrainSegmentPair.cs index 8e1d40ea9..471b5296d 100644 --- a/src/SIL.Machine.AspNetCore/Models/TrainSegmentPair.cs +++ b/src/SIL.Machine.AspNetCore/Models/TrainSegmentPair.cs @@ -1,11 +1,11 @@ namespace SIL.Machine.AspNetCore.Models; -public class TrainSegmentPair : IEntity +public record TrainSegmentPair : IEntity { - public string Id { get; set; } = default!; + public string Id { get; set; } = ""; public int Revision { get; set; } = 1; - public string TranslationEngineRef { get; set; } = default!; - public string Source { get; set; } = default!; - public string Target { get; set; } = default!; - public bool SentenceStart { get; set; } + public required string TranslationEngineRef { get; init; } + public required string Source { get; init; } + public required string Target { get; init; } + public required bool SentenceStart { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs b/src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs index 07b94f759..7cd8a918c 100644 --- a/src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs +++ b/src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs @@ -1,13 +1,13 @@ namespace SIL.Machine.AspNetCore.Models; -public class TranslationEngine : IEntity +public record TranslationEngine : IEntity { - public string Id { get; set; } = default!; + public string Id { get; set; } = ""; public int Revision { get; set; } = 1; - public string EngineId { get; set; } = default!; - public string SourceLanguage { get; set; } = default!; - public string TargetLanguage { get; set; } = default!; - public bool IsModelPersisted { get; set; } - public int BuildRevision { get; set; } - public Build? CurrentBuild { get; set; } + public required string EngineId { get; init; } + public required string SourceLanguage { get; init; } + public required string TargetLanguage { get; init; } + public required bool IsModelPersisted { get; init; } + public int BuildRevision { get; init; } + public Build? CurrentBuild { get; init; } } diff --git a/src/SIL.Machine.AspNetCore/SIL.Machine.AspNetCore.csproj b/src/SIL.Machine.AspNetCore/SIL.Machine.AspNetCore.csproj index ce36bf58c..ea3fd1f6f 100644 --- a/src/SIL.Machine.AspNetCore/SIL.Machine.AspNetCore.csproj +++ b/src/SIL.Machine.AspNetCore/SIL.Machine.AspNetCore.csproj @@ -39,7 +39,7 @@ - + diff --git a/src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs b/src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs index 422861b6c..58b537494 100644 --- a/src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs +++ b/src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs @@ -327,7 +327,7 @@ private async Task UpdateTrainJobStatus( private static double GetMetric(ClearMLTask task, string metric, string variant) { - if (!task.LastMetrics.TryGetValue(metric, out Dictionary? metricVariants)) + if (!task.LastMetrics.TryGetValue(metric, out IReadOnlyDictionary? metricVariants)) return 0; if (!metricVariants.TryGetValue(variant, out ClearMLMetricsEvent? metricEvent)) diff --git a/src/SIL.Machine.AspNetCore/Services/DistributedReaderWriterLockFactory.cs b/src/SIL.Machine.AspNetCore/Services/DistributedReaderWriterLockFactory.cs index a30b68716..64a3c3aa5 100644 --- a/src/SIL.Machine.AspNetCore/Services/DistributedReaderWriterLockFactory.cs +++ b/src/SIL.Machine.AspNetCore/Services/DistributedReaderWriterLockFactory.cs @@ -24,7 +24,15 @@ public async Task CreateAsync( { try { - await _locks.InsertAsync(new RWLock { Id = id }, cancellationToken); + await _locks.InsertAsync( + new RWLock + { + Id = id, + ReaderLocks = [], + WriterQueue = [] + }, + cancellationToken + ); } catch (DuplicateKeyException) { diff --git a/src/SIL.Machine.AspNetCore/Services/InMemoryStorage.cs b/src/SIL.Machine.AspNetCore/Services/InMemoryStorage.cs index e92109a39..7deccb6e9 100644 --- a/src/SIL.Machine.AspNetCore/Services/InMemoryStorage.cs +++ b/src/SIL.Machine.AspNetCore/Services/InMemoryStorage.cs @@ -1,3 +1,4 @@ +using SIL.ObjectModel; using static SIL.Machine.AspNetCore.Utils.SharedFileUtils; namespace SIL.Machine.AspNetCore.Services; diff --git a/src/SIL.Machine.AspNetCore/Services/LanguageTagService.cs b/src/SIL.Machine.AspNetCore/Services/LanguageTagService.cs index 67f69c784..bc2d64982 100644 --- a/src/SIL.Machine.AspNetCore/Services/LanguageTagService.cs +++ b/src/SIL.Machine.AspNetCore/Services/LanguageTagService.cs @@ -19,10 +19,8 @@ public class LanguageTagService : ILanguageTagService private readonly Dictionary _flores200Languages; - private static readonly Regex LangTagPattern = new Regex( - "(?'language'[a-zA-Z]{2,8})([_-](?'script'[a-zA-Z]{4}))?", - RegexOptions.ExplicitCapture - ); + private static readonly Regex LangTagPattern = + new("(?'language'[a-zA-Z]{2,8})([_-](?'script'[a-zA-Z]{4}))?", RegexOptions.ExplicitCapture); public LanguageTagService() { diff --git a/src/SIL.Machine.AspNetCore/Services/LocalStorage.cs b/src/SIL.Machine.AspNetCore/Services/LocalStorage.cs index 9fc26c097..38e9049bd 100644 --- a/src/SIL.Machine.AspNetCore/Services/LocalStorage.cs +++ b/src/SIL.Machine.AspNetCore/Services/LocalStorage.cs @@ -1,3 +1,4 @@ +using SIL.ObjectModel; using static SIL.Machine.AspNetCore.Utils.SharedFileUtils; namespace SIL.Machine.AspNetCore.Services; diff --git a/src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs b/src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs index efa041bc9..df10a018e 100644 --- a/src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs +++ b/src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs @@ -11,6 +11,9 @@ public class NmtPreprocessBuildJob( ILanguageTagService languageTagService ) : HangfireBuildJob>(platformService, engines, lockFactory, buildJobService, logger) { + private static readonly JsonSerializerOptions PretranslateSerializerOptions = + new() { WriteIndented = true, PropertyNamingPolicy = JsonNamingPolicy.CamelCase }; + private readonly ISharedFileService _sharedFileService = sharedFileService; private readonly ICorpusService _corpusService = corpusService; private readonly ILanguageTagService _languageTagService = languageTagService; @@ -24,7 +27,12 @@ protected override async Task DoWorkAsync( CancellationToken cancellationToken ) { - IDictionary counts = await WriteDataFilesAsync(buildId, data, buildOptions, cancellationToken); + (int trainCount, int pretranslateCount) = await WriteDataFilesAsync( + buildId, + data, + buildOptions, + cancellationToken + ); // Log summary of build data JsonObject buildPreprocessSummary = @@ -32,12 +40,10 @@ CancellationToken cancellationToken { { "Event", "BuildPreprocess" }, { "EngineId", engineId }, - { "BuildId", buildId } + { "BuildId", buildId }, + { "NumTrainRows", trainCount }, + { "NumPretranslateRows", pretranslateCount } }; - foreach (KeyValuePair kvp in counts) - { - buildPreprocessSummary.Add(kvp.Key, kvp.Value); - } TranslationEngine? engine = await Engines.GetAsync(e => e.EngineId == engineId, cancellationToken); if (engine is null) throw new OperationCanceledException($"Engine {engineId} does not exist. Build canceled."); @@ -64,7 +70,7 @@ CancellationToken cancellationToken } } - private async Task> WriteDataFilesAsync( + private async Task<(int TrainCount, int PretranslateCount)> WriteDataFilesAsync( string buildId, IReadOnlyList corpora, string? buildOptions, @@ -76,17 +82,13 @@ CancellationToken cancellationToken { buildOptionsObject = JsonSerializer.Deserialize(buildOptions); } - await using var sourceTrainWriter = new StreamWriter( - await _sharedFileService.OpenWriteAsync($"builds/{buildId}/train.src.txt", cancellationToken) - ); - await using var targetTrainWriter = new StreamWriter( - await _sharedFileService.OpenWriteAsync($"builds/{buildId}/train.trg.txt", cancellationToken) - ); + await using StreamWriter sourceTrainWriter = + new(await _sharedFileService.OpenWriteAsync($"builds/{buildId}/train.src.txt", cancellationToken)); + await using StreamWriter targetTrainWriter = + new(await _sharedFileService.OpenWriteAsync($"builds/{buildId}/train.trg.txt", cancellationToken)); - Dictionary counts = new(); - counts["CorpusSize"] = 0; - counts["NumTrainRows"] = 0; - counts["NumPretranslateRows"] = 0; + int trainCount = 0; + int pretranslateCount = 0; async IAsyncEnumerable ProcessRowsAsync() { foreach (Corpus corpus in corpora) @@ -98,35 +100,19 @@ async IAsyncEnumerable ProcessRowsAsync() corpus.TargetFiles ); - var parallelCorpora = new List(); - IParallelTextCorpus parallelTextCorpus = sourceCorpora[CorpusType.Text] .AlignRows(targetCorpora[CorpusType.Text], allSourceRows: true, allTargetRows: true); - parallelCorpora.Add(parallelTextCorpus); - if ( - (bool?)buildOptionsObject?["use_key_terms"] - ?? true && sourceCorpora.ContainsKey(CorpusType.Term) && targetCorpora.ContainsKey(CorpusType.Term) - ) - { - IParallelTextCorpus parallelKeyTermsCorpus = sourceCorpora[CorpusType.Term] - .AlignRows(targetCorpora[CorpusType.Term]); - IEnumerable keyTermsTextIds = parallelKeyTermsCorpus.Select(r => r.TextId).Distinct(); - if (keyTermsTextIds.Count() == 1) - corpus.TrainOnTextIds.Add(keyTermsTextIds.First()); //Should only be one textId for key terms - parallelCorpora.Add(parallelKeyTermsCorpus); - } - - foreach (ParallelTextRow row in parallelCorpora.Flatten()) + foreach (ParallelTextRow row in parallelTextCorpus) { bool isInTrainOnChapters = false; bool isInPretranslateChapters = false; if (targetCorpora[CorpusType.Text] is ScriptureTextCorpus stc) { - bool IsInChapters(Dictionary> bookChapters, object rowRef) + bool IsInChapters(IReadOnlyDictionary> bookChapters, object rowRef) { if (rowRef is not VerseRef vr) return false; - return bookChapters.TryGetValue(vr.Book, out HashSet? chapters) + return bookChapters.TryGetValue(vr.Book, out IReadOnlySet? chapters) && (chapters.Contains(vr.ChapterNum) || chapters.Count == 0); } if (corpus.TrainOnChapters is not null) @@ -138,7 +124,8 @@ bool IsInChapters(Dictionary> bookChapters, object rowRef) { await sourceTrainWriter.WriteAsync($"{row.SourceText}\n"); await targetTrainWriter.WriteAsync($"{row.TargetText}\n"); - counts["NumTrainRows"] += 1; + if (!row.IsEmpty) + trainCount++; } if ( ( @@ -159,17 +146,30 @@ bool IsInChapters(Dictionary> bookChapters, object rowRef) { refs = row.TargetRefs; } - counts["NumPretranslateRows"] += 1; yield return new Pretranslation { CorpusId = corpus.Id, TextId = row.TextId, - Refs = refs.Select(r => r.ToString()!).ToList(), + Refs = refs.Select(r => r.ToString() ?? "").ToList(), Translation = row.SourceText }; + pretranslateCount++; + } + } + + if ( + (bool?)buildOptionsObject?["use_key_terms"] + ?? true && sourceCorpora.ContainsKey(CorpusType.Term) && targetCorpora.ContainsKey(CorpusType.Term) + ) + { + IParallelTextCorpus parallelKeyTermsCorpus = sourceCorpora[CorpusType.Term] + .AlignRows(targetCorpora[CorpusType.Term]); + foreach (ParallelTextRow row in parallelKeyTermsCorpus) + { + await sourceTrainWriter.WriteAsync($"{row.SourceText}\n"); + await targetTrainWriter.WriteAsync($"{row.TargetText}\n"); + trainCount++; } - if (!row.IsEmpty) - counts["CorpusSize"]++; } } } @@ -182,11 +182,11 @@ bool IsInChapters(Dictionary> bookChapters, object rowRef) await JsonSerializer.SerializeAsync( sourcePretranslateStream, ProcessRowsAsync(), - new JsonSerializerOptions { WriteIndented = true, PropertyNamingPolicy = JsonNamingPolicy.CamelCase }, + PretranslateSerializerOptions, cancellationToken: cancellationToken ); - return counts; + return (trainCount, pretranslateCount); } protected override async Task CleanupAsync( diff --git a/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs b/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs index ce2d840d7..a9d265e9c 100644 --- a/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs +++ b/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs @@ -1,3 +1,4 @@ +using SIL.ObjectModel; using static SIL.Machine.AspNetCore.Utils.SharedFileUtils; namespace SIL.Machine.AspNetCore.Services; diff --git a/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs b/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs index ee7971161..3661dbc09 100644 --- a/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs +++ b/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs @@ -288,12 +288,14 @@ private static Models.Corpus Map(Serval.Translation.V1.Corpus source) TargetLanguage = source.TargetLanguage, TrainOnAll = source.TrainOnAll, PretranslateAll = source.PretranslateAll, - TrainOnChapters = source - .TrainOnChapters.Select(kvp => (kvp.Key, kvp.Value.Chapters.ToHashSet())) - .ToDictionary(), - PretranslateChapters = source - .PretranslateChapters.Select(kvp => (kvp.Key, kvp.Value.Chapters.ToHashSet())) - .ToDictionary(), + TrainOnChapters = source.TrainOnChapters.ToDictionary( + kvp => kvp.Key, + kvp => (IReadOnlySet)kvp.Value.Chapters.ToHashSet() + ), + PretranslateChapters = source.PretranslateChapters.ToDictionary( + kvp => kvp.Key, + kvp => (IReadOnlySet)kvp.Value.Chapters.ToHashSet() + ), TrainOnTextIds = source.TrainOnTextIds.ToHashSet(), PretranslateTextIds = source.PretranslateTextIds.ToHashSet(), SourceFiles = source.SourceFiles.Select(Map).ToList(), diff --git a/src/SIL.Machine.AspNetCore/Usings.cs b/src/SIL.Machine.AspNetCore/Usings.cs index 2e541bc3a..f42422c20 100644 --- a/src/SIL.Machine.AspNetCore/Usings.cs +++ b/src/SIL.Machine.AspNetCore/Usings.cs @@ -53,6 +53,5 @@ global using SIL.Machine.Translation; global using SIL.Machine.Translation.Thot; global using SIL.Machine.Utils; -global using SIL.ObjectModel; global using SIL.Scripture; global using SIL.WritingSystems; diff --git a/src/SIL.Machine.AspNetCore/Utils/AsyncDisposableBase.cs b/src/SIL.Machine.AspNetCore/Utils/AsyncDisposableBase.cs index ec551079b..b0f2d3bee 100644 --- a/src/SIL.Machine.AspNetCore/Utils/AsyncDisposableBase.cs +++ b/src/SIL.Machine.AspNetCore/Utils/AsyncDisposableBase.cs @@ -1,4 +1,6 @@ -namespace SIL.Machine.AspNetCore.Utils; +using SIL.ObjectModel; + +namespace SIL.Machine.AspNetCore.Utils; public class AsyncDisposableBase : DisposableBase, IAsyncDisposable { diff --git a/tests/SIL.Machine.AspNetCore.Tests/Services/DistributedReaderWriterLockFactoryTests.cs b/tests/SIL.Machine.AspNetCore.Tests/Services/DistributedReaderWriterLockFactoryTests.cs index b8c6b29be..61a5cbfd2 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Services/DistributedReaderWriterLockFactoryTests.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Services/DistributedReaderWriterLockFactoryTests.cs @@ -6,12 +6,14 @@ public class DistributedReaderWriterLockFactoryTests [Test] public async Task InitAsync_ReleaseWriterLocks() { - var env = new TestEnvironment(); + TestEnvironment env = new(); env.Locks.Add( new RWLock { Id = "resource1", - WriterLock = new Lock { Id = "lock1", HostId = "this_service" } + WriterLock = new() { Id = "lock1", HostId = "this_service" }, + ReaderLocks = [], + WriterQueue = [] } ); @@ -24,15 +26,13 @@ public async Task InitAsync_ReleaseWriterLocks() [Test] public async Task InitAsync_ReleaseReaderLocks() { - var env = new TestEnvironment(); + TestEnvironment env = new(); env.Locks.Add( new RWLock { Id = "resource1", - ReaderLocks = - { - new Lock { Id = "lock1", HostId = "this_service" } - } + ReaderLocks = [new() { Id = "lock1", HostId = "this_service" }], + WriterQueue = [] } ); @@ -45,16 +45,14 @@ public async Task InitAsync_ReleaseReaderLocks() [Test] public async Task InitAsync_RemoveWaiters() { - var env = new TestEnvironment(); + TestEnvironment env = new(); env.Locks.Add( new RWLock { Id = "resource1", - WriterLock = new Lock { Id = "lock1", HostId = "other_service" }, - WriterQueue = - { - new Lock { Id = "lock2", HostId = "this_service" } - } + WriterLock = new() { Id = "lock1", HostId = "other_service" }, + ReaderLocks = [], + WriterQueue = [new() { Id = "lock2", HostId = "this_service" }] } ); @@ -69,7 +67,7 @@ private class TestEnvironment public TestEnvironment() { Locks = new MemoryRepository(); - var serviceOptions = new ServiceOptions { ServiceId = "this_service" }; + ServiceOptions serviceOptions = new() { ServiceId = "this_service" }; Factory = new DistributedReaderWriterLockFactory( new OptionsWrapper(serviceOptions), Locks, diff --git a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtClearMLBuildJobFactoryTests.cs b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtClearMLBuildJobFactoryTests.cs index 55f689234..edb2f2a46 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtClearMLBuildJobFactoryTests.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtClearMLBuildJobFactoryTests.cs @@ -78,7 +78,15 @@ public TestEnvironment() SourceLanguage = "es", TargetLanguage = "en", BuildRevision = 1, - CurrentBuild = new Build { BuildId = "build1", JobState = BuildJobState.Pending } + IsModelPersisted = false, + CurrentBuild = new() + { + BuildId = "build1", + JobId = "job1", + JobRunner = BuildJobRunner.ClearML, + Stage = NmtBuildStages.Train, + JobState = BuildJobState.Pending + } } ); Options = Substitute.For>(); diff --git a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs index cf627c89b..d3a2ba336 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs @@ -89,7 +89,7 @@ public async Task DeleteAsync_WhileBuilding() Assert.That(env.Engines.Contains("engine1"), Is.False); } - private class TestEnvironment : DisposableBase + private class TestEnvironment : ObjectModel.DisposableBase { private readonly Hangfire.InMemory.InMemoryStorage _memoryStorage; private readonly BackgroundJobClient _jobClient; @@ -110,7 +110,8 @@ public TestEnvironment() EngineId = "engine1", SourceLanguage = "es", TargetLanguage = "en", - BuildRevision = 1 + BuildRevision = 1, + IsModelPersisted = false } ); _memoryStorage = new Hangfire.InMemory.InMemoryStorage(); diff --git a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtPreprocessBuildJobTests.cs b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtPreprocessBuildJobTests.cs index c8f77c291..eb0add632 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Services/NmtPreprocessBuildJobTests.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Services/NmtPreprocessBuildJobTests.cs @@ -3,26 +3,6 @@ namespace SIL.Machine.AspNetCore.Services; [TestFixture] public class NmtPreprocessBuildJobTests { - [SetUp] - public void SetUp() - { - ZipFile.CreateFromDirectory( - Path.Combine("..", "..", "..", "Services", "data", "paratext"), - Path.Combine(Path.GetTempPath(), "Project.zip") - ); - ZipFile.CreateFromDirectory( - Path.Combine("..", "..", "..", "Services", "data", "paratext2"), - Path.Combine(Path.GetTempPath(), "Project2.zip") - ); - } - - [TearDown] - public void TearDown() - { - File.Delete(Path.Combine(Path.GetTempPath(), "Project.zip")); - File.Delete(Path.Combine(Path.GetTempPath(), "Project2.zip")); - } - [Test] [TestCase(false, false, null, null, 0, 0)] [TestCase(false, true, null, null, 5, 0)] @@ -39,7 +19,7 @@ public async Task BuildJobTest( int numEntriesWrittenToPretranslate ) { - using var env = new TestEnvironment(); + using TestEnvironment env = new(); var corpus1 = new Corpus { Id = "corpusId1", @@ -47,45 +27,41 @@ int numEntriesWrittenToPretranslate TargetLanguage = "en", PretranslateAll = pTAll, TrainOnAll = tOAll, - PretranslateTextIds = pTTextIds is null ? new HashSet() : pTTextIds.ToHashSet(), - TrainOnTextIds = tOTextIds is null ? new HashSet() : tOTextIds.ToHashSet(), - SourceFiles = new List - { - new CorpusFile + PretranslateTextIds = pTTextIds?.ToHashSet() ?? [], + TrainOnTextIds = tOTextIds?.ToHashSet() ?? [], + SourceFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Text, Location = Path.Combine("..", "..", "..", "Services", "data", "source1.txt") } - }, - TargetFiles = new List - { - new CorpusFile + ], + TargetFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Text, Location = Path.Combine("..", "..", "..", "Services", "data", "target1.txt") } - } + ] }; - var corpora = new ReadOnlyList(new List { corpus1 }); - await env.BuildJob.RunAsync("engine1", "build1", corpora, null, default); - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt")) + await env.BuildJob.RunAsync("engine1", "build1", [corpus1], null, default); + using (StreamReader reader = new(await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt"))) { - using (var reader = new StreamReader(stream)) - { - //Split yields one more segment that there are new lines; thus, the "- 1" - Assert.That(reader.ReadToEnd().Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain)); - } + //Split yields one more segment that there are new lines; thus, the "- 1" + Assert.That(reader.ReadToEnd().Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain)); } - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + + using ( + StreamReader reader = new(await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + ) { - using (var reader = new StreamReader(stream)) - { - JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); - Assert.NotNull(pretranslationJsonObject); - Assert.That(pretranslationJsonObject!.ToList().Count, Is.EqualTo(numEntriesWrittenToPretranslate)); - } + JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); + Assert.That(pretranslationJsonObject, Is.Not.Null); + Assert.That(pretranslationJsonObject, Has.Count.EqualTo(numEntriesWrittenToPretranslate)); } } @@ -98,7 +74,7 @@ public async Task BuildJobTest_Paratext( int numEntriesWrittenToPretranslate ) { - using var env = new TestEnvironment(); + using TestEnvironment env = new(); var corpus1 = new Corpus { Id = "corpusId1", @@ -108,43 +84,39 @@ int numEntriesWrittenToPretranslate TrainOnAll = false, PretranslateTextIds = new HashSet(), TrainOnTextIds = new HashSet(), - SourceFiles = new List - { - new CorpusFile + SourceFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project.zip") } - }, - TargetFiles = new List - { - new CorpusFile + ], + TargetFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project.zip") } - } + ] }; - var corpora = new ReadOnlyList(new List { corpus1 }); - await env.BuildJob.RunAsync("engine1", "build1", corpora, buildOptions, default); - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt")) + await env.BuildJob.RunAsync("engine1", "build1", [corpus1], buildOptions, default); + using (StreamReader reader = new(await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt"))) { - using (var reader = new StreamReader(stream)) - { - //Split yields one more segment that there are new lines; thus, the "- 1" - Assert.That(reader.ReadToEnd().Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain)); - } + //Split yields one more segment that there are new lines; thus, the "- 1" + Assert.That(reader.ReadToEnd().Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain)); } - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + + using ( + StreamReader reader = new(await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + ) { - using (var reader = new StreamReader(stream)) - { - JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); - Assert.NotNull(pretranslationJsonObject); - Assert.That(pretranslationJsonObject!.ToList().Count, Is.EqualTo(numEntriesWrittenToPretranslate)); - } + JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); + Assert.That(pretranslationJsonObject, Is.Not.Null); + Assert.That(pretranslationJsonObject, Has.Count.EqualTo(numEntriesWrittenToPretranslate)); } } @@ -161,10 +133,10 @@ public async Task BuildJobTest_Chapterlevel( bool throwsException = false ) { - using var env = new TestEnvironment(); + using TestEnvironment env = new(); var parser = new ScriptureRangeParser(); - Corpus corpus1 = new Corpus(); + Corpus corpus1; if (throwsException) { Assert.Throws(() => @@ -178,32 +150,30 @@ public async Task BuildJobTest_Chapterlevel( TrainOnAll = false, PretranslateChapters = parser .GetChapters(pretranslateBiblicalRangeChapters) - .Select(kvp => (kvp.Key, kvp.Value.ToHashSet())) - .ToDictionary(), + .ToDictionary(kvp => kvp.Key, kvp => (IReadOnlySet)kvp.Value.ToHashSet()), TrainOnChapters = parser .GetChapters(trainOnBiblicalRangeChapters) - .Select(kvp => (kvp.Key, kvp.Value.ToHashSet())) - .ToDictionary(), + .ToDictionary(kvp => kvp.Key, kvp => (IReadOnlySet)kvp.Value.ToHashSet()), PretranslateTextIds = new HashSet(), TrainOnTextIds = new HashSet(), - SourceFiles = new List - { - new CorpusFile + SourceFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project.zip") } - }, - TargetFiles = new List - { - new CorpusFile + ], + TargetFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project2.zip") } - } + ] }; }); return; @@ -219,61 +189,54 @@ public async Task BuildJobTest_Chapterlevel( TrainOnAll = false, PretranslateChapters = parser .GetChapters(pretranslateBiblicalRangeChapters) - .Select(kvp => (kvp.Key, kvp.Value.ToHashSet())) - .ToDictionary(), + .ToDictionary(kvp => kvp.Key, kvp => (IReadOnlySet)kvp.Value.ToHashSet()), TrainOnChapters = parser .GetChapters(trainOnBiblicalRangeChapters) - .Select(kvp => (kvp.Key, kvp.Value.ToHashSet())) - .ToDictionary(), + .ToDictionary(kvp => kvp.Key, kvp => (IReadOnlySet)kvp.Value.ToHashSet()), PretranslateTextIds = new HashSet(), TrainOnTextIds = new HashSet(), - SourceFiles = new List - { - new CorpusFile + SourceFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project.zip") } - }, - TargetFiles = new List - { - new CorpusFile + ], + TargetFiles = + [ + new() { TextId = "textId1", Format = FileFormat.Paratext, Location = Path.Combine(Path.GetTempPath(), "Project2.zip") } - } + ] }; } - var corpora = new ReadOnlyList(new List { corpus1 }); - await env.BuildJob.RunAsync("engine1", "build1", corpora, "{\"use_key_terms\":false}", default); - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt")) + await env.BuildJob.RunAsync("engine1", "build1", [corpus1], "{\"use_key_terms\":false}", default); + using (StreamReader reader = new(await env.SharedFileService.OpenReadAsync("builds/build1/train.src.txt"))) { - using (var reader = new StreamReader(stream)) - { - //Split yields one more segment that there are new lines; thus, the "- 1" - string text = reader.ReadToEnd(); - Assert.That(text.Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain), text); - } + //Split yields one more segment that there are new lines; thus, the "- 1" + string text = reader.ReadToEnd(); + Assert.That(text.Split("\n").Length - 1, Is.EqualTo(numLinesWrittenToTrain), text); } - using (var stream = await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + + using (Stream stream = await env.SharedFileService.OpenReadAsync("builds/build1/pretranslate.src.json")) + using (StreamReader reader = new(stream)) { - using (var reader = new StreamReader(stream)) - { - JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); - Assert.NotNull(pretranslationJsonObject); - Assert.That( - pretranslationJsonObject!.ToList().Count, - Is.EqualTo(numEntriesWrittenToPretranslate), - JsonSerializer.Serialize(pretranslationJsonObject) - ); - } + JsonArray? pretranslationJsonObject = JsonSerializer.Deserialize(reader.ReadToEnd()); + Assert.That(pretranslationJsonObject, Is.Not.Null); + Assert.That( + pretranslationJsonObject, + Has.Count.EqualTo(numEntriesWrittenToPretranslate), + JsonSerializer.Serialize(pretranslationJsonObject) + ); } } - private class TestEnvironment : DisposableBase + private class TestEnvironment : ObjectModel.DisposableBase { public ISharedFileService SharedFileService { get; } public ICorpusService CorpusService { get; } @@ -288,6 +251,19 @@ private class TestEnvironment : DisposableBase public TestEnvironment() { + if (!Sldr.IsInitialized) + Sldr.Initialize(offlineMode: true); + + CleanupProjectFiles(); + ZipFile.CreateFromDirectory( + Path.Combine("..", "..", "..", "Services", "data", "paratext"), + Path.Combine(Path.GetTempPath(), "Project.zip") + ); + ZipFile.CreateFromDirectory( + Path.Combine("..", "..", "..", "Services", "data", "paratext2"), + Path.Combine(Path.GetTempPath(), "Project2.zip") + ); + Engines = new MemoryRepository(); Engines.Add( new TranslationEngine @@ -297,7 +273,15 @@ public TestEnvironment() SourceLanguage = "es", TargetLanguage = "en", BuildRevision = 1, - CurrentBuild = new Build { BuildId = "build1", JobState = BuildJobState.Pending } + IsModelPersisted = false, + CurrentBuild = new() + { + BuildId = "build1", + JobId = "job1", + JobState = BuildJobState.Pending, + JobRunner = BuildJobRunner.Hangfire, + Stage = NmtBuildStages.Preprocess + } } ); CorpusService = new CorpusService(); @@ -352,5 +336,16 @@ public TestEnvironment() new LanguageTagService() ); } + + protected override void DisposeManagedResources() + { + CleanupProjectFiles(); + } + + private static void CleanupProjectFiles() + { + File.Delete(Path.Combine(Path.GetTempPath(), "Project.zip")); + File.Delete(Path.Combine(Path.GetTempPath(), "Project2.zip")); + } } } diff --git a/tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs b/tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs index 38c9a61f2..dd5b2a5b0 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs @@ -215,7 +215,7 @@ public async Task GetWordGraphAsync() ); } - private class TestEnvironment : DisposableBase + private class TestEnvironment : ObjectModel.DisposableBase { private readonly Hangfire.InMemory.InMemoryStorage _memoryStorage; private readonly BackgroundJobClient _jobClient; @@ -234,7 +234,8 @@ public TestEnvironment() EngineId = "engine1", SourceLanguage = "es", TargetLanguage = "en", - BuildRevision = 1 + BuildRevision = 1, + IsModelPersisted = false } ); TrainSegmentPairs = new MemoryRepository(); diff --git a/tests/SIL.Machine.AspNetCore.Tests/Usings.cs b/tests/SIL.Machine.AspNetCore.Tests/Usings.cs index 4aafc901d..c4806736a 100644 --- a/tests/SIL.Machine.AspNetCore.Tests/Usings.cs +++ b/tests/SIL.Machine.AspNetCore.Tests/Usings.cs @@ -21,5 +21,4 @@ global using SIL.Machine.Tokenization; global using SIL.Machine.Translation; global using SIL.Machine.Utils; -global using SIL.ObjectModel; global using SIL.WritingSystems;