-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue 4322, enable lda summary output #5260
Changes from all commits
a41290c
5d7bc1c
aa75df9
cd7fc3e
2a7c2d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -5,6 +5,8 @@ | |||||
using System; | ||||||
using System.Collections.Generic; | ||||||
using System.IO; | ||||||
using System.IO.Compression; | ||||||
using System.Linq; | ||||||
using Microsoft.ML.CommandLine; | ||||||
using Microsoft.ML.Data; | ||||||
using Microsoft.ML.Internal.Utilities; | ||||||
|
@@ -1047,6 +1049,18 @@ public void SavePipeLda() | |||||
"loader=Text{col=F1V:Num:0-2}", | ||||||
"xf=Lda{col={name=Result src=F1V numtopic=3 alphasum=3 ns=3 reset=+ t=1} summary=+}", | ||||||
}, forceDense: true); | ||||||
|
||||||
// topic summary text file saved inside the model.zip file. | ||||||
string name = TestName + ".zip"; | ||||||
string modelPath = GetOutputPath("SavePipe", name); | ||||||
using (var file = Env.OpenInputFile(modelPath)) | ||||||
using (var strm = file.OpenReadStream()) | ||||||
using (var zip = new ZipArchive(strm, ZipArchiveMode.Read)) | ||||||
{ | ||||||
Comment on lines
+1052
to
+1059
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (appended to code to allow for threaded comment) Can you please also add a test to test your PR on ML.NET's API instead of MAML? machinelearning/test/Microsoft.ML.Functional.Tests/IntrospectiveTraining.cs Lines 190 to 191 in 45a16dc
Thanks.! #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great idea, I will test both MAML and from transformer. Actually test from MAML and check summary was how TLC originally test topic summary. In reply to: 446344620 [](ancestors = 446344620) |
||||||
var entry = zip.Entries.First(source => source.Name == "word_topic_summary-Result.txt"); | ||||||
Assert.True(entry != null); | ||||||
} | ||||||
|
||||||
Done(); | ||||||
} | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(appended to code to allow for threaded discussion)
Should we also add the
outputWordSummary
option to the ML.NET public API found here on the TextCatalog?:https://github.com/frank-dong-ms/machinelearning/blob/aa75df98f549741893deb3a042a5dd287a56ab99/src/Microsoft.ML.Transforms/Text/TextCatalog.cs#L597-L614 #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be break public API, I think we should avoid break public API as much as possible. User still can get summary via the public interface "GetLdaDetails" from lda transformer as this test shows
In reply to: 448136744 [](ancestors = 448136744)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I don't think it's a good idea to add this option to this public API. No need for a new API only for this.
In reply to: 448522335 [](ancestors = 448522335,448512997,448136744)