Releases: dotnet/machinelearning
ML.NET 1.5.4
New Features
- New API for exporting models to Onnx. (#5544). A new API has been added to Onnx converter to specify the output columns you care about. This will export a smaller and more performant model in many cases.
Enhancements
- Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395) (Thank you @jasallen)
- Update OnnxRuntime to 1.6 (#5529)
- Updated tensorflow.net to 0.20.0 (#5404)
- Added in DcgTruncationLevel to AutoML api and increased default level to 10 (#5433)
Bug Fixes
- AutoML.NET specific fixes.
- Fixed AutoFitMaxExperimentTimeTest (#5506)
- Fixed code generator tests failure (#5520)
- Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445)
- Handled exception during GetNextPipeline for AutoML (#5455)
- Fixed internationalization bug(#5162) in AutoML parameter sweeping caused by culture dependent float parsing. (#5163)
- Fixed MaxModels exit criteria for AutoML unit test (#5471)
- Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548)
- Fixed bug in Tensorflow Transforer with handling primitive types (#5547)
- Fixed MLNet.CLI build error (#5546)
- Fixed memory leaks from OnnxTransformer (#5518)
- Fixed memory leak in object pool (#5521)
- Fixed Onnx Export for ProduceWordBags (#5435)
- Upgraded boundary calculation and expected value calculation in SrCnnEntireAnomalyDetector (#5436)
- Fixed SR anomaly score calculation at beginning (#5502)
- Improved error message in ColumnConcatenatingEstimator (#5444)
- Fixed issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468)
- Fixed issue 4322, enable lda summary output (#5260)
- Fixed perf regression in ShuffleRows (#5417)
- Change the _maxCalibrationExamples default on CalibratorUtils (#5415)
Build / Test updates
- Migrated to Arcade build system that is used my multiple dotnet projects. This will give increased build/CI efficiencies going forward. Updated build instructions can be found in the docs/building folder
- Fixed MacOS builds (#5467 and #5457)
Documentation Updates
Breaking Changes
- None
ML.NET 1.5.2
New Features
- New API and algorithms for time series data. In this release ML.NET introduces new capabilities for working with time series data.
- Ranking experiments in AutoML.NET API. ML.NET now adds support for automating ranking experiments. (#5150, #5246) Corresponding support will soon be added to Model Builder in Visual Studio.
- Cross validation support in ranking (#5263)
- CountTargetEncodingEstimator. This transforms a categorical column into a set of features that includes the count of each label class, the log-odds for each label class and the back-off indicator (#4514)
Enhancements
- Onnx Enhancements
- Support more types for ONNX export of HashEstimator (#5104)
- Added ONNX export support for NaiveCalibrator (#5289)
- Added ONNX export support for StopWordsRemovingEstimator and CustomStopWordsRemovingEstimator (#5279)
- Support onnx export with previous OpSet version (#5176)
- Added a sample for Onnx conversion (#5195)
- New features in old transformers
- Output topic summary to model file for LDATransformer (#5260)
- Use Channel Instead of BufferBlock (#5123, #5313). (Thanks @jwood803)
- Support specifying command timeout while using the database loader (#5288)
- Added cross entropy support to validation training, edited metric reporting (#5255)
- Allow TextLoader to load empty float/double fields as NaN instead of 0 (#5198)
Bug Fixes
- Changed default value of RowGroupColumnName from null to GroupId (#5290)
- Updated AveragedPerceptron default iterations from 1 to 10 (#5258)
- Properly normalize column names in Utils.GetSampleData() for duplicate cases (#5280)
- Add two-variable scenario in Tensor shape inference for TensorflowTransform (#5257)
- Fixed score column name and order bugs in CalibratorTransformer (#5261)
- Fix for conditional error in root cause analysis additions (#5269)
- Ensured Sanitized Column Names are Unique in AutoML CLI (#5177)
- Ensure that the graph is set to be the current graph when scoring with multiple models (#5149)
- Uniform onnx conversion method when using non-default column names (#5146)
- Fixed multiple issues related to splitting data. (#5227)
- Changed default NGram length from 1 to 2. (#5248)
- Improve exception msg by adding column name (#5232)
- Use model schema type instead of class definition schema (#5228)
- Use GetRandomFileName when creating random temp folder to avoid conflict (#5229)
- Filter anomalies according to boundaries under AnomalyAndMargin mode (#5212)
- Improve error message when defining custom type for variables (#5114)
- Fixed OnnxTransformer output column mapping. (#5192)
- Fixed version format of built packages (#5197)
- Improvements to "Invalid TValue" error message (#5189)
- Added IDisposable to OnnxTransformer and fixed memory leaks (#5348)
- Fixes #4392. Added AddPredictionEnginePool overload for implementation factory (#4393)
- Updated codegen to make it work with mlnet 1.5 (#5173)
- Updated codegen to support object detection scenario. (#5216)
- Fix issue #5350, check file lock before reload model (#5351)
- Improve handling of infinity values in AutoML.NET when calculating average CV metrics (#5345)
- Throw when PCA generates invalid eigenvectors (#5349)
- RobustScalingNormalizer entrypoint added (#5310)
- Replace whitelist terminology to allow list (#5328) (Thanks @LetticiaNicoli)
- Fixes (#5352) issues caused by equality with non-string values for root cause localization (#5354)
- Added catch in R^2 calculation for case with few samples (#5319)
- Added support for RankingMetrics with CrossValSummaryRunner (#5386)
Test updates
- Refactor of OnnxConversionTests.cs (#5185)
- New code coverage (#5169)
- Test fix using breastcancel dataset and test cleanup (#5292)
Documentation Updates
- Updated ORT version info for OnnxScoringEstimator (#5175)
- Updated OnnxTransformer docs (#5296)
- Improve VectorTypeAttribute(dims) docs (#5301)
Breaking Changes
- None
ML.NET 1.5.0
New Features
- New anomaly detection algorithm (#5135). ML.NET has previously supported anomaly detection through DetectAnomalyBySrCnn. This function operates in a streaming manner by computing anomalies around each arriving point and examining a window around it. Now we introduce a new function
DetectEntireAnomalyBySrCnn
that computes anomalies by considering the entire dataset and also supports the ability to set sensitivity and output margin. - Root Cause Detection (#4925) ML.NET now also supports root cause detection for anomalies detected in time series data.
Enhancements
- Updates to TextLoader
- Onnxruntime updated to v1.3 (#5104). This brings support for additional data types for the HashingEstimator.
- Onnx export for OneHotHashEncodingTransformer and HashingTransormer (#5013, #5152, #5138)
- Support for Categorical features in CalculateFeatureContribution of LightGBM (#5018)
Bug Fixes
In this release we have traced down every bug that would occur randomly and sporadically and fixed many subtle bugs. As a result, we have also re-enabled a lot of tests listed in the Test Updates section below.
- Fixed race condition for test MulticlassTreeFeaturizedLRTest (#4950)
- Fix SsaForecast bug (#5023)
- Fixed x86 crash (#5081)
- Fixed and added unit tests for EnsureResourceAsync hanging issue (#4943)
- Added IDisposable support for several classes (#4939)
- Updated libmf and corresponding MatrixFactorizationSimpleTrainAndPredict() baselines per build (#5121)
- Fix MatrixFactorization trainer's warning (#5071)
- Update CodeGenerator's console project to netcoreapp3.1 (#5066)
- Let ImageLoadingTransformer dispose the last image it loads (#5056)
- [LightGBM] Fixed bug for empty categorical values (#5048)
- Converted potentially large variables to type long (#5041)
- Made resource downloading more robust (#4997)
- Updated MultiFileSource.Load to fix inconsistent behavior with multiple files (#5003)
- Removed WeakReference already cleaned up by GC (#4995)
- Fixed Bitmap(file) locking the file. (#4994)
- Remove WeakReference list in PredictionEnginePoolPolicy. (#4992)
- Added the assembly name of the custom transform to the model file (#4989)
- Updated constructor of ImageLoadingTransformer to accept empty imageFolder paths (#4976)
Onnx bug fixes
- ColumnSelectingTransformer now infers ONNX shape (#5079)
- Fixed KMeans scoring differences between ORT and OnnxRunner (#4942)
- CountFeatureSelectingEstimator no selection support (#5000)
- Fixes OneHotEncoding Issue (#4974)
- Fixes multiclass logistic regression (#4963)
- Adding vector tests for KeyToValue and ValueToKey (#5090)
AutoML fixes
- Handle NaN optimization metric in AutoML (#5031)
- Add projects capability in CodeGenerator (#5002)
- Simplify CodeGen - phase 2 (#4972)
- Support sweeping multiline option in AutoML (#5148)
Test updates
- Fix libomp installation for MacOS Builds(#5143, #5141)
- address TF test download fail, use resource manager with retry download (#5102)
- Adding OneHotHashEncoding Test (#5098)
- Changed Dictionary to ConcurrentDictionary (#5097)
- Added SQLite database to test loading of datasets in non-Windows builds (#5080)
- Added ability to compare configuration specific baselines, updated baslines for many tests and re-enabled disabled tests (#5045, #5059, #5068, #5057, #5047, #5029, #5094, #5060)
- Fixed TestCancellation hanging (#4999)
- fix benchmark test hanging issue (#4985)
- Added working version of checking whether file is available for access (#4938)
Documentation Updates
- Update OnnxTransformer Doc XML (#5085)
- Updated build docs for .NET Core 3.1 (#4967)
- Updated OnnxScoringEstimator's documentation (#4966)
- Fix xrefs in the LDSVM trainer docs (#4940)
- Clarified parameters on time series (#5038)
- Update ForecastBySsa function specifications and add seealso (#5027)
- Add see also section to TensorFlowEstimator docs (#4941)
Breaking Changes
- None
ML.NET 1.5.0-preview2
New Features (IN-PREVIEW, please provide feedback)
- TimeSeriesImputer (#4623) This data transformer can be used to impute missing rows in time series data.
- LDSVM Trainer (#4060) The "Local Deep SVM" usess trees as its SVM kernel to create a non-linear binary trainer. A sample can be found here.
- Onnxruntime updated to v1.2 This also includes support for GPU execution of onnx models
- Export-to-ONNX for below components:
Bug Fixes
- Fix issue in WaiterWaiter caused by race condition (#4829)
- Onnx Export change to allow for running inference on multiple rows in OnnxRuntime (#4783)
- Data splits to default to MLContext seed when not specified (#4764)
- Add Seed property to MLContext and use as default for data splits (#4775)
- Onnx bug fixes
- Updating onnxruntime version (#4882)
- Calculate ReduceSum row by row in ONNX model from OneVsAllTrainer (#4904)
- Several onnx export fixes related to KeyToValue and ValueToKey transformers (#4900, #4866, #4841, #4889, #4878, #4797)
- Fixes to onnx export for text related transforms (#4891, #4813)
- Fixed bugs in OptionalColumnTransform and ColumnSelecting (#4887, #4815)
- Alternate solution for ColumnConcatenatingTransformer (#4875)
- Added slot names support for OnnxTransformer (#4857)
- Fixed output schema of OnnxTransformer (#4849)
- Changed Binarizer node to be cast to the type of the predicted label … (#4818)
- Fix for OneVersusAllTrainer (#4698)
- Enable OnnxTransformer to accept KeyDataViewTypes as if they were UInt32 (#4824)
- Fix off by 1 error with the cats_int64s attribute for the OneHotEncoder ONNX operator (#4827)
- Changed Binarizer node to be cast to the type of the predicted label … (#4818)
- Updated handling of missing values with LightGBM, and added ability to use (0) as missing value (#4695)
- Double cast to float for some onnx estimators (#4745)
- Fix onnx output name for GcnTransform (#4786)
- Added support to run PFI on uncalibrated binary classification models (#4587)
- Fix bug in WordBagEstimator when training on empty data (#4696)
- Added Cancellation mechanism to Image Classification (through the experimental nuget) (fixes #4632) (#4650)
- Changed F1 score to return 0 instead of NaN when Precision + Recall is 0 (#4674)
- TextLoader, BinaryLoader and SvmLightLoader now check the existence of the input file before training (#4665)
- ImageLoadingTransformer now checks the existence of input folder before training (#4691)
- Use random file name for AutoML experiment folder (#4657)
- Using invariance culture when converting to string (#4635)
- Fix NullReferenceException when it comes to Recommendation in AutoML and CodeGenerator (#4774)
Enhancements
- Added in support for System.DateTime type for the DateTimeTransformer (#4661)
- Additional changes to ExpressionTransformer (#4614)
- Optimize generic MethodInfo for Func (#4588)
- Data splits to default to MLContext seed when not specified (#4764)
- Added in DateTime type support for TimeSeriesImputer (#4812)
Test updates
- Code analysis updates
- Update analyzer test library (#4740)
- Enable the internal code analyzer for test projects (#4731)
- Implement MSML_ExtendBaseTestClass (Test classes should be derived from BaseTestClass) (#4746)
- Enable MSML_TypeParamName for the full solution (#4762)
- Enable MSML_ParameterLocalVarName for the full solution (#4833)
- Enable MSML_SingleVariableDeclaration for the full solution (#4765)
- Better logging from tests
- Enable Conditional Numerical Reproducibility for tests (#4569)
- Changed all MLContext creation to include a fixed seed (#4736)
- Fix incorrect SynchronizationContext use in TestSweeper (#4779)
Documentation Updates
- Update cookbook to latest API (#4706)
- Update documentation to stop mentioning interfaces that no longer exist (#4673)
- Roadmap update (#4704)
- Added release process documentation to README.md (#4402)
- Fix documentation of SvmLightLoader (#4616)
- Correct KMeans scoring function doc (#4705)
- Several typo fixes thanks to @MaherJendoubi (#4627, #4631, #4626 #4617, #4633, #4629, #4642)
- Other typo fixes: (#4628, #4685, #4885)
Breaking Changes
- None
ML.NET 1.5.0-preview
New Features (IN-PREVIEW, please provide feedback)
-
Export-to-ONNX for below components:
- WordTokenizingTransformer (#4451)
- NgramExtractingTransformer (#4451)
- OptionalColumnTransform (#4454)
- KeyToValueMappingTransformer (#4455)
- LbfgsMaximumEntropyMulticlassTrainer (4462)
- LightGbmMulticlassTrainer (4462)
- LightGbmMulticlassTrainer with SoftMax (4462)
- OneVersusAllTrainer (4462)
- SdcaMaximumEntropyMulticlassTrainer (4462)
- SdcaNonCalibratedMulticlassTrainer (4462)
- CopyColumn Transform (#4486)
- PriorTrainer (#4515)
-
DateTime Transformer (#4521)
-
Loader and Saver for SVMLight file format (#4190)
Sample -
Expression transformer (#4548)
The expression transformer takes the expression in the form of text using syntax of a simple expression language, and performs the operation defined in the expression on the input columns in each row of the data. The transformer supports having a vector input column, in which case it applies the expression to each slot of the vector independently. The expression language is extendable to user defined operations.
Sample
Bug Fixes
- Fix using permutation feature importance with Binary Prediction Transformer and CalibratedModelParametersBase loaded from disk. (#4306)
- Fixed model saving and loading of OneVersusAllTrainer to include SoftMax. (#4472)
- Ignore hidden columns in AutoML schema checks of validation data. (#4490)
- Ensure BufferBlocks are completed and empty in RowShufflingTransformer. (#4479)
- Create methods not being called when loading models from disk. (#4485)
- Fixes onnx exports for binary classification trainers. (#4463)
- Make PredictionEnginePool.GetPredictionEngine thread safe. (#4570)
- Memory leak when using FeaturizeText transform. (#4576)
- System.ArgumentOutOfRangeException issue in CustomStopWordsRemovingTransformer. (#4592)
- Image Classification low accuracy on EuroSAT Dataset. (4522)
Stability fixes by Sam Harwell
- Prevent exceptions from escaping FileSystemWatcher events. (#4535)
- Make local functions static where applicable. (#4530)
- Disable CS0649 in OnnxConversionTest. (#4531)
- Make test methods public. (#4532)
- Conditionally compile helper code. (#4534)
- Avoid running API Compat for design time builds. (#4529)
- Pass by reference when null is not expected. (#4546)
- Add Xunit.Combinatorial for test projects. (#4545)
- Use Theory to break up tests in OnnxConversionTest. (#4533)
- Update code coverage integration. (#4543)
- Use std::unique_ptr for objects in LdaEngine. (#4547)
- Enable VSTestBlame to show details for crashes. (#4537)
- Use std::unique_ptr for samplers_ and likelihood_in_iter_. (#4551)
- Add tests for IParameterValue implementations. (#4549)
- Convert LdaEngine to a SafeHandle. (#4538)
- Create SafeBoosterHandle and SafeDataSetHandle. (#4539)
- Add IterationDataAttribute. (#4561)
- Add tests for ParameterSet equality. (#4550)
- Add a test handler for AppDomain.UnhandledException. (#4557)
Breaking Changes
None
Enhancements
- Hash Transform API that takes in advanced options. (#4443)
- Image classification performance improvements and option to create validation set from train set. (#4522)
- Upgraded OnnxRuntime to v1.0 and Google Protobuf to 3.10.1. (#4416)
CLI and AutoML API
- None.
Remarks
- Thank you, Sam Harwell for making a series of stability fixes that has substantially increased the stability of our Build CI.
ML.NET 1.4.0
New Features
-
General Availability of Image Classification API
IntroducesMicrosoft.ML.Vision
package that enables image classification by leveraging an existing pre-trained deep neural network model. Here the API trains the last classification layer using TensorFlow by using its C# bindings from TensorFlow .NET. This is a high level API that is simple yet powerful. Below are some of the key features:GPU training
: Supported on Windows and Linux, more information here.Early stopping
: Saves time by stopping training automatically when model has been stabelized.Learning rate scheduler
: Learning rate is an integral and potentially difficult part of deep learning. By providing learning rate schedulers, we give users a way to optimize the learning rate with high initial values which can decay over time. High initial learning rate helps to introduce randomness into the system, allowing the Loss function to better find the global minima. While the decayed learning rate helps to stabilize the loss over time. We have implemented Exponential Decay Learning rate scheduler and Polynomial Decay Learning rate scheduler.Pre-trained DNN Architectures
: The supported DNN architectures used internally fortransfer learning
are below:- Inception V3.
- ResNet V2 101.
- ResNet V2 50.
- MobileNet V2.
Example code:
var pipeline = mlContext.MulticlassClassification.Trainers.ImageClassification( featureColumnName: "Image", labelColumnName: "Label"); ITransformer trainedModel = pipeline.Fit(trainDataView);
Samples
-
General Availability of Database Loader
The database loader enables to load data from databases into theIDataView
and therefore enables model training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, etc.It is important to highlight that in the same way as when training from files, when training with a database ML .NET also supports data streaming, meaning that the whole database doesn’t need to fit into memory, it’ll be reading from the database as it needs so you can handle very large databases (i.e. 50GB, 100GB or larger).
Example code:
//Lines of code for loading data from a database into an IDataView for a later model training //... string connectionString = @"Data Source=YOUR_SERVER;Initial Catalog= YOUR_DATABASE;Integrated Security=True"; string commandText = "SELECT * from SentimentDataset"; DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader(); DbProviderFactory providerFactory = DbProviderFactories.GetFactory("System.Data.SqlClient"); DatabaseSource dbSource = new DatabaseSource(providerFactory, connectionString, commandText); IDataView trainingDataView = loader.Load(dbSource); // ML.NET model training code using the training IDataView //... public class SentimentData { public string FeedbackText; public string Label; }
-
General Availability of PredictionEnginePool for scalable deployment
When deploying an ML model into multi-threaded and scalable .NET Core web applications and services (such as ASP .NET Core web apps, WebAPIs or an Azure Function) it is recommended to use the PredictionEnginePool instead of directly creating the PredictionEngine object on every request due to performance and scalability reasons. For further background information on why the PredictionEnginePool is recommended, read this blog post. -
General Availability of Enhanced for .NET Core 3.0
This means ML .NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.
Bug Fixes
- Adds reasonable exception when user tries to use
OnnxSequenceType
attribute without specifing sequence type. (#4272) - Image Classification API: Fix processing incomplete batch(<batchSize), images processed per epoch , enable EarlyStopping without Validation Set. (#4289)
- Exception is thrown if NDCG > 10 is used with LightGbm for evaluating ranking. (##4081)
- DatabaseLoader error when using attributes (i.e ColumnName). (#4308)
- Recommendation experiment got SMAC local search exception during training. (#4358)
- TensorFlow exception triggered: input ended unexpectedly in the middle of a field. (#4314)
PredictionEngine
breaks after saving/loading a Model. (#4321)- Data file locked even after TextLoader goes out of context. (#4404)
- ImageClassification API should save cache files/meta files in user temp directory or user provided workspace path. (#4410)
Breaking Changes
None
Enhancements
- Publish latest nuget to public feed from master branch when commits are made. (#4406)
- Defaults for ImageClassification API. (#4415)
CLI and AutoML API
- Recommendation Task. (#4246, 4391)
- Image Classification Task. (#4395)
- Move AutoML CodeGen to master from feature branch. (#4365)
Remarks
- None.
ML.NET 1.4.0-preview2
New Features
-
Deep Neural Networks Training (0.16.0-preview2)
Improves the in-preview
ImageClassification
API further:- Early stopping feature stops the training when optimal accuracy is reached (#4237)
- Enables inferencing on in-memory images (#4242)
PredictedLabel
output column now contains actual class labels instead ofuint32
class index values (#4228)- GPU support on Windows and Linux (#4270, #4277)
- Upgraded TensorFlow .NET version to 0.11.3 (#4205)
In-memory image inferencing sample
Early stopping sample
GPU samples -
New ONNX Exporters (1.4.0-preview2)
Bug Fixes
- OnnxSequenceType and ColumnName attributes together doesn't work (#4187)
- Fix memory leak in TensorflowTransformer (#4223)
- Enable permutation feature importance to be used with model loaded from disk (#4262)
IsSavedModel
returns true when loaded TensorFlow model is a frozen model (#4262)- Exception when using
OnnxSequenceType
attribute directly without specify sequence type (#4272, #4297)
Samples
- TensorFlow full model retrain sample (#4127)
Breaking Changes
None.
Obsolete API
Enhancements
- Improve exception message in LightGBM (#4214)
- FeaturizeText should allow only outputColumnName to be defined (#4211)
- Fix NgramExtractingTransformer GetSlotNames to not allocate a new delegate on every invoke (#4247)
- Resurrect broken code coverage build and re-enable code coverage for pull request (#4261)
- NimbusML entrypoint for permutation feature importance (#4232)
- Reuse memory when copying outputs from TensorFlow graph (#4260)
- DateTime to DateTime standard conversion (#4273)
- CodeCov version upgraded to 1.7.2 (#4291)
CLI and AutoML API
None.
Remarks
None.
ML.NET 1.4.0-preview
New Features
-
Deep Neural Networks Training (0.16.0-preview) (#4151)
Improves the in-preview
ImageClassification
API further:- Increases DNN training speed by ~10x compared to the same API in 0.15.1 release.
- Prevents repeated computations by caching featurized image values to disk from intermediate layers to train the final fully-connected layer.
- Reduced and constant memory footprint.
- Simplifies the API by not requiring the user to pre-process the image.
- Introduces callback to provide metrics during training such as accuracy, cross-entropy.
- Improved image classification sample.
public static ImageClassificationEstimator ImageClassification( this ModelOperationsCatalog catalog, string featuresColumnName, string labelColumnName, string scoreColumnName = "Score", string predictedLabelColumnName = "PredictedLabel", Architecture arch = Architecture.InceptionV3, int epoch = 100, int batchSize = 10, float learningRate = 0.01f, ImageClassificationMetricsCallback metricsCallback = null, int statisticFrequency = 1, DnnFramework framework = DnnFramework.Tensorflow, string modelSavePath = null, string finalModelPrefix = "custom_retrained_model_based_on_", IDataView validationSet = null, bool testOnTrainSet = true, bool reuseTrainSetBottleneckCachedValues = false, bool reuseValidationSetBottleneckCachedValues = false, string trainSetBottleneckCachedValuesFilePath = "trainSetBottleneckFile.csv", string validationSetBottleneckCachedValuesFilePath = "validationSetBottleneckFile.csv" )
-
Database Loader (0.16.0-preview) (#4070,#4091,#4138)
Additional DatabaseLoader support:
- Support DBNull.
- Add
CreateDatabaseLoader<TInput>
to map columns from a .NET Type. - Read multiple columns into a single vector
string connectionString = "YOUR_RELATIONAL_DATABASE_CONNECTION_STRING"; string commandText = "SELECT * from URLClicks"; DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<UrlClick>(); DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, commandText); IDataView dataView = loader.Load(dbSource);
-
Enhanced .NET Core 3.0 Support
- Use C# hardware intrinsics detection to support AVX, SSE and software fallbacks
- Allows for faster training on AVX-supported machines
- Allows for scoring core ML .NET models on ARM processors. (Note: some components do not support ARM yet, ex. FastTree, LightGBM, OnnxTransformer)
Bug Fixes
None.
Samples
- DeepLearning Image Classification Training sample (DNN Transfer Learning) (#633)
- DatabaseLoader sample loading an IDataView from SQL Server localdb (#611)
Breaking Changes
None
Enhancements
None.
CLI and AutoML API
- AutoML codebase has moved from feature branch to master branch (#3882).
Remarks
None.
ML.NET 1.3.1
New Features
-
Deep Neural Networks Training (PREVIEW) (#4057)
Introduces in-preview 0.15.1Microsoft.ML.DNN
package that enables full DNN model retraining and transfer learning in .NET using C# bindings for tensorflow provided by Tensorflow .NET. The goal of this package is to allow high level DNN training and scoring tasks such as image classification, text classification, object detection, etc using simple yet powerful APIs that are framework agnostic but currently they only uses Tensorflow as the backend. The below APIs are in early preview and we hope to get customer feedback that we can incorporate in the next iteration.public static DnnEstimator RetrainDnnModel( this ModelOperationsCatalog catalog, string[] outputColumnNames, string[] inputColumnNames, string labelColumnName, string tensorFlowLabel, string optimizationOperation, string modelPath, int epoch = 10, int batchSize = 20, string lossOperation = null, string metricOperation = null, string learningRateOperation = null, float learningRate = 0.01f, bool addBatchDimensionInput = false, DnnFramework dnnFramework = DnnFramework.Tensorflow) public static DnnEstimator ImageClassification( this ModelOperationsCatalog catalog, string featuresColumnName, string labelColumnName, string outputGraphPath = null, string scoreColumnName = "Score", string predictedLabelColumnName = "PredictedLabel", string checkpointName = "_retrain_checkpoint", Architecture arch = Architecture.InceptionV3, DnnFramework dnnFramework = DnnFramework.Tensorflow, int epoch = 10, int batchSize = 20, float learningRate = 0.01f, bool measureTrainAccuracy = false)
-
Database Loader (PREVIEW) (#4035)
Introduces Database loader that enables training on databases. This loader supports any relational database supported by System.Data in .NET Framework or .NET Core, meaning that you can use many RDBMS such as SQL Server, Azure SQL Database, Oracle, PostgreSQL, MySQL, etc. This feature is in early preview and can be accessed viaMicrosoft.ML.Experimental
nuget.public static DatabaseLoader CreateDatabaseLoader(this DataOperationsCatalog catalog, params DatabaseLoader.Column[] columns)
Bug Fixes
Serious
-
SaveOnnxCommand appears to ignore predictors when saving a model to ONNX format: This broke export to ONNX functionality. (3974)
-
Unable to use fasterrcnn onnx model. (3963)
-
PredictedLabel is always true for Anomaly Detection: This bug disabled scenarios like fraud detection using binary classification/PCA. (#4039)
-
Update build certifications: This bug broke the official builds because of outdated certificates that were being used. (#4059)
Other
- Stop LightGbm Warning for Default Metric Input: Fixes warning, LightGBM
Warning Unknown parameter metric=
is produced when the default metric is used. (#3965)
Samples
Breaking Changes
None
Enhancements
CLI and AutoML API
- Bug fixes.
Remarks
- Machine Learning at Microsoft with ML.NET is presented at KDD 2019 Proceedings
ML.NET v1.2.0
General Availability
-
Microsoft.ML.TimeSeries
- Anomaly detection algorithms (Spike and Change Point):
- Independent and identically distributed.
- Singular spectrum analysis.
- Spectral residual from Azure Anomaly Detector/Kensho team.
- Forecasting models:
- Singular spectrum analysis.
- Prediction Engine for online learning
- Enables updating time series model with new observations at scoring so that the user does not have to re-train the time series with old data each time.
- Anomaly detection algorithms (Spike and Change Point):
-
Microsoft.ML.OnnxTransformer
Enables scoring of ONNX models in the learning pipeline. Uses ONNX Runtime v0.4. -
Microsoft.ML.TensorFlow
Enables scoring of TensorFlow models in the learning pipeline. Uses TensorFlow v1.13. Very useful for image and text classification. Users can featurize images or text using DNN models and feed the result into a classical machine learning model like a decision tree or logistic regression trainer.
New Features
-
Tree-based featurization (#3812)
Generating features using tree structure has been a popular technique in data mining. Useful for capturing feature interactions when creating a stacked model, dimensionality reduction, or featurizing towards an alternative label. ML.NET's tree featurization trains a tree-based model and then maps input feature vector to several non-linear feature vectors. Those generated feature vectors are:
- The leaves it falls into. It's a binary vector with ones happens at the indexes of reached leaves,
- The paths that the input vector passes before hitting the leaves, and
- The reached leaves values.
Here are two references.
- p. 9 (a Kaggle solution adopted by FB below).
- Section 3. (Facebook)
- Section of Entity-level personalization with GLMix. (LinkedIn)
-
Microsoft.Extensions.ML integration package. (#3827)
This package makes it easier to use ML.NET with app models that support Microsoft.Extensions - i.e. ASP.NET and Azure Functions.
Specifically it contains functionality for:
- Dependency Injection
- Pooling PredictionEngines
- Reloading models when the file or URI has changed
- Hooking ML.NET logging to Microsoft.Extensions.Logging
Bug Fixes
Serious
-
Time series Sequential Transform needs to have a binding mechanism: This bug made it impossible to use time series in NimbusML. (#3875)
-
Build errors resulting from upgrading to VS2019 compilers: The default CMAKE_C_FLAG for debug configuration sets /ZI to generate a PDB capable of edit and continue. In the new compilers, this is incompatible with /guard:cf which we set for security reasons. (#3894)
-
LightGBM Evaluation metric parameters: In LightGbm EvaluateMetricType where if a user specified EvaluateMetricType.Default, the metric would not get added to the options Dictionary, and LightGbmWrappedTraining would throw because of that. (#3815)
-
Change default EvaluationMetric for LightGbm: In ML.NET, the default EvaluationMetric for LightGbm is set to EvaluateMetricType.Error for multiclass, EvaluationMetricType.LogLoss for binary etc. This leads to inconsistent behavior from the user's perspective. (#3859)
Other
- CustomGains should allow multiple values in argument attribute. (#3854)
Breaking Changes
None
Enhancements
-
Fixes the Hardcoded Sigmoid value from -0.5 to the value specified during training. (#3850)
-
Fix TextLoader constructor and add exception message. (#3788)
-
Introduce the
FixZero
argument to the LogMeanVariance normalizer. (#3916) -
Ensembles trainer now work with ITrainerEstimators instead of ITrainers. (#3796)
-
LightGBM Unbalanced Data Argument. (#3925)
-
Tree based trainers implement ICanGetSummaryAsIDataView. (#3892)
-
CLI and AutoML API
Documentation and Samples
- Samples for applying ONNX model to in-memory images. (#3851)
- Reformatted all ~200 samples to 85 character width so the horizontal scrollbar does not appear on docs webpage. (#3930, 3941, 3949, 3950, 3947, 3943, 3942, 3946, 3948)
Remarks
- Roughly 200 Github issues were closed, the count decreased from ~550 to 351. Most of the issues got resolved due to the release of stable API and availability of samples.