-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1547 from shauheen/v07
Cherrypick for release 0.7
- Loading branch information
Showing
52 changed files
with
1,302 additions
and
129 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# ML.NET 0.7 Release Notes | ||
|
||
Today we are excited to release ML.NET 0.7, which our algorithms strongly | ||
recommend you to try out! This release enables making recommendations with | ||
matrix factorization, identifying unusual events with anomaly detection, | ||
adding custom transformations to your ML pipeline, and more! We also have a | ||
small surprise for those who work in teams that use both .NET and Python. | ||
Finally, we wanted to thank the many new contributors to the project since the | ||
last release! | ||
|
||
### Installation | ||
|
||
ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET | ||
Core | ||
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md) | ||
for more details. | ||
|
||
You can install ML.NET NuGet from the CLI using: | ||
``` | ||
dotnet add package Microsoft.ML | ||
``` | ||
|
||
From package manager: | ||
``` | ||
Install-Package Microsoft.ML | ||
``` | ||
|
||
### Release Notes | ||
|
||
Below are some of the highlights from this release. | ||
|
||
* Added Matrix factorization for recommendation problems | ||
([#1263](https://github.com/dotnet/machinelearning/pull/1263)) | ||
|
||
* Matrix factorization (MF) is a common approach to recommendations when | ||
you have data on how users rated items in your catalog. For example, you | ||
might know how users rated some movies and want to recommend which other | ||
movies they are likely to watch next. | ||
* ML.NET's MF uses [LIBMF](https://github.com/cjlin1/libmf). | ||
* Example usage of MF can be found | ||
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs). | ||
The example is general but you can imagine that the matrix rows | ||
correspond to users, matrix columns correspond to movies, and matrix | ||
values correspond to ratings. This matrix would be quite sparse as users | ||
have only rated a small subset of the catalog. | ||
* Note: [ML.NET | ||
0.3](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/release-notes/0.3/release-0.3.md) | ||
included Field-Aware Factorization Machines (FFM) as a learner for | ||
binary classification. FFM is a generalization of MF, but there are a | ||
few differences: | ||
* FFM enables taking advantage of other information beyond the rating | ||
a user assigns to an item (e.g. movie genre, movie release date, | ||
user profile). | ||
* FFM is currently limited to binary classification (the ratings needs | ||
to be converted to 0 or 1), whereas MF solves a regression problem | ||
(the ratings can be continuous numbers). | ||
* If the only information available is the user-item ratings, MF is | ||
likely to be significantly faster than FFM. | ||
* A more in-depth discussion can be found | ||
[here](https://www.csie.ntu.edu.tw/~cjlin/talks/recsys.pdf). | ||
|
||
* Enabled anomaly detection scenarios | ||
([#1254](https://github.com/dotnet/machinelearning/pull/1254)) | ||
|
||
* [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection) | ||
enables identifying unusual values or events. It is used in scenarios | ||
such as fraud detection (identifying suspicious credit card | ||
transactions) and server monitoring (identifying unusual activity). | ||
* This release includes the following anomaly detection techniques: | ||
SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, and | ||
IidSpikeDetector. | ||
* Example usage can be found | ||
[here](https://github.com/dotnet/machinelearning/blob/7fb76b026d0035d6da4d0b46bd3f2a6e3c0ce3f1/test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs). | ||
|
||
* Enabled using ML.NET in Windows x86 apps | ||
([#1008](https://github.com/dotnet/machinelearning/pull/1008)) | ||
|
||
* ML.NET can now be used in x86 apps. | ||
* Some components that are based on external dependencies (e.g. | ||
TensorFlow) will not be available in x86. Please open an issue on GitHub | ||
for discussion if this blocks you. | ||
|
||
* Added the `CustomMappingEstimator` for custom data transformations | ||
[#1406](https://github.com/dotnet/machinelearning/pull/1406) | ||
|
||
* ML.NET has a wide variety of data transformations for pre-processing and | ||
featurizing data (e.g. processing text, images, categorical features, | ||
etc.). | ||
* However, there might be application-specific transformations that would | ||
be useful to do within an ML.NET pipeline (as opposed to as a | ||
pre-processing step). For example, calculating [cosine | ||
similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between two | ||
text columns (after featurization) or something as simple as creating a | ||
new column that adds the values in two other columns. | ||
* An example of the `CustomMappingEstimator` can be found | ||
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/test/Microsoft.ML.Tests/Transformers/CustomMappingTests.cs#L55). | ||
|
||
* Consolidated several API concepts in `MLContext` | ||
[#1252](https://github.com/dotnet/machinelearning/pull/1252) | ||
|
||
* `MLContext` replaces `LocalEnvironment` and `ConsoleEnvironment` but | ||
also includes properties for ML tasks like | ||
`BinaryClassification`/`Regression`, various transforms/trainers, and | ||
evaluation. More information can be found in | ||
[#1098](https://github.com/dotnet/machinelearning/issues/1098). | ||
* Example usage can be found | ||
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/code/MlNetCookBook.md). | ||
|
||
* Open sourced [NimbusML](https://github.com/microsoft/nimbusml): experimental | ||
Python bindings for ML.NET. | ||
|
||
* NimbusML makes it easy for data scientists to train models in Python and | ||
hand them off to .NET developers to include in their apps and services | ||
using ML.NET. | ||
* NimbusML components easily integrate into | ||
[scikit-learn](http://scikit-learn.org/stable/) pipelines. | ||
* Note that NimbusML is an experimental project without the same support | ||
level as ML.NET. | ||
|
||
### Acknowledgements | ||
|
||
Shoutout to [dzban2137](https://github.com/dzban2137), | ||
[beneyal](https://github.com/beneyal), | ||
[pkulikov](https://github.com/pkulikov), | ||
[amiteshenoy](https://github.com/amiteshenoy), | ||
[DAXaholic](https://github.com/DAXaholic), | ||
[Racing5372](https://github.com/Racing5372), | ||
[ThePiranha](https://github.com/ThePiranha), | ||
[helloguo](https://github.com/helloguo), | ||
[elbruno](https://github.com/elbruno), | ||
[harshsaver](https://github.com/harshsaver), | ||
[f1x3d](https://github.com/f1x3d), [rauhs](https://github.com/rauhs), | ||
[nihitb06](https://github.com/nihitb06), | ||
[nandaleite](https://github.com/nandaleite), | ||
[timitoc](https://github.com/timitoc), | ||
[feiyun0112](https://github.com/feiyun0112), | ||
[Pielgrin](https://github.com/Pielgrin), | ||
[malik97160](https://github.com/malik97160), | ||
[Niladri24dutta](https://github.com/Niladri24dutta), | ||
[suhailsinghbains](https://github.com/suhailsinghbains), | ||
[terop](https://github.com/terop), [Matei13](https://github.com/Matei13), | ||
[JorgeAndd](https://github.com/JorgeAndd), and the ML.NET team for their | ||
contributions as part of this release! |
114 changes: 114 additions & 0 deletions
114
docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.ML.Runtime.Api; | ||
using Microsoft.ML.Runtime.Data; | ||
using Microsoft.ML.Trainers; | ||
using System; | ||
using System.Collections.Generic; | ||
|
||
// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT. | ||
// If you change the existinc content, check that the files referencing it in the XML documentation are still correct, as they reference | ||
// line by line. | ||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public partial class TrainerSamples | ||
{ | ||
// The following variables defines the shape of a matrix. Its shape is _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount. | ||
// The variable _synthesizedMatrixFirstRowIndex indicates the integer that would be mapped to the first row index. If user data uses | ||
// 0-based indices for rows, _synthesizedMatrixFirstRowIndex can be set to 0. Similarly, for 1-based indices, _synthesizedMatrixFirstRowIndex | ||
// could be 1. | ||
const int _synthesizedMatrixFirstColumnIndex = 1; | ||
const int _synthesizedMatrixFirstRowIndex = 1; | ||
const int _synthesizedMatrixColumnCount = 60; | ||
const int _synthesizedMatrixRowCount = 100; | ||
|
||
// A data structure used to encode a single value in matrix | ||
internal class MatrixElement | ||
{ | ||
// Matrix column index starts from _synthesizedMatrixFirstColumnIndex and is at most | ||
// _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount - 1. | ||
// Contieuous=true means that all values between the min and max indexes are all allowed. | ||
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)] | ||
public uint MatrixColumnIndex; | ||
// Matrix row index starts from _synthesizedMatrixFirstRowIndex and is at most | ||
// _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount - 1. | ||
// Contieuous=true means that all values between the min and max indexes are all allowed. | ||
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)] | ||
public uint MatrixRowIndex; | ||
// The value at the column MatrixColumnIndex and row MatrixRowIndex. | ||
public float Value; | ||
} | ||
|
||
// A data structure used to encode prediction result. Comparing with MatrixElement, The field Value in MatrixElement is | ||
// renamed to Score because Score is the default name of matrix factorization's output. | ||
internal class MatrixElementForScore | ||
{ | ||
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)] | ||
public uint MatrixColumnIndex; | ||
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)] | ||
public uint MatrixRowIndex; | ||
public float Score; | ||
} | ||
|
||
// This example first creates in-memory data and then use it to train a matrix factorization model. Afterward, quality metrics are reported. | ||
public static void MatrixFactorizationInMemoryData() | ||
{ | ||
// Create an in-memory matrix as a list of tuples (column index, row index, value). | ||
var dataMatrix = new List<MatrixElement>(); | ||
for (uint i = _synthesizedMatrixFirstColumnIndex; i < _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount; ++i) | ||
for (uint j = _synthesizedMatrixFirstRowIndex; j < _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount; ++j) | ||
dataMatrix.Add(new MatrixElement() { MatrixColumnIndex = i, MatrixRowIndex = j, Value = (i + j) % 5 }); | ||
|
||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(seed: 0, conc: 1); | ||
|
||
// Convert the in-memory matrix into an IDataView so that ML.NET components can consume it. | ||
var dataView = ComponentCreation.CreateDataView(mlContext, dataMatrix); | ||
|
||
// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the | ||
// matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field | ||
// names' in MatrixElement class. | ||
var pipeline = new MatrixFactorizationTrainer(mlContext, nameof(MatrixElement.Value), | ||
nameof(MatrixElement.MatrixColumnIndex), nameof(MatrixElement.MatrixRowIndex), | ||
advancedSettings: s => | ||
{ | ||
s.NumIterations = 10; | ||
s.NumThreads = 1; // To eliminate randomness, # of threads must be 1. | ||
s.K = 32; | ||
}); | ||
|
||
// Train a matrix factorization model. | ||
var model = pipeline.Fit(dataView); | ||
|
||
// Apply the trained model to the training set. | ||
var prediction = model.Transform(dataView); | ||
|
||
// Calculate regression matrices for the prediction result. | ||
var metrics = mlContext.Regression.Evaluate(prediction, | ||
label: nameof(MatrixElement.Value), score: nameof(MatrixElementForScore.Score)); | ||
|
||
// Print out some metrics for checking the model's quality. | ||
Console.WriteLine($"L1 - {metrics.L1}"); | ||
Console.WriteLine($"L2 - {metrics.L2}"); | ||
Console.WriteLine($"LossFunction - {metrics.LossFn}"); | ||
Console.WriteLine($"RMS - {metrics.Rms}"); | ||
Console.WriteLine($"RSquared - {metrics.RSquared}"); | ||
|
||
// Create two two entries for making prediction. Of course, the prediction value, Score, is unknown so it's default. | ||
// If any of row and column indexes are out-of-range (e.g., MatrixColumnIndex=99999), the prediction value will be NaN. | ||
var testMatrix = new List<MatrixElementForScore>() { | ||
new MatrixElementForScore() { MatrixColumnIndex = 1, MatrixRowIndex = 7, Score = default }, | ||
new MatrixElementForScore() { MatrixColumnIndex = 3, MatrixRowIndex = 6, Score = default } }; | ||
|
||
// Again, convert the test data to a format supported by ML.NET. | ||
var testDataView = ComponentCreation.CreateDataView(mlContext, testMatrix); | ||
|
||
// Feed the test data into the model and then iterate through all predictions. | ||
foreach (var pred in model.Transform(testDataView).AsEnumerable<MatrixElementForScore>(mlContext, false)) | ||
Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex} and column {pred.MatrixColumnIndex} is {pred.Score}"); | ||
} | ||
} | ||
} |
Oops, something went wrong.