Skip to content

Commit

Permalink
Merge pull request #1547 from shauheen/v07
Browse files Browse the repository at this point in the history
Cherrypick for release 0.7
  • Loading branch information
shauheen authored Nov 6, 2018
2 parents c5cef31 + 86088a1 commit abba48a
Show file tree
Hide file tree
Showing 52 changed files with 1,302 additions and 129 deletions.
143 changes: 143 additions & 0 deletions docs/release-notes/0.7/release-0.7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# ML.NET 0.7 Release Notes

Today we are excited to release ML.NET 0.7, which our algorithms strongly
recommend you to try out! This release enables making recommendations with
matrix factorization, identifying unusual events with anomaly detection,
adding custom transformations to your ML pipeline, and more! We also have a
small surprise for those who work in teams that use both .NET and Python.
Finally, we wanted to thank the many new contributors to the project since the
last release!

### Installation

ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
Core
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
for more details.

You can install ML.NET NuGet from the CLI using:
```
dotnet add package Microsoft.ML
```

From package manager:
```
Install-Package Microsoft.ML
```

### Release Notes

Below are some of the highlights from this release.

* Added Matrix factorization for recommendation problems
([#1263](https://github.com/dotnet/machinelearning/pull/1263))

* Matrix factorization (MF) is a common approach to recommendations when
you have data on how users rated items in your catalog. For example, you
might know how users rated some movies and want to recommend which other
movies they are likely to watch next.
* ML.NET's MF uses [LIBMF](https://github.com/cjlin1/libmf).
* Example usage of MF can be found
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs).
The example is general but you can imagine that the matrix rows
correspond to users, matrix columns correspond to movies, and matrix
values correspond to ratings. This matrix would be quite sparse as users
have only rated a small subset of the catalog.
* Note: [ML.NET
0.3](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/release-notes/0.3/release-0.3.md)
included Field-Aware Factorization Machines (FFM) as a learner for
binary classification. FFM is a generalization of MF, but there are a
few differences:
* FFM enables taking advantage of other information beyond the rating
a user assigns to an item (e.g. movie genre, movie release date,
user profile).
* FFM is currently limited to binary classification (the ratings needs
to be converted to 0 or 1), whereas MF solves a regression problem
(the ratings can be continuous numbers).
* If the only information available is the user-item ratings, MF is
likely to be significantly faster than FFM.
* A more in-depth discussion can be found
[here](https://www.csie.ntu.edu.tw/~cjlin/talks/recsys.pdf).

* Enabled anomaly detection scenarios
([#1254](https://github.com/dotnet/machinelearning/pull/1254))

* [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection)
enables identifying unusual values or events. It is used in scenarios
such as fraud detection (identifying suspicious credit card
transactions) and server monitoring (identifying unusual activity).
* This release includes the following anomaly detection techniques:
SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, and
IidSpikeDetector.
* Example usage can be found
[here](https://github.com/dotnet/machinelearning/blob/7fb76b026d0035d6da4d0b46bd3f2a6e3c0ce3f1/test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs).

* Enabled using ML.NET in Windows x86 apps
([#1008](https://github.com/dotnet/machinelearning/pull/1008))

* ML.NET can now be used in x86 apps.
* Some components that are based on external dependencies (e.g.
TensorFlow) will not be available in x86. Please open an issue on GitHub
for discussion if this blocks you.

* Added the `CustomMappingEstimator` for custom data transformations
[#1406](https://github.com/dotnet/machinelearning/pull/1406)

* ML.NET has a wide variety of data transformations for pre-processing and
featurizing data (e.g. processing text, images, categorical features,
etc.).
* However, there might be application-specific transformations that would
be useful to do within an ML.NET pipeline (as opposed to as a
pre-processing step). For example, calculating [cosine
similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between two
text columns (after featurization) or something as simple as creating a
new column that adds the values in two other columns.
* An example of the `CustomMappingEstimator` can be found
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/test/Microsoft.ML.Tests/Transformers/CustomMappingTests.cs#L55).

* Consolidated several API concepts in `MLContext`
[#1252](https://github.com/dotnet/machinelearning/pull/1252)

* `MLContext` replaces `LocalEnvironment` and `ConsoleEnvironment` but
also includes properties for ML tasks like
`BinaryClassification`/`Regression`, various transforms/trainers, and
evaluation. More information can be found in
[#1098](https://github.com/dotnet/machinelearning/issues/1098).
* Example usage can be found
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/code/MlNetCookBook.md).

* Open sourced [NimbusML](https://github.com/microsoft/nimbusml): experimental
Python bindings for ML.NET.

* NimbusML makes it easy for data scientists to train models in Python and
hand them off to .NET developers to include in their apps and services
using ML.NET.
* NimbusML components easily integrate into
[scikit-learn](http://scikit-learn.org/stable/) pipelines.
* Note that NimbusML is an experimental project without the same support
level as ML.NET.

### Acknowledgements

Shoutout to [dzban2137](https://github.com/dzban2137),
[beneyal](https://github.com/beneyal),
[pkulikov](https://github.com/pkulikov),
[amiteshenoy](https://github.com/amiteshenoy),
[DAXaholic](https://github.com/DAXaholic),
[Racing5372](https://github.com/Racing5372),
[ThePiranha](https://github.com/ThePiranha),
[helloguo](https://github.com/helloguo),
[elbruno](https://github.com/elbruno),
[harshsaver](https://github.com/harshsaver),
[f1x3d](https://github.com/f1x3d), [rauhs](https://github.com/rauhs),
[nihitb06](https://github.com/nihitb06),
[nandaleite](https://github.com/nandaleite),
[timitoc](https://github.com/timitoc),
[feiyun0112](https://github.com/feiyun0112),
[Pielgrin](https://github.com/Pielgrin),
[malik97160](https://github.com/malik97160),
[Niladri24dutta](https://github.com/Niladri24dutta),
[suhailsinghbains](https://github.com/suhailsinghbains),
[terop](https://github.com/terop), [Matei13](https://github.com/Matei13),
[JorgeAndd](https://github.com/JorgeAndd), and the ML.NET team for their
contributions as part of this release!
114 changes: 114 additions & 0 deletions docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Trainers;
using System;
using System.Collections.Generic;

// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT.
// If you change the existinc content, check that the files referencing it in the XML documentation are still correct, as they reference
// line by line.
namespace Microsoft.ML.Samples.Dynamic
{
public partial class TrainerSamples
{
// The following variables defines the shape of a matrix. Its shape is _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount.
// The variable _synthesizedMatrixFirstRowIndex indicates the integer that would be mapped to the first row index. If user data uses
// 0-based indices for rows, _synthesizedMatrixFirstRowIndex can be set to 0. Similarly, for 1-based indices, _synthesizedMatrixFirstRowIndex
// could be 1.
const int _synthesizedMatrixFirstColumnIndex = 1;
const int _synthesizedMatrixFirstRowIndex = 1;
const int _synthesizedMatrixColumnCount = 60;
const int _synthesizedMatrixRowCount = 100;

// A data structure used to encode a single value in matrix
internal class MatrixElement
{
// Matrix column index starts from _synthesizedMatrixFirstColumnIndex and is at most
// _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount - 1.
// Contieuous=true means that all values between the min and max indexes are all allowed.
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
public uint MatrixColumnIndex;
// Matrix row index starts from _synthesizedMatrixFirstRowIndex and is at most
// _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount - 1.
// Contieuous=true means that all values between the min and max indexes are all allowed.
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
public uint MatrixRowIndex;
// The value at the column MatrixColumnIndex and row MatrixRowIndex.
public float Value;
}

// A data structure used to encode prediction result. Comparing with MatrixElement, The field Value in MatrixElement is
// renamed to Score because Score is the default name of matrix factorization's output.
internal class MatrixElementForScore
{
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
public uint MatrixColumnIndex;
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
public uint MatrixRowIndex;
public float Score;
}

// This example first creates in-memory data and then use it to train a matrix factorization model. Afterward, quality metrics are reported.
public static void MatrixFactorizationInMemoryData()
{
// Create an in-memory matrix as a list of tuples (column index, row index, value).
var dataMatrix = new List<MatrixElement>();
for (uint i = _synthesizedMatrixFirstColumnIndex; i < _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount; ++i)
for (uint j = _synthesizedMatrixFirstRowIndex; j < _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount; ++j)
dataMatrix.Add(new MatrixElement() { MatrixColumnIndex = i, MatrixRowIndex = j, Value = (i + j) % 5 });

// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext(seed: 0, conc: 1);

// Convert the in-memory matrix into an IDataView so that ML.NET components can consume it.
var dataView = ComponentCreation.CreateDataView(mlContext, dataMatrix);

// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the
// matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field
// names' in MatrixElement class.
var pipeline = new MatrixFactorizationTrainer(mlContext, nameof(MatrixElement.Value),
nameof(MatrixElement.MatrixColumnIndex), nameof(MatrixElement.MatrixRowIndex),
advancedSettings: s =>
{
s.NumIterations = 10;
s.NumThreads = 1; // To eliminate randomness, # of threads must be 1.
s.K = 32;
});

// Train a matrix factorization model.
var model = pipeline.Fit(dataView);

// Apply the trained model to the training set.
var prediction = model.Transform(dataView);

// Calculate regression matrices for the prediction result.
var metrics = mlContext.Regression.Evaluate(prediction,
label: nameof(MatrixElement.Value), score: nameof(MatrixElementForScore.Score));

// Print out some metrics for checking the model's quality.
Console.WriteLine($"L1 - {metrics.L1}");
Console.WriteLine($"L2 - {metrics.L2}");
Console.WriteLine($"LossFunction - {metrics.LossFn}");
Console.WriteLine($"RMS - {metrics.Rms}");
Console.WriteLine($"RSquared - {metrics.RSquared}");

// Create two two entries for making prediction. Of course, the prediction value, Score, is unknown so it's default.
// If any of row and column indexes are out-of-range (e.g., MatrixColumnIndex=99999), the prediction value will be NaN.
var testMatrix = new List<MatrixElementForScore>() {
new MatrixElementForScore() { MatrixColumnIndex = 1, MatrixRowIndex = 7, Score = default },
new MatrixElementForScore() { MatrixColumnIndex = 3, MatrixRowIndex = 6, Score = default } };

// Again, convert the test data to a format supported by ML.NET.
var testDataView = ComponentCreation.CreateDataView(mlContext, testMatrix);

// Feed the test data into the model and then iterate through all predictions.
foreach (var pred in model.Transform(testDataView).AsEnumerable<MatrixElementForScore>(mlContext, false))
Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex} and column {pred.MatrixColumnIndex} is {pred.Score}");
}
}
}
Loading

0 comments on commit abba48a

Please sign in to comment.