Merge pull request #1547 from shauheen/v07

Cherrypick for release 0.7
dotnet · Nov 6, 2018 · abba48a · abba48a
2 parents c5cef31 + 86088a1
commit abba48a
Show file tree

Hide file tree

Showing 52 changed files with 1,302 additions and 129 deletions.
diff --git a/docs/release-notes/0.7/release-0.7.md b/docs/release-notes/0.7/release-0.7.md
@@ -0,0 +1,143 @@
+# ML.NET 0.7 Release Notes
+
+Today we are excited to release ML.NET 0.7, which our algorithms strongly
+recommend you to try out! This release enables making recommendations with
+matrix factorization, identifying unusual events with anomaly detection,
+adding custom transformations to your ML pipeline, and more! We also have a
+small surprise for those who work in teams that use both .NET and Python.
+Finally, we wanted to thank the many new contributors to the project since the
+last release! 
+
+### Installation
+
+ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
+Core
+2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
+for more details.
+
+You can install ML.NET NuGet from the CLI using:
+```
+dotnet add package Microsoft.ML
+```
+
+From package manager:
+```
+Install-Package Microsoft.ML
+```
+
+### Release Notes
+
+Below are some of the highlights from this release.
+
+* Added Matrix factorization for recommendation problems
+  ([#1263](https://github.com/dotnet/machinelearning/pull/1263))
+
+    * Matrix factorization (MF) is a common approach to recommendations when
+      you have data on how users rated items in your catalog. For example, you
+      might know how users rated some movies and want to recommend which other
+      movies they are likely to watch next.
+    * ML.NET's MF uses [LIBMF](https://github.com/cjlin1/libmf).
+    * Example usage of MF can be found
+      [here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs).
+      The example is general but you can imagine that the matrix rows
+      correspond to users, matrix columns correspond to movies, and matrix
+      values correspond to ratings. This matrix would be quite sparse as users
+      have only rated a small subset of the catalog.
+    * Note: [ML.NET
+      0.3](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/release-notes/0.3/release-0.3.md)
+      included Field-Aware Factorization Machines (FFM) as a learner for
+      binary classification. FFM is a generalization of MF, but there are a
+      few differences:
+        * FFM enables taking advantage of other information beyond the rating
+          a user assigns to an item (e.g. movie genre, movie release date,
+          user profile). 
+        * FFM is currently limited to binary classification (the ratings needs
+          to be converted to 0 or 1), whereas MF solves a regression problem
+          (the ratings can be continuous numbers).
+        * If the only information available is the user-item ratings, MF is
+          likely to be significantly faster than FFM.
+        * A more in-depth discussion can be found
+          [here](https://www.csie.ntu.edu.tw/~cjlin/talks/recsys.pdf).
+
+* Enabled anomaly detection scenarios
+  ([#1254](https://github.com/dotnet/machinelearning/pull/1254))
+
+    * [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection)
+      enables identifying unusual values or events. It is used in scenarios
+      such as fraud detection (identifying suspicious credit card
+      transactions) and server monitoring (identifying unusual activity). 
+    * This release includes the following anomaly detection techniques:
+      SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, and
+      IidSpikeDetector. 
+    * Example usage can be found
+      [here](https://github.com/dotnet/machinelearning/blob/7fb76b026d0035d6da4d0b46bd3f2a6e3c0ce3f1/test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs).
+
+* Enabled using ML.NET in Windows x86 apps
+  ([#1008](https://github.com/dotnet/machinelearning/pull/1008))
+
+    * ML.NET can now be used in x86 apps. 
+    * Some components that are based on external dependencies (e.g.
+      TensorFlow) will not be available in x86. Please open an issue on GitHub
+      for discussion if this blocks you.
+
+* Added the `CustomMappingEstimator` for custom data transformations
+  [#1406](https://github.com/dotnet/machinelearning/pull/1406)
+
+    * ML.NET has a wide variety of data transformations for pre-processing and
+      featurizing data (e.g. processing text, images, categorical features,
+      etc.).
+    * However, there might be application-specific transformations that would
+      be useful to do within an ML.NET pipeline (as opposed to as a
+      pre-processing step). For example, calculating [cosine
+      similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between two
+      text columns (after featurization) or something as simple as creating a
+      new column that adds the values in two other columns.
+    * An example of the `CustomMappingEstimator` can be found
+      [here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/test/Microsoft.ML.Tests/Transformers/CustomMappingTests.cs#L55).
+
+* Consolidated several API concepts in `MLContext`
+  [#1252](https://github.com/dotnet/machinelearning/pull/1252)
+
+    * `MLContext` replaces `LocalEnvironment` and `ConsoleEnvironment` but
+      also includes properties for ML tasks like
+      `BinaryClassification`/`Regression`, various transforms/trainers, and
+      evaluation. More information can be found in
+      [#1098](https://github.com/dotnet/machinelearning/issues/1098).
+    * Example usage can be found
+      [here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/code/MlNetCookBook.md).
+
+* Open sourced [NimbusML](https://github.com/microsoft/nimbusml): experimental
+  Python bindings for ML.NET. 
+
+    * NimbusML makes it easy for data scientists to train models in Python and
+      hand them off to .NET developers to include in their apps and services
+      using ML.NET. 
+    * NimbusML components easily integrate into
+      [scikit-learn](http://scikit-learn.org/stable/) pipelines. 
+    * Note that NimbusML is an experimental project without the same support
+      level as ML.NET.
+
+### Acknowledgements
+
+Shoutout to [dzban2137](https://github.com/dzban2137),
+[beneyal](https://github.com/beneyal),
+[pkulikov](https://github.com/pkulikov),
+[amiteshenoy](https://github.com/amiteshenoy),
+[DAXaholic](https://github.com/DAXaholic),
+[Racing5372](https://github.com/Racing5372),
+[ThePiranha](https://github.com/ThePiranha),
+[helloguo](https://github.com/helloguo),
+[elbruno](https://github.com/elbruno),
+[harshsaver](https://github.com/harshsaver),
+[f1x3d](https://github.com/f1x3d), [rauhs](https://github.com/rauhs),
+[nihitb06](https://github.com/nihitb06),
+[nandaleite](https://github.com/nandaleite),
+[timitoc](https://github.com/timitoc),
+[feiyun0112](https://github.com/feiyun0112),
+[Pielgrin](https://github.com/Pielgrin),
+[malik97160](https://github.com/malik97160),
+[Niladri24dutta](https://github.com/Niladri24dutta),
+[suhailsinghbains](https://github.com/suhailsinghbains),
+[terop](https://github.com/terop), [Matei13](https://github.com/Matei13),
+[JorgeAndd](https://github.com/JorgeAndd), and the ML.NET team for their
+contributions as part of this release! 
diff --git a/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs b/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs
@@ -0,0 +1,114 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+using Microsoft.ML.Runtime.Api;
+using Microsoft.ML.Runtime.Data;
+using Microsoft.ML.Trainers;
+using System;
+using System.Collections.Generic;
+
+// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT. 
+// If you change the existinc content, check that the files referencing it in the XML documentation are still correct, as they reference
+// line by line. 
+namespace Microsoft.ML.Samples.Dynamic
+{
+    public partial class TrainerSamples
+    {
+        // The following variables defines the shape of a matrix. Its shape is _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount.
+        // The variable _synthesizedMatrixFirstRowIndex indicates the integer that would be mapped to the first row index. If user data uses
+        // 0-based indices for rows, _synthesizedMatrixFirstRowIndex can be set to 0. Similarly, for 1-based indices, _synthesizedMatrixFirstRowIndex
+        // could be 1.
+        const int _synthesizedMatrixFirstColumnIndex = 1;
+        const int _synthesizedMatrixFirstRowIndex = 1;
+        const int _synthesizedMatrixColumnCount = 60;
+        const int _synthesizedMatrixRowCount = 100;
+
+        // A data structure used to encode a single value in matrix
+        internal class MatrixElement
+        {
+            // Matrix column index starts from _synthesizedMatrixFirstColumnIndex and is at most
+            // _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount - 1.
+            // Contieuous=true means that all values between the min and max indexes are all allowed.
+            [KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
+            public uint MatrixColumnIndex;
+            // Matrix row index starts from _synthesizedMatrixFirstRowIndex and is at most
+            // _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount - 1.
+            // Contieuous=true means that all values between the min and max indexes are all allowed.
+            [KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
+            public uint MatrixRowIndex;
+            // The value at the column MatrixColumnIndex and row MatrixRowIndex.
+            public float Value;
+        }
+
+        // A data structure used to encode prediction result. Comparing with MatrixElement, The field Value in MatrixElement is
+        // renamed to Score because Score is the default name of matrix factorization's output.
+        internal class MatrixElementForScore
+        {
+            [KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
+            public uint MatrixColumnIndex;
+            [KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
+            public uint MatrixRowIndex;
+            public float Score;
+        }
+
+        // This example first creates in-memory data and then use it to train a matrix factorization model. Afterward, quality metrics are reported.
+        public static void MatrixFactorizationInMemoryData()
+        {
+            // Create an in-memory matrix as a list of tuples (column index, row index, value).
+            var dataMatrix = new List<MatrixElement>();
+            for (uint i = _synthesizedMatrixFirstColumnIndex; i < _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount; ++i)
+                for (uint j = _synthesizedMatrixFirstRowIndex; j < _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount; ++j)
+                    dataMatrix.Add(new MatrixElement() { MatrixColumnIndex = i, MatrixRowIndex = j, Value = (i + j) % 5 });
+
+            // Create a new context for ML.NET operations. It can be used for exception tracking and logging,
+            // as a catalog of available operations and as the source of randomness.
+            var mlContext = new MLContext(seed: 0, conc: 1);
+
+            // Convert the in-memory matrix into an IDataView so that ML.NET components can consume it.
+            var dataView = ComponentCreation.CreateDataView(mlContext, dataMatrix);
+
+            // Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the
+            // matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field
+            // names' in MatrixElement class.
+            var pipeline = new MatrixFactorizationTrainer(mlContext, nameof(MatrixElement.Value),
+                nameof(MatrixElement.MatrixColumnIndex), nameof(MatrixElement.MatrixRowIndex),
+                advancedSettings: s =>
+                {
+                    s.NumIterations = 10;
+                    s.NumThreads = 1; // To eliminate randomness, # of threads must be 1.
+                    s.K = 32;
+                });
+
+            // Train a matrix factorization model.
+            var model = pipeline.Fit(dataView);
+
+            // Apply the trained model to the training set.
+            var prediction = model.Transform(dataView);
+
+            // Calculate regression matrices for the prediction result.
+            var metrics = mlContext.Regression.Evaluate(prediction,
+                label: nameof(MatrixElement.Value), score: nameof(MatrixElementForScore.Score));
+
+            // Print out some metrics for checking the model's quality.
+            Console.WriteLine($"L1 - {metrics.L1}");
+            Console.WriteLine($"L2 - {metrics.L2}");
+            Console.WriteLine($"LossFunction - {metrics.LossFn}");
+            Console.WriteLine($"RMS - {metrics.Rms}");
+            Console.WriteLine($"RSquared - {metrics.RSquared}");
+
+            // Create two two entries for making prediction. Of course, the prediction value, Score, is unknown so it's default.
+            // If any of row and column indexes are out-of-range (e.g., MatrixColumnIndex=99999), the prediction value will be NaN.
+            var testMatrix = new List<MatrixElementForScore>() {
+                new MatrixElementForScore() { MatrixColumnIndex = 1, MatrixRowIndex = 7, Score = default },
+                new MatrixElementForScore() { MatrixColumnIndex = 3, MatrixRowIndex = 6, Score = default } };
+
+            // Again, convert the test data to a format supported by ML.NET.
+            var testDataView = ComponentCreation.CreateDataView(mlContext, testMatrix);
+
+            // Feed the test data into the model and then iterate through all predictions.
+            foreach (var pred in model.Transform(testDataView).AsEnumerable<MatrixElementForScore>(mlContext, false))
+                Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex} and column {pred.MatrixColumnIndex} is {pred.Score}");
+        }
+    }
+}