Merge branch 'development' of github.com:v3io/tutorials

# Conflicts: # demos/image-classification/README.md # getting-started/dask-cluster.ipynb
v3io · Dec 5, 2019 · 9dfe18d · 9dfe18d
2 parents 38ae365 + 95932f5
commit 9dfe18d
Show file tree

Hide file tree

Showing 14 changed files with 3,530 additions and 2,120 deletions.
diff --git a/README.md b/README.md
@@ -8,11 +8,11 @@
   - [Deploying Models to Production](#deploying-models-to-production)
   - [Visualization, Monitoring, and Logging](#visualization-monitoring-and-logging)
 - [End-to-End Use-Case Applications](#end-to-end-use-case-applications)
-  - [Smart Stock Trading](demos/stocks/01-gen-demo-data.ipynb)
+  - [Image Classification](demos/image-classification/01-image-classification.ipynb)
   - [Predictive Infrastructure Monitoring](demos/netops/01-generator.ipynb)
-  - [Image Recognition](demos/image-classification/keras-cnn-dog-or-cat-classification.ipynb)
   - [Natural Language Processing (NLP)](demos/nlp/nlp-example.ipynb)
   - [Stream Enrichment](demos/stream-enrich/stream-enrich.ipynb)
+  - [Smart Stock Trading](demos/stocks/01-gen-demo-data.ipynb)
 - [Jupyter Notebook Basics](#jupyter-notebook-basics)
   - [Creating Virtual Environments in Jupyter Notebook](#creating-virtual-environments-in-jupyter-notebook)
   - [Updating the Tutorial Notebooks](#update-notebooks)
@@ -28,11 +28,12 @@ The Iguazio Data Science Platform (**"the platform"**) is a fully integrated and
 The platform incorporates the following components:
 
 - A data science workbench that includes Jupyter Notebook, integrated analytics engines, and Python packages
-- Real-time dashboards based on Grafana
+- Model management with experiments tracking and automated pipeline capabilities
 - Managed data and machine-learning (ML) services over a scalable Kubernetes cluster
 - A real-time serverless functions framework &mdash; Nuclio
 - An extremely fast and secure data layer that supports SQL, NoSQL, time-series databases, files (simple objects), and streaming
 - Integration with third-party data sources such as Amazon S3, HDFS, SQL databases, and streaming or messaging protocols
+- Real-time dashboards based on Grafana
 
 <br><img src="assets/images/igz-self-service-platform.png" alt="Self-service data science platform" width="650"/><br>
 
@@ -115,7 +116,7 @@ When your model is ready, you can train it in Jupyter Notebook or by using scala
 You can find model-training examples in the platform's tutorial Jupyter notebooks:
 
 - The [NetOps demo](demos/netops/03-training.ipynb) tutorial demonstrates predictive infrastructure-monitoring using scikit-learn.
-- The [image-classification demo](demos/image-classification/infer.ipynb) tutorial demonstrates image recognition using TensorFlow and Keras.
+- The [image-classification demo](demos/image-classification/01-image-classification.ipynb) tutorial demonstrates image recognition using TensorFlow and Horovod with MLRun.
 
 If you're are a beginner, you might find the following ML guide useful &mdash; [Machine Learning Algorithms In Layman's Terms](https://towardsdatascience.com/machine-learning-algorithms-in-laymans-terms-part-1-d0368d769a7b).
 
@@ -165,11 +166,11 @@ For information on how to create Grafana dashboards to monitor and visualize dat
 Iguazio provides full end-to-end use-case applications that demonstrate how to use the Iguazio Data Science Platform and related tools to address data science requirements for different industries and implementations.
 The applications are provided in the **demos** directory of the platform's tutorial Jupyter notebooks and cover the following use cases; for more detailed descriptions, see the demos README ([notebook](demos/README.ipynb) / [Markdown](demos/README.md)):
 
-- <a id="stocks-use-case-app"></a>**Smart stock trading** ([**stocks**](demos/stocks/read-stocks.ipynb)) &mdash; the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.
+- <a id="image-recog-use-case-app"></a>**Image recognition** ([**image-classification**](demos/image-classification/01-image-classification.ipynb)) &mdash; the application builds and trains an ML model that identifies (recognizes) and classifies images by using Keras, TensorFlow, and scikit-learn.
 - <a id="netops-use-case-app"></a>**Predictive infrastructure monitoring** ([**netops**](demos/netops/01-generator.ipynb)) &mdash; the application builds, trains, and deploys a machine-learning model for analyzing and predicting failure in network devices as part of a network operations (NetOps) flow. The goal is to identify anomalies for device metrics &mdash; such as CPU, memory consumption, or temperature &mdash; which can signify an upcoming issue or failure.
-- <a id="image-recog-use-case-app"></a>**Image recognition** ([**image-classification**](demos/image-classification/keras-cnn-dog-or-cat-classification.ipynb)) &mdash; the application builds and trains an ML model that identifies (recognizes) and classifies images by using Keras, TensorFlow, and scikit-learn.
 - <a id="nlp-use-case-app"></a>**Natural language processing (NLP)** ([**nlp**](demos/nlp/nlp-example.ipynb)) &mdash; the application processes natural-language textual data &mdash; including spelling correction and sentiment analysis &mdash; and generates a Nuclio serverless function that translates any given text string to another (configurable) language.
 - <a id="stream-enrich-use-case-app"></a>**Stream enrichment** ([**stream-enrich**](demos/stream-enrich/stream-enrich.ipynb)) &mdash; the application demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.
+- <a id="stocks-use-case-app"></a>**Smart stock trading** ([**stocks**](demos/stocks/read-stocks.ipynb)) &mdash; the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.
 
 <a id="jupyter-notebook-basics"></a>
 ## Jupyter Notebook Basics

diff --git a/demos/README.ipynb b/demos/README.ipynb
@@ -14,11 +14,11 @@
     "**In This Document**\n",
     "\n",
     "- [Overview](#overview)\n",
-    "- [Stock Trading](#stocks-demo)\n",
+    "- [Image Classification](#image-classification-demo)\n",
     "- [Predictive Infrastructure Monitoring](#netops-demo)\n",
-    "- [Image Recognition](#image-classification-demo)\n",
     "- [Natural Language Processing (NLP)](#nlp-demo)\n",
-    "- [Stream Enrichment](#stream-enrich-demo)"
+    "- [Stream Enrichment](#stream-enrich-demo)\n",
+    "- [Stock Trading](#stocks-demo)"
    ]
   },
   {
@@ -35,16 +35,20 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a id=\"stocks-demo\"></a>\n",
-    "## Smart Stock Trading\n",
+    "<a id=\"image-classification-demo\"></a>\n",
+    "## Image Classification\n",
     "\n",
-    "The [**stocks**](stocks/01-gen-demo-data.ipynb) demo demonstrates a smart stock-trading application: \n",
-    "the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.\n",
+    "The [**image-classification**](image-classification/01-image-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.\n",
     "\n",
-    "- The stock data is read from Twitter by using the [TwythonStreamer](https://twython.readthedocs.io/en/latest/usage/streaming_api.html) Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.\n",
-    "- Sentiment analysis is done by using the [TextBlob](https://textblob.readthedocs.io/) Python library for natural language processing (NLP).\n",
-    "- The analyzed data is visualized as graphs on a [Grafana](https://grafana.com/grafana) dashboard, which is created from the Jupyter notebook code.\n",
-    "  The data is read from both the TSDB and NoSQL stock tables."
+    "This example is using TensorFlow, Horovod, and Nuclio demonstrating end to end solution for image classification, \n",
+    "it consists of 4 MLRun and Nuclio functions:\n",
+    "\n",
+    "1. import an image archive from S3 to the cluster file system\n",
+    "2. Tag the images based on their name structure \n",
+    "3. Distrubuted training using TF, Keras and Horovod\n",
+    "4. Automated deployment of Nuclio model serving function (form [Notebook](nuclio-serving-tf-images.ipynb) and from [Dockerfile](./inference-docker))\n",
+    "\n",
+    "The Example also demonstrate an [automated pipeline](mlrun_mpijob_pipe.ipynb) using MLRun and KubeFlow pipelines "
    ]
   },
   {
@@ -67,28 +71,29 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a id=\"image-classification-demo\"></a>\n",
-    "## Image Recognition\n",
+    "<a id=\"nlp-demo\"></a>\n",
+    "## Natural Language Processing (NLP)\n",
     "\n",
-    "The [**image-classification**](image-classification/keras-cnn-dog-or-cat-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.\n",
+    "The [**nlp**](nlp/nlp-example.ipynb) demo demonstrates natural language processing (NLP): the application processes natural-language textual data &mdash; including spelling correction and sentiment analysis &mdash; and generates a Nuclio serverless function that translates any given text string to another (configurable) language.\n",
     "\n",
-    "- The data is collected by downloading images of dogs and cats from the Iguazio sample data-set AWS bucket.\n",
-    "- The training data for the ML model is prepared by using [pandas](https://pandas.pydata.org/) DataFrames to build a predecition map.\n",
-    "  The data is visualized by using the [Matplotlib](https://matplotlib.org/) Python library.\n",
-    "- An image recognition and classification ML model that identifies the animal type is built and trained by using [Keras](https://keras.io/), [TensorFlow](https://www.tensorflow.org/), and [scikit-learn](https://scikit-learn.org) (a.k.a. sklearn)."
+    "- The textual data is collected and processed by using the [TextBlob](https://textblob.readthedocs.io/) Python NLP library. The processing includes spelling correction and sentiment analysis.\n",
+    "- A serverless function that translates text to another language, which is configured in an environment variable, is generated by using the [Nuclio](https://nuclio.io/) framework."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a id=\"nlp-demo\"></a>\n",
-    "## Natural Language Processing (NLP)\n",
+    "<a id=\"stocks-demo\"></a>\n",
+    "## Smart Stock Trading\n",
     "\n",
-    "The [**nlp**](nlp/nlp-example.ipynb) demo demonstrates natural language processing (NLP): the application processes natural-language textual data &mdash; including spelling correction and sentiment analysis &mdash; and generates a Nuclio serverless function that translates any given text string to another (configurable) language.\n",
+    "The [**stocks**](stocks/01-gen-demo-data.ipynb) demo demonstrates a smart stock-trading application: \n",
+    "the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.\n",
     "\n",
-    "- The textual data is collected and processed by using the [TextBlob](https://textblob.readthedocs.io/) Python NLP library. The processing includes spelling correction and sentiment analysis.\n",
-    "- A serverless function that translates text to another language, which is configured in an environment variable, is generated by using the [Nuclio](https://nuclio.io/) framework."
+    "- The stock data is read from Twitter by using the [TwythonStreamer](https://twython.readthedocs.io/en/latest/usage/streaming_api.html) Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.\n",
+    "- Sentiment analysis is done by using the [TextBlob](https://textblob.readthedocs.io/) Python library for natural language processing (NLP).\n",
+    "- The analyzed data is visualized as graphs on a [Grafana](https://grafana.com/grafana) dashboard, which is created from the Jupyter notebook code.\n",
+    "  The data is read from both the TSDB and NoSQL stock tables."
    ]
   },
   {
@@ -128,5 +133,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/demos/README.md b/demos/README.md
@@ -1,30 +1,33 @@
-
 # End-to-End Platform Use-Case Application Demos
 
 **In This Document**
 
 - [Overview](#overview)
-- [Stock Trading](#stocks-demo)
+- [Image Classification](#image-classification-demo)
 - [Predictive Infrastructure Monitoring](#netops-demo)
-- [Image Recognition](#image-classification-demo)
 - [Natural Language Processing (NLP)](#nlp-demo)
 - [Stream Enrichment](#stream-enrich-demo)
+- [Stock Trading](#stocks-demo)
 
 <a id="overview"></a>
 ## Overview
 
 The **demos** tutorials directory contains full end-to-end use-case applications that demonstrate how to use the Iguazio Data Science Platform ("the platform") and related tools to address data science requirements for different industries and implementations.
 
-<a id="stocks-demo"></a>
-## Smart Stock Trading
+<a id="image-classification-demo"></a>
+## Image Classification
 
-The [**stocks**](stocks/read-stocks.ipynb) demo demonstrates a smart stock-trading application: 
-the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.
+The [**image-classification**](image-classification/01-image-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.
 
-- The stock data is read from Twitter by using the [TwythonStreamer](https://twython.readthedocs.io/en/latest/usage/streaming_api.html) Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.
-- Sentiment analysis is done by using the [TextBlob](https://textblob.readthedocs.io/) Python library for natural language processing (NLP).
-- The analyzed data is visualized as graphs on a [Grafana](https://grafana.com/grafana) dashboard, which is created from the Jupyter notebook code.
-  The data is read from both the TSDB and NoSQL stock tables.
+This example is using TensorFlow, Horovod, and Nuclio demonstrating end to end solution for image classification, 
+it consists of 4 MLRun and Nuclio functions:
+
+1. import an image archive from S3 to the cluster file system
+2. Tag the images based on their name structure 
+3. Distrubuted training using TF, Keras and Horovod
+4. Automated deployment of Nuclio model serving function (form [Notebook](nuclio-serving-tf-images.ipynb) and from [Dockerfile](./inference-docker))
+
+The Example also demonstrate an [automated pipeline](mlrun_mpijob_pipe.ipynb) using MLRun and KubeFlow pipelines 
 
 <a id="netops-demo"></a>
 ## Predictive Infrastructure Monitoring
@@ -37,16 +40,6 @@ The goal is to identify anomalies for device metrics &mdash; such as CPU, memory
 - The data is generated by using an open-source generator tool that was written by Iguazio.
   This generator enables users to customize the metrics, data range, and many other parameters, and prepare a data set that's suitable for other similar use cases.
 
-<a id="image-classification-demo"></a>
-## Image Recognition
-
-The [**image-classification**](image-classification/keras-cnn-dog-or-cat-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.
-
-- The data is collected by downloading images of dogs and cats from the Iguazio sample data-set AWS bucket.
-- The training data for the ML model is prepared by using [pandas](https://pandas.pydata.org/) DataFrames to build a predecition map.
-  The data is visualized by using the [Matplotlib](https://matplotlib.org/) Python library.
-- An image recognition and classification ML model that identifies the animal type is built and trained by using [Keras](https://keras.io/), [TensorFlow](https://www.tensorflow.org/), and [scikit-learn](https://scikit-learn.org) (a.k.a. sklearn).
-
 <a id="nlp-demo"></a>
 ## Natural Language Processing (NLP)
 
@@ -55,6 +48,17 @@ The [**nlp**](nlp/nlp-example.ipynb) demo demonstrates natural language processi
 - The textual data is collected and processed by using the [TextBlob](https://textblob.readthedocs.io/) Python NLP library. The processing includes spelling correction and sentiment analysis.
 - A serverless function that translates text to another language, which is configured in an environment variable, is generated by using the [Nuclio](https://nuclio.io/) framework.
 
+<a id="stocks-demo"></a>
+## Smart Stock Trading
+
+The [**stocks**](stocks/01-gen-demo-data.ipynb) demo demonstrates a smart stock-trading application: 
+the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.
+
+- The stock data is read from Twitter by using the [TwythonStreamer](https://twython.readthedocs.io/en/latest/usage/streaming_api.html) Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.
+- Sentiment analysis is done by using the [TextBlob](https://textblob.readthedocs.io/) Python library for natural language processing (NLP).
+- The analyzed data is visualized as graphs on a [Grafana](https://grafana.com/grafana) dashboard, which is created from the Jupyter notebook code.
+  The data is read from both the TSDB and NoSQL stock tables.
+
 <a id="stream-enrich-demo"></a>
 ### Stream Enrichment
 

diff --git a/demos/gpu/README.md b/demos/gpu/README.md
@@ -1,4 +1,3 @@
-
 # GPU Demos
 
 - [Overview](#gpu-demos-overview)
@@ -15,13 +14,16 @@ The **demos/gpu** directory includes the following:
 - A **horovod** directory with applications that use Uber's [Horovod](https://eng.uber.com/horovod/) distributed deep-learning framework, which can be used to convert a single-GPU TensorFlow, Keras, or PyTorch model-training program to a distributed program that trains the model simultaneously over multiple GPUs.
     The objective is to speed up your model training with minimal changes to your existing single-GPU code and without complicating the execution.
     Horovod code can also run over CPUs with only minor modifications.
+    For more information and examples, see the [Horovod GitHub repository](https://github.com/horovod/horovod).
+
     The Horovod tutorials include the following:
 
     - An image-recognition demo application for execution over GPUs (**image-classification**).
     - A slightly modified version of the GPU image-classification demo application for execution over CPUs (**cpu/image-classification**).
     - Benchmark tests (**benchmark-tf.ipynb**, which executes **tf_cnn_benchmarks.py**).
 
 - A **rapids** directory with applications that use NVIDIA's [RAPIDS](https://rapids.ai/) open-source libraries suite for executing end-to-end data science and analytics pipelines entirely on GPUs.
+
   The RAPIDS tutorials include the following:
 
     - Demo applications that use the [cuDF](https://rapidsai.github.io/projects/cudf/en/latest/index.html) RAPIDS GPU DataFrame library to perform batching and aggregation of data that's read from a Kafaka stream, and then write the results to a Parquet file.<br>