From 57940b235042dd28b153ff78a18745264c9040b8 Mon Sep 17 00:00:00 2001
From: "mr.Shu" <mr@shu.io>
Date: Wed, 30 Oct 2019 12:22:38 +0100
Subject: [PATCH] neural-networks: Prepare for 2019

* Move all the necessary imports and downloads inline (in order for
  Google Colab to work).

* Since Google Colab does not like the interactive plots, they had to be
  ditched.

Signed-off-by: mr.Shu <mr@shu.io>
---
 assignment3/neural_networks.ipynb | 1360 +++++++++++++++++------------
 1 file changed, 803 insertions(+), 557 deletions(-)
diff --git a/assignment3/neural_networks.ipynb b/assignment3/neural_networks.ipynb
index 9562327..662bac1 100644
--- a/assignment3/neural_networks.ipynb
+++ b/assignment3/neural_networks.ipynb
@@ -1,558 +1,804 @@
 {
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Third assignment: Neural Nets\n",
-    "\n",
-    "Welcome to the third assignment. As we promised last time, in this assigment you are not going to be bothered with implementation details or any enginiering work (well, sort of...). Rather, this time you will try to work with 'state-of-the-art' neural network library (well, sort of...) called **Keras**. Your task will be to play around with some models -- trying to fit them to some data and fine-tune their parameters.\n",
-    "\n",
-    "This notebook is going to have some special plots. Each plot should have a small 'power-off' button in the top-right corner. Once you've changed any given plot you **need** to turn it off by clicking this button, otherwise all your plots will be plotted into the currently active plot. The mess you will be exposed to really is not worth it.\n",
-    "\n",
-    "You may need a bit more computing power (or time...) for this assignment. We also have a special *bonus task ready* for you.\n",
-    "\n",
-    "So lets start as usual with some imports (we highly recomend you just run this part of the code and do not modify it)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "import numpy as np\n",
-    "from matplotlib import pyplot as plt\n",
-    "import utils\n",
-    "from keras.models import Sequential\n",
-    "from keras.layers import Conv2D, Dense, Activation, Flatten, MaxPooling2D\n",
-    "from keras.optimizers import SGD, Adam\n",
-    "from keras.utils import to_categorical\n",
-    "from keras import backend as K\n",
-    "\n",
-    "\n",
-    "%matplotlib notebook"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Lets take a look at our data. Today, we will start with 'datasets' which are commonly refered to as *toy datasets*. Function `show_plot()` can take 4 values as its paramets.\n",
-    "\n",
-    "* spiral\n",
-    "* circles\n",
-    "* sample\n",
-    "* anything else (this will results in unkown shape and data will be random)\n",
-    "\n",
-    "You can also add new datapoints into the 'dataset' by clicking onto the plot. **Right** click should add <span style=\"color:red\">red</span> datapoints and **left** click should add <span style=\"color:green\">green</span> datapoints."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "X, y, p = utils.show_plot('spiral')\n",
-    "\n",
-    "print(\"Starting shape of data - X: {} | y: {}\".format(X.shape, y.shape))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Keras (https://keras.io/) is a high-level neural networks API, written in Python. As a backend it can use one of the more lower-level, 'research like' neural network libraries, such as, <a href='https://github.com/tensorflow/tensorflow'>Tensorflow</a>, <a href='https://github.com/Theano/Theano'>Theano</a> (whose development has sadly ended) or <a href='https://github.com/Microsoft/cntk'>CNTK</a>. \n",
-    "\n",
-    "All of these libraries operate on the same principle. Rather than building the whole model at once, they build **computational graphs** which are then used to compute gradients using the backpropagation algorithm (if you are interested you can read more about computational graphs and backpropagation here: http://cs231n.github.io/optimization-2/).\n",
-    "\n",
-    "**Keras** library was developed to be used as a quick prototyping tool. It allows you to specify your models and its parts as you would describe them on paper. It supports most of today's standard neural network building blocks as well as most of the deep learning models (Convolutional Neural Networks, Recurent Neural Networks, ...)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The core Keras datastructure is a model. The simplest available model is `Sequential()`, which is just a linear stack of layers (you can build more complex models through Keras functional API). This function returns a model datastructure, to which we can add layers using the `.add()` function.\n",
-    "\n",
-    "The simplest model we can build is called a **Multi Layered Perceptron (MLP)**. This model contains just fully-connected layers. In Keras these are called **Dense** layers. \n",
-    "\n",
-    "The first parameter to the layer is number of neurons it has. We can then specify the **activation function** to be applied on its outputs by setting the `activation` parameter. First layer of any network also needs to have the `input_dim` parameter set. This tells Keras what is the input dimension of the data that is going to be fed into the model. This is needed so that Keras can prepare its weights."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We prepared a simple example model that consits of two layers. The first one has 6 neurons and uses the `tanh` activation function. The second (output) layer has two neurons (one for each class) and softmax activation function. Values at the last layer can now be represented as the probability that current example is part of one of the two classes.\n",
-    "\n",
-    "After you specify your model, you need to compile it so that Keras can build its computational graph and prepare the weights. Each time you change your architecture you need to recompile your model (if you are not a fan of this, check out <a href='http://pytorch.org/'>PyTorch</a> or <a href='https://chainer.org/'>Chainer</a> which build their graphs on the fly)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Our targets are now in one dimensional array with values 0 and 1. This would be fine if our model would output just the probability value at the end (we could for instance use one sigmoid neuron as the output layer). But since we have two neurons with softmax activation (as we want to interpret this as probability of assigment to each of the classes) we need to transform our data into something called **categorical variable** by adding what is called *dummy variables*.\n",
-    "\n",
-    "Each target variable now consits of 2 dimensions. The dimension that represents the *true* value in the previous one dimensional vector is set to **one**. Every other dimension si set to **zero**. Keras provides a handy function `to_categorical` that does this for us. Note that this process is sometimes also called one-hot encoding.\n",
-    "\n",
-    "Feel free to check what this actually does in the next two cells."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "p.y[90:110]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "to_categorical(p.y[90:110])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Finally we come to the last step, which is model training. This can be done using the `fit()` function. You will mostly need to provide a **loss** parameter (that is what loss is the model supposed to optimize), and you can also specify a set of metrics to be computed on the data by setting the **metrics** parameter (it expects a list of strings that represent respective metrics).\n",
-    "\n",
-    "Optimizer is a required parameter that specifies the algorithm that your model is going to be optimized with. In our case we are going to use standard stochastic gradient descent (SGD). \n",
-    "\n",
-    "Keras also prepares the traning/validation set split by using the `validation_split` parameter. It takes float values from 0.0 to 1.0 which represent the fraction of the train set which is going to be used for validation. In our case it is 10%, meaning we set `validation_split=0.1`.\n",
-    "\n",
-    "Let's try it all out in the next cell!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "mlp = Sequential()\n",
-    "mlp.add(Dense(6, activation='tanh', input_dim=X.shape[1]))\n",
-    "mlp.add(Dense(2, activation='softmax'))\n",
-    "\n",
-    "mlp.compile(loss='mse', optimizer=SGD(lr=0.4),  metrics=['accuracy'])\n",
-    "\n",
-    "history = mlp.fit(p.X, to_categorical(p.y), epochs=30, validation_split=0.1, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Fit function returns a `history` object. This contains the history of all values of loss and metrics throughout the training. It is often very useful to plot these values, so that we can better understand what is our model doing (is the loss going down, are we overfitting/underfitting/...) throughout the training process."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "plt.figure()\n",
-    "plt.plot(history.history['loss'], label='training loss')\n",
-    "plt.plot(history.history['val_loss'], label='validation loss')\n",
-    "plt.legend(loc='best')\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.plot(history.history['acc'], label='train accuracy')\n",
-    "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
-    "plt.legend(loc='best')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Keras also provides a handy function that alows us to eveluate our models on the metrics that we previously specified. For example:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "score = mlp.evaluate(p.X, to_categorical(p.y))\n",
-    "print(\"\\n\\nloss: {} | train acc: {}\".format(score[0], score[1]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can also plot points that our model missclassified. Although this is by far only a toy example, error analysis like this is the key to both getting intuition regarding both the problem and the model at hand, and at the same time improving the model/looking differently at the problem in the future."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "Z = np.argmax(mlp.predict(p.X), axis=1)\n",
-    "\n",
-    "wrong_points = p.X[Z != p.y]\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.plot(p.X[:, 0], p.X[:, 1], 'wo')\n",
-    "plt.plot(wrong_points[:, 0], wrong_points[:, 1], 'ro')\n",
-    "plt.title('missclassified points')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For simple classification task like this we can also plot something called *decission boundary*. It shows a threshold line through the *class space*. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x_min, x_max = p.X[:, 0].min() - 1, p.X[:, 0].max() + 1\n",
-    "y_min, y_max = p.X[:, 1].min() - 1, p.X[:, 1].max() + 1\n",
-    "\n",
-    "xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),\n",
-    "                     np.arange(y_min, y_max, 0.02))\n",
-    "\n",
-    "Z = np.argmax(mlp.predict(np.c_[xx.ravel(), yy.ravel()]), axis=1)\n",
-    "\n",
-    "Z = Z.reshape(xx.shape)\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.contourf(xx, yy, Z, alpha=0.6, cmap=plt.cm.coolwarm)\n",
-    "plt.axis('off')\n",
-    "\n",
-    "plt.scatter(p.X[:, 0], p.X[:, 1], c=p.y, cmap=plt.cm.coolwarm)\n",
-    "plt.title('Decision boundary')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Please, provide a brief explanation on what efect does changing the number of neurons or adding/subtracting layers have on the decision boundary and the resulting classification.\n",
-    "\n",
-    "You can also try to change the learning rate or make use of different activation functions or optimizers (check the docs at https://keras.io/ for more details)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "*Write your explanation here*"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "----------------------------------------------------------------------------------------------------------\n",
-    "\n",
-    "As the second part of this assigment, we prepared a new (also often times called *toy*) dataset called **MNIST Fashion dataset**. Note that both the name and the content of this dataset is a \"play\" on the (to some extend) overused \"default\" dataset in Machine Learning which for a long time used to be (and in some cases still is) the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database).\n",
-    "\n",
-    "\n",
-    "This (fashion) dataset consits out of 60 000 training examples and 10 000 test examples. Each datapoint is a grayscale picture that is 28x28 pixels in size. It consists out of 10 classes: \n",
-    "\n",
-    "* T-shirt/top\n",
-    "* Trouser\n",
-    "* Pullover\n",
-    "* Dress\n",
-    "* Coat\n",
-    "* Sandal\n",
-    "* Shirt\n",
-    "* Sneaker\n",
-    "* Bag\n",
-    "* Ankle boot\n",
-    "\n",
-    "We already loaded this data for you so you can focus on optimizing your models. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "X_train, y_train = utils.load_mnist('train')\n",
-    "print(\"Train data shape: {} | labels: {}\".format(X_train.shape, y_train.shape))\n",
-    "\n",
-    "X_test, y_test = utils.load_mnist('test')\n",
-    "print(\"Test data shape: {} | labels: {}\".format(X_test.shape, y_test.shape))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "utils.visualize_mnist(X_train, y_train)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "First, we start as before with a simple *MLP*. Since our data is in the shape of 28x28 pixel image we need to create a single vector out of each data point, so that it can be fed into our model.\n",
-    "\n",
-    "*Note: Neural networks in general are very sensitive to normalizations. It is almost always a good idea to normalize your data. With images it is usualy as simple as dividing by 255 which is the highest value that pixel can have.*"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "X_train = X_train.reshape(-1, 28 * 28)\n",
-    "X_test = X_test.reshape(-1, 28 * 28)\n",
-    "print(\"New train data shape: {} | test data: {}\".format(X_train.shape, X_test.shape))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We prepared a simple MLP with 3 layers for you. The first two with 100 neurons each use the `tanh` activaton function. At the end, there is again a softmax activation function with 10 neurons (one for each class). \n",
-    "\n",
-    "Here we are going to be optimizing **categorical_crossentropy** with our new optimizer called **Adam** (https://arxiv.org/pdf/1412.6980.pdf, https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/). We also prepared same code do display history objects."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "mlp = Sequential()\n",
-    "mlp.add(Dense(100, activation='tanh', input_dim=X_train.shape[1]))\n",
-    "mlp.add(Dense(100, activation='tanh'))\n",
-    "mlp.add(Dense(10, activation='softmax'))\n",
-    "\n",
-    "mlp.compile(loss='categorical_crossentropy', optimizer=Adam(0.04),  metrics=['accuracy'])\n",
-    "\n",
-    "history = mlp.fit(X_train, to_categorical(y_train), epochs=30, validation_split=0.1, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "plt.figure()\n",
-    "plt.plot(history.history['loss'], label='training loss')\n",
-    "plt.plot(history.history['val_loss'], label='validation loss')\n",
-    "plt.legend(loc='best')\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.plot(history.history['acc'], label='train accuracy')\n",
-    "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
-    "plt.legend(loc='best')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "As you may recall, it is good practice to only compute test scores when you have already chosen the *best* model and only compute it **once**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_score = mlp.evaluate(X_train, to_categorical(y_train))\n",
-    "print(\"\\n\\ntrain loss: {} | train acc: {}\\n\".format(train_score[0], train_score[1]))\n",
-    "\n",
-    "test_score = mlp.evaluate(X_test, to_categorical(y_test))\n",
-    "print(\"\\n\\ntest loss: {} | test acc: {}\".format(test_score[0], test_score[1]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, we are going to try to use a **Convolutional Neural Network** on our data. First, some setup. Since convolutions can directly process 2D data (which is how our images are represented), we again reshape our data to 28x28x1 (based on the backend or settings you are using this can also be 1x28x28) -- the last digit is the number of channels our images have. \n",
-    "\n",
-    "Since we only have grayscale images, the number of channels is going to be 1. If we had full RGB images this value would be 3."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "img_rows = 28\n",
-    "img_cols = 28\n",
-    "\n",
-    "if K.image_data_format() == 'channels_first':\n",
-    "    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)\n",
-    "    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)\n",
-    "    input_shape = (1, img_rows, img_cols)\n",
-    "else:\n",
-    "    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)\n",
-    "    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)\n",
-    "    input_shape = (img_rows, img_cols, 1)\n",
-    "    \n",
-    "print(\"New train data shape: {} | test data: {}\".format(X_train.shape, X_test.shape))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A convolution neural network is again just another `Sequential` model. We can add convolutional layers as before by running `.add(Conv2D())`, where the first parameter is the number of filters, the second is the size of these filters followed again by an activation function and for the first layer we also specify the input shape. \n",
-    "\n",
-    "We again prepared a simple starting model. We start with two 2D convolutional layers, the first one having 32 filters and  the second one with 64 filters, each of them of size 3x3. After the first two layers we added a MaxPoolling layer which just halved our output from the second convolutional layer by running max filter of size 2x2. At the end we have only one fully-connected layer with 10 neurons (again one for each class) with the softmax activation function.\n",
-    "\n",
-    "This time we are going to run our optimizer on **batches** of data. This means that we are going to take a batch of 512 examples, compute the loss on all of these examples and backpropage afterwards. This can significantly speed up the process of traning, since we are not computing loss after every single example.\n",
-    "\n",
-    "*Note: if you are planning to run this on school computers we suggest to start slow and run this on only chunk of the data and work your way up, for example start with just 10000 training images.* "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cnn = Sequential()\n",
-    "cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))\n",
-    "cnn.add(Conv2D(64, (3, 3), activation='relu'))\n",
-    "cnn.add(MaxPooling2D(pool_size=(2, 2)))\n",
-    "cnn.add(Flatten())\n",
-    "cnn.add(Dense(10, activation='softmax'))\n",
-    "\n",
-    "cnn.compile(loss='categorical_crossentropy', optimizer=Adam(0.04),  metrics=['accuracy'])\n",
-    "\n",
-    "history = cnn.fit(X_train, to_categorical(y_train), epochs=10, batch_size=512, \n",
-    "                  validation_split=0.1, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "plt.figure()\n",
-    "plt.plot(history.history['loss'], label='training loss')\n",
-    "plt.plot(history.history['val_loss'], label='validation loss')\n",
-    "plt.legend(loc='best')\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.plot(history.history['acc'], label='train accuracy')\n",
-    "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
-    "plt.legend(loc='best')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "train_score = mlp.evaluate(X_train, to_categorical(y_train))\n",
-    "print(\"\\n\\ntrain loss: {} | train acc: {}\\n\".format(train_score[0], train_score[1]))\n",
-    "\n",
-    "test_score = mlp.evaluate(X_test, to_categorical(y_test))\n",
-    "print(\"\\n\\ntest loss: {} | test acc: {}\".format(test_score[0], test_score[1]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## For bonus points:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this special bonus you are going to be competing against each other. We provided you with some starting code (also check out utils.py) for the last dataset.\n",
-    "\n",
-    "Now, **top solutions** on this Fashion MNIST dataset (in terms of **testing accuracy**) are going to be awarded bonus points. Furthermore, the **best report** (as evaluated by the TAs) will be awarded bonus points as well.\n",
-    "\n",
-    "As your final submission, we would like to get the following from you: \n",
-    "\n",
-    "* This notebook, fully filled out.\n",
-    "* Everything necessary to get your results. **Your accuracy needs to be reproducible!** This means that we need to be able to get the same accuracy when we run your scripts.\n",
-    "* Max 2 page report (preferably in [Lecture Notes of Computer Science](https://github.com/latextemplates/LNCS) format) describing your approach and your results.\n",
-    "\n",
-    "Some more notes:\n",
-    "\n",
-    "* Your *testing accuracy* needs to be **at least 70%** in order for your solution to participate.\n",
-    "* Your models need to be neural-network based.\n",
-    "* Your models need to be trained on *just* the **training** data.\n",
-    "* Your models need to be implemented by you. If you are going to use a model that has already been described/used before, make sure you reference it in your report.\n",
-    "\n",
-    "\n",
-    "If you are stuck or would appreciate some help with this bonus, feel free to reach out to the [TAs](http://compbio.fmph.uniba.sk/vyuka/ml/).\n",
-    "\n",
-    "Good luck!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 2",
-   "language": "python",
-   "name": "python2"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "colab": {
+      "name": "neural_networks.ipynb",
+      "provenance": []
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VUEd8Ol_uWTo",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Third assignment: Neural Nets\n",
+        "\n",
+        "Welcome to the third assignment. As we promised last time, in this assigment you are not going to be bothered with implementation details or any enginiering work (well, sort of...). Rather, this time you will try to work with 'state-of-the-art' neural network library (well, sort of...) called **Keras**. Your task will be to play around with some models -- trying to fit them to some data and fine-tune their parameters.\n",
+        "\n",
+        "You may need a bit more computing power (or time...) for this assignment. Thanks to Google Colab this should not be such a big deal but still, expect to spend quite some time fiddling with this assignment. We also have a special *bonus task ready* for you.\n",
+        "\n",
+        "So lets start as usual with some imports (we highly recomend you just run this part of the code and do not modify it)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "yvGrw_jXu0B-",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "from sklearn.datasets import make_circles, make_moons, make_blobs\n",
+        "from matplotlib import pyplot as plt\n",
+        "import numpy as np\n",
+        "import gzip\n",
+        "from matplotlib import pyplot as plt\n",
+        "from keras.models import Sequential\n",
+        "from keras.layers import Conv2D, Dense, Activation, Flatten, MaxPooling2D\n",
+        "from keras.optimizers import SGD, Adam\n",
+        "from keras.utils import to_categorical\n",
+        "from keras import backend as K\n",
+        "\n",
+        "\n",
+        "def get_data(shape='circles', n_samples=200, std=0.1):\n",
+        "    if (shape == 'sample'):\n",
+        "        X, y = make_blobs(n_samples, centers=2)\n",
+        "    elif(shape == 'circles'):\n",
+        "        X, y = make_circles(n_samples, noise=std)\n",
+        "    elif(shape == 'spiral'):\n",
+        "        X, y = make_moons(n_samples, noise=std)\n",
+        "    else:\n",
+        "        print(\"Unknow shape! Drawing random sample data.\")\n",
+        "        X = np.random.rand(n_samples, 2) * std\n",
+        "        y = np.random.randint(0, 2, n_samples)\n",
+        "    return X, y\n",
+        "\n",
+        "\n",
+        "class PlotBuilder:\n",
+        "\n",
+        "    def __init__(self, plot, n_samples, X, y):\n",
+        "        self.X = X\n",
+        "        self.y = y\n",
+        "        self.plot = plot\n",
+        "        x0_data = np.array(plot[0].get_xdata())\n",
+        "        y0_data = np.array(plot[0].get_ydata())\n",
+        "        x1_data = np.array(plot[1].get_xdata())\n",
+        "        y1_data = np.array(plot[1].get_ydata())\n",
+        "\n",
+        "        self.class_0 = np.zeros((len(x0_data), 2))\n",
+        "        self.class_1 = np.zeros((len(x1_data), 2))\n",
+        "\n",
+        "        self.class_0[:, 0] = x0_data\n",
+        "        self.class_0[:, 1] = y0_data\n",
+        "\n",
+        "        self.class_1[:, 0] = x1_data\n",
+        "        self.class_1[:, 1] = y1_data\n",
+        "\n",
+        "        plot[0].figure.canvas.mpl_connect('button_press_event', self)\n",
+        "        plot[1].figure.canvas.mpl_connect('button_press_event', self)\n",
+        "\n",
+        "    def __call__(self, event):\n",
+        "        if event.inaxes != self.plot[0].axes:\n",
+        "            return\n",
+        "\n",
+        "        if event.inaxes != self.plot[1].axes:\n",
+        "            return\n",
+        "\n",
+        "        if event.button == 1:\n",
+        "            # right click\n",
+        "            self.class_0 = np.vstack((self.class_0,\n",
+        "                                      [event.xdata,\n",
+        "                                       event.ydata]))\n",
+        "        elif event.button == 3:\n",
+        "            # left click\n",
+        "            self.class_1 = np.vstack((self.class_1,\n",
+        "                                      [event.xdata,\n",
+        "                                       event.ydata]))\n",
+        "\n",
+        "        self.X = np.vstack((self.class_0, self.class_1))\n",
+        "        self.y = np.array([0] * len(self.class_0) + [1] * len(self.class_1))\n",
+        "        self.plot[0].set_data(self.class_0[:, 0], self.class_0[:, 1])\n",
+        "        self.plot[1].set_data(self.class_1[:, 0], self.class_1[:, 1])\n",
+        "        self.plot[0].figure.canvas.draw()\n",
+        "        self.plot[1].figure.canvas.draw()\n",
+        "\n",
+        "\n",
+        "def show_plot(shape='cirles', n_samples=200, std=0.1):\n",
+        "    X, y = get_data(shape, n_samples, std)\n",
+        "    X_0 = X[y == 0]\n",
+        "    X_1 = X[y == 1]\n",
+        "    fig = plt.figure()\n",
+        "    ax = fig.add_subplot(111)\n",
+        "    ax.set_title('Generated data of \"{}\" shape'.format(shape))\n",
+        "    class_1, = ax.plot(X_0[:, 0], X_0[:, 1], 'go')\n",
+        "    class_2, = ax.plot(X_1[:, 0], X_1[:, 1], 'ro')\n",
+        "    p = PlotBuilder([class_1, class_2], n_samples, X, y)\n",
+        "\n",
+        "    plt.show()\n",
+        "    return p.X, p.y, p\n",
+        "\n",
+        "\n",
+        "mlp.add(Dense(100, activation='tanh'))\n",
+        "def load_mnist(kind):\n",
+        "    labels_path = './{}-labels-idx1-ubyte.gz'.format(kind)\n",
+        "    images_path = './{}-images-idx3-ubyte.gz'.format(kind)\n",
+        "\n",
+        "    with gzip.open(labels_path, 'rb') as lbpath:\n",
+        "        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,\n",
+        "                               offset=8)\n",
+        "\n",
+        "    with gzip.open(images_path, 'rb') as imgpath:\n",
+        "        images = np.frombuffer(imgpath.read(), dtype=np.uint8,\n",
+        "                               offset=16).reshape(len(labels), 784)\n",
+        "    return images.reshape(len(images), 28, 28), labels\n",
+        "\n",
+        "\n",
+        "def visualize_mnist(X, y):\n",
+        "    label_idx = [y == i for i in range(10)]\n",
+        "    ims = np.array([X[label_idx[i]][:10] for i in range(10)])\n",
+        "\n",
+        "    classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
+        "               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n",
+        "\n",
+        "    f, axarr = plt.subplots(10, 10, figsize=(12, 12))\n",
+        "    for i in range(10):\n",
+        "        axarr[0][i].set_title(classes[i])\n",
+        "        for j in range(10):\n",
+        "            axarr[i][j].imshow(ims[j][i], 'gray')\n",
+        "            axarr[i][j].axis('off')\n",
+        "\n",
+        "    plt.show()\n",
+        "\n",
+        "# Download data into the notebook environment\n",
+        "!wget -q https://github.com/NaiveNeuron/ml_exercises/raw/master/assignment3/test-images-idx3-ubyte.gz\n",
+        "!wget -q https://github.com/NaiveNeuron/ml_exercises/raw/master/assignment3/train-images-idx3-ubyte.gz\n",
+        "!wget -q https://github.com/NaiveNeuron/ml_exercises/raw/master/assignment3/test-labels-idx1-ubyte.gz\n",
+        "!wget -q https://github.com/NaiveNeuron/ml_exercises/raw/master/assignment3/train-labels-idx1-ubyte.gz\n",
+        "\n",
+        "# Ensure matplotlib images are shown\n",
+        "%matplotlib inline"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "WbfMigjpuWUE",
+        "colab_type": "text"
+      },
+      "source": [
+        "Lets take a look at our data. Today, we will start with 'datasets' which are commonly refered to as *toy datasets*. Function `show_plot()` can take 4 values as its paramets.\n",
+        "\n",
+        "* spiral\n",
+        "* circles\n",
+        "* sample\n",
+        "* anything else (this will results in unkown shape and data will be random)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "JZWBy-FquWUI",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "X, y, p = show_plot('spiral')\n",
+        "\n",
+        "print(\"Starting shape of data - X: {} | y: {}\".format(X.shape, y.shape))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0_gwVxx4uWUP",
+        "colab_type": "text"
+      },
+      "source": [
+        "Keras (https://keras.io/) is a high-level neural networks API, written in Python. As a backend it can use one of the more lower-level, 'research like' neural network libraries, such as, <a href='https://github.com/tensorflow/tensorflow'>Tensorflow</a>, <a href='https://github.com/Theano/Theano'>Theano</a> (whose development has sadly ended) or <a href='https://github.com/Microsoft/cntk'>CNTK</a>. \n",
+        "\n",
+        "All of these libraries operate on the same principle. Rather than building the whole model at once, they build **computational graphs** which are then used to compute gradients using the backpropagation algorithm (if you are interested you can read more about computational graphs and backpropagation here: http://cs231n.github.io/optimization-2/).\n",
+        "\n",
+        "**Keras** library was developed to be used as a quick prototyping tool. It allows you to specify your models and its parts as you would describe them on paper. It supports most of today's standard neural network building blocks as well as most of the deep learning models (Convolutional Neural Networks, Recurent Neural Networks, ...)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qQtiyY38uWUS",
+        "colab_type": "text"
+      },
+      "source": [
+        "The core Keras datastructure is a model. The simplest available model is `Sequential()`, which is just a linear stack of layers (you can build more complex models through Keras functional API). This function returns a model datastructure, to which we can add layers using the `.add()` function.\n",
+        "\n",
+        "The simplest model we can build is called a **Multi Layered Perceptron (MLP)**. This model contains just fully-connected layers. In Keras these are called **Dense** layers. \n",
+        "\n",
+        "The first parameter to the layer is number of neurons it has. We can then specify the **activation function** to be applied on its outputs by setting the `activation` parameter. First layer of any network also needs to have the `input_dim` parameter set. This tells Keras what is the input dimension of the data that is going to be fed into the model. This is needed so that Keras can prepare its weights."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "GYCazihRuWUU",
+        "colab_type": "text"
+      },
+      "source": [
+        "We prepared a simple example model that consits of two layers. The first one has 6 neurons and uses the `tanh` activation function. The second (output) layer has two neurons (one for each class) and softmax activation function. Values at the last layer can now be represented as the probability that current example is part of one of the two classes.\n",
+        "\n",
+        "After you specify your model, you need to compile it so that Keras can build its computational graph and prepare the weights. Each time you change your architecture you need to recompile your model (if you are not a fan of this, check out <a href='http://pytorch.org/'>PyTorch</a> or <a href='https://chainer.org/'>Chainer</a> which build their graphs on the fly)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "H5p2maq2uWUX",
+        "colab_type": "text"
+      },
+      "source": [
+        "Our targets are now in one dimensional array with values 0 and 1. This would be fine if our model would output just the probability value at the end (we could for instance use one sigmoid neuron as the output layer). But since we have two neurons with softmax activation (as we want to interpret this as probability of assigment to each of the classes) we need to transform our data into something called **categorical variable** by adding what is called *dummy variables*.\n",
+        "\n",
+        "Each target variable now consits of 2 dimensions. The dimension that represents the *true* value in the previous one dimensional vector is set to **one**. Every other dimension si set to **zero**. Keras provides a handy function `to_categorical` that does this for us. Note that this process is sometimes also called one-hot encoding.\n",
+        "\n",
+        "Feel free to check what this actually does in the next two cells."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "CpbhMZRSuWUZ",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "y[90:110]"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ogOIKVTBuWUh",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "to_categorical(y[90:110])"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gEuVLp24uWUo",
+        "colab_type": "text"
+      },
+      "source": [
+        "Finally we come to the last step, which is model training. This can be done using the `fit()` function. You will mostly need to provide a **loss** parameter (that is what loss is the model supposed to optimize), and you can also specify a set of metrics to be computed on the data by setting the **metrics** parameter (it expects a list of strings that represent respective metrics).\n",
+        "\n",
+        "Optimizer is a required parameter that specifies the algorithm that your model is going to be optimized with. In our case we are going to use standard stochastic gradient descent (SGD). \n",
+        "\n",
+        "Keras also prepares the traning/validation set split by using the `validation_split` parameter. It takes float values from 0.0 to 1.0 which represent the fraction of the train set which is going to be used for validation. In our case it is 10%, meaning we set `validation_split=0.1`.\n",
+        "\n",
+        "Let's try it all out in the next cell!"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "MsDxjEDquWUs",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "mlp = Sequential()\n",
+        "mlp.add(Dense(6, activation='tanh', input_dim=X.shape[1]))\n",
+        "mlp.add(Dense(2, activation='softmax'))\n",
+        "\n",
+        "mlp.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.4),  metrics=['accuracy'])\n",
+        "\n",
+        "history = mlp.fit(p.X, to_categorical(p.y), epochs=30, validation_split=0.1, verbose=True)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cCyGttpRuWUz",
+        "colab_type": "text"
+      },
+      "source": [
+        "Fit function returns a `history` object. This contains the history of all values of loss and metrics throughout the training. It is often very useful to plot these values, so that we can better understand what is our model doing (is the loss going down, are we overfitting/underfitting/...) throughout the training process."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "m2UQwXu9uWU3",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "plt.figure()\n",
+        "plt.plot(history.history['loss'], label='training loss')\n",
+        "plt.plot(history.history['val_loss'], label='validation loss')\n",
+        "plt.legend(loc='best')\n",
+        "\n",
+        "plt.figure()\n",
+        "plt.plot(history.history['acc'], label='train accuracy')\n",
+        "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
+        "plt.legend(loc='best')\n",
+        "plt.show()"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "uPDDZ5ZUuWWN",
+        "colab_type": "text"
+      },
+      "source": [
+        "Keras also provides a handy function that alows us to eveluate our models on the metrics that we previously specified. For example:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "V8o58HWMuWWu",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "score = mlp.evaluate(p.X, to_categorical(p.y))\n",
+        "print(\"\\n\\nloss: {} | train acc: {}\".format(score[0], score[1]))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QVTY8YtOuWXB",
+        "colab_type": "text"
+      },
+      "source": [
+        "We can also plot points that our model missclassified. Although this is by far only a toy example, error analysis like this is the key to both getting intuition regarding both the problem and the model at hand, and at the same time improving the model/looking differently at the problem in the future."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "IaYTk84MuWXL",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "Z = np.argmax(mlp.predict(p.X), axis=1)\n",
+        "\n",
+        "wrong_points = p.X[Z != p.y]\n",
+        "wrong_classifications = p.y[Z != p.y]\n",
+        "\n",
+        "plt.figure()\n",
+        "\n",
+        "axes = plt.gca()\n",
+        "x_min, x_max = p.X[:, 0].min() - 1, p.X[:, 0].max() + 1\n",
+        "y_min, y_max = p.X[:, 1].min() - 1, p.X[:, 1].max() + 1\n",
+        "axes.set_xlim([x_min, x_max])\n",
+        "axes.set_ylim([y_min, y_max])\n",
+        "\n",
+        "plt.scatter(wrong_points[:, 0], wrong_points[:, 1], c=wrong_classifications, cmap=plt.cm.coolwarm)\n",
+        "plt.title('Missclassified points')\n",
+        "plt.show()"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fTKVZwOOuWYF",
+        "colab_type": "text"
+      },
+      "source": [
+        "For simple classification task like this we can also plot something called *decission boundary*. It shows a threshold line through the *class space*. "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "UD68wf_ouWYa",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),\n",
+        "                     np.arange(y_min, y_max, 0.02))\n",
+        "\n",
+        "Z = np.argmax(mlp.predict(np.c_[xx.ravel(), yy.ravel()]), axis=1)\n",
+        "\n",
+        "Z = Z.reshape(xx.shape)\n",
+        "\n",
+        "plt.figure()\n",
+        "plt.contourf(xx, yy, Z, alpha=0.6, cmap=plt.cm.coolwarm)\n",
+        "\n",
+        "plt.scatter(p.X[:, 0], p.X[:, 1], c=p.y, cmap=plt.cm.coolwarm)\n",
+        "plt.title('Decision boundary')\n",
+        "plt.show()"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "edNZHB_juWZQ",
+        "colab_type": "text"
+      },
+      "source": [
+        "Please, provide a brief explanation on what efect does changing the number of neurons or adding/subtracting layers have on the decision boundary and the resulting classification.\n",
+        "\n",
+        "You can also try to change the learning rate or make use of different activation functions or optimizers (check the docs at https://keras.io/ for more details)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8KnrUYtmuWZu",
+        "colab_type": "text"
+      },
+      "source": [
+        "*Write your explanation here*"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "habZuLrTuWZ8",
+        "colab_type": "text"
+      },
+      "source": [
+        "----------------------------------------------------------------------------------------------------------\n",
+        "\n",
+        "As the second part of this assigment, we prepared a new (also often times called *toy*) dataset called **MNIST Fashion dataset**. Note that both the name and the content of this dataset is a \"play\" on the (to some extend) overused \"default\" dataset in Machine Learning which for a long time used to be (and in some cases still is) the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database).\n",
+        "\n",
+        "\n",
+        "This (fashion) dataset consits out of 60 000 training examples and 10 000 test examples. Each datapoint is a grayscale picture that is 28x28 pixels in size. It consists out of 10 classes: \n",
+        "\n",
+        "* T-shirt/top\n",
+        "* Trouser\n",
+        "* Pullover\n",
+        "* Dress\n",
+        "* Coat\n",
+        "* Sandal\n",
+        "* Shirt\n",
+        "* Sneaker\n",
+        "* Bag\n",
+        "* Ankle boot\n",
+        "\n",
+        "We already loaded this data for you so you can focus on optimizing your models. "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "p9XQzyPnuWaL",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "X_train, y_train = load_mnist('train')\n",
+        "print(\"Train data shape: {} | labels: {}\".format(X_train.shape, y_train.shape))\n",
+        "\n",
+        "X_test, y_test = load_mnist('test')\n",
+        "print(\"Test data shape: {} | labels: {}\".format(X_test.shape, y_test.shape))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "z0P6TGMBuWcp",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "visualize_mnist(X_train, y_train)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7ZEbjCazuWdS",
+        "colab_type": "text"
+      },
+      "source": [
+        "First, we start as before with a simple *MLP*. Since our data is in the shape of 28x28 pixel image we need to create a single vector out of each data point, so that it can be fed into our model.\n",
+        "\n",
+        "*Note: Neural networks in general are very sensitive to input normalization. It is almost always a good idea to normalize your data. With images it can usualy be as simple as dividing any value by 255 which is the highest value that a pixel can have.*"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "bWGASiIFuWdy",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "X_train = X_train.reshape(-1, 28 * 28)\n",
+        "X_test = X_test.reshape(-1, 28 * 28)\n",
+        "print(\"New train data shape: {} | test data: {}\".format(X_train.shape, X_test.shape))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kclULjLPuWeW",
+        "colab_type": "text"
+      },
+      "source": [
+        "We prepared a simple MLP with 3 layers for you. The first two with 100 neurons each use the `tanh` activaton function. At the end, there is again a softmax activation function with 10 neurons (one for each class). \n",
+        "\n",
+        "Here we are going to be optimizing **categorical_crossentropy** with our new optimizer called **Adam** (https://arxiv.org/pdf/1412.6980.pdf, https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/). We also prepared same code do display history objects."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "B2mtkwYfuWe-",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "mlp = Sequential()\n",
+        "mlp.add(Dense(100, activation='tanh', input_dim=X_train.shape[1]))\n",
+        "mlp.add(Dense(100, activation='tanh'))\n",
+        "mlp.add(Dense(10, activation='softmax'))\n",
+        "\n",
+        "mlp.compile(loss='categorical_crossentropy', optimizer=Adam(0.04),  metrics=['accuracy'])\n",
+        "\n",
+        "history = mlp.fit(X_train, to_categorical(y_train), epochs=30, validation_split=0.1, verbose=True)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Q5GF_f66uWfd",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "plt.figure()\n",
+        "plt.plot(history.history['loss'], label='training loss')\n",
+        "plt.plot(history.history['val_loss'], label='validation loss')\n",
+        "plt.legend(loc='best')\n",
+        "\n",
+        "plt.figure()\n",
+        "plt.plot(history.history['acc'], label='train accuracy')\n",
+        "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
+        "plt.legend(loc='best')\n",
+        "plt.show()"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8Fr6hpfLuWf-",
+        "colab_type": "text"
+      },
+      "source": [
+        "As you may recall, it is good practice to only compute test scores when you have already chosen the *best* model and only compute it **once**."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "qnfqmAGmuWgN",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "train_score = mlp.evaluate(X_train, to_categorical(y_train))\n",
+        "print(\"\\n\\ntrain loss: {} | train acc: {}\\n\".format(train_score[0], train_score[1]))\n",
+        "\n",
+        "test_score = mlp.evaluate(X_test, to_categorical(y_test))\n",
+        "print(\"\\n\\ntest loss: {} | test acc: {}\".format(test_score[0], test_score[1]))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8T1A4VccuWgn",
+        "colab_type": "text"
+      },
+      "source": [
+        "Now, we are going to try to use a **Convolutional Neural Network** on our data. First, some setup. Since convolutions can directly process 2D data (which is how our images are represented), we again reshape our data to 28x28x1 (based on the backend or settings you are using this can also be 1x28x28) -- the last digit is the number of channels our images have. \n",
+        "\n",
+        "Since we only have grayscale images, the number of channels is going to be 1. If we had full RGB images this value would be 3."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "JKkp0JWNuWg2",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "img_rows = 28\n",
+        "img_cols = 28\n",
+        "\n",
+        "if K.image_data_format() == 'channels_first':\n",
+        "    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)\n",
+        "    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)\n",
+        "    input_shape = (1, img_rows, img_cols)\n",
+        "else:\n",
+        "    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)\n",
+        "    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)\n",
+        "    input_shape = (img_rows, img_cols, 1)\n",
+        "    \n",
+        "print(\"New train data shape: {} | test data: {}\".format(X_train.shape, X_test.shape))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "X_Vf9FsIuWhW",
+        "colab_type": "text"
+      },
+      "source": [
+        "A convolution neural network is again just another `Sequential` model. We can add convolutional layers as before by running `.add(Conv2D())`, where the first parameter is the number of filters, the second is the size of these filters followed again by an activation function and for the first layer we also specify the input shape. \n",
+        "\n",
+        "We again prepared a simple starting model. We start with two 2D convolutional layers, the first one having 32 filters and  the second one with 64 filters, each of them of size 3x3. After the first two layers we added a MaxPoolling layer which just halved our output from the second convolutional layer by running max filter of size 2x2. At the end we have only one fully-connected layer with 10 neurons (again one for each class) with the softmax activation function.\n",
+        "\n",
+        "This time we are going to run our optimizer on **batches** of data. This means that we are going to take a batch of 512 examples, compute the loss on all of these examples and backpropage afterwards. This can significantly speed up the process of traning, since we are not computing loss after every single example.\n",
+        "\n",
+        "*Note: if you are planning to run this on school computers we suggest to start slow and run this on only chunk of the data and work your way up, for example start with just 10000 training images.* "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "bNy8H5KouWhY",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "cnn = Sequential()\n",
+        "cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))\n",
+        "cnn.add(Conv2D(64, (3, 3), activation='relu'))\n",
+        "cnn.add(MaxPooling2D(pool_size=(2, 2)))\n",
+        "cnn.add(Flatten())\n",
+        "cnn.add(Dense(10, activation='softmax'))\n",
+        "\n",
+        "cnn.compile(loss='categorical_crossentropy', optimizer=Adam(0.04),  metrics=['accuracy'])\n",
+        "\n",
+        "history = cnn.fit(X_train, to_categorical(y_train), epochs=10, batch_size=512, \n",
+        "                  validation_split=0.1, verbose=True)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "wUSgCUmouWiK",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "plt.figure()\n",
+        "plt.plot(history.history['loss'], label='training loss')\n",
+        "plt.plot(history.history['val_loss'], label='validation loss')\n",
+        "plt.legend(loc='best')\n",
+        "\n",
+        "plt.figure()\n",
+        "plt.plot(history.history['acc'], label='train accuracy')\n",
+        "plt.plot(history.history['val_acc'], label='validation accuracy')\n",
+        "plt.legend(loc='best')\n",
+        "plt.show()"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "CLUYOCmcuWjS",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "train_score = mlp.evaluate(X_train, to_categorical(y_train))\n",
+        "print(\"\\n\\ntrain loss: {} | train acc: {}\\n\".format(train_score[0], train_score[1]))\n",
+        "\n",
+        "test_score = mlp.evaluate(X_test, to_categorical(y_test))\n",
+        "print(\"\\n\\ntest loss: {} | test acc: {}\".format(test_score[0], test_score[1]))"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kOxUR0aAuWkk",
+        "colab_type": "text"
+      },
+      "source": [
+        "## For bonus points:"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "PLfs5TUvuWkl",
+        "colab_type": "text"
+      },
+      "source": [
+        "In this special bonus you are going to be competing against each other (and the outside world). We provided you with some starting code (also check out the first cell of this notebook) for the last dataset.\n",
+        "\n",
+        "Now, **top solutions** on this Fashion MNIST dataset (in terms of **testing accuracy**) are going to be awarded bonus points. Furthermore, the **best report** (as evaluated by the TAs) will be awarded bonus points as well.\n",
+        "\n",
+        "As your final submission, we would like to get the following from you: \n",
+        "\n",
+        "* This notebook, fully filled out.\n",
+        "* Everything necessary to get your results. **Your accuracy needs to be reproducible!** This means that we need to be able to get the same accuracy when we run your scripts.\n",
+        "* Max 2 page report (preferably in [Lecture Notes of Computer Science](https://github.com/latextemplates/LNCS) format) describing your approach and your results.\n",
+        "\n",
+        "Some more notes:\n",
+        "\n",
+        "* Your *testing accuracy* needs to be **at least 70%** in order for your solution to participate.\n",
+        "* Your models need to be neural-network based.\n",
+        "* Your models need to be trained on *just* the **training** data.\n",
+        "* Your models need to be implemented by you. If you are going to use a model that has already been described/used before, make sure you reference it in your report.\n",
+        "\n",
+        "\n",
+        "If you are stuck or would appreciate some help with this bonus, feel free to reach out to the [TAs](http://compbio.fmph.uniba.sk/vyuka/ml/).\n",
+        "\n",
+        "Good luck!"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "9xfYsHx-uWkv",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": 0,
+      "outputs": []
+    }
+  ]
+}
\ No newline at end of file