Famous MNIST dataset ported to Deno land.
This package downloads the MNIST dataset, unpacks it and stores on users's machine. To load the dataset:
import { loadMnist } from "https://deno.land/x/[email protected]/mod.ts";
const mnist = await loadMnist();
Dataset is split up in two parts: train data (60,000 images) and test data (10,000) images. These arrays are, in turn ordered so that first part contains easier to recognize images, than the second part. Why is it so described on Yann LeCun's original page. So, yo probably want to shuffle those images first, for that there is a shuffle util:
import { loadMnist, shuffle } from "https://deno.land/x/[email protected]/mod.ts";
const mnist = await loadMnist();
const trainData = shuffle(mnist.train);
Each image array consist of pairs – image
and it's label
. Image is an array of 784 (28×28)
integers from 0 to 255. 0 represents clear paper, 255 – the deepest (black) ink. You can normalize
these images to values between 0 and 1 using normalize()
utility function:
const trainData = shuffle(mnist.train).map(d => {label: d.label, image: normalize(d.image)});
Label is, of course, the digit that is encoded in the array. You can look at what that digit looks
like using printDigit
function, e.g.:
console.log(printDigit(mnist.test[3378].image));
console.log(mnist.test[3378].label);
Will output:
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ ▓ ░ ░ ░ ░ ░ ░ ░ ▒ ▓ ▓ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▓ █ █ █ █ █ █ █ █ █ ▓ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ █ █ █ █ █ █ ▓ █ █ █ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▓ ░ ░ ░ ░ ░ ░ ▒ █ █ ▓ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ▓ ░ ░ ░ ░ ░ ░ █ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ▒ ░ ░ ░ ░ ░ ▓ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
7
The only useful function that is left in utils is downscaleImage
it wil turn 784 (28×28) array into
196 (14×14) array:
console.log(downscaleImage(mnist.test[3378].image).length); // -> 196
console.log(printDigit(downscaleImage(mnist.test[3378].image)));
Down-scaled output is:
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ▒ ▓ ▒ ▒ ▓ █ ▓ ░ ░ ░
░ ░ ░ ░ ▓ ▓ ▓ ▓ ▓ █ ▒ ░ ░ ░
░ ░ ░ ░ █ ▒ ░ ░ ▓ ▓ ░ ░ ░ ░
░ ░ ░ ░ ▒ ░ ░ ░ █ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ▓ ▓ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ▒ ▓ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ▓ ▒ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░ ░
Data is packed in gzip files and will be unpacked on first run, so don't forget to add
--allow-read
and --allow-write
flags when you first run your program that uses the dataset.
There's only one option cacheDir
which tells the loader where to store the unpacked dataset. By
default it creates .mnist_data
folder with .gitignore
file in it, which ignores the folder
content. You can change this behavior, e.g.:
import { loadMnist } from "https://deno.land/x/[email protected]/mod.ts";
const mnist = await loadMnist({ cacheDir: "/some/other/path" });
Note that in case you defined the folder yourself, the loader won't make .gitignore
in it.