Dataset:
CIFAR-10 Dataset Download Link: https://drive.google.com/file/d/1XpBBM95ZhlF_QuDc54rLQalbQ1aDfv-m/view?usp=sharing
Unzip the file and create a dir name "data" , put "cifar10png" inside
Environment's Requirement:
- Python == 3.6.9
- numpy == 1.16.5 (anaconda* / pip)
- tensorflow-gpu == 1.14 (anaconda* / pip)
- tensorboard == 1.14 (anaconda* / pip)
- opencv-python == 4.1.1.26 (pip*)
- sklearn == 0.0 (pip*)
( * is just my choice in installing those packages )
Folder Dir Arch:
Lcheckpoints
Lconvert
Ldata
Lcifar10png
Lgraphs
Lmodels
Lsrc
Lconfig.py
Lutils
Lweights
dataset.py
trainer.py
run.py
How to Run:
-
execute run.py
T = Trainer() T.optimize(num_epoch=1) T.export()
Option: Select number of epoch for training
(Optional) 2. Edit Config in src/config.py to select different models
i. select different models
model_arch = "lenet5"
if model_arch == "mobilenetv2":
img_size = 224
if model_arch == "lenet5":
img_size = 32
Option: The current supporting models include lenet5 , mobilenetv2
ii. change validation size
# batch size
batch_size = 32
# validation split
validation_size = .16
Option: The default bs is 32 and train : valid spilt is 84 : 16
- Execute run_test.py after you obtain the lenet5.pb model in weights to get test set accuracy
How to Read Results: After a successful run, tensorboard should write a result graph in the dir ./graph/{network name change directory to the ./graph/lenet5 file and type in console: tensorboard --logdir=./
Here is the result of mobilenetv2 , acc ~ 0.84 , after 30 epoches of training
To view the result of the two models , Download the following tensorboard graph: https://drive.google.com/file/d/1etOCcu3eAZvEH_KY_rYFKighLdfT7CIt/view?usp=sharing
Explanation About the Code:
How to load the batches of data to train the network?
Please Read dataset.py :
-
For reading images and labels
-
Images path will be the fullpath of the images
-
Images will undergo Normalization and Standardization when loading the batches into the model
a. Normalization: 1/255 --> Data will change From 0-255 to 0-1
b. Standardization:
image_normalize_mean = [0.485, 0.456, 0.406] image_normalize_std = [0.229, 0.224, 0.225] for i in range(start, end): image = cv2.imread(self._images_path[i]) image = cv2.resize(image, (self._image_size, self._image_size), cv2.INTER_LINEAR) images.append(image) images = np.array(images) images = images.astype(np.float32) # Normalization and Standardization images = np.multiply(images, 1.0 / 255.0) for image in images: for channel in range(3) : image[:, :, channel] -= image_normalize_mean[channel] image[:, :, channel] /= image_normalize_std[channel]
Explanation in details: https://towardsdatascience.com/normalization-vs-standardization-cb8fe15082eb
- labels will be in the form of [ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0] , where 1 represent the groundtruth of the image class and the length is 10 since there are 10 classes
Please read the trainer.py file
-
In tensorflow , we use something called session to run the input and ground truth label. To train our model we must feed in the data and let our optimizer to do back propagation and update model weighting
x_batch, y_true_batch, _, cls_batch = self.data.train.next_batch(self.train_batch_size) x_valid_batch, y_valid_batch, _, valid_cls_batch = self.data.valid.next_batch(self.train_batch_size) feed_dict_train = {self.x: x_batch, self.y_true: y_true_batch} feed_dict_validate = {self.x: x_valid_batch, self.y_true: y_valid_batch} self.session.run(self.optimizer, feed_dict=feed_dict_train)
How to define our loss and optimizer?
Please read the trainer.py file
- Different problems use differemt loss and activation function in Final Layer
In image classification , we usually use something called softmax and cross entrophy
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.
self.cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=self.model.getOutput(), labels=self.y_true)
Detailed Explanation (In Simplified Chinese): https://zhuanlan.zhihu.com/p/35709485 (About the calculation of loss using Cross Entro.)
-
For my code , I use Adam Optimizer and a Learning rate with Exponential Decay
self.learning_rate = tf.train.exponential_decay(self.lr, self.global_step, self.step_rate, self.decay, staircase=True) self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=0.01).minimize(self.cost, self.global_step)
Detailed Explanation (In Traditional Chinese): https://medium.com/%E9%9B%9E%E9%9B%9E%E8%88%87%E5%85%94%E5%85%94%E7%9A%84%E5%B7%A5%E7%A8%8B%E4%B8%96%E7%95%8C/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92ml-note-sgd-momentum-adagrad-adam-optimizer-f20568c968db
Brief Introduction About CNN:
Example of a CNN Network
Convolutional Layer extract features from the images
Pooling Layer Simplify the features and make it smaller
Relu Activation Layer makes sure no negative values exist (Activation function : Linear(wont elminate -ve), Relu, Tanh, Swish activation function (swish is a quite new act. func and it is proved to increase the acc of a network)
In the Final Layer, we will add Fully Connection Layer and Softmax Function (For cases of classification)
Some basic knowledge abt CNN (in Trad Chinese): https://medium.com/%E9%9B%9E%E9%9B%9E%E8%88%87%E5%85%94%E5%85%94%E7%9A%84%E5%B7%A5%E7%A8%8B%E4%B8%96%E7%95%8C/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-ml-note-convolution-neural-network-%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-bfa8566744e9
Using LeNet 5 as an example:
FAQ:
-
Difference btw random normal & truncated normal Explanation https://stackoverflow.com/questions/41704484/what-is-difference-between-tf-truncated-normal-and-tf-random-
-
Benefit of the truncated normal distribution in initializing weights in a neural network: https://stats.stackexchange.com/questions/228670/what-is-the-benefit-of-the-truncated-normal-distribution-in-initializing-weights
-
What does mean = 0, sigma = 0.1 Normal distribution looks like: https://www.wolframalpha.com/input/?i=normal+distribution+with+mean+%3D+0%2C+sigma+%3D+0.1
-
How to calculate the input and output after passing through convolution or pooling?
For example , how do we know that the output is 28 * 28 after passing an input 32*32 in the first layer
We need to use this equation :
# TODO: Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x6.
conv1_w = tf.Variable(tf.truncated_normal(shape=[5, 5, 3, 6], mean=mu, stddev=sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(self.inputImage, conv1_w, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# TODO: Activation.
conv1 = tf.nn.relu(conv1)
In the first layer, we can observe that the kernel size is 5 , stride = 1 , padding = 0 (VALID means no zero paddding, SAME means using zero padding)
and Thus : [ (32 + 2 * 0 - 5 ) / 1 ] + 1 = 27 + 1 = 28
But why the dimension becomes 6 ?
A guide to receptive field arithmetic for Convolutional Neural Networks(Written in English): https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807 Detailed Explanation of CNN(Written in English): https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad5df8
But what is zero-padding?
class LeNet:
def __init__(self , shape):
# Hyperparameters
mu = 0
sigma = 0.1
layer_depth = {
'layer_1': 6,
'layer_2': 16,
'layer_3': 120,
'layer_f1': 84
}
self.inputImage = tf.placeholder(tf.float32, shape=shape, name='Image')
#x = tf.identity(x, "Input")
# TODO: Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
conv1_w = tf.Variable(tf.truncated_normal(shape=[5, 5, 3, 6], mean=mu, stddev=sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(self.inputImage, conv1_w, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# TODO: Activation.
conv1 = tf.nn.relu(conv1)
# TODO: Pooling. Input = 28x28x6. Output = 14x14x6.
pool_1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# TODO: Layer 2: Convolutional. Output = 10x10x16.
conv2_w = tf.Variable(tf.truncated_normal(shape=[5, 5, 6, 16], mean=mu, stddev=sigma))
conv2_b = tf.Variable(tf.zeros(16))
conv2 = tf.nn.conv2d(pool_1, conv2_w, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# TODO: Activation.
conv2 = tf.nn.relu(conv2)
# TODO: Pooling. Input = 10x10x16. Output = 5x5x16.
pool_2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# TODO: Flatten. Input = 5x5x16. Output = 400.
fc1 = flatten(pool_2)
# TODO: Layer 3: Fully Connected. Input = 400. Output = 120.
fc1_w = tf.Variable(tf.truncated_normal(shape=(400, 120), mean=mu, stddev=sigma))
fc1_b = tf.Variable(tf.zeros(120))
fc1 = tf.matmul(fc1, fc1_w) + fc1_b
# TODO: Activation.
fc1 = tf.nn.relu(fc1)
# TODO: Layer 4: Fully Connected. Input = 120. Output = 84.
fc2_w = tf.Variable(tf.truncated_normal(shape=(120, 84), mean=mu, stddev=sigma))
fc2_b = tf.Variable(tf.zeros(84))
fc2 = tf.matmul(fc1, fc2_w) + fc2_b
# TODO: Activation.
fc2 = tf.nn.relu(fc2)
# TODO: Layer 5: Fully Connected. Input = 84. Output = 10.
fc3_w = tf.Variable(tf.truncated_normal(shape=(84, config.total_class), mean=mu, stddev=sigma))
fc3_b = tf.Variable(tf.zeros(10))
output = tf.matmul(fc2, fc3_w) + fc3_b
output = tf.identity(output,"output")
self.output = output
Please read the models/lenet5 file:
Model architecture: (Underconstruction)