Computer vision

 # Computer vision

 How to make machines see computer vision and we’ll present thank you Claire  said yes. And we will present a competition that unlike deep traffic which is designed to explore ideas teach you about concepts of drl seg fuse the deep dynamic  driving scene segmentation competition that I’ll present today is at the very cutting edge whoever does well in this competition is likely to produce a publication or ideas that would lead the world in the area of perception perhaps together with the people running this class perhaps in your own and I  encourage you to do so even more cats today computer vision today as it stands is deep learning majority of the successes in how we interpret form representations understand images and videos utilize to a significant degree neural networks the very ideas we’ve been talking about that applies for supervised unsupervised and RL and for the supervised case is just the focus of today the process is the same the date is essential there’s annotated data where the human provides the labels that serves as the ground truth in the training process then the NN ghost’s through that data learning to map from the raw sensory input to the ground truth labels and then generalize or the testing data set and the kind of raw sensors were dealing with their numbers I’ll say this again and again that for human vision for us here would take for granted this particular aspect of our ability is to take in raw sensory information through our eyes and interpret but it’s just numbers that’s sth whether you’re an expert computer vision  person or new to the field you have to always go back to meditate on is what kind of things the machine is given what is the data that is tasked to work with in order to perform the tasks you’re asking it to do perhaps the data is given is highly insufficient to do want to do that’s the question I’ll come up again and again our images enough to understand the world around you and given these numbers the set of numbers sometimes with one channel sometimes with these RGB where every single pixel have three different colors the task is to classify or regress produce a continues variable or one of a set of class labels as before we must be careful about our intuition of what is hard and what is easy in computer vision.

 ## Biological inspiration for computation: neuron

 Visual cortex is in layers and as information passes from the eyes to the to the parts  of the brain that makes senses of the raw sensor information higher and higher order representations have formed this is the inspiration the idea behind using drl for images higher and higher representations of form through the layers there early layers taking in the very raw and sensory information then extracting edges connecting those edges forming those edges to form more complex features and finally into the higher-order semantic meaning that we hope to get from these images in cv   dl is  hard. I’ll say this again the illumination variability is the biggest challenge or at the least one of the one the biggest challenges in driving for visible light cameras pose variability the objects as I’ll also discuss about some of the advances Geoff hinton and the capsule networks the idea with the NNs as they’re currently useful CV are not good with representing variable pose these objects in images and this 2d plane of color and texture look very different numerically when the object is rotated and the object is mangled and shaped in different ways the deformable will truncated cat intraclass variability the for the classification task which would be an example today throughout to introduce some of the networks over the past decade that have received success in some of the tuition and insight that made those networks work classification there is a lot of variability inside the classes and very little variability between the classes all of these are cats on top all of those are dogs are bottom they look very different and the the other I would say the second biggest problem in driving perception visible light camera perceptions occlusion when part of the object is occluded due to the three-dimensional nature of our world some objects in front of others and they occlude the background object and yet we’re still tasked with identifying the object when only part of it is visible and sometimes that part told you there’s cats is very hardly visible here we’re tasked with classifying  a cat with just an ears visible just the leg and in the philosophical level as we’ll talk about the motivation for our competition here here’s a cat dressed as a monkey eating a banana on a philosophical level most of us understand what’s going on it the scene in fact a NN it’s to today successfully classify this image this video a cat but the context the humor of the situation and in fact you could argue it’s a monkey is missing and what else is missing is the dynamic information the temporal dynamics of the scene that’s what’s missing in a lot of the perception work that has been done to date in the autonomous vehicle space in terms of visible light cameras and we’re looking to expand on that that’s what psyche fuse is  all about image classification pipeline there’s a bin with different categories inside each class cat dog mug hat those bins there’s a lot of examples of each and your task with when a new example comes along you never seen before to put that image in a bin it’s the same as the ML tasks before and everything relies on the data that’s been ground truth that been labeled by human beings amnesty is a toy data set of handwritten digits often used as examples and Koko safar imagenet places and a lot of other incredible datasets rich data sets of a hundred thousand millions of images out there represent scenes people’s faces and different objects those are all ground truth data for testing algorithms and for competing architectures to be evaluated against each other see far ten one of the simplest almost toy datasets of tiny icons with ten categories of airplane automobile bird cat deer dog for our course ship and truck is commonly used to explore some of the basic convolution NNs we’ll discuss so let’s come up with a very trivial classifier to explain the concept of how we could go about it in fact this is maybe if you start to think about how to classify an image if you don’t know any of these techniques this is perhaps the approach you would take is you would subtract images so in order to know that an image of a cat is different that image of a dog if to compare them when given those two images what’s the way you compere them one way you could do it is you just subtract it and then sum all the pixel wise differences in the image just subtract the intensity of the image pixel by pixel sum it up if that intent if that differences is really high that means the images are very different using that metric we can look at C for 10 and use it as a classifier saying based on this different function I’m going to find one of the 10 bins for a new image that has the lowest difference find an image in this dataset that is most like the image I have and put it in the same bin as that images in so there’s 10 classes if we just flip a coin the accuracy of our classifier will be 10% using our image different classifier we can actually do pretty good much better that random much better than 10% we can do 35 38 % accuracy. That’s a classifier we have our first classifier K-nearest neighbors.