Training a Computer to Recognize Your Handwriting

1537

收藏 2016-07-03

he remarkable system of neurons is the inspiration behind a widely used machine learning technique called Artificial Neural Networks (ANN), used for image recognition. Learn how you can use this to recognize handwriting.

By Kenneth Soo, Stanford.

Take a look at the picture below and try to identify what it is:

One should be able to tell that it is a giraffe, despite it being strangely fat. We recognize images and objects instantly, even if these images are presented in a form that is very different from what we have seen before. We do this with the 80 billion or more neurons in the human brain working together to transmit information. The remarkable system of neurons is also the inspiration behind a widely used machine learning technique called Artificial Neural Networks (ANN), commonly used for image recognition. In some cases, computers using this technique have even out-performed humans in recognizing images.

The Problem

Image recognition is important for many of the advanced technologies we use today. It is used in visual surveillance, guiding autonomous vehicles and even identifying the presence of diseases from X-ray images. Most modern smartphones also come with pre-installed image recognition programs that recognizes handwriting and convert them into typed words.

In this chapter we will look at how we can train an ANN algorithm to recognize images of handwritten digits. We will be using the images from the famous MNIST (Mixed National Institute of Standards and Technology) database.

Some of the handwritten digits in the MNIST database

An Illustration

We first train our ANN model (further explained later in the chapter) by giving it examples of 10,000 handwritten digits, as well as the correct answer. After the ANN model is trained, we can test how well the model performs by giving it 1,000 new handwritten digits without the correct answer. The model is then required to recognize the actual digit.

At the start, the ANN translates handwritten images into a language it understands. Black pixels are given the value “0” and white pixels the value “1”. Each pixel in an image is called a variable.

Out of the 1,000 pictures that the model was asked to recognize, it correctly identified 922 of them, which is equivalent to a 92.2% accuracy. We can use a contingency table to view the results, as shown below.

Contingency Table showing the performance of the ANN model. For example, the first row tells us that out of 85 actual digit “0”s given to the model, 84 were correctly identified and 1 was wrongly identified as “6”. The last column indicates prediction accuracy.

From the table, we can see that when given a handwritten image of either “0” or “1”, the model almost always identifies it correctly. On the other hand, the digit “5” is the trickiest to identify. An advantage of using a contingency table is that it tells us the frequency of mis-identification. When given an image of the digit “2”, it misidentifies it as “7” or “8” in about 8% of the time. Let’s take an in-depth look at some of these misidentified digits.

While the images look obviously like a digit “2” to human eyes, the ANN is sometimes unable to recognize shapes and features of images, like the tail of the digit “2” (Read Limitations). Another interesting observation is how the digits “3” and “5” are mixed up by the model with significant frequency (about 10%).

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

oliyiyi

2016-7-3 07:59:10

The Neurons that Inspired the Network

We first take a look at how neurons in our brains work. Our brain has a large network of interlinked neurons, which act as a highway for information to be transmitted from point A to point B. To send different kinds of information from A to B, the brain activates a different sets of neurons, and so essentially uses a different route to get from A to B. This is how a typical neuron might look like.

An illustration of a brain neuron, and labels of its main components.

At each neuron, its dendrites receive incoming signals sent by other neurons. If the neuron receives a high enough level of signals within a certain period of time, the neuron sends an electrical pulse into the terminals. These outgoing signals are then received by other neurons.

Technical Explanation I: How the Model works

A simple Artificial Neural Network map, showing two scenarios with two different inputs but with the same output. Activated neurons along the path are shown in red.

Similarly, in the ANN model, we have an input node, which is the image we give the program, and an output node, which is the digit that the program recognized. The main characteristics of an ANN is as such:

Step 1. When the input node is given an image, it activates a unique set of neurons in the first layer, starting a chain reaction that would pave a unique path to the output node. In Scenario 1, neurons A, B, and D are activated in layer 1.

Step 2. The activated neurons send signals to every connected neuron in the next layer. This directly affects which neurons are activated in the next layer. In Scenario 1, neuron A sends a signal to E and G, neuron B sends a signal to E, and neuron D sends a signal to F and G.

Step 3. In the next layer, each neuron is governed by a rule on what combinations of received signals would activate the neuron (further explained later). In Scenario 1, neuron E is activated by the signals from A and B. However, for neuron F and G, their neurons’ rules tell them that they have not received the right signals to be activated, and hence they remains grey.

Step 4. Steps 2-3 are repeated for all the remaining layers (it is possible for the model to have more than 2 layers), until we are left with the output node.

Step 5. The output node deduces the correct digit based on signals received from neurons in the layer directly preceding it (layer 2). Each combination of activated neurons in layer 2 leads to one solution, though each solution can be represented by different combinations of activated neurons. In Scenarios 1 & 2, two images given to the input. Because the images are different, the network activates a different set of neurons to get from the input to the output. However, the output is still able to recognise that both images are “6”.

Technical Explanation II: Training the Model

We need to first decide the number of layers and number of neurons in each layer for our ANN model. While there is no limit, a good start is to use 3 layers, with the number of neurons being proportional to the number of variables. For the digit recognizer ANN, we used 3 layers with 500 neurons each. The two main ingredients involved in training a model are: a metric to evaluate the accuracy of the model, as well as the rules of the neurons that tells them whether they are activated or not.

A common metric used is the sum of the squared errors (SSE). Roughly speaking, a squared error denotes how close a predicted digit is to the actual digit. The ANN model will try to minimise the SSE by changing the rules of the neurons, and the change is determined by a mathematical concept known as differentiation.

Each neuron’s rule has two components – the weight (i.e. strength) of incoming signals [w], and the minimum received signal strength for activation [m]. In the following example, we illustrate the rules for neuron G. Zero weight is given to the signals from A and B (i.e. no connection), and weights of 1, 2, and -1 are given to the signals from C, D, and E respectively. The m-value for G is 2, so G is activated if:

D is activated and E is not activated, or if,
C and D are activated.

An example of a neuron(G)’s rule. The braces below G indicates the received signal strength.

Limitations

Computationally Expensive. The amount of CPU power and time taken to train an ANN model is significantly higher compared to other types of models that can be used for a similar purpose (e.g. Random Forests), yet the results are not better. Although ANN has been known for a long time, it was previously not widely used and has gained a resurgence only because of the advances in hardware that made its computing feasible. However, ANN is the basis for more advanced models, like Deep Neural Network (DNN), which was used by Google in Oct 2015 and Mar 2016 to defeat human champions in the game of Go, widely viewed as an unsolved “grand challenge” for Artificial Intelligence.

Lack of feature recognition. The ANN is unable to recognize features of the image if they are of a different shape or location. For example, if we want our ANN to recognize images of cats, and suppose we give it examples in which the cat always appear in the bottom of the image, then the ANN will not recognize the same cats if they appear at the top, or the same cats of larger sizes. An advanced version of ANN called Convolutional Neural Networks (CNN) solve this problem by looking at various regions of the image. In fact, CNNs are also more efficient, and they are widely used in image and video recognition. For more information, check out a previous blog post onIntroduction to Convolutional Neural Network.

Additional Notes for Advanced Readers

[This section is intended for readers who have mathematics or computer science background and wish to implement their own ANN.]

The neuron’s rule described in the technical explanation is actually a mathematical function called “activation function”. It gives zero output when the input is low, and gives positive output when the input is high enough. Some commonly used activation functions are the sigmoid function and the rectifier function. The output node is also a function, and usually the softmax function is used (a generalization of the logistic function). As such, the ANN can be viewed as “a grand function of functions”. This is also why we use differentiation to find the correct weights through gradient descent. Lastly, note that it is essential to normalize the input data during implementation.

Try out ANN using this C++ code by Ben Graham:https://github.com/btgraham/Batchwise-Dropout

For more posts like this, visit annalyzin.wordpress.com.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群