본문 바로가기
Programming/Algorithm

How to computer recongizes the image

by OKOK 2017. 6. 11.

1.       Summary


I would like to introduce facial recognition technology, which is my research area currently. To understand the face recognition field, we need to a basic knowledge of linear algebra, machine learning, and computer vision. This report focuses on how computers perceive images. Also, I will explain the BoF method that are typically used in many ways. This report is intended for beginners and those without prior knowledge. If you are interested in more advanced content of mathematical algorithms, please refer to the reference materials and major books. Also, if you leave questions, I will reply as much as I can.



 

2.       Table of contents

a.       Introduction

i. How to computer recognizes the image

 

b.       Main part

i. What are image features

ii. Concept of Bag-of-Features

iii. Image Recognition Algorithm using BoF

iv. Disadvantages of BoF

 

c.       Conclusion

i. Comparison of BoF and Deep Learning

 

d. Reference

 



3.       Introduction


3.1 How to computer recognizes the image

Image recognition is the problem of determining the object that is imaged in the input image. Image recognition has long been a problem in image processing. Let’s think about why it is difficult to recognize images on a computer. How does the computer see the image? An image to a computer is a two-dimensional array of RGM values. The images the computer can see are just a set of numbers. When a person views an image of an object, he or she can recognize the object in a comprehensive manner and understand what the object represents. But it is not easy for a computer. Let’s take a closer look at why it’s hard to recognize the image. In order to recognize an image, a judgment criterion is required to determine the type of object being imaged. It is difficult to establish the criteria. For example, consider the problem of identifying strawberries, apples, oranges in a photograph. Shall we differentiate by color? If you use color as a criterion, you can distinguish between strawberry and tangerine. How about distinguishing by shape? Strawberries and apples can be distinguished by shape. But apple and tangerine are all round, so it is not easy to distinguish. If you use both color and shape, you will distinguish three given objects. But what if new tomatoes were added? Creating such an object-recognizing device is not simple.

Image recognition is a prime example of unordered processing. Machine learning is effective for computers to deal with jobs that are uncertain. Therefore, various image recognition techniques using instructional learning have been proposed. I will briefly explain the concept of image recognition as an instructional learning machine learning. First of all, we prepare training data that pairs a large number of images and object4 names stamped on images. Create training models by putting training data into a learning machine. In the previous example, we prepare a large number of images of strawberries, apples, and mandarins. And then put the prepared data into the learning machine so that the image and object name are paired and create the identifier as the learning model. This identifier tells you which objects are stamped on the new image among strawberry, apple, or mandarin. The accuracy of image recognition is determined by the algorithm in the learner that produces the identifier. Therefore, various algorithms have been proposed for the purpose of improving accuracy. I will introduce the BoF method which is well known as the technique before image recognition using deep learning.

 



4.       Main body


4.1 What are image feature?

We have already looked at the structure of image recognition and why it is difficult. In this section, I introduce Bag-of-Features, which was a typical image recognition technique used before deep-Learning. BoF uses image features for image recognition. So, let’s look first at the image features and image feature points. An image feature point is a point where the RGB value is significantly different between the pixels connected side by side. This is often referred to as the image feature point because it often appears in the image. Image feature quantity is a value that shows image feature point as vector value. You can compare image feature points using image feature quantities. In other words, you can compare whether image feature points resemble or not between multiple images.

 

4.2 Concept of BoF

Next, let’s look at the concept of BoF. BoF begins with the idea that there are many similar parts between images of the same type of object. For example, let’s imagine we have two bikes. The two are different shapes, but there are several similar parts. Handles, saddles, and wheels look similar to each other. The basic idea of BoF is that the same kind of object will have many similar parts. Based on this concept, let’s look at BoF to see how to distinguish between bicycle and car. BoF compares parts and parts of two objects. And it separates images into parts from bicycles and cars. This is called batch. If there are many similar batch in the BoF, they are defined as the same kind. Let’s pay attention to the name BoF. We use the word features in the sense that a set of separations from an image is a set of features representing an object. In each object, the word bag is used because it is similar to putting a separate batch in a bag. In other words, BoF means to separate and collect the characteristics of an object.

 

4.3 Image Recognition Algorithm using BoF

From now on, let’s look at the image recognition algorithm by BoF. Image recognition by BoF is an instructional learning algorithm. In other words, prepare training data to accumulate image and image types in the image, and put the training data into a learning machine to create an image identifier, which is a learning model. The identifies the type of object being imaged in the image as it is entered. In image recognition using BoF, the image is represented as a certain type of vector. That is, a vector that indicates which batch the image has. This vector, which represents the shape of an object, is called a feature vector. Determines what type of object the image represents based on the feature vector.

First, let’s look at learning processing, which is a way to create an image identifier (learning model) with BoF. Prepare a large amount of training data that pairs images and image types. Next, the image feature points of each training data image are extracted, and the feature quantities of the feature points are calculated and stored. Image feature point extraction corresponds to the batch extraction described earlier. An image feature point is a point that represents a characteristic part of an image. In other words, extracting image feature points is like extracting characteristic parts of the image. By extracting the feature points from each image in this way, we can obtain a large number of image features that are paired with the type of object that has been imaged. If there are many image features, it will be difficult to find similar image features and it will take a lot of time. Therefore, it is possible to shorten the calculation time by collecting and processing similar image feature quantities.

Next, let's look at the process of integrating image features. There are many similar features in a large number of image feature points. Therefore, image feature quantities with similar values ​​are regarded as similar feature points, and such image sets are grouped through a process called clustering. Clustering is the process of grouping large amounts of data into large chunks. K-means is well-known as a clustering technique. This allows you to group similar image feature points into the same group. Each group created by the clustering process contains similar image feature points. Set one feature point representing each group. This characteristic point is called representative point. Then, the remaining feature points belonging to each group are replaced with representative points. Representative points are representative features in image recognition. Although the set of image feature quantities themselves is too much to deal with, it is possible to deal with realistic time by defining representative points with clustering and replacing other feature points with representative points.

Next, define a feature vector with representative points for each image. The dimension count of this feature vector is equal to the number of groups created by clustering. Let's look at the image-specific feature vectors. Feature points extracted from an image belong to one group. Here we count the number of dimensions representing the groups belonging to each feature point. After all counts have been added, a feature vector is created by normalizing the vector (making the length value equal to 1). This is done on all images to define the representation vector of each image. A machine learning method based on map learning is used to create a learning machine using expression vectors when executing learning processing.

We did this by creating an identifier. Now let's look at how to determine what kind of object an unknown imager is. First extracts feature points from the unknown image that is input. Then, the feature quantity of each image feature point is compared with the feature quantity of the representative point, and the feature vector of the unknown image is created by replacing the image feature point at the nearest representative point. The characteristic vector is input to the discriminator to determine the type of object. So, you can recognize the image.


       4.4 Disadvantages of BoF

       The first is that the image feature points are affected if the density is uneven. The feature point is a point where the difference between RGB values ​​is large. There are a number of images that are partially uneven in texture. Extracting feature points from these images can extract a lot of textures from stronger parts. BoF performs image recognition by considering features extracted from feature points. Conversely, parts of the image that do not extract feature points do not affect image recognition. This means that the density of the image feature points is low, or that the feature points do not appear well, which causes the problem of not reflecting the characteristics of the whole image.

The second is that the positional relationship is ignored in the data structure being handled. BoF converts image feature quantities into feature vectors and processes them. At this time, the positional relation of image feature points is not taken into consideration. In other words, if the image feature points are the same, they are judged to be the same type of object, even though the positional relationship is completely different.

The third is that it is affected by changes in the viewing angle of the image. Most of the image feature quantity will change the image feature quantity if the viewing angle changes. Since BoF is a technique using feature quantities, problems with feature quantities directly affect BoF. In other words, if the angle of viewing the image changes, the image feature changes, and the feature vector also changes. 




5.       Conclusions


All three drawbacks come from image feature points or image feature quantities. In other words, the nature of image feature points or feature quantities affects the accuracy of image recognition. It is obvious that the disadvantage of BoF, which started from the concept that the same kind of object has many similar parts. On the other hand, what about deep learning? In deep learning, we set the neural network used by humans and do not specify the feature extraction method. In other words, image recognition by deep learning can be regarded as a method of automatically determining the feature extraction method that has been performed so far by training data. We are currently working with our team members and are working on a project to apply it to face recognition technology.