// The contents of this file are in the public domain. See LICENSE_FOR_EXAMPLE_PROGRAMS.txt /* This is an example illustrating the use of the deep learning tools from the dlib C++ Library. In it, we will show how to use the loss_metric layer to do metric learning on images. The main reason you might want to use this kind of algorithm is because you would like to use a k-nearest neighbor classifier or similar algorithm, but you don't know a good way to calculate the distance between two things. A popular example would be face recognition. There are a whole lot of papers that train some kind of deep metric learning algorithm that embeds face images in some vector space where images of the same person are close to each other and images of different people are far apart. Then in that vector space it's very easy to do face recognition with some kind of k-nearest neighbor classifier. In this example we will use a version of the ResNet network from the dnn_imagenet_ex.cpp example to learn to map images into some vector space where pictures of the same person are close and pictures of different people are far apart. You might want to read the simpler introduction to the deep metric learning API, dnn_metric_learning_ex.cpp, before reading this example. You should also have read the examples that introduce the dlib DNN API before continuing. These are dnn_introduction_ex.cpp and dnn_introduction2_ex.cpp. */ #include #include #include using namespace dlib; using namespace std; // ---------------------------------------------------------------------------------------- // We will need to create some functions for loading data. This program will // expect to be given a directory structured as follows: // top_level_directory/ // person1/ // image1.jpg // image2.jpg // image3.jpg // person2/ // image4.jpg // image5.jpg // image6.jpg // person3/ // image7.jpg // image8.jpg // image9.jpg // // The specific folder and image names don't matter, nor does the number of folders or // images. What does matter is that there is a top level folder, which contains // subfolders, and each subfolder contains images of a single person. // This function spiders the top level directory and obtains a list of all the // image files. std::vector> load_objects_list ( const string& dir ) { std::vector> objects; for (auto subdir : directory(dir).get_dirs()) { std::vector imgs; for (auto img : subdir.get_files()) imgs.push_back(img); if (imgs.size() != 0) objects.push_back(imgs); } return objects; } // This function takes the output of load_objects_list() as input and randomly // selects images for training. It should also be pointed out that it's really // important that each mini-batch contain multiple images of each person. This // is because the metric learning algorithm needs to consider pairs of images // that should be close (i.e. images of the same person) as well as pairs of // images that should be far apart (i.e. images of different people) during each // training step. void load_mini_batch ( const size_t num_people, // how many different people to include const size_t samples_per_id, // how many images per person to select. dlib::rand& rnd, const std::vector>& objs, std::vector>& images, std::vector& labels ) { images.clear(); labels.clear(); DLIB_CASSERT(num_people <= objs.size(), "The dataset doesn't have that many people in it."); std::vector already_selected(objs.size(), false); matrix image; for (size_t i = 0; i < num_people; ++i) { size_t id = rnd.get_random_32bit_number()%objs.size(); // don't pick a person we already added to the mini-batch while(already_selected[id]) id = rnd.get_random_32bit_number()%objs.size(); already_selected[id] = true; for (size_t j = 0; j < samples_per_id; ++j) { const auto& obj = objs[id][rnd.get_random_32bit_number()%objs[id].size()]; load_image(image, obj); images.push_back(std::move(image)); labels.push_back(id); } } // You might want to do some data augmentation at this point. Here we do some simple // color augmentation. for (auto&& crop : images) { disturb_colors(crop,rnd); // Jitter most crops if (rnd.get_random_double() > 0.1) crop = jitter_image(crop,rnd); } // All the images going into a mini-batch have to be the same size. And really, all // the images in your entire training dataset should be the same size for what we are // doing to make the most sense. DLIB_CASSERT(images.size() > 0); for (auto&& img : images) { DLIB_CASSERT(img.nr() == images[0].nr() && img.nc() == images[0].nc(), "All the images in a single mini-batch must be the same size."); } } // ---------------------------------------------------------------------------------------- // The next page of code defines a ResNet network. It's basically copied // and pasted from the dnn_imagenet_ex.cpp example, except we replaced the loss // layer with loss_metric and make the network somewhat smaller. template