On Kaggle there’s a competition from Google currently on going where you’re tasked with classifying landmarks using a deep network. Here’s my process flow of what I’ve learnt trying to solve this problem.
I’ve setup my server on Google Cloud with 1 TB space because there’s over a million photos to be downloaded just for training. It’s also got 8 cores and 15 GB Ram. You’ll need to download the “train, test & sample-submission” and unzip them to get the csv files.
To download the images you can use this file, download it to the same folder with the train.csv and also create a folder to save the downloaded images like “training_images”. The command to download is python download-data.py train.csv training_images and when it starts you’ll get some results on the console of files not found or failed to convert. Only a few of the train/test link are links to images that have been removed. Once you’re done with this you can do the same for test.csv with the test folder.
I’m going to use Keras and tensorflow to train the network and here’s the link to the full code. There’s 14951 landmarks to be classified and in the train.csv you’ll see there’s three rows: id, landmark_id, url and the way the downloader creates the files is using the “id”.jpg so from this in order to group the images according to the landmark id I’ll use pandas as follows.
Now you should have the files distributed to validation and training. To train the model here’s the code
I’ve used EarlyStopping to avoid over fitting, here though with so many classes to classify, almost 20,000 the model will have a hard time converging and a bigger model would be better than a small one because in back propagation it’s going to adjust it’s weights but with a small network it’s never going to generalize well. Good thing with Keras you can resize the images to a smaller size to speed up the process and also making turning all the images to grayscale. In turn also the test folder generator should also turn the image to grayscale before trying to classify it.
And that’s how you do it.
But wait there’s more…
You could also use a pre-trained model, clip it’s head and only train the classification. Here’s one attempt on the vgg network.
Problem: the log error starts at 15.x on an untrained model it starts usually at 9 or 8 what you need to know about log error is instead of just penalizing on whether the model is right or wrong it also penalizes on it’s confidence. The confidence score goes from 0 -1 so a confidence of 0.8 but it’s right is still penalized for the 0.2 lack of confidence in a correct score, the same goes for an incorrect classification. Plus it takes around one and a half hours to complete one epoch with 50 epochs in total. Ain’t nobody got time for that.
If you trying to train without a GPU I feel bad for you son, I got 99 problems but a chip ain’t one. What you should expect is a log score of as near 1 as possible < 1 is better but hard to attain. There is a way though if you have the time and patience, like in Jurassic World “Life finds a way” in Data Science “Data finds a way” you just need a better model and the model is object detection that can identify also the bounding area of the object.
If instead of dumping raw pixels of landmarks “and others” and expect it will soon know to only care about the landmarks, training it to know the association between a landmark and it’s id relative to many different images from different angles, occlusion, size, lighting… it’s going to be a more robust model. And I know a guy and he know’s “the way” follow the link.