Practical insights for a data-driven approach to model optimization
In Part 1, we discussed the importance of collecting good image data and assigning proper labels for your image classification project to be successful. Also, we talked about classes and sub-classes of your data. These may seem pretty straight forward concepts, but it’s important to have a solid understanding going forward. So, if you haven’t, please check it out.
Now we will discuss how to build the various data sets and the techniques that have worked well for my application. Then in the next part, we will dive into the evaluation of your models, beyond simple accuracy.
I will again use the example zoo animals image classification app.
Data Sets
As machine learning engineers, we are all familiar with the train-validation-test sets, but when we include the concept of sub-classes discussed in Part 1, and incorporate to concepts discussed below to set a minimum and maximum image count per class, as well as staged and synthetic data to the mix, the process gets a bit more complicated. I had to create a custom script to handle these options.