- Convert image to grayscale and increase contrast to make bacteria colonies more visible.
- Remove the timestamp visible on the petri dish so that the model doesn't learn the noise. (It turned out that it is hard to automate that for different lighting setups)
- Removing timestamp might be unnecessary as neural network might just recognise it as a noise on its own.
- Augment data with added noise: horizontal/vertical flipping, blur, to produce more data samples.
- Agar plates without colonies make up roughly 10% of the dataset which creates unbalanced dataset for binary classification.
- Plot recall to observe threshold during training. Arbitrarily setting it to 0.5 might not be the best idea.
- Use class balancer in PyTorch to get the same number of data samples for both classes.
- Not starting with a small subset of data.
- Not getting the baseline model as fast as possible.
- Underestimating the influence of the size of each data sample on training.
- Not rescaling pictures to smaller size early on.
- Not saving the rescaled pictures locally.
- Not looking at the distribution of data before the training.
- Not setting the threshold for binary classification based on the recall plot.
- Finding the research paper that doesn't focus on the architecture.
- Not using Papers with Code.
- Focusing too much on the data preprocessing.