23/08/2020
- Created Github Repository
- Shared the link
- Installing some prerequisites and using anaconda instead of plain python.
- Figuring out how to generate data
- Tried the first two links -> Synth text and text renderer. Feel that text renderer is more suitable than the other one. Going to try the third link as well.
- Trying to generate data using https://github.com/Belval/TextRecognitionDataGenerator
- Third one has already a lot of built in options commpared to the text renderer and has the option of using all threads of cpu for the image generation process making it a lot faster. Therefore going to use this generator.
- Trying to produce variations in the generated data and figuring out what to do with the boxes around characters
- Trying to figure out how to organize the labels for the texts that are generated
- Still looking for boxing the characters in the data and trying to create tfrecords of a small sample
24/08/2020
- Resolving version conflicts for aocr model and text generator to work side by side. Uninstalled tensorflow 2.1 and installed tensorflow 1.15. Rest remain unchanged.
- Adding more fonts, backgrounds etc to the data generator and changing the default height and width options of the image.
- Generating training data while maintaining an annotations file containing the location and label of each image.
- Generated training data(60K) and train.tfrecords file. Time taken for creating train.tfrecords = 5 mins.
- Generated testing data(10K) and test.tfrecords file. Time taken for creating test.tfrecords = 1 min.
- Training the model with default hyperparameters i.e epochs 1000, initial learning rate is 1.0 but since learning rate is adaptive so it doesn't matter much, batch size = 65, using full ascii mode as data contains lowercase as well aas somtimes spaces are also there.
25/08/2020
- So 1000 epochs set by default seems impossible as it is taking around 50 mins per epoch even when using gpu so stopping the training and restarting it with less number of epochs.
- Updating the github repository.
- Trained the model in around 6 hours and 30 mins(50 epochs). Final loss 0.142 and perplexity 1.04. Total steps 42713. Looks as if the model did not improve much after 30 epochs(around 26000 steps) or so as the loss was varying from 0.05 - 0.15 throughout.
- Testing the model on custom test set generated randomly like the training set
- Testing time = 15 min, Test Sample Size = 10k, Test Accuracy = 69.88%
- Testing the model on given sample of 30 images -->> Accuracy = 31.70%
- Thinking of modifying the dataset as earlier dataset has max length of string around 30 chars, the images used include colored images and also both upper case and lower case
- Training the model again with 37 epochs instead of previous 50, reduced batch size of 60, increased steps per epoch of 1000 and most importantly the max prediction length now will be of length 15 which was 32 earlier
26/08/2020
- The model is still training. Increased number of epochs by 10 as the loss was still decreasing (although slowly) when the training stopped at 37 epochs so increased it further to 47. This may be due to the modification in the dataset as the previous set had much more vaariation than required. Also earlier each step was around 0.5 s and now it is around 0.8 s due to the fact that now the max image width has been set to 650 pixels compared to previous width of 300 pixels.
- Finished training in 12 hrs and 40 min. Final loss is around 0.06 and perplexity is 1.06. The 9 more epochs helped reduce the loss a little bit.
- The loss plots show that the loss is initially decrasing rapidly and then decrases gradually. After 40 epochs the decrease is very less and it becomes almost stagnant after around 47 epochs i.e only a small change of 0.002 or 0.003 takes place.
- Testing the models on self generated test set and the sample dataset provided today of around 1786 pics:
model | self generated test set | sample dataset provided |
---|---|---|
model trained for 37 epochs: | 74.45% | 48.30% |
model trained for 46 epochs: | 76.80% | 56.54% |
- Feel like the data generated by me has much more distortion than required comparing it to the provided public dataset.
- Looking for changes in the model architecture
27/08/2020
- Looking for different pretrained model options available and for what kind of task they are suitable for.
- Trying to use ResNet50 as the base (feature extractor) and then passing the output to the attention layer.
- Havng problem in the dimensions being passed and what are required as input to the attention layer.
- Trying a different model DenseNet121 just to check what happens and still having the same problem of dimensions not matching.
- After long search and few experiments now taking the output from one of the intermediate layers instead of the final layer of the pretrained model without the classification head.
- Still having dimension mismatch somewhere in the attention layer.
- Looking for implementing own cnn similar to ResNet50 by passing the output of previous layers in the next layers.
28/08/2020
- By adding just 4 or 5 more layers (apart from previous 8) the GPU is running out memory. Figuring out the solution.
- The GPU keeps running out of memory when nummber of layers is more than like 12 and can't use CPU for this because the training then will take a year to finish.
- Tried to make a cnn similar to ResNet50. First few layers follow this architecture where the output of a layer is added to the result of a layer more deeper than it i.e output of layer x is added to output of layer n + x. And on top of this are few more convolution and pooling layers without this residue learning part. In the end there is dropout layer.
- Currently training the modified model with 25 epochs, learning rate(adaptive) = 0.05(initial), batch size is 60, total number of images in the training set is 60k, and the max label length allowed is 15.
- Checking out how to deploy a model using flask while the model is training
- Compiling the results of both the models and prepearing to test both of them. Analyzing the loss and accuracy plots for both the models.
- Thinking of training both models a little bit more on a mix of the provided data and custom more variation data for the model to be more adaptive and robust.
29/08/2020
- Training the models some more on the public test dataset provided. Because it seems that the models have memorized and that is not good. Especially the modified model so retraining may be required.
- Preparing the assignment report and side by side learning about how to make the web app and deploy it
- Studying about Flask and Heroku
- Looking for how the predict function of the model works because this is required for the web app deployment
- Preparing the documentation
30/08/2020
- Training the original model on mix of own and public dataset.
- Preparing presentation report
- Testing both the models again on custom and public dataset and plotting their losses and accuracies.
- Writing the flask script side by side.
- Exporting the models and writing flask script to deploy the model.
- Uploading the datasets and updating the github repository.
- Created Flask App
31/08/2020
- Compiling things and updating the github repository
- Creating my_run.sh and editing the prediction part in the main file to communicate properly with my_run.sh
- Tried depolying using Heroku but got memory limit exceeded and therefore switched to ngrok.
- Completed the assignment report.