Code Optimisation #1791
Replies: 3 comments 5 replies
-
Yes, the library has grown somewhat over the years and the broadening of support and evolutionary growth from the original ESP8266 starting point has added complexity. I assume you mean the ESP32 S3 is slightly slower for SPI displays. This is due to the need for extra commands to initiate each SPI transfer. The ESP32 parallel code was intended for UNO style displays where the GPIO sequence for the pins was fairly random, this means a lookup table has to be used and this can slow things down in some circumstances. It would be better to use the approach in the STM32 library where a single register write can be used without a lookup. However the ESP32 has a GPIO output I/O clock of 20MHz, this means a pin can be toggled at a maximum of 10MHz, limiting the parallel write speed. Using I2S can fix this but forces the use of DMA and a significant config overhead. The "best" processor for I/O is the RP2040 using PIO. In this case the library can clear a screen in 16 bit parallel mode very fast at the limit of the SSD1963 parallel rate (12ms to clear all 800x480 pixels!) AND the processor can do other things while this is happening as the clear is done by the PIO! I was not intending to add S3 support yet but found the UM ProS3 which is a perfect fit for a displayless Node-RED project. The S3 does seem rather expensive at the moment so would not have otherwise been my processor of choice. |
Beta Was this translation helpful? Give feedback.
-
I have done some tests using the ILI9341 at 40MHz as a benchmark to check the performance difference between ESP32 and ESP32 S3:
So the S3 is 2.7% slower for a screen clear and 9.4% slower for the entire grpahics test. As expected low pixel count block writes (e.g. for the "Lines" test) show the biggest performance impact due to the higher extra instruction count. Overall though for general screen updates in real projects a user probably won't notice a difference. It may be these extra SPI instructions for the S3 are only needed in particular circumstances but I have just been pleased to get it working and not looked into that! |
Beta Was this translation helpful? Give feedback.
-
Anyway the good news is that your library could work without any change on ESP32-S3. Regardig the parallel mod of ESP32 maybe you could consider to rewrite the folowing functions like this: #define tft_Write_8(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t)(C)); WR_H become : #define tft_Write_8(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = (set_mask1((uint8_t)(C)) | (1 << TFT_WR)) like this instead of 3 cycle for each command or pixel it takes only 2. that's allready a very good result ;) |
Beta Was this translation helpful? Give feedback.
-
Hi Bodmer,
I have worked on your library these last days. My goal was to first implement 16 Bit parralelle mode for ESP32 and second to make it compatible with ESP32_S3.
I succeded both. So you will find bellow my library.
SSD1963.zip
I have to confess that your code is so complicated for me that i had to rebuilt the library with only ESP32 and SSD1963 screen. So please consider my file as your work, not mine.
the good news first; the last version of TFT_eSPI library perfectly works on ESP_S3, but get a little slower than on ESP32, and i did not succeded to found why. I have tried to use GPIO Bundle to push the data to the screen, the library code get simple, but the library get very slow (about 10 times slower), and the max GPIO pin in a Bundle is 8 pin....
Maybe it could work with I2S (witch enable the use of DMA), but it's getting to complicated for me/
During my work i found some optimizations for your code and the result is amazing.
I have done my test with your TFT_graphictest_one_lib sketch and here's the result:
Test Hardware Settings : ESP32 Devkit C V4 + SSD1963 5 inch sreen (800x480)
Arduino IDE whith ESP32 2.0.3-RC1 framework.
Original TFT_eSPI library 8 Bit Mode: (V 2.4.50).
Benchmark Time (microseconds)
Screen fill 755002
Text 7853
Lines 226140
Horiz/Vert Lines 70419
Rectangles (outline) 35447
Rectangles (filled) 2822791
Circles (filled) 195771
Circles (outline) 75947
Triangles (outline) 46222
Triangles (filled) 888077
Rounded rects (outline) 47667
Rounded rects (filled) 2812643
Done!
total = 7 983 979 (8.0 Seconds)
SSD1963 library 8 Bit Mode:
Benchmark Time (microseconds)
Screen fill 580776
Text 6073
Lines 171310
Horiz/Vert Lines 46966
Rectangles (outline) 23729
Rectangles (filled) 1881955
Circles (filled) 136820
Circles (outline) 60489
Triangles (outline) 35036
Triangles (filled) 596203
Rounded rects (outline) 33930
Rounded rects (filled) 1883505
Done!
total = 5 456 792 (5.5 Seconds)
difference : - 2 527 187 --> 2.53 seconds less (time reduced by 31.7 %)
SSD1963 library 16 Bit Mode:
Benchmark Time (microseconds)
Screen fill 362989
Text 5012
Lines 143674
Horiz/Vert Lines 17909
Rectangles (outline) 9250
Rectangles (filled) 705864
Circles (filled) 75069
Circles (outline) 55531
Triangles (outline) 27900
Triangles (filled) 238341
Rounded rects (outline) 20593
Rounded rects (filled) 725155
Done!
total = 2 387 287 (2.4 Seconds)
difference : - 5 596 692 --> 5.6 seconde less (time reduced by 70.1 %)
Test Hardware Settings : ESP32_S3 Devkit C V1 + SSD1963 5 inch sreen (800x480)
Arduino IDE whith ESP32 2.0.3-RC1 framework.
Original TFT_eSPI library 8 Bit Mode: (V 2.4.50).
Benchmark Time (microseconds)
Screen fill 943211
Text 8554
Lines 252627
Horiz/Vert Lines 87774
Rectangles (outline) 44040
Rectangles (filled) 3526440
Circles (filled) 234819
Circles (outline) 83840
Triangles (outline) 51362
Triangles (filled) 1103609
Rounded rects (outline) 56326
Rounded rects (filled) 3511202
Done!
total = 9 903 804 (9.9 Seconds)
SSD1963 library 8 Bit Mode:
Benchmark Time (microseconds)
Screen fill 725546
Text 6305
Lines 182304
Horiz/Vert Lines 58564
Rectangles (outline) 29429
Rectangles (filled) 2350990
Circles (filled) 160624
Circles (outline) 64370
Triangles (outline) 37027
Triangles (filled) 738613
Rounded rects (outline) 39063
Rounded rects (filled) 2350300
Done!
total = 6 704 072 (6.7 Seconds)
difference : - 3 199 732 --> 3.20 seconds less (time reduced by 32.3 %)
SSD1963 library 16 Bit Mode:
TFT_eSPI library test!
Benchmark Time (microseconds)
Screen fill 435332
Text 4912
Lines 148100
Horiz/Vert Lines 19859
Rectangles (outline) 10233
Rectangles (filled) 783855
Circles (filled) 79932
Circles (outline) 57850
Triangles (outline) 28471
Triangles (filled) 262360
Rounded rects (outline) 21626
Rounded rects (filled) 806959
Done!
total = 2 659 489 (2.4 Seconds)
difference : - 7 244 315 --> 7.24 seconds less (time reduced by 73.1 %)
as you can see it is possible to make it realy faster.
Hope this could help you to make your powerfull library even more powerfull.
regards
Beta Was this translation helpful? Give feedback.
All reactions