| paper | English datasets |Chinese datasets| pretrained model: Google driver or Baidu Netdisk (passwd:7npu) |
- This work was tested with PyTorch 1.8.0, CUDA 10.1, python 3.6.13 and Ubuntu 18.04.
- requirements : lmdb, Pillow, torchvision, nltk, natsort, timm, mmcv
pip install lmdb pillow torchvision nltk natsort timm mmcv
- Synthetic image datasets: MJSynth (MJ) and SynthText (ST) and SynthAdd (password:627x);
- Real image datasets: the union of trainsets IIIT5K, SVT, IC03, IC13, IC15, COCO-Text, SVTP, CUTE80; (baidu|google)
- Validation datasets : the union of the sets IC13 (857), SVT, IIIT5k (3000), IC15 (1811), SVTP, and CUTE80;
- Evaluation datasets : English benchmark datasets, consist of IIIT5k (3000), SVT, IC13 (857), IC15 (1811), SVTP, and CUTE80.
- Download Chinese training sets, validation sets and evaluation sets from here .
-
Download pretrained model from Google driver or Baidu Netdisk (passwd:7npu) ;
-
Set models path, testsets path and characters list ;
-
Run test_benchmark.py ;
CUDA_VISIBLE_DEVICES=0 python test_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 25 --imgW 96 --output_channel 192
-
Run test_chn_benchmark.py
CUDA_VISIBLE_DEVICES=0 python test_chn_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 64 --imgW 320 --output_channel 192
Please consider citing this work in your publications if it helps your research.
@article{cheng2024viptr,
title={VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition},
author={Cheng, Xianfu and Zhou, Weixiao and Li, Xiang and Chen, Xiaoming and Yang, Jian and Li, Tongliang and Li, Zhoujun},
journal={arXiv preprint arXiv:2401.10110},
year={2024}
}