-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
49 changed files
with
7,404 additions
and
3 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
import copy | ||
import math | ||
import time | ||
|
||
import skimage | ||
import torch | ||
import io | ||
from pycocotools.coco import COCO | ||
from pycocotools import mask as maskutils | ||
from glob import glob | ||
from tqdm import tqdm | ||
import numpy as np | ||
import torch | ||
import sys, os | ||
import matplotlib.pyplot as plt | ||
import cv2 | ||
import random | ||
import json | ||
from PIL import Image, TarIO | ||
import contextlib | ||
import torch.nn.functional as F | ||
import pyclipper | ||
from shapely.geometry import Polygon | ||
import shutil | ||
import skimage | ||
|
||
|
||
os.makedirs('train_images', exist_ok=True) | ||
os.makedirs('train_gt', exist_ok=True) | ||
os.makedirs('val_images', exist_ok=True) | ||
os.makedirs('val_gt', exist_ok=True) | ||
os.makedirs('test_images', exist_ok=True) | ||
os.makedirs('test_gt', exist_ok=True) | ||
|
||
with open('split.json') as f_json: | ||
split_info = json.load(f_json) | ||
train_img_list = split_info['train'] | ||
val_img_list = split_info['val'] | ||
test_img_list = split_info['test'] | ||
|
||
for im_id in train_img_list: | ||
shutil.copy(os.path.join('image', im_id+'.jpg'), os.path.join('train_images', im_id+'.jpg')) | ||
shutil.copy(os.path.join('semantic_label', im_id + '_maskfg.png'), os.path.join('train_gt', im_id + '.png')) | ||
for im_id in val_img_list: | ||
shutil.copy(os.path.join('image', im_id+'.jpg'), os.path.join('val_images', im_id+'.jpg')) | ||
shutil.copy(os.path.join('semantic_label', im_id + '_maskfg.png'), os.path.join('val_gt', im_id + '.png')) | ||
for im_id in test_img_list: | ||
shutil.copy(os.path.join('image', im_id+'.jpg'), os.path.join('test_images', im_id+'.jpg')) | ||
shutil.copy(os.path.join('semantic_label', im_id + '_maskfg.png'), os.path.join('test_gt', im_id + '.png')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
## Data Preparation | ||
|
||
### Step1. Download | ||
|
||
- **HierText**. Follow [the official repo of HierText](https://github.com/google-research-datasets/hiertext) to download the dataset images. I label and provide the text stroke segmentation ground-truths (png format, binary, 0 for background, 255 for text foreground), which can be downloaded with the following OneDrive links: [train_gt (131MB)](https://1drv.ms/u/s!AimBgYV7JjTlgcorK9fmoBp7QImvww?e=zRiNKL), [validation_gt (26MB)](https://1drv.ms/u/s!AimBgYV7JjTlgcooQOfgKDidqWvrAw?e=7NCHiC), [test_gt (25MB)](https://1drv.ms/u/s!AimBgYV7JjTlgcopZbsovlW6JVjomA?e=qw8Dht). | ||
|
||
 | ||
|
||
- **Total-Text**. Follow [the official repo of Total-Text](https://github.com/cs-chan/Total-Text-Dataset) to download the dataset. For text stroke segmentation, please download [the character level mask ground-truths](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Pixel/Character%20Level%20Mask). | ||
- **TextSeg**. Follow [the official repo of TextSeg](https://github.com/SHI-Labs/Rethinking-Text-Segmentation) to apply for the dataset. | ||
|
||
### Step2. Process & Organization | ||
|
||
(1) For Total-Text, rename` groundtruth_pixel/Train/img61.JPG` to ` groundtruth_pixel/Train/img61.jpg` . | ||
|
||
(2) For TextSeg, see ` TextSeg/process_textseg.py` and use it to split the original data. | ||
|
||
(3) Organize the datasets as the following structure: | ||
|
||
``` | ||
|- HierText | ||
| |- train | ||
| |- train_gt | ||
| |- validation | ||
| |- validation_gt | ||
| |- test | ||
| └ test_gt | ||
|- TotalText | ||
| |- groundtruth_pixel | ||
| |- Test | ||
| └ Train | ||
| └ Images | ||
| |- Test | ||
| └ Train | ||
|- TextSeg | ||
| |- train_images | ||
| |- train_gt | ||
| |- val_images | ||
| |- val_gt | ||
| |- test_images | ||
| └ test_gt | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.