Ko-CLIP

This repository contains code to train Korean CLIP on MS-COCO with Korean annotations in AI-HUB. Additionally, to get more Korean annotations, we use Naver Papago translator from English to Korean on VizWiz data.

Pretrained Model

The original CLIP has large-scaled dataset however ours dataset is much less than CLIP's. Due to lack Korean caption data, we use pretrained language and visual model to get representations on less dataset.

Pretrained Language Model

We fixed PLM as klue/roberta-large on huggingface to get more powerful text representation in Korean.

Pretrained Visual Model

We used PVMs as google/vit-base-patch16-224-in21k on huggingface and RN101 on torchvision to get image representations.
Actually, the images are not dependent in number of Korean dataset, but CLIP is trained pair of texts-images so Ko-CLIP trained limited images(which has Korean captions).

See WandB dashboard for check training records and model performance with comparing pretrained visual models.

Zero-shot classification

In zero-shot classification, we predict on CIFAR-10 and CIFAR-100 datasets.

We refer to CLIP, clip-training for train, koclip idea, and other pretrained models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ko-CLIP

Pretrained Model

Pretrained Language Model

Pretrained Visual Model

Zero-shot classification

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ko-CLIP

Pretrained Model

Pretrained Language Model

Pretrained Visual Model

Zero-shot classification