Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 1.81 KB

README.md

File metadata and controls

19 lines (13 loc) · 1.81 KB

Ko-CLIP

This repository contains code to train Korean CLIP on MS-COCO with Korean annotations in AI-HUB. Additionally, to get more Korean annotations, we use Naver Papago translator from English to Korean on VizWiz data.

Pretrained Model

The original CLIP has large-scaled dataset however ours dataset is much less than CLIP's. Due to lack Korean caption data, we use pretrained language and visual model to get representations on less dataset.

Pretrained Language Model

  • We fixed PLM as klue/roberta-large on huggingface to get more powerful text representation in Korean.

Pretrained Visual Model

  • We used PVMs as google/vit-base-patch16-224-in21k on huggingface and RN101 on torchvision to get image representations.
  • Actually, the images are not dependent in number of Korean dataset, but CLIP is trained pair of texts-images so Ko-CLIP trained limited images(which has Korean captions).

See WandB dashboard for check training records and model performance with comparing pretrained visual models.

Zero-shot classification

In zero-shot classification, we predict on CIFAR-10 and CIFAR-100 datasets.

We refer to CLIP, clip-training for train, koclip idea, and other pretrained models.