DPLM: A Diachronic Language Model for Long-Time Span Classical Chinese

This repository provides a reference implementation of the following paper:

A Diachronic Language Model for Long-Time Span Classical Chinese
Yuting Wei, Meiling Li, Yangfu Zhu, Yuanxing Xu, Yuqing Li, Bin Wu
Information Processing and Management

Prerequisites

To set up the environment, follow these steps:

Create an Anaconda environment with Python 3.8:

conda create -n temporal python=3.8
conda activate temporal

Install the required PyTorch version and dependencies:

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

Install the remaining Python packages:

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

Download our training dataset:

Please download the dataset from Google Drive here and place it in the train_datasets folder.

Training

To train the model, simply run the following command:

bash run.sh

Evaluation

To evaluate the model, use the following command, replacing path_to_trained_dplm with the path to your trained model:

python evaluate.py --model_name path_to_trained_dplm

Citation

Please cite our paper if you use our code, dataset, or compare with our model:

@article{WEI2025103925,
title = {A diachronic language model for long-time span classical Chinese},
journal = {Information Processing & Management},
volume = {62},
number = {1},
pages = {103925},
year = {2025},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2024.103925},
url = {https://www.sciencedirect.com/science/article/pii/S030645732400284X},
author = {Yuting Wei and Meiling Li and Yangfu Zhu and Yuanxing Xu and Yuqing Li and Bin Wu}
}

Acknowledgements

The code structure is inspired by TempoBERT. The eval datasets are adapted from the Ancient Chinese Corpus with Word Sense Annotation. We sincerely thank the authors of these projects for their contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
eval_wsd_datasets		eval_wsd_datasets
results		results
tempobert		tempobert
train_datasets		train_datasets
trained_models		trained_models
README.md		README.md
bert_model.py		bert_model.py
data_utils.py		data_utils.py
evaluate.py		evaluate.py
hf_utils.py		hf_utils.py
requirements.txt		requirements.txt
run.sh		run.sh
temporal_text_dataset.py		temporal_text_dataset.py
test_bert.py		test_bert.py
tokenization_utils_base.py		tokenization_utils_base.py
train_tempobert_ancient.py		train_tempobert_ancient.py
utils.py		utils.py
wsd_model.py		wsd_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPLM: A Diachronic Language Model for Long-Time Span Classical Chinese

Prerequisites

Training

Evaluation

Citation

Acknowledgements

About

Releases

Packages

Languages

yuting-wei/DPLM

Folders and files

Latest commit

History

Repository files navigation

DPLM: A Diachronic Language Model for Long-Time Span Classical Chinese

Prerequisites

Training

Evaluation

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages