From 558322079202cd76ddb593c9a858f467de69e986 Mon Sep 17 00:00:00 2001 From: FloChiff Date: Thu, 21 Mar 2024 11:27:41 +0100 Subject: [PATCH] Fixed some content in the datasets --- dataset/ehri/dataset.md | 20 ++++++++++---------- dataset/pec/dataset.md | 2 +- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/dataset/ehri/dataset.md b/dataset/ehri/dataset.md index 2c46b89..722537e 100644 --- a/dataset/ehri/dataset.md +++ b/dataset/ehri/dataset.md @@ -36,16 +36,16 @@ The first case happened with German, Hungarian and Polish, which had full transc Here is how the dataset is constituted: -| Language | Collection | Documents | Lines | ATR model accuracy | -| :---: | :---: | :---: | :---: | :---: | -| German | BF; Nisko; EHT | 56 | 2287 | 97.9% | -| English | BF; EHT; DR | 54 | 1989 | 97.5% | -| Czech | BF; EHT | 46 | 1713 | 96.7% | -| Danish | DR | 36 | 1007 | 97.8% | -| Hungarian | EHT | 30 | 1334 | 95.7% | -| Polish | EHT | 15 | 468 | 93.1% | -| Slovak | BF | 15 | 395 | 93.7% | -| Multilingual | BF; Nisko; DR; EHT | 252 | 9193 | 97.2% | +| Language | Collection | Documents | Lines | +| :---: | :---: | :---: | :---: | +| German | BF; Nisko; EHT | 56 | 2287 | +| English | BF; EHT; DR | 54 | 1989 | +| Czech | BF; EHT | 46 | 1713 | +| Danish | DR | 36 | 1007 | +| Hungarian | EHT | 30 | 1334 | +| Polish | EHT | 15 | 468 | +| Slovak | BF | 15 | 395 | +| Multilingual | BF; Nisko; DR; EHT | 252 | 9193 | In this table, I mentioned each language of the dataset, the collections from which the documents came from, the number of documents and lines, as well as the accuracy that I obtained for the models I developed. As can be observed, there are two truly dominant languages (German and English). It is mostly due to the fact that they are found in the majority of collections. Then, there are three more or less important languages (Danish, Hungarian and Czech) and, finally, two rather limited languages in terms of quantity (Polish and Slovak). diff --git a/dataset/pec/dataset.md b/dataset/pec/dataset.md index ac4a207..bdcfd0f 100644 --- a/dataset/pec/dataset.md +++ b/dataset/pec/dataset.md @@ -1,6 +1,6 @@ --- layout: page -title: "Dataset" +title: "Dataset (PEC)" date: 2023-06-21 ---