Skip to content

Commit

Permalink
feat: change the domain proportion yaml file to integrate the language
Browse files Browse the repository at this point in the history
  • Loading branch information
lucashervier committed Jun 28, 2024
1 parent f0ee085 commit cca8762
Showing 1 changed file with 33 additions and 16 deletions.
49 changes: 33 additions & 16 deletions training/domain_proportions.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
newspaper: 1
book: 2.5
technical: 2
wiki: 3
legal: 2
misc: 0.1
dialogue: 2
parlementary: 0.05
forum: 1
math: 3
code: 1
aligned: 3
legi_dialogue: 2
legi_spoken: 2
legi_written: 2
programming: 1
fr--newspaper: 13.27
fr--book: 5.9
fr--technical: 9.47
fr--wiki: 1.31
fr--legi_written: 0.36
fr--dialogue: 0.01
fr--legi_spoken: 0.09
fr--legi_dialogue: 0.08
en--technical: 16.22
en--newspaper: 1.62
en--legi_written: 1.88
en--wiki: 2.69
en--forum: 1.41
en--book: 1.93
en--math: 4.95
en--dialogue: 0.26
en--legi_dialogue: 0.02
de--wiki: 1.0
de--legi_written: 0.41
de--book: 0.06
de--legi_dialogue: 0.02
es--wiki: 0.8
es--legi_written: 0.3
es--legi_dialogue: 0.01
es--book: 0.04
it--wiki: 0.77
it--legi_written: 0.3
it--book: 0.04
es-en--aligned: 0.05
it-en--aligned: 0.05
de-fr--aligned: 0.05
fr-en--aligned: 8.67
code--programming: 25.96

0 comments on commit cca8762

Please sign in to comment.