diff --git a/CHANGELOG.md b/CHANGELOG.md index 02d125638..4fa1bfd27 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,6 +20,7 @@ ## [3.6.0](https://github.com/BLKSerene/Wordless/releases/tag/3.6.0) - ??/??/2024 ### 🎉 New Features +- Measures: Add effect size - squared association ratio - Utils: Add Stanza's Sindhi dependency parser ### 📌 Bugfixes diff --git a/doc/doc.md b/doc/doc.md index 3c8666b83..d98138534 100644 --- a/doc/doc.md +++ b/doc/doc.md @@ -946,11 +946,13 @@ The following variables would be used in formulas:
**NumCharsAlpha**: Number of alphabetic characters (letters, CJK characters, etc.) Readability Formula|Formula|Supported Languages -------------------|-------|:-----------------: -Al-Heeti's Readability Prediction Formulaš
([Al-Heeti, 1984, pp. 102, 104, 106](#ref-al-heeti-1984))|![Formula](/doc/measures/readability/rd.svg)|**Arabic** +Al-Heeti's readability formulaš
([Al-Heeti, 1984, pp. 102, 104, 106](#ref-al-heeti-1984))|![Formula](/doc/measures/readability/rd.svg)|**Arabic** Automated Arabic Readability Index
([Al-Tamimi et al., 2013](#ref-al-tamimi-et-al-2013))|![Formula](/doc/measures/readability/aari.svg)|**Arabic** Automated Readability Indexš
([Smith & Senter, 1967, p. 8](#ref-smith-senter-1967)
Navy: [Kincaid et al., 1975, p. 14](#ref-kincaid-et-al-1975))|![Formula](/doc/measures/readability/ari.svg)|All languages -Bormuth's Cloze Mean & Grade Placement
([Bormuth, 1969, pp. 152, 160](#ref-bormuth-1969))|![Formula](/doc/measures/readability/bormuths_cloze_mean_gp.svg)
where **C** is the cloze criterion score, whose value could be changed via **Menu Bar → Preferences → Settings → Measures → Readability → Bormuth's Grade Placement → Cloze criterion score**|**English** -Coleman-Liau Index
([Coleman & Liau, 1975](#ref-coleman-liau-1975))|![Formula](/doc/measures/readability/coleman_liau_index.svg)|All languages -Coleman's Readability Formulaš
([Liau et al., 1976](#ref-liau-et-al-1976))|![Formula](/doc/measures/readability/colemans_readability_formula.svg)|All languages²³ -Dale-Chall Readability Formulaš
([Dale & Chall, 1948a](#ref-dale-chall-1948a); [Dale & Chall, 1948b](#ref-dale-chall-1948b)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958)
New: [Chall & Dale, 1995](#ref-chall-dale-1995))|![Formula](/doc/measures/readability/x_c50.svg)|**English** -Danielson-Bryan's Readability Formulaš
([Danielson & Bryan, 1963](#ref-danielson-bryan-1963))|![Formula](/doc/measures/readability/danielson_bryans_readability_formula.svg)|All languages -Dawood's Readability Formula
([Dawood, 1977](#ref-dawood-1977))|![Formula](/doc/measures/readability/dawoods_readability_formula.svg)|**Arabic** +Bormuth's cloze mean & grade placement
([Bormuth, 1969, pp. 152, 160](#ref-bormuth-1969))|![Formula](/doc/measures/readability/bormuths_cloze_mean_gp.svg)
where **C** is the cloze criterion score, whose value could be changed via **Menu Bar → Preferences → Settings → Measures → Readability → Bormuth's Grade Placement → Cloze criterion score**|**English** +Coleman-Liau index
([Coleman & Liau, 1975](#ref-coleman-liau-1975))|![Formula](/doc/measures/readability/coleman_liau_index.svg)|All languages +Coleman's readability formulaš
([Liau et al., 1976](#ref-liau-et-al-1976))|![Formula](/doc/measures/readability/colemans_readability_formula.svg)|All languages²³ +Crawford's readability formula
([Crawford, 1985](#ref-crawford-1985))|![Formula](/doc/measures/readability/crawfords_readability_formula.svg)|**Spanish**² +Dale-Chall readability formulaš
([Dale & Chall, 1948a](#ref-dale-chall-1948a); [Dale & Chall, 1948b](#ref-dale-chall-1948b)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958)
New: [Chall & Dale, 1995](#ref-chall-dale-1995))|![Formula](/doc/measures/readability/x_c50.svg)|**English** +Danielson-Bryan's readability formulaš
([Danielson & Bryan, 1963](#ref-danielson-bryan-1963))|![Formula](/doc/measures/readability/danielson_bryans_readability_formula.svg)|All languages +Dawood's readability formula
([Dawood, 1977](#ref-dawood-1977))|![Formula](/doc/measures/readability/dawoods_readability_formula.svg)|**Arabic** Degrees of Reading Power
([College Entrance Examination Board, 1981](#ref-college-entrance-examination-board-1981))|![Formula](/doc/measures/readability/drp.svg)
where **M** is *Bormuth's cloze mean*.|**English** Devereux Readability Index
([Smith, 1961](#ref-smith-1961))|![Formula](/doc/measures/readability/devereux_readability_index.svg)|All languages Dickes-Steiwer Handformel
([Dickes & Steiwer, 1977](#ref-dickes-steiwer-1977))|![Formula](/doc/measures/readability/dickes_steiwer_handformel.svg)|All languages Easy Listening Formula
([Fang, 1966](#ref-fang-1966))|![Formula](/doc/measures/readability/elf.svg)|All languages² -Flesch-Kincaid Grade Level
([Kincaid et al., 1975, p. 14](#ref-kincaid-et-al-1975))|![Formula](/doc/measures/readability/gl.svg)|All languages² -Flesch Reading Easeš
([Flesch, 1948](#ref-flesch-1948)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958)
Dutch: [Douma, 1960, p. 453](#ref-douma-1960); [Brouwer, 1963](#ref-brouwer-1963)
French: [Kandel & Moles, 1958](#ref-kandel-moles-1958)
German: [Amstad, 1978](#ref-amstad-1978)
Italian: [Franchina & Vacca, 1986](#ref-franchina-vacca-1986)
Russian: [Oborneva, 2006, p. 13](#ref-oborneva-2006)
Spanish: [FernĂĄndez Huerta, 1959](#ref-fernandez-huerta-1959); [Szigriszt Pazos, 1993, p. 247](#ref-szigrisze-pazos-1993)
Ukrainian: [Partiko, 2001](#ref-partiko-2001))|![Formula](/doc/measures/readability/re.svg)|All languages² -Flesch Reading Ease (Farr-Jenkins-Paterson)š
([Farr et al., 1951](#ref-farr-et-al-1951)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958))|![Formula](/doc/measures/readability/re_farr_jenkins_paterson.svg)|All languages² -FORCAST Grade Level
([Caylor & Sticht, 1973, p. 3](#ref-caylor-sticht-1973))|![Formula](/doc/measures/readability/rgl.svg)

* **One sample of 150 words** would be taken randomly from the text, so the text should be **at least 150 words long**.|All languages² -Fórmula de comprensibilidad de GutiÊrrez de Polini
([GutiĂŠrrez de Polini, 1972](#ref-gutierrez-de-polini-1972))|![Formula](/doc/measures/readability/cp.svg)|**Spanish** -FĂłrmula de Crawford
([Crawford, 1985](#ref-crawford-1985))|![Formula](/doc/measures/readability/formula_de_crawford.svg)|**Spanish**² +Flesch-Kincaid grade level
([Kincaid et al., 1975, p. 14](#ref-kincaid-et-al-1975))|![Formula](/doc/measures/readability/gl.svg)|All languages² +Flesch reading easeš
([Flesch, 1948](#ref-flesch-1948)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958)
Dutch: [Douma, 1960, p. 453](#ref-douma-1960); [Brouwer, 1963](#ref-brouwer-1963)
French: [Kandel & Moles, 1958](#ref-kandel-moles-1958)
German: [Amstad, 1978](#ref-amstad-1978)
Italian: [Franchina & Vacca, 1986](#ref-franchina-vacca-1986)
Russian: [Oborneva, 2006, p. 13](#ref-oborneva-2006)
Spanish: [FernĂĄndez Huerta, 1959](#ref-fernandez-huerta-1959); [Szigriszt Pazos, 1993, p. 247](#ref-szigrisze-pazos-1993)
Ukrainian: [Partiko, 2001](#ref-partiko-2001))|![Formula](/doc/measures/readability/re.svg)|All languages² +Flesch reading ease (Farr-Jenkins-Paterson)š
([Farr et al., 1951](#ref-farr-et-al-1951)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958))|![Formula](/doc/measures/readability/re_farr_jenkins_paterson.svg)|All languages² +FORCAST
([Caylor & Sticht, 1973, p. 3](#ref-caylor-sticht-1973))|![Formula](/doc/measures/readability/rgl.svg)

* **One sample of 150 words** would be taken randomly from the text, so the text should be **at least 150 words long**.|All languages² Fucks's Stilcharakteristik
([Fucks, 1955](#ref-fucks-1955))|![Formula](/doc/measures/readability/fuckss_stilcharakteristik.svg)|All languages² -Gulpease Index
([Lucisano & Emanuela Piemontese, 1988](#ref-lucisano-emanuela-piemontese-1988))|![Formula](/doc/measures/readability/gulpease_index.svg)|**Italian** +GULPEASE
([Lucisano & Emanuela Piemontese, 1988](#ref-lucisano-emanuela-piemontese-1988))|![Formula](/doc/measures/readability/gulpease.svg)|**Italian** Gunning Fog Indexš
(English: [Gunning, 1968, p. 38](#ref-gunning-1968)
Powers-Sumner-Kearl: [Powers et al., 1958](#ref-powers-et-al-1958)
Navy: [Kincaid et al., 1975, p. 14](#ref-kincaid-et-al-1975)
Polish: [Pisarek, 1969](#ref-pisarek-1969))|![Formula](/doc/measures/readability/fog_index.svg)
where **NumHardWords** is the number of words with 3 or more syllables, except proper nouns and words with 3 syllables ending with *-ed* or *-es*, for **English texts**, and the number of words with 4 or more syllables in their base forms, except proper nouns, for **Polish texts**.|**English & Polish**² +GutiÊrrez de Polini's readability formula
([GutiĂŠrrez de Polini, 1972](#ref-gutierrez-de-polini-1972))|![Formula](/doc/measures/readability/cp.svg)|**Spanish** Legibilidad Âľ
([MuĂąoz Baquedano, 2006](#ref-munoz-baquedano-2006))|![Formula](/doc/measures/readability/mu.svg)
where **LenWordsAvg** is the average word length in letters, and **LenWordsVar** is the variance of word lengths in letters.|**Spanish** -Lensear Write
([O’Hayre, 1966, p. 8](#ref-o-hayre-1966))|![Formula](/doc/measures/readability/lensear_write.svg)
where **NumWords1Syl** is the number of monosyllabic words excluding *the*, *is*, *are*, *was*, *were*.

* **One sample of 100 words** would be taken randomly from the text, and if the text is **shorter than 100 words**, **NumWords1Syl** and **NumSentences** would be multiplied by 100 and then divided by **NumWords**.|**English**² +Lensear Write Formula
([O’Hayre, 1966, p. 8](#ref-o-hayre-1966))|![Formula](/doc/measures/readability/lensear_write_formula.svg)
where **NumWords1Syl** is the number of monosyllabic words excluding *the*, *is*, *are*, *was*, *were*.

* **One sample of 100 words** would be taken randomly from the text, and if the text is **shorter than 100 words**, **NumWords1Syl** and **NumSentences** would be multiplied by 100 and then divided by **NumWords**.|**English**² Lix
([BjÜrnsson, 1968](#ref-bjornsson-1968))|![Formula](/doc/measures/readability/lix.svg)|All languages Lorge Readability Indexš
([Lorge, 1944](#ref-lorge-1944)
Corrected: [Lorge, 1948](#ref-lorge-1948))|![Formula](/doc/measures/readability/lorge_readability_index.svg)|**English**Âł -Luong-Nguyen-Dinh's Readability Formula
([Luong et al., 2018](#ref-luong-et-al-2018))|![Formula](/doc/measures/readability/luong_nguyen_dinh_readability_formula.svg)

* The number of syllables is estimated by tokenizing the text by whitespace and counting the number of tokens excluding punctuation marks|**Vietnamese** -McAlpine EFLAW Readability Score
([Nirmaldasan, 2009](#ref-nirmaldasan-2009))|![Formula](/doc/measures/readability/eflaw.svg)|**English** +Luong-Nguyen-Dinh's readability formula
([Luong et al., 2018](#ref-luong-et-al-2018))|![Formula](/doc/measures/readability/luong_nguyen_dinhs_readability_formula.svg)

* The number of syllables is estimated by tokenizing the text by whitespace and counting the number of tokens excluding punctuation marks|**Vietnamese** +McAlpine EFLAW Readability Score
([McAlpine, 2006](#ref-mcalpine-2006))|![Formula](/doc/measures/readability/eflaw.svg)|**English** neue Wiener Literaturformelnš
([Bamberger & Vanecek, 1984, p. 82](#ref-bamberger-vanecek-1984))|![Formula](/doc/measures/readability/nwl.svg)|**German**² neue Wiener Sachtextformelš
([Bamberger & Vanecek, 1984, pp. 83–84](#ref-bamberger-vanecek-1984))|![Formula](/doc/measures/readability/nws.svg)|**German**² OSMAN
([El-Haj & Rayson, 2016](#ref-elhaj-rayson-2016))|![Formula](/doc/measures/readability/osman.svg)
where **NumFaseehWords** is the number of words which have 5 or more syllables and contain ء/ئ/ؤ/ذ/ظ or end with وا/ون.

* The number of syllables in each word is estimated by adding up the number of short syllables and twice the number of long and stress syllables in each word.|**Arabic** Rix
([Anderson, 1983](#ref-anderson-1983))|![Formula](/doc/measures/readability/rix.svg)|All languages -SMOG Grade
([McLaughlin, 1969](#ref-mclaughlin-1969)
German: [Bamberger & Vanecek, 1984, p.78](#ref-bamberger-vanecek-1984))|![Formula](/doc/measures/readability/smog_grade.svg)

* A sample would be constructed using **the first 10 sentences, the last 10 sentences, and the 10 sentences at the middle of the text**, so the text should be **at least 30 sentences long**.|All languages² -Spache Grade Levelš
([Spache, 1953](#ref-spache-1953)
Revised: [Spache, 1974](#ref-spache-1974))|![Formula](/doc/measures/readability/spache_grade_level.svg)

* **Three samples each of 100 words** would be taken randomly from the text and the results would be averaged out, so the text should be **at least 100 words long**.|All languages -Strain Index
([Solomon, 2006](#ref-solomon-2006))|![Formula](/doc/measures/readability/strain_index.svg)

* A sample would be constructed using **the first 3 sentences in the text**, so the text should be **at least 3 sentences long**.|All languages² -Tränkle & Bailer's Readability Formulaš
([Tränkle & Bailer, 1984](#ref-trankle-bailer-1984))|![Formula](/doc/measures/readability/trankle_bailers_readability_formula.svg)

* **One sample of 100 words** would be taken randomly from the text, so the text should be **at least 100 words long**.|All languagesÂł -Tuldava's Text Difficulty
([Tuldava, 1975](#ref-tuldava-1975))|![Formula](/doc/measures/readability/td.svg)|All languages² -Wheeler & Smith's Readability Formula
([Wheeler & Smith, 1954](#ref-wheeler-smith-1954))|![Formula](/doc/measures/readability/wheeler_smiths_readability_formula.svg)
where **NumUnits** is the number of sentence segments ending in periods, question marks, exclamation marks, colons, semicolons, and dashes.|All languages² +SMOG Grading
([McLaughlin, 1969](#ref-mclaughlin-1969)
German: [Bamberger & Vanecek, 1984, p.78](#ref-bamberger-vanecek-1984))|![Formula](/doc/measures/readability/smog_grading.svg)

* A sample would be constructed using **the first 10 sentences, the last 10 sentences, and the 10 sentences at the middle of the text**, so the text should be **at least 30 sentences long**.|All languages² +Spache readability formulaš
([Spache, 1953](#ref-spache-1953)
Revised: [Spache, 1974](#ref-spache-1974))|![Formula](/doc/measures/readability/spache_readability_formula.svg)

* **Three samples each of 100 words** would be taken randomly from the text and the results would be averaged out, so the text should be **at least 100 words long**.|English +Strain Index
([Nathaniel, 2017](#ref-nathaniel-2017))|![Formula](/doc/measures/readability/strain_index.svg)

* A sample would be constructed using **the first 3 sentences in the text**, so the text should be **at least 3 sentences long**.|All languages² +Tränkle-Bailer's readability formulaš
([Tränkle & Bailer, 1984](#ref-trankle-bailer-1984))|![Formula](/doc/measures/readability/trankle_bailers_readability_formula.svg)

* **One sample of 100 words** would be taken randomly from the text, so the text should be **at least 100 words long**.|All languagesÂł +Tuldava's readability formula
([Tuldava, 1975](#ref-tuldava-1975))|![Formula](/doc/measures/readability/td.svg)|All languages² +Wheeler-Smith's readability formula
([Wheeler & Smith, 1954](#ref-wheeler-smith-1954))|![Formula](/doc/measures/readability/wheeler_smiths_readability_formula.svg)
where **NumUnits** is the number of sentence segments ending in periods, question marks, exclamation marks, colons, semicolons, and dashes.|All languages² > [!NOTE] > 1. Variants available and can be selected via **Menu Bar → Preferences → Settings → Measures → Readability** @@ -1191,7 +1193,9 @@ The following variables would be used in formulas:
**NumTokens**: Number of tokens
Measure of Effect Size|Formula ----------------------|------- -%DIFF
([Gabrielatos & Marchi, 2012](#ref-gabrielatos-marchi-2012))|![Formula](/doc/measures/effect_size/pct_diff.svg) -Cubic Association Ratio
([Daille, 1994](#ref-daille-1994), [1995](#ref-daille-1995))|![Formula](/doc/measures/effect_size/im3.svg) -Dice's Coefficient
([Smadja et al., 1996](#ref-smadja-et-al-1996))|![Formula](/doc/measures/effect_size/dices_coeff.svg) -Difference Coefficient
([Hofland & Johanson, 1982](#ref-hofland-johanson-1982); [Gabrielatos, 2018](#ref-gabrielatos-2018))|![Formula](/doc/measures/effect_size/diff_coeff.svg) -Jaccard Index
([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/jaccard_index.svg) -Kilgarriff's Ratio
([Kilgarriff, 2009](#ref-kilgarriff-2009))|![Formula](/doc/measures/effect_size/kilgarriffs_ratio.svg)
where **α** is the smoothing parameter, whose value could be changed via **Menu Bar → Preferences → Settings → Measures → Effect Size → Kilgarriff's Ratio → Smoothing Parameter**. -Log Ratio
([Hardie, 2014](#ref-hardie-2014))|![Formula](/doc/measures/effect_size/log_ratio.svg) -Log-Frequency Biased MD
([Thanopoulos et al., 2002](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/lfmd.svg) +%DIFF
([Gabrielatos & Marchi, 2011](#ref-gabrielatos-marchi-2011))|![Formula](/doc/measures/effect_size/pct_diff.svg) +Cubic association ratio
([Daille, 1994](#ref-daille-1994))|![Formula](/doc/measures/effect_size/im3.svg) +Dice-Sørensen coefficient
([Smadja et al., 1996](#ref-smadja-et-al-1996))|![Formula](/doc/measures/effect_size/dice_sorensen_coeff.svg) +Difference coefficient
([Hofland & Johansson, 1982](#ref-hofland-johansson-1982); [Gabrielatos, 2018](#ref-gabrielatos-2018))|![Formula](/doc/measures/effect_size/diff_coeff.svg) +Jaccard index
([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/jaccard_index.svg) +Kilgarriff's ratio
([Kilgarriff, 2009](#ref-kilgarriff-2009))|![Formula](/doc/measures/effect_size/kilgarriffs_ratio.svg)
where **α** is the smoothing parameter, whose value could be changed via **Menu Bar → Preferences → Settings → Measures → Effect Size → Kilgarriff's Ratio → Smoothing Parameter**. logDice
([RychlĂ˝, 2008](#ref-rychly-2008))|![Formula](/doc/measures/effect_size/log_dice.svg) -MI.log-f
([Lexical Computing Ltd., 2015](#ref-lexical-computing-ltd-2015); [Kilgarriff & Tugwell, 2002](#ref-kilgarriff-tugwell-2002))|![Formula](/doc/measures/effect_size/mi_log_f.svg) -Minimum Sensitivity
([Pedersen, 1998](#ref-pedersen-1998))|![Formula](/doc/measures/effect_size/min_sensitivity.svg) +Log-frequency biased MD
([Thanopoulos et al., 2002](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/lfmd.svg) +Log Ratio
([Hardie, 2014](#ref-hardie-2014))|![Formula](/doc/measures/effect_size/log_ratio.svg) +MI.log-f
([Kilgarriff & Tugwell, 2002](#ref-kilgarriff-tugwell-2002); [Lexical Computing Ltd., 2015](#ref-lexical-computing-ltd-2015))|![Formula](/doc/measures/effect_size/mi_log_f.svg) +Minimum sensitivity
([Pedersen, 1998](#ref-pedersen-1998))|![Formula](/doc/measures/effect_size/min_sensitivity.svg) Mutual Dependency
([Thanopoulos et al., 2002](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/md.svg) Mutual Expectation
([Dias et al., 1999](#ref-dias-et-al-1999))|![Formula](/doc/measures/effect_size/me.svg) -Mutual Information
([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/mi.svg) -Odds Ratio
([Pojanapunya & Todd, 2016](#ref-pojanapunya-todd-2016))|![Formula](/doc/measures/effect_size/odds_ratio.svg) -Pointwise Mutual Information
([Church & Hanks, 1990](#ref-church-hanks-1990))|![Formula](/doc/measures/effect_size/pmi.svg) -Poisson Collocation Measure
([Quasthoff & Wolff, 2002](#ref-quasthoff-wolff-2002))|![Formula](/doc/measures/effect_size/poisson_collocation_measure.svg) -Squared Phi Coefficient
([Church & Gale, 1991](#ref-church-gale-1991))|![Formula](/doc/measures/effect_size/squared_phi_coeff.svg) +Mutual information
([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/mi.svg) +Odds ratio
([Pojanapunya & Todd, 2016](#ref-pojanapunya-todd-2016))|![Formula](/doc/measures/effect_size/odds_ratio.svg) +Pointwise mutual information
([Church & Hanks, 1990](#ref-church-hanks-1990))|![Formula](/doc/measures/effect_size/pmi.svg) +Poisson collocation measure
([Quasthoff & Wolff, 2002](#ref-quasthoff-wolff-2002))|![Formula](/doc/measures/effect_size/poisson_collocation_measure.svg) +Squared association ratio
([Daille, 1995](#ref-daille-1995))|![Formula](/doc/measures/effect_size/im2.svg) +Squared phi coefficient
([Church & Gale, 1991](#ref-church-gale-1991))|![Formula](/doc/measures/effect_size/squared_phi_coeff.svg) ## [13 References](#doc) 1. [**^**](#ref-rd) Al-Heeti, K. N. (1984). *Judgment analysis technique applied to readability prediction of Arabic reading material* [Doctoral dissertation, University of Northern Colorado]. ProQuest Dissertations and Theses Global. -1. [**^**](#ref-aari) Al-Tamimi, A., Jaradat M., Aljarrah, N., & Ghanim, S. (2013). AARI: Automatic Arabic readability index. *The International Arab Journal of Information Technology*, *11*(4), 370–378. +1. [**^**](#ref-aari) Al-Tamimi, A., Jaradat M., Aljarrah, N., & Ghanim, S. (2013). AARI: Automatic Arabic Readability Index. *The International Arab Journal of Information Technology*, *11*(4), 370–378. 1. [**^**](#ref-re) Amstad, T. (1978). *Wie verständlich sind unsere Zeitungen?* [Unpublished doctoral dissertation]. University of Zurich. 1. [**^**](#ref-rix) Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. *Journal of Reading*, *26*(6), 490–496. -1. [**^**](#ref-num-word-types-bamberger-vanecek) [**^**](#ref-nwl) [**^**](#ref-nws) [**^**](#ref-smog-grade) Bamberger, R., & Vanecek, E. (1984). *Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache*. Jugend und Volk. +1. [**^**](#ref-num-word-types-bamberger-vanecek) [**^**](#ref-nwl) [**^**](#ref-nws) [**^**](#ref-smog-grading) Bamberger, R., & Vanecek, E. (1984). *Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache*. Jugend und Volk. -1. [**^**](#ref-z-score-berry-rogghes) Berry-Rogghe, G. L. M. (1973). The computation of collocations and their relevance in lexical studies. In A. J. Aiken, R. W. Bailey, & N. Hamilton-Smith (Eds.), *The computer and literary studies* (pp. 103–112). Edinburgh University Press. +1. [**^**](#ref-z-test-berry-rogghes) Berry-Rogghe, G. L. M. (1973). The computation of collocations and their relevance in lexical studies. In A. J. Aiken, R. W. Bailey, & N. Hamilton-Smith (Eds.), *The computer and literary studies* (pp. 103–112). Edinburgh University Press. 1. [**^**](#ref-bormuths-cloze-mean-gp) Bormuth, J. R. (1969). *Development of readability analyses*. U.S. Department of Health, Education, and Welfare. http://files.eric.ed.gov/fulltext/ED029166.pdf 1. [**^**](#ref-lix) Björnsson, C.-H. (1968). *Läsbarhet*. Liber. -1. [**^**](#ref-re) Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheid van Nederlands proza. *Paedagogische studiën*, *40*, 454–464. https://objects.library.uu.nl/reader/index.php?obj=1874-205260&lan=en +1. [**^**](#ref-re) Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheid van Nederlands proza. *Paedagogische Studiën*, *40*, 454–464. https://objects.library.uu.nl/reader/index.php?obj=1874-205260&lan=en 1. [**^**](#ref-brunets-index) Brunét, E. (1978). *Le vocabulaire de Jean Giraudoux: Structure et evolution*. Slatkine. @@ -1577,7 +1585,7 @@ Measure of Effect Size|Formula 1. [**^**](#ref-x-c50) Chall, J. S., & Dale, E. (1995). *Readability revisited: The new Dale-Chall readability formula*. Brookline Books. -1. [**^**](#ref-squared-phi-coeff) Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom. +1. [**^**](#ref-squared-phi-coeff) Church, K. W., & Gale, W. A. (1991, September 29–October 1). *Concordances for parallel text* [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom. 1. [**^**](#ref-students-t-test-1-sample) Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), *Lexical acquisition: Exploiting on-line resources to build a lexicon* (pp. 115–164). Psychology Press. @@ -1589,11 +1597,11 @@ Measure of Effect Size|Formula 1. [**^**](#ref-mattr) Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). *Journal of Quantitative Linguistics*, *17*(2), 94–100. https://doi.org/10.1080/09296171003643098 -1. [**^**](#ref-formula-de-crawford) Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. *Lectura y Vida*, *6*(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf +1. [**^**](#ref-crawfords-readability-formula) Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. *Lectura y Vida*, *6*(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf 1. [**^**](#ref-im3) Daille, B. (1994). *Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques* [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= -1. [**^**](#ref-im3) Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. *UCREL technical papers* (Vol. 5). Lancaster University. +1. [**^**](#ref-im2) Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. *UCREL technical papers* (Vol. 5). Lancaster University. 1. [**^**](#ref-num-words-dale-769) [**^**](#ref-num-word-types-dale-769) Dale, E. (1931). A comparison of two word lists. *Educational Research Bulletin*, *10*(18), 484–489. @@ -1605,13 +1613,13 @@ Measure of Effect Size|Formula 1. [**^**](#ref-dawoods-readability-formula) Dawood, B.A.K. (1977). *The relationship between readability and selected language variables* [Unpublished master’s thesis]. University of Baghdad. -1. [**^**](#ref-z-score) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Proceedings of the symposium on statistical association methods for mechanized documentation* (pp. 61–148). National Bureau of Standards. +1. [**^**](#ref-z-test) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Proceedings of the symposium on statistical association methods for mechanized documentation* (pp. 61–148). National Bureau of Standards. 1. [**^**](#ref-me) Dias, G., Guilloré, S., & Pereira Lopes, J. G. (1999). Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In A. Condamines, C. Fabre, & M. Péry-Woodley (Eds.), *TALN'99: 6ème Conférence Annuelle Sur le Traitement Automatique des Langues Naturelles* (pp. 333–339). TALN. 1. [**^**](#ref-dickes-steiwer-handformel) Dickes, P. & Steiwer, L. (1977). Ausarbeitung von lesbarkeitsformeln für die deutsche sprache. *Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie*, *9*(1), 20–28. -1. [**^**](#ref-re) Douma, W. H. (1960). *De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules* [Readability of Dutch farm papers: A discussion and application of readability-formulas]. Afdeling sociologie en sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 +1. [**^**](#ref-re) Douma, W. H. (1960). *De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules* [Readability of Dutch farm papers: A discussion and application of readability-formulas]. Afdeling Sociologie en Sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 1. [**^**](#ref-logttr) Dugast, D. (1978). Sur quoi se fonde la notion d’étendue théoretique du vocabulaire? *Le Français Moderne*, *46*, 25–32. @@ -1619,7 +1627,7 @@ Measure of Effect Size|Formula 1. [**^**](#ref-log-likehood-ratio-test) Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. *Computational Linguistics*, *19*(1), 61–74. -1. [**^**](#ref-jaccard-index) [**^**](#ref-mi) Dunning, T. E. (1998). *Finding structure in text, genome and other symbolic sequences* [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf +1. [**^**](#ref-jaccard-index) [**^**](#ref-mi) Dunning, T. E. (1998). *Finding structure in text, genome and other symbolic sequences* [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 1. [**^**](#ref-osman) El-Haj, M., & Rayson, P. (2016). OSMAN: A novel Arabic readability metric. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), *Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)* (pp. 250–255). European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2016/index.html @@ -1627,7 +1635,7 @@ Measure of Effect Size|Formula 1. [**^**](#ref-elf) Fang, I. E. (1966). The easy listening formula. *Journal of Broadcasting*, *11*(1), 63–68. https://doi.org/10.1080/08838156609363529 -1. [**^**](#ref-re-simplified) Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. *Journal of Applied Psychology*, *35*(5), 333–337. https://doi.org/10.1037/h0062427 +1. [**^**](#ref-re-farr-jenkins-paterson) Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. *Journal of Applied Psychology*, *35*(5), 333–337. https://doi.org/10.1037/h0062427 1. [**^**](#ref-re) Fernández Huerta, J. (1959). Medidas sencillas de lecturabilidad. *Consigna*, *214*, 29–32. @@ -1637,27 +1645,29 @@ Measure of Effect Size|Formula 1. [**^**](#ref-re) Franchina, V., & Vacca, R. (1986). Adaptation of Flesh readability index on a bilingual text written by the same author both in Italian and English languages. *Linguaggi*, *3*, 47–49. -1. [**^**](#ref-fuckss-stilcharakteristik) Fucks, W. (1955). *Unterschied des Prosastils von Dichtern und anderen Schriftstellern: ein Beispiel mathematischer Stilanalyse*. Bouvier. +1. [**^**](#ref-fuckss-stilcharakteristik) Fucks, W. (1955). *Unterschied des prosastils von dichtern und anderen schriftstellern: Ein beispiel mathematischer stilanalyse*. Bouvier. 1. [**^**](#ref-diff-coeff) Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), *Corpus approaches to discourse: A critical review* (pp. 225–258). Routledge. - -1. [**^**](#ref-pct-diff) Gabrielatos, C., & Marchi, A. (2012, September 13–14). *Keyness: Appropriate metrics and practical issues* [Conference session]. CADS International Conference 2012, University of Bologna, Italy. + +1. [**^**](#ref-pct-diff) Gabrielatos, C., & Marchi, A. (2011, November 5). *Keyness: Matching metrics to definitions* [Conference session]. Corpus Linguistics in the South 1, University of Portsmouth, United Kingdom. https://eprints.lancs.ac.uk/id/eprint/51449/4/Gabrielatos_Marchi_Keyness.pdf 1. [**^**](#ref-griess-dp) Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. *International Journal of Corpus Linguistics*, *13*(4), 403–437. https://doi.org/10.1075/ijcl.13.4.02gri -1. [**^**](#ref-rttr) Guiraud, P. (1954). *Les caractères statistiques du vocabulaire: Essai de méthodologie*. Presses universitaires de France. +1. [**^**](#ref-rttr) Guiraud, P. (1954). *Les caractères statistiques du vocabulaire: Essai de méthodologie*. Presses Universitaires de France. 1. [**^**](#ref-fog-index) Gunning, R. (1968). *The technique of clear writing* (revised ed.). McGraw-Hill Book Company. 1. [**^**](#ref-cp) Gutiérrez de Polini, L. E. (1972). *Investigación sobre lectura en Venezuela* [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela. + +1. [**^**](#ref-lexical-density) Halliday, M. A. K. (1989). *Spoken and written language* (2nd ed.). Oxford University Press. -1. [**^**](#ref-log-ratio) Hardie, A. (2014, April 28). *Log ratio: An informal introduction*. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/ +1. [**^**](#ref-log-ratio) Hardie, A. (2014, April 28). *Log Ratio: An informal introduction*. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/ 1. [**^**](#ref-herdans-vm) Herdan, G. (1955). A new derivation and interpretation of Yule's ‘Characteristic’ K. *Zeitschrift für Angewandte Mathematik und Physik (ZAMP)*, *6*(4), 332–339. https://doi.org/10.1007/BF01587632 1. [**^**](#ref-logttr) Herdan, G. (1960). *Type-token mathematics: A textbook of mathematical linguistics*. Mouton. - -1. [**^**](#ref-pearsons-chi-squared-test) [**^**](#ref-diff-coeff) Hofland, K., & Johanson, S. (1982). *Word frequencies in British and American English*. Norwegian Computing Centre for the Humanities. + +1. [**^**](#ref-pearsons-chi-squared-test) [**^**](#ref-diff-coeff) Hofland, K., & Johansson, S. (1982). *Word frequencies in British and American English*. Norwegian Computing Centre for the Humanities. 1. [**^**](#ref-honores-stat) Honoré, A. (1979). Some simple measures of richness of vocabulary. *Association of Literary and Linguistic Computing Bulletin*, *7*(2), 172–177. @@ -1666,7 +1676,7 @@ Linguistic Computing Bulletin*, *7*(2), 172–177. 1. [**^**](#ref-juillands-d) [**^**](#ref-juillands-u) Juilland, A., & Chang-Rodriguez, E. (1964). *Frequency dictionary of Spanish words*. Mouton. -1. [**^**](#ref-re) Kandel, L., & Moles A. (1958). Application de l’indice de flesch la langue francaise [applying flesch index to french language]. *The Journal of Educational Research*, *21*, 283–287. +1. [**^**](#ref-re) Kandel, L., & Moles, A. (1958). Application de l’indice de flesch à la langue française. *The Journal of Educational Research*, *21*, 283–287. 1. [**^**](#ref-mann-whiteney-u-test) Kilgarriff, A. (2001). Comparing corpora. *International Journal of Corpus Linguistics*, *6*(1), 232–263. https://doi.org/10.1075/ijcl.6.1.05kil @@ -1688,31 +1698,31 @@ Linguistic Computing Bulletin*, *7*(2), 172–177. 1. [**^**](#ref-lorge-readability-index) Lorge, I. (1948). The Lorge and Flesch readability formulae: A correction. *School and Society*, *67*, 141–142. -1. [**^**](#ref-gulpease-index) Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. *Scuola e Città*, *39*(3), 110–124. +1. [**^**](#ref-gulpease) Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. *Scuola e Città*, *39*(3), 110–124. 1. [**^**](#ref-num-syls-luong-nguyen-dinh-1000) [**^**](#ref-luong-nguyen-dinhs-readability-formula) Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. *2018 10th International Conference on Knowledge and Systems Engineering (KSE)* (pp. 198–202). IEEE. https://doi.org/10.1109/KSE.2018.8573379 -1. [**^**](#ref-lynes-d3) Lyne, A. A. (1985). Dispersion. In *The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method* (pp. 101–124). Slatkine/Champion. +1. [**^**](#ref-lynes-d3) Lyne, A. A. (1985). Dispersion. In A. A. Lyne (Ed.), *The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method* (pp. 101–124). Slatkine. 1. [**^**](#ref-vocdd) Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). *Lexical diversity and language development: Quantification and assessment*. Palgrave Macmillan. 1. [**^**](#ref-logttr) Maas, H.-D. (1972). Über den zusammenhang zwischen wortschatzumfang und länge eines textes. *Zeitschrift für Literaturwissenschaft und Linguistik*, *2*(8), 73–96. + +1. [**^**](#ref-eflaw) McAlpine, R. (2006). *From plain English to global English*. Journalism Online. Retrieved October 31, 2024, from https://www.angelfire.com/nd/nirmaldasan/journalismonline/fpetge.html 1. [**^**](#ref-mtld) McCarthy, P. M. (2005). *An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD)* [Doctoral dissertation, The University of Memphis]. ProQuest Dissertations and Theses Global. 1. [**^**](#ref-hdd) [**^**](#ref-mtld) McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. *Behavior Research Methods*, *42*(2), 381–392. https://doi.org/10.3758/BRM.42.2.381 -1. [**^**](#ref-smog-grade) McLaughlin, G. H. (1969). SMOG grading: A new readability formula. *Journal of Reading*, *12*(8), 639–646. +1. [**^**](#ref-smog-grading) McLaughlin, G. H. (1969). SMOG Grading: A new readability formula. *Journal of Reading*, *12*(8), 639–646. 1. [**^**](#ref-mu) Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos. *Boletín de Investigación Educacional, Pontificia Universidad Católica de Chile*, *21*(2), 13–26. - -1. [**^**](#ref-eflaw) Nirmaldasan. (2009, April 30). *McAlpine EFLAW readability score*. Readability Monitor. Retrieved November 15, 2022, from https://strainindex.wordpress.com/2009/04/30/mcalpine-eflaw-readability-score/ -1. [**^**](#ref-pearsons-chi-squared-test) Oakes, M. P. (1998). *Statistics for Corpus Linguistics*. Edinburgh University Press. +1. [**^**](#ref-pearsons-chi-squared-test) Oakes, M. P. (1998). *Statistics for corpus linguistics*. Edinburgh University Press. 1. [**^**](#ref-re) Oborneva, I. V. (2006). *Автоматизированная оценка сложности учебных текстов на основе статистических параметров* [Doctoral dissertation, Institute for Strategy of Education Development of the Russian Academy of Education]. Freereferats.ru. https://static.freereferats.ru/_avtoreferats/01002881899.pdf?ver=3 -1. [**^**](#ref-lensear-write) O’Hayre, J. (1966). *Gobbledygook has gotta go*. U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf +1. [**^**](#ref-lensear-write-formula) O’Hayre, J. (1966). *Gobbledygook has gotta go*. U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf 1. [**^**](#ref-students-t-test-2-sample) Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. *Language and Computers*, *68*, 247–269. @@ -1740,38 +1750,36 @@ Linguistic Computing Bulletin*, *7*(2), 172–177. 1. [**^**](#ref-ald) [**^**](#ref-fald) [**^**](#ref-arf) [**^**](#ref-farf) [**^**](#ref-awt) [**^**](#ref-fawt) Savický, P., & Hlaváčová, J. (2002). Measures of word commonness. *Journal of Quantitative Linguistics*, *9*(3), 215–231. https://doi.org/10.1076/jqul.9.3.215.14124 -1. [**^**](#ref-simpsons-l) Simpson, E. H. (1949). Measurement of diversity. *Nature*, *163*, p. 688. https://doi.org/10.1038/163688a0 +1. [**^**](#ref-simpsons-l) Simpson, E. H. (1949). Measurement of diversity. *Nature*, *163*, 688. https://doi.org/10.1038/163688a0 -1. [**^**](#ref-dices-coeff) Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. *Computational Linguistics*, *22*(1), 1–38. +1. [**^**](#ref-dice-sorensen-coeff) Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. *Computational Linguistics*, *22*(1), 1–38. 1. [**^**](#ref-devereux-readability-index) Smith, E. A. (1961). Devereaux readability index. *Journal of Educational Research*, *54*(8), 298–303. https://doi.org/10.1080/00220671.1961.10882728 1. [**^**](#ref-ari) Smith, E. A., & Senter, R. J. (1967). *Automated readability index*. Aerospace Medical Research Laboratories. https://apps.dtic.mil/sti/pdfs/AD0667273.pdf - -1. [**^**](#ref-strain-index) Solomon, N. W. (2006). *Qualitative analysis of media language* [Unpublished doctoral dissertation]. Madurai Kamaraj University. + +1. [**^**](#ref-strain-index) Nathaniel, W. S. (2017). *A quantitative analysis of media language* [Master’s thesis, Madurai Kamaraj University]. LAMBERT Academic Publishing. 1. [**^**](#ref-logttr) Somers, H. H. (1966). Statistical methods in literary analysis. In J. Leeds (Ed.), *The computer and literary style* (pp. 128–140). Kent State University Press. -1. [**^**](#ref-spache-grade-level) Spache, G. (1953). A new readability formula for primary-grade reading materials. *Elementary School Journal*, *53*(7), 410–413. https://doi.org/10.1086/458513 +1. [**^**](#ref-spache-readability-formula) Spache, G. (1953). A new readability formula for primary-grade reading materials. *Elementary School Journal*, *53*(7), 410–413. https://doi.org/10.1086/458513 -1. [**^**](#ref-num-words-spache) [**^**](#ref-spache-grade-level) Spache, G. (1974). *Good reading for poor readers* (Rev. 9th ed.). Garrard. +1. [**^**](#ref-num-words-spache) [**^**](#ref-spache-readability-formula) Spache, G. (1974). *Good reading for poor readers* (Rev. 9th ed.). Garrard. 1. [**^**](#ref-re) Szigriszt Pazos, F. (1993). *Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad* [Doctoral dissertation, Complutense University of Madrid]. Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y 1. [**^**](#ref-lfmd) [**^**](#ref-md) Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), *Proceedings of the Third International Conference on Language Resources and Evaluation* (pp. 620–625). European Language Resources Association. -1. [**^**](#ref-trankle-bailers-readability-formula) Tränkle, U., & Bailer, H. (1984). *Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln für die Deutsche Sprache* [Cross-validation and recalculation of the readability formulas for the German language]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, *16*(3), 231–244. +1. [**^**](#ref-trankle-bailers-readability-formula) Tränkle, U., & Bailer, H. (1984). Kreuzvalidierung und neuberechnung von lesbarkeitsformeln für die Deutsche sprache. *Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie*, *16*(3), 231–244. -1. [**^**](#ref-td) Tuldava, J. (1975). Ob izmerenii trudnosti tekstov [On measuring the complexity of the text]. *Uchenye zapiski Tartuskogo universiteta. Trudy po metodike prepodavaniya inostrannykh yazykov*, *345*, 102–120. - -1. [**^**](#ref-lexical-density) Ure, J. (1971). Lexical density and register differentiation. In G. E. Perren & J. L. M. Trim (Eds.), *Applications of Linguistics* (pp. 443–452). Cambridge University Press. +1. [**^**](#ref-td) Tuldava, J. (1975). Ob izmerenii trudnosti tekstov. *Uchenye zapiski Tartuskogo universiteta. Trudy po metodike prepodavaniya inostrannykh yazykov*, *345*, 102–120. 1. [**^**](#ref-wheeler-smiths-readability-formula) Wheeler, L. R., & Smith, E. H. (1954). A practical readability formula for the classroom teacher in the primary grades. *Elementary English*, *31*(7), 397–399. 1. [**^**](#ref-yules-index-of-diversity) Williams, C. B. (1970). *Style and vocabulary: Numerical studies*. Griffin. -1. [**^**](#ref-log-likehood-ratio-test) [**^**](#ref-students-t-test-2-sample) Wilson, A. (2013). Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), *New Approaches to the Study of Linguistic Variability* (pp. 3–11). Peter Lang. +1. [**^**](#ref-log-likehood-ratio-test) [**^**](#ref-students-t-test-2-sample) Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), *New approaches to the study of linguistic variability* (pp. 3–11). Peter Lang. 1. [**^**](#ref-yules-characteristic-k) Yule, G. U. (1944). *The statistical study of literary vocabulary*. Cambridge University Press. -1. [**^**](#ref-zhangs-distributional-consistency) Zhang, H., Huang, C., & Yu, S. (2004). Distributional consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), *Proceedings of Fourth International Conference on Language Resources and Evaluation* (pp. 1119–1122). European Language Resources Association. +1. [**^**](#ref-zhangs-distributional-consistency) Zhang, H., Huang, C., & Yu, S. (2004). Distributional Consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), *Proceedings of Fourth International Conference on Language Resources and Evaluation* (pp. 1119–1122). European Language Resources Association. diff --git a/doc/measures/effect_size/dices_coeff.svg b/doc/measures/effect_size/dice_sorensen_coeff.svg similarity index 100% rename from doc/measures/effect_size/dices_coeff.svg rename to doc/measures/effect_size/dice_sorensen_coeff.svg diff --git a/doc/measures/effect_size/diff_coeff.svg b/doc/measures/effect_size/diff_coeff.svg index daaceb5f1..a7657003f 100644 --- a/doc/measures/effect_size/diff_coeff.svg +++ b/doc/measures/effect_size/diff_coeff.svg @@ -1,16 +1,12 @@ - - + + - - - - @@ -19,58 +15,61 @@ + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/effect_size/im2.svg b/doc/measures/effect_size/im2.svg new file mode 100644 index 000000000..8cb48ef21 --- /dev/null +++ b/doc/measures/effect_size/im2.svg @@ -0,0 +1,34 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/effect_size/kilgarriffs_ratio.svg b/doc/measures/effect_size/kilgarriffs_ratio.svg index e4a4996f1..c66b493d4 100644 --- a/doc/measures/effect_size/kilgarriffs_ratio.svg +++ b/doc/measures/effect_size/kilgarriffs_ratio.svg @@ -1,6 +1,6 @@ - - + + @@ -11,7 +11,6 @@ - @@ -25,66 +24,66 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/effect_size/log_ratio.svg b/doc/measures/effect_size/log_ratio.svg index f78dcdf47..7ecbe0e5a 100644 --- a/doc/measures/effect_size/log_ratio.svg +++ b/doc/measures/effect_size/log_ratio.svg @@ -1,6 +1,6 @@ - - + + @@ -12,39 +12,39 @@ + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/effect_size/odds_ratio.svg b/doc/measures/effect_size/odds_ratio.svg index e1b9e4957..768aa7423 100644 --- a/doc/measures/effect_size/odds_ratio.svg +++ b/doc/measures/effect_size/odds_ratio.svg @@ -1,48 +1,46 @@ - - + + - + - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/lexical_density_diversity/herdans_vm.svg b/doc/measures/lexical_density_diversity/herdans_vm.svg index 4bbe800b2..a35367a61 100644 --- a/doc/measures/lexical_density_diversity/herdans_vm.svg +++ b/doc/measures/lexical_density_diversity/herdans_vm.svg @@ -1,16 +1,17 @@ - - + + + + - @@ -19,66 +20,65 @@ + - - - - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/lexical_density_diversity/lexical_density.svg b/doc/measures/lexical_density_diversity/lexical_density.svg index 0247ac81b..b56141777 100644 --- a/doc/measures/lexical_density_diversity/lexical_density.svg +++ b/doc/measures/lexical_density_diversity/lexical_density.svg @@ -1,10 +1,9 @@ - - + + - @@ -26,46 +25,46 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/lexical_density_diversity/repeat_rate.svg b/doc/measures/lexical_density_diversity/repeat_rate.svg index 48c9e02d5..5bc32b63f 100644 --- a/doc/measures/lexical_density_diversity/repeat_rate.svg +++ b/doc/measures/lexical_density_diversity/repeat_rate.svg @@ -1,6 +1,6 @@ - - + + @@ -19,13 +19,12 @@ + - - @@ -45,113 +44,112 @@ - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/lexical_density_diversity/shannon_entropy.svg b/doc/measures/lexical_density_diversity/shannon_entropy.svg index f10194cf0..6c72e0a34 100644 --- a/doc/measures/lexical_density_diversity/shannon_entropy.svg +++ b/doc/measures/lexical_density_diversity/shannon_entropy.svg @@ -1,6 +1,6 @@ - - + + @@ -23,13 +23,13 @@ + + - - @@ -49,155 +49,153 @@ - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/lexical_density_diversity/yules_index_of_diversity.svg b/doc/measures/lexical_density_diversity/yules_index_of_diversity.svg index f74a2a49a..67a11ca40 100644 --- a/doc/measures/lexical_density_diversity/yules_index_of_diversity.svg +++ b/doc/measures/lexical_density_diversity/yules_index_of_diversity.svg @@ -1,7 +1,8 @@ - - + + + @@ -27,77 +28,76 @@ - - - - - + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/coleman_liau_index.svg b/doc/measures/readability/coleman_liau_index.svg index 04860b3e3..845457141 100644 --- a/doc/measures/readability/coleman_liau_index.svg +++ b/doc/measures/readability/coleman_liau_index.svg @@ -1,6 +1,6 @@ - - + + @@ -24,7 +24,6 @@ - @@ -46,7 +45,7 @@ - + @@ -56,142 +55,142 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/formula_de_crawford.svg b/doc/measures/readability/crawfords_readability_formula.svg similarity index 83% rename from doc/measures/readability/formula_de_crawford.svg rename to doc/measures/readability/crawfords_readability_formula.svg index a954b7c65..aba346625 100644 --- a/doc/measures/readability/formula_de_crawford.svg +++ b/doc/measures/readability/crawfords_readability_formula.svg @@ -1,6 +1,6 @@ - - + + @@ -18,7 +18,6 @@ - @@ -37,84 +36,84 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/devereux_readability_index.svg b/doc/measures/readability/devereux_readability_index.svg index c6c638d6e..2d2a4d541 100644 --- a/doc/measures/readability/devereux_readability_index.svg +++ b/doc/measures/readability/devereux_readability_index.svg @@ -1,10 +1,10 @@ - - + + - + @@ -17,7 +17,6 @@ - @@ -29,83 +28,84 @@ + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/gulpease_index.svg b/doc/measures/readability/gulpease.svg similarity index 80% rename from doc/measures/readability/gulpease_index.svg rename to doc/measures/readability/gulpease.svg index 277341cb3..e38a34146 100644 --- a/doc/measures/readability/gulpease_index.svg +++ b/doc/measures/readability/gulpease.svg @@ -1,6 +1,6 @@ - - + + @@ -13,10 +13,13 @@ + - + + + @@ -32,67 +35,61 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/lensear_write.svg b/doc/measures/readability/lensear_write_formula.svg similarity index 100% rename from doc/measures/readability/lensear_write.svg rename to doc/measures/readability/lensear_write_formula.svg diff --git a/doc/measures/readability/lorge_readability_index.svg b/doc/measures/readability/lorge_readability_index.svg index 0dc170ba3..6fee9315a 100644 --- a/doc/measures/readability/lorge_readability_index.svg +++ b/doc/measures/readability/lorge_readability_index.svg @@ -1,18 +1,17 @@ - - + + - - + @@ -46,205 +45,205 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/luong_nguyen_dinh_readability_formula.svg b/doc/measures/readability/luong_nguyen_dinhs_readability_formula.svg similarity index 100% rename from doc/measures/readability/luong_nguyen_dinh_readability_formula.svg rename to doc/measures/readability/luong_nguyen_dinhs_readability_formula.svg diff --git a/doc/measures/readability/smog_grade.svg b/doc/measures/readability/smog_grading.svg similarity index 100% rename from doc/measures/readability/smog_grade.svg rename to doc/measures/readability/smog_grading.svg diff --git a/doc/measures/readability/spache_grade_level.svg b/doc/measures/readability/spache_readability_formula.svg similarity index 77% rename from doc/measures/readability/spache_grade_level.svg rename to doc/measures/readability/spache_readability_formula.svg index b7d3cd45e..fa2a5c179 100644 --- a/doc/measures/readability/spache_grade_level.svg +++ b/doc/measures/readability/spache_readability_formula.svg @@ -1,16 +1,10 @@ - - + + - - - - - - @@ -24,7 +18,6 @@ - @@ -44,158 +37,164 @@ + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/x_c50.svg b/doc/measures/readability/x_c50.svg index d28c8114f..03383369f 100644 --- a/doc/measures/readability/x_c50.svg +++ b/doc/measures/readability/x_c50.svg @@ -1,9 +1,7 @@ - - + + - - @@ -37,7 +35,6 @@ - @@ -51,11 +48,13 @@ + + - + @@ -234,12 +233,12 @@ - - - - - - + + + + + + diff --git a/doc/measures/statistical_significance/z_score.svg b/doc/measures/statistical_significance/z_test.svg similarity index 100% rename from doc/measures/statistical_significance/z_score.svg rename to doc/measures/statistical_significance/z_test.svg diff --git a/doc/measures/statistical_significance/z_score_berry_rogghe.svg b/doc/measures/statistical_significance/z_test_berry_rogghe.svg similarity index 100% rename from doc/measures/statistical_significance/z_score_berry_rogghe.svg rename to doc/measures/statistical_significance/z_test_berry_rogghe.svg diff --git a/tests/tests_measures/test_measures_effect_size.py b/tests/tests_measures/test_measures_effect_size.py index 0fff027b8..f29af43c1 100644 --- a/tests/tests_measures/test_measures_effect_size.py +++ b/tests/tests_measures/test_measures_effect_size.py @@ -63,9 +63,9 @@ def test_im3(): assert_zeros(wl_measures_effect_size.im3) # Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), pp. 1–38. (p. 13) -def test_dices_coeff(): +def test_dice_sorensen_coeff(): numpy.testing.assert_array_equal( - numpy.round(wl_measures_effect_size.dices_coeff( + numpy.round(wl_measures_effect_size.dice_sorensen_coeff( main, numpy.array([130] * 2), numpy.array([3121 - 130] * 2), @@ -75,7 +75,7 @@ def test_dices_coeff(): numpy.array([0.08] * 2) ) - assert_zeros(wl_measures_effect_size.dices_coeff) + assert_zeros(wl_measures_effect_size.dice_sorensen_coeff) # Reference: Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. (p. 471) def test_diff_coeff(): @@ -110,7 +110,13 @@ def test_kilgarriffs_ratio(): assert_zeros(wl_measures_effect_size.kilgarriffs_ratio, result = 1) -# Reference: Hardie, A. (2014, April 28). Log ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/. +def test_log_dice(): + assert_zeros(wl_measures_effect_size.log_dice, result = 14) + +def test_lfmd(): + assert_zeros(wl_measures_effect_size.lfmd) + +# Reference: Hardie, A. (2014, April 28). Log Ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/. def test_log_ratio(): numpy.testing.assert_array_equal( wl_measures_effect_size.log_ratio( @@ -134,12 +140,6 @@ def test_log_ratio(): numpy.array([float('-inf'), float('inf'), 0]) ) -def test_lfmd(): - assert_zeros(wl_measures_effect_size.lfmd) - -def test_log_dice(): - assert_zeros(wl_measures_effect_size.log_dice, result = 14) - def test_mi_log_f(): assert_zeros(wl_measures_effect_size.mi_log_f) @@ -164,7 +164,7 @@ def test_md(): def test_me(): assert_zeros(wl_measures_effect_size.me) -# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf (p. 51) +# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 (p. 51) def test_mi(): numpy.testing.assert_array_equal( numpy.round(wl_measures_effect_size.mi( @@ -221,6 +221,9 @@ def test_pmi(): def test_poisson_collocation_measure(): assert_zeros(wl_measures_effect_size.poisson_collocation_measure) +def test_im2(): + assert_zeros(wl_measures_effect_size.im2) + # Reference: Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom. def test_squared_phi_coeff(): numpy.testing.assert_array_equal( @@ -239,13 +242,13 @@ def test_squared_phi_coeff(): if __name__ == '__main__': test_pct_diff() test_im3() - test_dices_coeff() + test_dice_sorensen_coeff() test_diff_coeff() test_jaccard_index() test_kilgarriffs_ratio() - test_log_ratio() - test_lfmd() test_log_dice() + test_lfmd() + test_log_ratio() test_mi_log_f() test_min_sensitivity() test_md() @@ -254,4 +257,5 @@ def test_squared_phi_coeff(): test_odds_ratio() test_pmi() test_poisson_collocation_measure() + test_im2() test_squared_phi_coeff() diff --git a/tests/tests_measures/test_measures_readability.py b/tests/tests_measures/test_measures_readability.py index 32dcf60b8..671bd5d1e 100644 --- a/tests/tests_measures/test_measures_readability.py +++ b/tests/tests_measures/test_measures_readability.py @@ -89,12 +89,6 @@ def test_rd(): rd_ara_12_policy_2 = wl_measures_readability.rd(main, test_text_ara_12) rd_eng_12 = wl_measures_readability.rd(main, test_text_eng_12) - print("Al-Heeti's Readability Prediction Formula:") - print(f'\tara/0: {rd_ara_0}') - print(f'\tara/12-policy-1: {rd_ara_12_policy_1}') - print(f'\tara/12-policy-2: {rd_ara_12_policy_2}') - print(f'\teng/12: {rd_eng_12}') - assert rd_ara_0 == 'text_too_short' assert rd_ara_12_policy_1 == 4.41434307 * (45 / 12) - 13.46873475 assert rd_ara_12_policy_2 == 0.97569509 * (45 / 12) + 0.37237998 * (12 / 3) - 0.90451827 * (12 / 5) - 1.06000414 @@ -105,11 +99,6 @@ def test_aari(): aari_ara_12 = wl_measures_readability.aari(main, test_text_ara_12) aari_eng_12 = wl_measures_readability.aari(main, test_text_eng_12) - print('Automated Arabic Readability Index:') - print(f'\tara/0: {aari_ara_0}') - print(f'\tara/12: {aari_ara_12}') - print(f'\teng/12: {aari_eng_12}') - assert aari_ara_0 == 'text_too_short' assert aari_ara_12 == 3.28 * 46 + 1.43 * (46 / 12) + 1.24 * (12 / 3) assert aari_eng_12 == 'no_support' @@ -122,12 +111,6 @@ def test_ari(): ari_eng_12_navy = wl_measures_readability.ari(main, test_text_eng_12) ari_spa_12 = wl_measures_readability.ari(main, test_text_spa_12) - print('Automated Readability Index:') - print(f'\teng/0: {ari_eng_0}') - print(f'\teng/12: {ari_eng_12}') - print(f'\teng/12-navy: {ari_eng_12_navy}') - print(f'\tspa/12: {ari_spa_12}') - assert ari_eng_0 == 'text_too_short' assert ari_eng_12 == 0.5 * (12 / 3) + 4.71 * (47 / 12) - 21.43 assert ari_eng_12_navy == ari_spa_12 == 0.37 * (12 / 3) + 5.84 * (47 / 12) - 26.01 @@ -137,11 +120,6 @@ def test_bormuths_cloze_mean(): m_eng_12 = wl_measures_readability.bormuths_cloze_mean(main, test_text_eng_12) m_other_12 = wl_measures_readability.bormuths_cloze_mean(main, test_text_other_12) - print("Bormuth's Cloze Mean:") - print(f'\teng/0: {m_eng_0}') - print(f'\teng/12: {m_eng_12}') - print(f'\tother/12: {m_other_12}') - assert m_eng_0 == 'text_too_short' assert m_eng_12 == ( 0.886593 - @@ -158,11 +136,6 @@ def test_bormuths_gp(): gp_eng_12 = wl_measures_readability.bormuths_gp(main, test_text_eng_12) gp_other_12 = wl_measures_readability.bormuths_gp(main, test_text_other_12) - print("Bormuth's Grade Placement:") - print(f'\teng/0: {gp_eng_0}') - print(f'\teng/12: {gp_eng_12}') - print(f'\tother/12: {gp_other_12}') - m = wl_measures_readability.bormuths_cloze_mean(main, test_text_eng_12) c = 0.35 @@ -179,11 +152,6 @@ def test_coleman_liau_index(): grade_level_eng_12 = wl_measures_readability.coleman_liau_index(main, test_text_eng_12) grade_level_spa_12 = wl_measures_readability.coleman_liau_index(main, test_text_spa_12) - print('Coleman-Liau Index:') - print(f'\teng/0: {grade_level_eng_0}') - print(f'\teng/12: {grade_level_eng_12}') - print(f'\tspa/12: {grade_level_spa_12}') - est_cloze_pct = 141.8401 - 0.21459 * (45 / 12 * 100) + 1.079812 * (3 / 12 * 100) assert grade_level_eng_0 == 'text_too_short' @@ -202,15 +170,6 @@ def test_colemans_readability_formula(): cloze_pct_tha_12 = wl_measures_readability.colemans_readability_formula(main, test_text_tha_12) cloze_pct_other_12 = wl_measures_readability.colemans_readability_formula(main, test_text_other_12) - print("Coleman's Readability Formula:") - print(f'\teng/0: {cloze_pct_eng_0}') - print(f'\teng/12-1: {cloze_pct_eng_12_1}') - print(f'\teng/12-2: {cloze_pct_eng_12_2}') - print(f'\teng/12-3: {cloze_pct_eng_12_3}') - print(f'\teng/12-4: {cloze_pct_eng_12_4}') - print(f'\ttha/12: {cloze_pct_tha_12}') - print(f'\tother/12: {cloze_pct_other_12}') - assert cloze_pct_eng_0 == 'text_too_short' assert cloze_pct_eng_12_1 == 1.29 * (9 / 12 * 100) - 38.45 assert cloze_pct_eng_12_2 == 1.16 * (9 / 12 * 100) + 1.48 * (3 / 12 * 100) - 37.95 @@ -219,6 +178,15 @@ def test_colemans_readability_formula(): assert cloze_pct_tha_12 != 'no_support' assert cloze_pct_other_12 == 'no_support' +def test_crawfords_readability_formula(): + grade_level_spa_0 = wl_measures_readability.crawfords_readability_formula(main, test_text_spa_0) + grade_level_spa_12 = wl_measures_readability.crawfords_readability_formula(main, test_text_spa_12) + grade_level_eng_12 = wl_measures_readability.crawfords_readability_formula(main, test_text_eng_12) + + assert grade_level_spa_0 == 'text_too_short' + assert grade_level_spa_12 == 3 / 12 * 100 * (-0.205) + 18 / 12 * 100 * 0.049 - 3.407 + assert grade_level_eng_12 == 'no_support' + def test_x_c50(): x_c50_eng_0 = wl_measures_readability.x_c50(main, test_text_eng_0) settings['x_c50']['variant'] = 'Original' @@ -229,13 +197,6 @@ def test_x_c50(): x_c50_eng_12_new = wl_measures_readability.x_c50(main, test_text_eng_12) x_c50_spa_12 = wl_measures_readability.x_c50(main, test_text_spa_12) - print('Dale-Chall Readability Formula:') - print(f'\teng/0: {x_c50_eng_0}') - print(f'\teng/12-orig: {x_c50_eng_12_orig}') - print(f'\teng/12-psk: {x_c50_eng_12_psk}') - print(f'\teng/12-new: {x_c50_eng_12_new}') - print(f'\tspa/12: {x_c50_spa_12}') - assert x_c50_eng_0 == 'text_too_short' assert x_c50_eng_12_orig == 0.1579 * (1 / 12 * 100) + 0.0496 * (12 / 3) + 3.6365 assert x_c50_eng_12_psk == 3.2672 + 0.1155 * (1 / 12 * 100) + 0.0596 * (12 / 3) @@ -250,12 +211,6 @@ def test_danielson_bryans_readability_formula(): danielson_bryan_eng_12_2 = wl_measures_readability.danielson_bryans_readability_formula(main, test_text_eng_12) danielson_bryan_other_12 = wl_measures_readability.danielson_bryans_readability_formula(main, test_text_other_12) - print("Danielson-Bryan's Readability Formula:") - print(f'\teng/0: {danielson_bryan_eng_0}') - print(f'\teng/12-1: {danielson_bryan_eng_12_1}') - print(f'\teng/12-2: {danielson_bryan_eng_12_2}') - print(f'\tother/12: {danielson_bryan_other_12}') - assert danielson_bryan_eng_0 == 'text_too_short' assert danielson_bryan_eng_12_1 == 1.0364 * (47 / (12 - 1)) + 0.0194 * (47 / 3) - 0.6059 assert danielson_bryan_eng_12_2 == danielson_bryan_other_12 == 131.059 - 10.364 * (47 / (12 - 1)) - 0.194 * (47 / 3) @@ -265,11 +220,6 @@ def test_dawoods_readability_formula(): dawood_ara_12 = wl_measures_readability.dawoods_readability_formula(main, test_text_ara_12) dawood_eng_12 = wl_measures_readability.dawoods_readability_formula(main, test_text_eng_12) - print("Dawood's Readability Formula:") - print(f'\tara/0: {dawood_ara_0}') - print(f'\tara/12: {dawood_ara_12}') - print(f'\teng/12: {dawood_eng_12}') - assert dawood_ara_0 == 'text_too_short' assert dawood_ara_12 == (-0.0533) * (45 / 12) - 0.2066 * (12 / 3) + 5.5543 * (12 / 5) - 1.0801 assert dawood_eng_12 == 'no_support' @@ -279,11 +229,6 @@ def test_drp(): drp_eng_12 = wl_measures_readability.drp(main, test_text_eng_12) drp_other_12 = wl_measures_readability.drp(main, test_text_other_12) - print('Degrees of Reading Power:') - print(f'\teng/0: {drp_eng_0}') - print(f'\teng/12: {drp_eng_12}') - print(f'\tother/12: {drp_other_12}') - assert drp_eng_0 == 'text_too_short' m = wl_measures_readability.bormuths_cloze_mean(main, test_text_eng_12) assert drp_eng_12 == 100 - math.floor(m * 100 + 0.5) @@ -294,11 +239,6 @@ def test_devereux_readability_index(): grade_placement_eng_12 = wl_measures_readability.devereux_readability_index(main, test_text_eng_12) grade_placement_spa_12 = wl_measures_readability.devereux_readability_index(main, test_text_spa_12) - print('Devereux Readability Index:') - print(f'\teng/0: {grade_placement_eng_0}') - print(f'\teng/12: {grade_placement_eng_12}') - print(f'\tspa/12: {grade_placement_spa_12}') - assert grade_placement_eng_0 == 'text_too_short' assert grade_placement_eng_12 == 1.56 * (47 / 12) + 0.19 * (12 / 3) - 6.49 assert grade_placement_spa_12 != 'text_too_short' @@ -308,11 +248,6 @@ def test_dickes_steiwer_handformel(): dickes_steiwer_eng_12 = wl_measures_readability.dickes_steiwer_handformel(main, test_text_eng_12) dickes_steiwer_spa_12 = wl_measures_readability.dickes_steiwer_handformel(main, test_text_spa_12) - print('Dickes-Steiwer Handformel:') - print(f'\teng/0: {dickes_steiwer_eng_0}') - print(f'\teng/12: {dickes_steiwer_eng_12}') - print(f'\tspa/12: {dickes_steiwer_spa_12}') - assert dickes_steiwer_eng_0 == 'text_too_short' assert dickes_steiwer_eng_12 == 235.95993 - numpy.log(45 / 12 + 1) * 73.021 - numpy.log(12 / 3 + 1) * 12.56438 - 5 / 12 * 50.03293 assert dickes_steiwer_spa_12 != 'text_too_short' @@ -323,12 +258,6 @@ def test_elf(): elf_spa_12 = wl_measures_readability.elf(main, test_text_spa_12) elf_other_12 = wl_measures_readability.elf(main, test_text_other_12) - print('Easy Listening Formula:') - print(f'\teng/0: {elf_eng_0}') - print(f'\teng/12: {elf_eng_12}') - print(f'\tspa/12: {elf_spa_12}') - print(f'\tother/12: {elf_other_12}') - assert elf_eng_0 == 'text_too_short' assert elf_eng_12 == (15 - 12) / 3 assert elf_spa_12 != 'no_support' @@ -340,12 +269,6 @@ def test_gl(): gl_spa_12 = wl_measures_readability.gl(main, test_text_spa_12) gl_other_12 = wl_measures_readability.gl(main, test_text_other_12) - print('Flesch-Kincaid Grade Level:') - print(f'\teng/0: {gl_eng_0}') - print(f'\teng/12: {gl_eng_12}') - print(f'\tspa/12: {gl_spa_12}') - print(f'\tother/12: {gl_other_12}') - assert gl_eng_0 == 'text_too_short' assert gl_eng_12 == 0.39 * (12 / 3) + 11.8 * (15 / 12) - 15.59 assert gl_spa_12 != 'no_support' @@ -375,22 +298,6 @@ def test_re_flesch(): flesch_re_afr_12 = wl_measures_readability.re_flesch(main, test_text_afr_12) flesch_re_other_12 = wl_measures_readability.re_flesch(main, test_text_other_12) - print('Flesch Reading Ease:') - print(f'\teng/0: {flesch_re_eng_0}') - print(f'\teng/12-psk: {flesch_re_eng_12_psk}') - print(f'\teng/12: {flesch_re_eng_12}') - print(f'\tnld/12-douma: {flesch_re_nld_12_douma}') - print(f'\tnld/12-brouwer: {flesch_re_nld_12_brouwer}') - print(f'\tfra/12: {flesch_re_fra_12}') - print(f'\tdeu/12: {flesch_re_deu_12}') - print(f'\tita/12: {flesch_re_ita_12}') - print(f'\trus/12: {flesch_re_rus_12}') - print(f'\tspa/12-fh: {flesch_re_spa_12_fh}') - print(f'\tspa/12-sp: {flesch_re_spa_12_sp}') - print(f'\tukr/12: {flesch_re_ukr_12}') - print(f'\tafr/12: {flesch_re_afr_12}') - print(f'\tother/12: {flesch_re_other_12}') - assert flesch_re_eng_0 == 'text_too_short' assert flesch_re_eng_12_psk == -2.2029 + 4.55 * (15 / 12) + 0.0778 * (12 / 3) assert flesch_re_eng_12 == 206.835 - 84.6 * (15 / 12) - 1.015 * (12 / 3) @@ -415,13 +322,6 @@ def test_re_farr_jenkins_paterson(): re_farr_jenkins_paterson_spa_12 = wl_measures_readability.re_farr_jenkins_paterson(main, test_text_spa_12) re_farr_jenkins_paterson_other_12 = wl_measures_readability.re_farr_jenkins_paterson(main, test_text_other_12) - print('Flesch Reading Ease (Farr-Jenkins-Paterson):') - print(f'\teng/0: {re_farr_jenkins_paterson_eng_0}') - print(f'\teng/12: {re_farr_jenkins_paterson_eng_12}') - print(f'\teng/12-psk: {re_farr_jenkins_paterson_eng_12_psk}') - print(f'\tspa/12: {re_farr_jenkins_paterson_spa_12}') - print(f'\tother/12: {re_farr_jenkins_paterson_other_12}') - assert re_farr_jenkins_paterson_eng_0 == 'text_too_short' assert re_farr_jenkins_paterson_eng_12 == 1.599 * (9 / 12 * 100) - 1.015 * (12 / 3) - 31.517 assert re_farr_jenkins_paterson_eng_12_psk == 8.4335 - 0.0648 * (9 / 12 * 100) + 0.0923 * (12 / 3) @@ -434,70 +334,25 @@ def test_rgl(): rgl_spa_150 = wl_measures_readability.rgl(main, test_text_spa_150) rgl_other_12 = wl_measures_readability.rgl(main, test_text_other_12) - print('FORCAST Grade Level:') - print(f'\teng/12: {rgl_eng_12}') - print(f'\teng/150: {rgl_eng_150}') - print(f'\tspa/150: {rgl_spa_150}') - print(f'\tother/12: {rgl_other_12}') - assert rgl_eng_12 == 'text_too_short' assert rgl_eng_150 == rgl_spa_150 == 20.43 - 0.11 * (6 * 18 + 4) assert rgl_other_12 == 'no_support' -def test_cp(): - cp_spa_0 = wl_measures_readability.cp(main, test_text_spa_0) - cp_spa_12 = wl_measures_readability.cp(main, test_text_spa_12) - cp_eng_12 = wl_measures_readability.cp(main, test_text_eng_12) - - print('Fórmula de Comprensibilidad de Gutiérrez de Polini:') - print(f'\tspa/0: {cp_spa_0}') - print(f'\tspa/12: {cp_spa_12}') - print(f'\teng/12: {cp_eng_12}') - - assert cp_spa_0 == 'text_too_short' - assert cp_spa_12 == 95.2 - 9.7 * (45 / 12) - 0.35 * (12 / 3) - assert cp_eng_12 == 'no_support' - -def test_formula_de_crawford(): - grade_level_spa_0 = wl_measures_readability.formula_de_crawford(main, test_text_spa_0) - grade_level_spa_12 = wl_measures_readability.formula_de_crawford(main, test_text_spa_12) - grade_level_eng_12 = wl_measures_readability.formula_de_crawford(main, test_text_eng_12) - - print('Fórmula de Crawford:') - print(f'\tspa/0: {grade_level_spa_0}') - print(f'\tspa/12: {grade_level_spa_12}') - print(f'\teng/12: {grade_level_eng_12}') - - assert grade_level_spa_0 == 'text_too_short' - assert grade_level_spa_12 == 3 / 12 * 100 * (-0.205) + 18 / 12 * 100 * 0.049 - 3.407 - assert grade_level_eng_12 == 'no_support' - def test_fuckss_stilcharakteristik(): stilcharakteristik_eng_0 = wl_measures_readability.fuckss_stilcharakteristik(main, test_text_eng_0) stilcharakteristik_eng_12 = wl_measures_readability.fuckss_stilcharakteristik(main, test_text_eng_12) stilcharakteristik_spa_12 = wl_measures_readability.fuckss_stilcharakteristik(main, test_text_spa_12) stilcharakteristik_other_12 = wl_measures_readability.fuckss_stilcharakteristik(main, test_text_other_12) - print("Fucks's Stilcharakteristik:") - print(f'\teng/0: {stilcharakteristik_eng_0}') - print(f'\teng/12: {stilcharakteristik_eng_12}') - print(f'\tspa/12: {stilcharakteristik_spa_12}') - print(f'\tother/12: {stilcharakteristik_other_12}') - assert stilcharakteristik_eng_0 == 'text_too_short' assert stilcharakteristik_eng_12 == 15 / 3 assert stilcharakteristik_spa_12 != 'no_support' assert stilcharakteristik_other_12 == 'no_support' -def test_gulpease_index(): - gulpease_index_ita_0 = wl_measures_readability.gulpease_index(main, test_text_ita_0) - gulpease_index_ita_12 = wl_measures_readability.gulpease_index(main, test_text_ita_12) - gulpease_index_eng_12 = wl_measures_readability.gulpease_index(main, test_text_eng_12) - - print('Gulpease Index:') - print(f'\tita/0: {gulpease_index_ita_0}') - print(f'\tita/12: {gulpease_index_ita_12}') - print(f'\teng/12: {gulpease_index_eng_12}') +def test_gulpease(): + gulpease_index_ita_0 = wl_measures_readability.gulpease(main, test_text_ita_0) + gulpease_index_ita_12 = wl_measures_readability.gulpease(main, test_text_ita_12) + gulpease_index_eng_12 = wl_measures_readability.gulpease(main, test_text_eng_12) assert gulpease_index_ita_0 == 'text_too_short' assert gulpease_index_ita_12 == 89 + (300 * 3 - 10 * 45) / 12 @@ -513,44 +368,35 @@ def test_fog_index(): fog_index_eng_12_navy = wl_measures_readability.fog_index(main, test_text_eng_12) fog_index_spa_12 = wl_measures_readability.fog_index(main, test_text_spa_12) - print('Gunning Fog Index:') - print(f'\teng/0: {fog_index_eng_0}') - print(f'\teng/12-orig: {fog_index_eng_12_propn_orig}') - print(f'\teng/12-psk: {fog_index_eng_12_pron_psk}') - print(f'\teng/12-navy: {fog_index_eng_12_navy}') - print(f'\tspa/12: {fog_index_spa_12}') - assert fog_index_eng_0 == 'text_too_short' assert fog_index_eng_12_propn_orig == 0.4 * (12 / 3 + 1 / 12 * 100) assert fog_index_eng_12_pron_psk == 3.0680 + 0.0877 * (12 / 3) + 0.0984 * (1 / 12 * 100) assert fog_index_eng_12_navy == ((12 + 2 * 0) / 3 - 3) / 2 assert fog_index_spa_12 == 'no_support' +def test_cp(): + cp_spa_0 = wl_measures_readability.cp(main, test_text_spa_0) + cp_spa_12 = wl_measures_readability.cp(main, test_text_spa_12) + cp_eng_12 = wl_measures_readability.cp(main, test_text_eng_12) + + assert cp_spa_0 == 'text_too_short' + assert cp_spa_12 == 95.2 - 9.7 * (45 / 12) - 0.35 * (12 / 3) + assert cp_eng_12 == 'no_support' + def test_mu(): mu_spa_0 = wl_measures_readability.mu(main, test_text_spa_0) mu_spa_12 = wl_measures_readability.mu(main, test_text_spa_12) mu_eng_12 = wl_measures_readability.mu(main, test_text_eng_12) - print('Legibilidad µ:') - print(f'\tspa/0: {mu_spa_0}') - print(f'\tspa/12: {mu_spa_12}') - print(f'\teng/12: {mu_eng_12}') - assert mu_spa_0 == 'text_too_short' assert mu_spa_12 == (12 / 11) * (3.75 / 7.1875) * 100 assert mu_eng_12 == 'no_support' -def test_lensear_write(): - score_eng_0 = wl_measures_readability.lensear_write(main, test_text_eng_0) - score_eng_12 = wl_measures_readability.lensear_write(main, test_text_eng_12) - score_eng_100 = wl_measures_readability.lensear_write(main, test_text_eng_100) - score_spa_100 = wl_measures_readability.lensear_write(main, test_text_spa_100) - - print('Lensear Write:') - print(f'\teng/0: {score_eng_0}') - print(f'\teng/12: {score_eng_12}') - print(f'\teng/100: {score_eng_100}') - print(f'\tspa/100: {score_spa_100}') +def test_lensear_write_formula(): + score_eng_0 = wl_measures_readability.lensear_write_formula(main, test_text_eng_0) + score_eng_12 = wl_measures_readability.lensear_write_formula(main, test_text_eng_12) + score_eng_100 = wl_measures_readability.lensear_write_formula(main, test_text_eng_100) + score_spa_100 = wl_measures_readability.lensear_write_formula(main, test_text_spa_100) assert score_eng_0 == 'text_too_short' assert score_eng_12 == 6 * (100 / 12) + 3 * 3 * (100 / 12) @@ -562,11 +408,6 @@ def test_lix(): lix_eng_12 = wl_measures_readability.lix(main, test_text_eng_12) lix_spa_12 = wl_measures_readability.lix(main, test_text_spa_12) - print('Lix:') - print(f'\teng/0: {lix_eng_0}') - print(f'\teng/12: {lix_eng_12}') - print(f'\tspa/12: {lix_spa_12}') - assert lix_eng_0 == 'text_too_short' assert lix_eng_12 == 12 / 3 + 100 * (3 / 12) assert lix_spa_12 != 'no_support' @@ -577,32 +418,18 @@ def test_lorge_readability_index(): lorge_eng_12_corrected = wl_measures_readability.lorge_readability_index(main, test_text_eng_12_prep) settings['lorge_readability_index']['use_corrected_formula'] = False lorge_eng_12 = wl_measures_readability.lorge_readability_index(main, test_text_eng_12_prep) - lorge_tha_12 = wl_measures_readability.lorge_readability_index(main, test_text_tha_12) - lorge_other_12 = wl_measures_readability.lorge_readability_index(main, test_text_other_12) - - print('Lorge Readability Index:') - print(f'\teng/0: {lorge_eng_0}') - print(f'\teng/12-corrected: {lorge_eng_12_corrected}') - print(f'\teng/12: {lorge_eng_12}') - print(f'\ttha/12: {lorge_tha_12}') - print(f'\tother/12: {lorge_other_12}') + lorge_spa_12 = wl_measures_readability.lorge_readability_index(main, test_text_spa_12) assert lorge_eng_0 == 'text_too_short' assert lorge_eng_12_corrected == 12 / 3 * 0.06 + 2 / 12 * 0.1 + 2 / 12 * 0.1 + 1.99 assert lorge_eng_12 == 12 / 3 * 0.07 + 2 / 12 * 13.01 + 2 / 12 * 10.73 + 1.6126 - assert lorge_tha_12 != 'no_support' - assert lorge_other_12 == 'no_support' + assert lorge_spa_12 == 'no_support' def test_luong_nguyen_dinhs_readability_formula(): readability_vie_0 = wl_measures_readability.luong_nguyen_dinhs_readability_formula(main, test_text_vie_0) readability_vie_12 = wl_measures_readability.luong_nguyen_dinhs_readability_formula(main, test_text_vie_12) readability_eng_12 = wl_measures_readability.luong_nguyen_dinhs_readability_formula(main, test_text_eng_12) - print("Luong-Nguyen-Dinh's Readability Formula:") - print(f'\tvie/0: {readability_vie_0}') - print(f'\tvie/12: {readability_vie_12}') - print(f'\teng/12: {readability_eng_12}') - assert readability_vie_0 == 'text_too_short' assert readability_vie_12 == 0.004 * (46 / 3) + 0.1905 * (46 / 12) + 2.7147 * 12 / 12 - 0.7295 assert readability_eng_12 == 'no_support' @@ -612,11 +439,6 @@ def test_eflaw(): eflaw_eng_12 = wl_measures_readability.eflaw(main, test_text_eng_12) eflaw_spa_12 = wl_measures_readability.eflaw(main, test_text_spa_12) - print('McAlpine EFLAW Readability Score:') - print(f'\teng/0: {eflaw_eng_0}') - print(f'\teng/12: {eflaw_eng_12}') - print(f'\tspa/12: {eflaw_spa_12}') - assert eflaw_eng_0 == 'text_too_short' assert eflaw_eng_12 == (12 + 6) / 3 assert eflaw_spa_12 == 'no_support' @@ -631,13 +453,6 @@ def test_nwl(): nwl_deu_12_3 = wl_measures_readability.nwl(main, test_text_deu_12) nwl_eng_12 = wl_measures_readability.nwl(main, test_text_eng_12) - print('neue Wiener Literaturformeln:') - print(f'\tdeu/0: {nwl_deu_0}') - print(f'\tdeu/12-1: {nwl_deu_12_1}') - print(f'\tdeu/12-2: {nwl_deu_12_2}') - print(f'\tdeu/12-3: {nwl_deu_12_3}') - print(f'\teng/12: {nwl_eng_12}') - sw = 5 / 5 * 100 s_100 = 3 / 12 * 100 ms = 0 / 12 * 100 @@ -660,13 +475,6 @@ def test_nws(): nws_deu_12_3 = wl_measures_readability.nws(main, test_text_deu_12) nws_eng_12 = wl_measures_readability.nws(main, test_text_eng_12) - print('neue Wiener Sachtextformel:') - print(f'\tdeu/0: {nws_deu_0}') - print(f'\tdeu/12-1: {nws_deu_12_1}') - print(f'\tdeu/12-2: {nws_deu_12_2}') - print(f'\tdeu/12-3: {nws_deu_12_3}') - print(f'\teng/12: {nws_eng_12}') - ms = 0 / 12 * 100 sl = 12 / 3 iw = 3 / 12 * 100 @@ -691,12 +499,6 @@ def test_osman(): osman_ara_faseeh = wl_measures_readability.osman(main, test_text_ara_faseeh) osman_eng_12 = wl_measures_readability.osman(main, test_text_eng_12) - print('OSMAN:') - print(f'\tara/0: {osman_ara_0}') - print(f'\tara/12: {osman_ara_12}') - print(f'\tara/faseeh: {osman_ara_faseeh}') - print(f'\teng/12: {osman_eng_12}') - assert osman_ara_0 == 'text_too_short' assert osman_ara_12 == 200.791 - 1.015 * (12 / 3) - 24.181 * ((3 + 26 + 3 + 0) / 12) assert osman_ara_faseeh == 200.791 - 1.015 * (1 / 1) - 24.181 * ((0 + 5 + 1 + 1) / 1) @@ -707,28 +509,16 @@ def test_rix(): rix_eng_12 = wl_measures_readability.rix(main, test_text_eng_12) rix_spa_12 = wl_measures_readability.rix(main, test_text_spa_12) - print('Rix:') - print(f'\teng/0: {rix_eng_0}') - print(f'\teng/12: {rix_eng_12}') - print(f'\tspa/12: {rix_spa_12}') - assert rix_eng_0 == 'text_too_short' assert rix_eng_12 == rix_spa_12 == 3 / 3 -def test_smog_grade(): - g_eng_12 = wl_measures_readability.smog_grade(main, test_text_eng_12) - g_eng_120 = wl_measures_readability.smog_grade(main, test_text_eng_120) - g_eng_120 = wl_measures_readability.smog_grade(main, test_text_eng_120) - g_deu_120 = wl_measures_readability.smog_grade(main, test_text_deu_120) - g_spa_120 = wl_measures_readability.smog_grade(main, test_text_spa_120) - g_other_12 = wl_measures_readability.smog_grade(main, test_text_other_12) - - print('SMOG Grade:') - print(f'\teng/12: {g_eng_12}') - print(f'\teng/120: {g_eng_120}') - print(f'\tdeu/120: {g_deu_120}') - print(f'\tspa/120: {g_spa_120}') - print(f'\tother/12: {g_other_12}') +def test_smog_grading(): + g_eng_12 = wl_measures_readability.smog_grading(main, test_text_eng_12) + g_eng_120 = wl_measures_readability.smog_grading(main, test_text_eng_120) + g_eng_120 = wl_measures_readability.smog_grading(main, test_text_eng_120) + g_deu_120 = wl_measures_readability.smog_grading(main, test_text_deu_120) + g_spa_120 = wl_measures_readability.smog_grading(main, test_text_spa_120) + g_other_12 = wl_measures_readability.smog_grading(main, test_text_other_12) assert g_eng_12 == 'text_too_short' assert g_eng_120 == 3.1291 + 1.043 * numpy.sqrt(15) @@ -736,19 +526,13 @@ def test_smog_grade(): assert g_spa_120 != 'no_support' assert g_other_12 == 'no_support' -def test_spache_grade_lvl(): - grade_lvl_eng_12 = wl_measures_readability.spache_grade_lvl(main, test_text_eng_12) - settings['spache_grade_lvl']['use_rev_formula'] = True - grade_lvl_eng_100_rev = wl_measures_readability.spache_grade_lvl(main, test_text_eng_100) - settings['spache_grade_lvl']['use_rev_formula'] = False - grade_lvl_eng_100 = wl_measures_readability.spache_grade_lvl(main, test_text_eng_100) - grade_lvl_spa_100 = wl_measures_readability.spache_grade_lvl(main, test_text_spa_100) - - print('Spache Grade Level:') - print(f'\teng/12: {grade_lvl_eng_12}') - print(f'\teng/100-rev: {grade_lvl_eng_100_rev}') - print(f'\teng/100: {grade_lvl_eng_100}') - print(f'\tspa/100: {grade_lvl_spa_100}') +def test_spache_readability_formula(): + grade_lvl_eng_12 = wl_measures_readability.spache_readability_formula(main, test_text_eng_12) + settings['spache_readability_formula']['use_rev_formula'] = True + grade_lvl_eng_100_rev = wl_measures_readability.spache_readability_formula(main, test_text_eng_100) + settings['spache_readability_formula']['use_rev_formula'] = False + grade_lvl_eng_100 = wl_measures_readability.spache_readability_formula(main, test_text_eng_100) + grade_lvl_spa_100 = wl_measures_readability.spache_readability_formula(main, test_text_spa_100) assert grade_lvl_eng_12 == 'text_too_short' assert grade_lvl_eng_100_rev == numpy.mean([0.121 * (100 / 25) + 0.082 * 25 + 0.659] * 3) @@ -761,12 +545,6 @@ def test_strain_index(): strain_index_spa_12 = wl_measures_readability.strain_index(main, test_text_spa_12) strain_index_other_12 = wl_measures_readability.strain_index(main, test_text_other_12) - print('Strain Index:') - print(f'\teng/0: {strain_index_eng_0}') - print(f'\teng/12: {strain_index_eng_12}') - print(f'\tspa/12: {strain_index_spa_12}') - print(f'\tother/12: {strain_index_other_12}') - assert strain_index_eng_0 == 'text_too_short' assert strain_index_eng_12 == 15 / 10 assert strain_index_spa_12 != 'no_support' @@ -781,13 +559,6 @@ def test_trankle_bailers_readability_formula(): trankle_bailers_tha_100 = wl_measures_readability.trankle_bailers_readability_formula(main, test_text_tha_100) trankle_bailers_other_100 = wl_measures_readability.trankle_bailers_readability_formula(main, test_text_other_100) - print("Tränkle & Bailer's Readability Formula:") - print(f'\teng/0: {trankle_bailers_eng_0}') - print(f'\teng/100-prep: {trankle_bailers_eng_100_prep_1}') - print(f'\teng/100-conj: {trankle_bailers_eng_100_conj_2}') - print(f'\ttha/100: {trankle_bailers_tha_100}') - print(f'\tother/100: {trankle_bailers_other_100}') - assert trankle_bailers_eng_0 == 'text_too_short' assert trankle_bailers_eng_100_prep_1 == 224.6814 - numpy.log(372 / 100 + 1) * 79.8304 - numpy.log(100 / 25 + 1) * 12.24032 - 1 * 1.292857 assert trankle_bailers_eng_100_conj_2 == 234.1063 - numpy.log(374 / 100 + 1) * 96.11069 - 0 * 2.05444 - 1 * 1.02805 @@ -800,12 +571,6 @@ def test_td(): td_spa_12 = wl_measures_readability.td(main, test_text_spa_12) td_other_12 = wl_measures_readability.td(main, test_text_other_12) - print("Tuldava's Text Difficulty:") - print(f'\teng/0: {td_eng_0}') - print(f'\teng/12: {td_eng_12}') - print(f'\tspa/12: {td_spa_12}') - print(f'\tother/12: {td_other_12}') - assert td_eng_0 == 'text_too_short' assert td_eng_12 == (15 / 12) * numpy.log(12 / 3) assert td_spa_12 != 'no_support' @@ -817,12 +582,6 @@ def test_wheeler_smiths_readability_formula(): wheeler_smith_spa_12 = wl_measures_readability.wheeler_smiths_readability_formula(main, test_text_spa_12) wheeler_smith_other_12 = wl_measures_readability.wheeler_smiths_readability_formula(main, test_text_other_12) - print("Wheeler & Smith's Readability Formula:") - print(f'\teng/0: {wheeler_smith_eng_0}') - print(f'\teng/12: {wheeler_smith_eng_12}') - print(f'\tspa/12: {wheeler_smith_spa_12}') - print(f'\tother/12: {wheeler_smith_other_12}') - assert wheeler_smith_eng_0 == 'text_too_short' assert wheeler_smith_eng_12 == (12 / 4) * (3 / 12) * 10 assert wheeler_smith_spa_12 != 'no_support' @@ -836,6 +595,7 @@ def test_wheeler_smiths_readability_formula(): test_bormuths_gp() test_coleman_liau_index() test_colemans_readability_formula() + test_crawfords_readability_formula() test_x_c50() test_danielson_bryans_readability_formula() test_dawoods_readability_formula() @@ -847,13 +607,12 @@ def test_wheeler_smiths_readability_formula(): test_re_flesch() test_re_farr_jenkins_paterson() test_rgl() - test_cp() - test_formula_de_crawford() test_fuckss_stilcharakteristik() - test_gulpease_index() + test_gulpease() test_fog_index() + test_cp() test_mu() - test_lensear_write() + test_lensear_write_formula() test_lix() test_lorge_readability_index() test_luong_nguyen_dinhs_readability_formula() @@ -863,8 +622,8 @@ def test_wheeler_smiths_readability_formula(): test__get_num_syls_ara() test_osman() test_rix() - test_smog_grade() - test_spache_grade_lvl() + test_smog_grading() + test_spache_readability_formula() test_strain_index() test_trankle_bailers_readability_formula() test_td() diff --git a/tests/tests_measures/test_measures_statistical_significance.py b/tests/tests_measures/test_measures_statistical_significance.py index 77bf7878f..d9d925cfa 100644 --- a/tests/tests_measures/test_measures_statistical_significance.py +++ b/tests/tests_measures/test_measures_statistical_significance.py @@ -263,22 +263,22 @@ def test_students_t_test_2_sample(): numpy.testing.assert_array_equal(t_stats, numpy.array([0] * 2)) numpy.testing.assert_array_equal(p_vals, numpy.array([1] * 2)) -def test__z_score_p_val(): +def test__z_test_p_val(): numpy.testing.assert_array_equal( - wl_measures_statistical_significance._z_score_p_val(numpy.array([0] * 2), 'Two-tailed'), + wl_measures_statistical_significance._z_test_p_val(numpy.array([0] * 2), 'Two-tailed'), numpy.array([1] * 2) ) numpy.testing.assert_array_equal( - wl_measures_statistical_significance._z_score_p_val(numpy.array([0] * 2), 'Left-tailed'), + wl_measures_statistical_significance._z_test_p_val(numpy.array([0] * 2), 'Left-tailed'), numpy.array([0] * 2) ) numpy.testing.assert_array_equal( - wl_measures_statistical_significance._z_score_p_val(numpy.array([0] * 2), 'Right-tailed'), + wl_measures_statistical_significance._z_test_p_val(numpy.array([0] * 2), 'Right-tailed'), numpy.array([0] * 2) ) -def test_z_score(): - z_scores, p_vals = wl_measures_statistical_significance.z_score( +def test_z_test(): + z_scores, p_vals = wl_measures_statistical_significance.z_test( main, numpy.array([0] * 2), numpy.array([0] * 2), @@ -289,8 +289,8 @@ def test_z_score(): numpy.testing.assert_array_equal(z_scores, numpy.array([0] * 2)) numpy.testing.assert_array_equal(p_vals, numpy.array([1] * 2)) -def test_z_score_berry_rogghe(): - z_scores, p_vals = wl_measures_statistical_significance.z_score_berry_rogghe( +def test_z_test_berry_rogghe(): + z_scores, p_vals = wl_measures_statistical_significance.z_test_berry_rogghe( main, numpy.array([0] * 2), numpy.array([0] * 2), @@ -314,6 +314,6 @@ def test_z_score_berry_rogghe(): test_students_t_test_1_sample() test_students_t_test_2_sample() - test__z_score_p_val() - test_z_score() - test_z_score_berry_rogghe() + test__z_test_p_val() + test_z_test() + test_z_test_berry_rogghe() diff --git a/wordless/wl_colligation_extractor.py b/wordless/wl_colligation_extractor.py index 7fedc7986..c61cb65c2 100644 --- a/wordless/wl_colligation_extractor.py +++ b/wordless/wl_colligation_extractor.py @@ -1176,7 +1176,7 @@ def run(self): test_stats = [None] * num_colligations_all p_vals = [None] * num_colligations_all else: - if test_statistical_significance == 'z_score_berry_rogghe': + if test_statistical_significance == 'z_test_berry_rogghe': test_stats, p_vals = func_statistical_significance(self.main, o11s, o12s, o21s, o22s, span) else: test_stats, p_vals = func_statistical_significance(self.main, o11s, o12s, o21s, o22s) diff --git a/wordless/wl_collocation_extractor.py b/wordless/wl_collocation_extractor.py index fa4859be0..4c13aff35 100644 --- a/wordless/wl_collocation_extractor.py +++ b/wordless/wl_collocation_extractor.py @@ -1174,7 +1174,7 @@ def run(self): test_stats = [None] * num_collocations_all p_vals = [None] * num_collocations_all else: - if test_statistical_significance == 'z_score_berry_rogghe': + if test_statistical_significance == 'z_test_berry_rogghe': test_stats, p_vals = func_statistical_significance(self.main, o11s, o12s, o21s, o22s, span) else: test_stats, p_vals = func_statistical_significance(self.main, o11s, o12s, o21s, o22s) diff --git a/wordless/wl_measures/wl_measures_adjusted_freq.py b/wordless/wl_measures/wl_measures_adjusted_freq.py index f7f020a1d..eff4fabd4 100644 --- a/wordless/wl_measures/wl_measures_adjusted_freq.py +++ b/wordless/wl_measures/wl_measures_adjusted_freq.py @@ -27,7 +27,7 @@ C = -scipy.special.digamma(1) # Reference: Savický, P., & Hlaváčová, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215–231. https://doi.org/10.1076/jqul.9.3.215.14124 -# Average Logarithmic Distance +# Average logarithmic distance def fald(main, tokens, search_term): dists = wl_measures_dispersion._get_dists(tokens, search_term) @@ -39,11 +39,11 @@ def fald(main, tokens, search_term): return fald -# Average Reduced Frequency +# Average reduced frequency def farf(main, tokens, search_term): return wl_measures_dispersion.arf(main, tokens, search_term) -# Average Waiting Time +# Average waiting time def fawt(main, tokens, search_term): dists = wl_measures_dispersion._get_dists(tokens, search_term) diff --git a/wordless/wl_measures/wl_measures_bayes_factor.py b/wordless/wl_measures/wl_measures_bayes_factor.py index cef94b15c..d144fb1d6 100644 --- a/wordless/wl_measures/wl_measures_bayes_factor.py +++ b/wordless/wl_measures/wl_measures_bayes_factor.py @@ -20,8 +20,8 @@ from wordless.wl_measures import wl_measures_statistical_significance, wl_measure_utils -# Log-likelihood Ratio -# Reference: Wilson, A. (2013). Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp. 3–11). Peter Lang. +# Log-likelihood ratio test +# Reference: Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New approaches to the study of linguistic variability (pp. 3–11). Peter Lang. def bayes_factor_log_likelihood_ratio_test(main, o11s, o12s, o21s, o22s): oxxs = o11s + o12s + o21s + o22s @@ -38,7 +38,7 @@ def bayes_factor_log_likelihood_ratio_test(main, o11s, o12s, o21s, o22s): return bics # Student's t-test (2-sample) -# Reference: Wilson, A. (2013). Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp. 3–11). Peter Lang. +# Reference: Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New approaches to the study of linguistic variability (pp. 3–11). Peter Lang. def bayes_factor_students_t_test_2_sample(main, freqs_x1s, freqs_x2s): # Modify settings temporarily diff --git a/wordless/wl_measures/wl_measures_dispersion.py b/wordless/wl_measures/wl_measures_dispersion.py index 19283922f..5a2572e92 100644 --- a/wordless/wl_measures/wl_measures_dispersion.py +++ b/wordless/wl_measures/wl_measures_dispersion.py @@ -36,7 +36,7 @@ def _get_dists(tokens, search_term): return dists -# Average Logarithmic Distance +# Average logarithmic distance def ald(main, tokens, search_term): dists = _get_dists(tokens, search_term) @@ -47,7 +47,7 @@ def ald(main, tokens, search_term): return ald -# Average Reduced Frequency +# Average reduced frequency def arf(main, tokens, search_term): dists = _get_dists(tokens, search_term) @@ -59,7 +59,7 @@ def arf(main, tokens, search_term): return arf -# Average Waiting Time +# Average waiting time def awt(main, tokens, search_term): dists = _get_dists(tokens, search_term) @@ -121,7 +121,7 @@ def juillands_d(main, freqs): return max(0, d) # Lyne's D₃ -# Reference: Lyne, A. A. (1985). Dispersion. In The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method (pp. 101–124). Slatkine/Champion. +# Reference: Lyne, A. A. (1985). Dispersion. In A. A. Lyne (Ed.), The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method (pp. 101–124). Slatkine. def lynes_d3(main, freqs): freqs = numpy.array(freqs) @@ -146,7 +146,7 @@ def rosengrens_s(main, freqs): return s # Zhang's Distributional Consistency -# Reference: Zhang, H., Huang, C., & Yu, S. (2004). Distributional consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of Fourth International Conference on Language Resources and Evaluation (pp. 1119–1122). European Language Resources Association. +# Reference: Zhang, H., Huang, C., & Yu, S. (2004). Distributional Consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of Fourth International Conference on Language Resources and Evaluation (pp. 1119–1122). European Language Resources Association. def zhangs_distributional_consistency(main, freqs): freqs = numpy.array(freqs) diff --git a/wordless/wl_measures/wl_measures_effect_size.py b/wordless/wl_measures/wl_measures_effect_size.py index daf10b37e..5073da287 100644 --- a/wordless/wl_measures/wl_measures_effect_size.py +++ b/wordless/wl_measures/wl_measures_effect_size.py @@ -23,7 +23,7 @@ from wordless.wl_measures import wl_measures_statistical_significance, wl_measure_utils # %DIFF -# Reference: Gabrielatos, C., & Marchi, A. (2012, September 13–14). Keyness: Appropriate metrics and practical issues [Conference session]. CADS International Conference 2012, University of Bologna, Italy. +# Reference: Gabrielatos, C., & Marchi, A. (2011, November 5). Keyness: Matching metrics to definitions [Conference session]. Corpus Linguistics in the South 1, University of Portsmouth, United Kingdom. https://eprints.lancs.ac.uk/id/eprint/51449/4/Gabrielatos_Marchi_Keyness.pdf def pct_diff(main, o11s, o12s, o21s, o22s): _, _, ox1s, ox2s = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) @@ -40,23 +40,21 @@ def pct_diff(main, o11s, o12s, o21s, o22s): ) ) -# Cubic Association Ratio -# References: -# Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= -# Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University. +# Cubic association ratio +# Reference: Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= def im3(main, o11s, o12s, o21s, o22s): e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 3, e11s)) -# Dice's Coefficient +# Dice-Sørensen coefficient # Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), 1–38. -def dices_coeff(main, o11s, o12s, o21s, o22s): +def dice_sorensen_coeff(main, o11s, o12s, o21s, o22s): o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) return wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s) -# Difference Coefficient +# Difference coefficient # References: # Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. # Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus approaches to discourse: A critical review (pp. 225–258). Routledge. @@ -72,12 +70,12 @@ def diff_coeff(main, o11s, o12s, o21s, o22s): 0 ) -# Jaccard Index -# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf +# Jaccard index +# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 def jaccard_index(main, o11s, o12s, o21s, o22s): return wl_measure_utils.numpy_divide(o11s, o11s + o12s + o21s) -# Kilgarriff's Ratio +# Kilgarriff's ratio # Reference: Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. González-Díaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (p. 171). University of Liverpool. def kilgarriffs_ratio(main, o11s, o12s, o21s, o22s): smoothing_param = main.settings_custom['measures']['effect_size']['kilgarriffs_ratio']['smoothing_param'] @@ -87,8 +85,22 @@ def kilgarriffs_ratio(main, o11s, o12s, o21s, o22s): wl_measure_utils.numpy_divide(o12s, o12s + o22s) * 1000000 + smoothing_param ) +# logDice +# Reference: Rychlý, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. Horák (Eds.), Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing. Masaryk University +def log_dice(main, o11s, o12s, o21s, o22s): + o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) + + return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s), default = 14) + +# Log-frequency biased MD +# Reference: Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620–625). European Language Resources Association. +def lfmd(main, o11s, o12s, o21s, o22s): + e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) + + return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s)) + wl_measure_utils.numpy_log2(o11s) + # Log Ratio -# Reference: Hardie, A. (2014, April 28). Log ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/ +# Reference: Hardie, A. (2014, April 28). Log Ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/ def log_ratio(main, o11s, o12s, o21s, o22s): _, _, ox1s, ox2s = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) @@ -107,30 +119,16 @@ def log_ratio(main, o11s, o12s, o21s, o22s): ) ) -# Log-Frequency Biased MD -# Reference: Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620–625). European Language Resources Association. -def lfmd(main, o11s, o12s, o21s, o22s): - e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) - - return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s)) + wl_measure_utils.numpy_log2(o11s) - -# logDice -# Reference: Rychlý, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. Horák (Eds.), Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing. Masaryk University -def log_dice(main, o11s, o12s, o21s, o22s): - o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) - - return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s), default = 14) - # MI.log-f # References: -# Lexical Computing. (2015, July 8). Statistics used in Sketch Engine. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/ # Kilgarriff, A., & Tugwell, D. (2002). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In Proceedings of the 8th Machine Translation Summit (pp. 187–190). European Association for Machine Translation. +# Lexical Computing. (2015, July 8). Statistics used in Sketch Engine. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/ def mi_log_f(main, o11s, o12s, o21s, o22s): e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s)) * wl_measure_utils.numpy_log(o11s + 1) -# Minimum Sensitivity +# Minimum sensitivity # Reference: Pedersen, T. (1998). Dependent bigram identification. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (p. 1197). AAAI Press. def min_sensitivity(main, o11s, o12s, o21s, o22s): o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) @@ -154,8 +152,8 @@ def me(main, o11s, o12s, o21s, o22s): return o11s * wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s) -# Mutual Information -# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf +# Mutual information +# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 def mi(main, o11s, o12s, o21s, o22s): oxxs = o11s + o12s + o21s + o22s e11s, e12s, e21s, e22s = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) @@ -167,7 +165,7 @@ def mi(main, o11s, o12s, o21s, o22s): return mi_11 + mi_12 + mi_21 + mi_22 -# Odds Ratio +# Odds ratio # Reference: Pojanapunya, P., & Todd, R. W. (2016). Log-likelihood and odds ratio keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 15(1), 133–167. https://doi.org/10.1515/cllt-2015-0030 def odds_ratio(main, o11s, o12s, o21s, o22s): return numpy.where( @@ -183,14 +181,14 @@ def odds_ratio(main, o11s, o12s, o21s, o22s): ) ) -# Pointwise Mutual Information +# Pointwise mutual information # Reference: Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29. def pmi(main, o11s, o12s, o21s, o22s): e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s, e11s)) -# Poisson Collocation Measure +# Poisson collocation measure # Reference: Quasthoff, U., & Wolff, C. (2002). The poisson collocation measure and its applications. Proceedings of 2nd International Workshop on Computational Approaches to Collocations. IEEE. def poisson_collocation_measure(main, o11s, o12s, o21s, o22s): oxxs = o11s + o12s + o21s + o22s @@ -201,7 +199,14 @@ def poisson_collocation_measure(main, o11s, o12s, o21s, o22s): wl_measure_utils.numpy_log(oxxs) ) -# Squared Phi Coefficient +# Squared association ratio +# Reference: Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University. +def im2(main, o11s, o12s, o21s, o22s): + e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s) + + return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s)) + +# Squared phi coefficient # Reference: Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom. def squared_phi_coeff(main, o11s, o12s, o21s, o22s): o1xs, o2xs, ox1s, ox2s = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s) diff --git a/wordless/wl_measures/wl_measures_lexical_density_diversity.py b/wordless/wl_measures/wl_measures_lexical_density_diversity.py index 7ca417647..e0491c5cb 100644 --- a/wordless/wl_measures/wl_measures_lexical_density_diversity.py +++ b/wordless/wl_measures/wl_measures_lexical_density_diversity.py @@ -29,7 +29,7 @@ _tr = QCoreApplication.translate -# Brunét's Index +# Brunét's index # References: # Brunét, E. (1978). Le vocabulaire de Jean Giraudoux: Structure et evolution. Slatkine. # Bucks, R. S., Singh, S., Cuerden, J. M., & Wilcock, G. K. (2000). Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance. Aphasiology, 14(1), 71–91. https://doi.org/10.1080/026870300401603 @@ -66,7 +66,7 @@ def fishers_index_of_diversity(main, text): return alpha -# Herdan's Vₘ +# Herdan's vₘ # Reference: Herdan, G. (1955). A new derivation and interpretation of Yule's ‘Characteristic’ K. Zeitschrift für Angewandte Mathematik und Physik (ZAMP), 6(4), 332–339. https://doi.org/10.1007/BF01587632 def herdans_vm(main, text): types_freqs = collections.Counter(text.get_tokens_flat()) @@ -99,7 +99,7 @@ def hdd(main, text): return sum(ttrs) -# Honoré's Statistic +# Honoré's statistic # References: # Honoré, A. (1979). Some simple measures of richness of vocabulary. Association of Literary and Linguistic Computing Bulletin, 7(2), 172–177. # Bucks, R. S., Singh, S., Cuerden, J. M., & Wilcock, G. K. (2000). Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance. Aphasiology, 14(1), 71–91. https://doi.org/10.1080/026870300401603 @@ -114,8 +114,8 @@ def honores_stat(main, text): return r -# Lexical Density -# Reference: Halliday, M. A. K. (1989). Spoken and written language (2nd ed., p. 64). +# Lexical density +# Reference: Halliday, M. A. K. (1989). Spoken and written language (2nd ed., p. 64). Oxford University Press. def lexical_density(main, text): if text.lang in main.settings_global['pos_taggers']: wl_pos_tagging.wl_pos_tag_universal(main, text.get_tokens_flat(), lang = text.lang, tagged = text.tagged) @@ -164,7 +164,7 @@ def logttr(main, text): return logttr -# Mean Segmental TTR +# Mean segmental TTR # References: # Johnson, W. (1944). Studies in language behavior: I. a program of research. Psychological Monographs, 56(2), 1–15. https://doi.org/10.1037/h0093508 # McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis] (p. 37). ProQuest Dissertations and Theses Global. @@ -185,7 +185,7 @@ def msttr(main, text): return msttr -# Measure of Textual Lexical Diversity +# Measure of textual lexical diversity # References: # McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis] (pp. 95–96, 99–100). ProQuest Dissertations and Theses Global. # McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. https://doi.org/10.3758/BRM.42.2.381 @@ -388,7 +388,7 @@ def popescus_r4(main, text): return r4 -# Repeat Rate +# Repeat rate # Reference: Popescu, I.-I. (2009). Word frequency studies (p. 166). Mouton de Gruyter. def repeat_rate(main, text): use_data = main.settings_custom['measures']['lexical_density_diversity']['repeat_rate']['use_data'] @@ -408,12 +408,12 @@ def repeat_rate(main, text): # Root TTR # References: -# Guiraud, P. (1954). Les caractères statistiques du vocabulaire: Essai de méthodologie. Presses universitaires de France. +# Guiraud, P. (1954). Les caractères statistiques du vocabulaire: Essai de méthodologie. Presses Universitaires de France. # Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment (p. 26). Palgrave Macmillan. def rttr(main, text): return text.num_types / numpy.sqrt(text.num_tokens) -# Shannon Entropy +# Shannon entropy # Reference: Popescu, I.-I. (2009). Word frequency studies (p. 173). Mouton de Gruyter. def shannon_entropy(main, text): use_data = main.settings_custom['measures']['lexical_density_diversity']['shannon_entropy']['use_data'] @@ -432,7 +432,7 @@ def shannon_entropy(main, text): return h # Simpson's l -# Reference: Simpson, E. H. (1949). Measurement of diversity. Nature, 163, p. 688. https://doi.org/10.1038/163688a0 +# Reference: Simpson, E. H. (1949). Measurement of diversity. Nature, 163, 688. https://doi.org/10.1038/163688a0 def simpsons_l(main, text): types_freqs = collections.Counter(text.get_tokens_flat()) freqs_nums_types = collections.Counter(types_freqs.values()) @@ -444,7 +444,7 @@ def simpsons_l(main, text): return l -# Type-token Ratio +# Type-token ratio # Reference: Johnson, W. (1944). Studies in language behavior: I. a program of research. Psychological Monographs, 56(2), 1–15. https://doi.org/10.1037/h0093508 def ttr(main, text): return text.num_types / text.num_tokens @@ -479,7 +479,7 @@ def ttr(n, d): return popt[0] -# Yule's Characteristic K +# Yule's characteristic K # Reference: Yule, G. U. (1944). The statistical study of literary vocabulary (pp. 52–53). Cambridge University Press. def yules_characteristic_k(main, text): types_freqs = collections.Counter(text.get_tokens_flat()) diff --git a/wordless/wl_measures/wl_measures_readability.py b/wordless/wl_measures/wl_measures_readability.py index df6af4392..d5faaf697 100644 --- a/wordless/wl_measures/wl_measures_readability.py +++ b/wordless/wl_measures/wl_measures_readability.py @@ -182,7 +182,7 @@ def get_num_sentences_sample(text, sample, sample_start): + 1 ) -# Al-Heeti's Readability Prediction Formula +# Al-Heeti's readability formula # Reference: Al-Heeti, K. N. (1984). Judgment analysis technique applied to readability prediction of Arabic reading material [Doctoral dissertation, University of Northern Colorado] (pp. 102, 104, 106). ProQuest Dissertations and Theses Global. def rd(main, text): if text.lang == 'ara': @@ -256,7 +256,7 @@ def ari(main, text): return ari -# Bormuth's Cloze Mean & Grade Placement +# Bormuth's cloze mean & grade placement # Reference: Bormuth, J. R. (1969). Development of readability analyses (pp. 152, 160). U.S. Department of Health, Education, and Welfare. http://files.eric.ed.gov/fulltext/ED029166.pdf def bormuths_cloze_mean(main, text): if text.lang.startswith('eng_'): @@ -294,7 +294,7 @@ def bormuths_gp(main, text): return gp -# Coleman-Liau Index +# Coleman-Liau index # Reference: Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284. https://doi.org/10.1037/h0076540 def coleman_liau_index(main, text): text = get_nums(main, text) @@ -311,7 +311,7 @@ def coleman_liau_index(main, text): return grade_level -# Coleman's Readability Formula +# Coleman's readability formula # Reference: Liau, T. L., Bassin, C. B., Martin, C. J., & Coleman, E. B. (1976). Modification of the Coleman readability formulas. Journal of Reading Behavior, 8(4), 381–386. https://journals.sagepub.com/doi/pdf/10.1080/10862967609547193 def colemans_readability_formula(main, text): variant = main.settings_custom['measures']['readability']['colemans_readability_formula']['variant'] @@ -374,7 +374,26 @@ def colemans_readability_formula(main, text): return cloze_pct -# Dale-Chall Readability Formula +# Crawford's readability formula +# Reference: Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. Lectura y Vida, 6(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf +def crawfords_readability_formula(main, text): + if text.lang == 'spa' and text.lang in main.settings_global['syl_tokenizers']: + text = get_nums(main, text) + + if text.num_words: + grade_level = ( + text.num_sentences / text.num_words * 100 * (-0.205) + + text.num_syls / text.num_words * 100 * 0.049 + - 3.407 + ) + else: + grade_level = 'text_too_short' + else: + grade_level = 'no_support' + + return grade_level + +# Dale-Chall readability formula # References: # Dale, E., & Chall, J. S. (1948a). A formula for predicting readability. Educational Research Bulletin, 27(1), 11–20, 28. # Dale, E., & Chall, J. S. (1948b). A formula for predicting readability: Instructions. Educational Research Bulletin, 27(2), 37–54. @@ -382,7 +401,7 @@ def colemans_readability_formula(main, text): # Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99–105. https://doi.org/10.1037/h0043254 # New: # Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books. -# 清川英男. (1996). CHALL, J. S. and DALE, E.(1995) Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books. 教育メディア研究, 3(1), 59. https://www.jstage.jst.go.jp/article/jaems/3/1/3_KJ00009004543/_pdf +# 清川英男. (1996). CHALL, J. S. and DALE, E.(1995) Readability revisited: The new Dale-Chall readability formula. Brookline Books. 教育メディア研究, 3(1), 59. https://www.jstage.jst.go.jp/article/jaems/3/1/3_KJ00009004543/_pdf def x_c50(main, text): if text.lang.startswith('eng_'): text = get_nums(main, text) @@ -416,7 +435,7 @@ def x_c50(main, text): return x_c50 -# Danielson-Bryan's Readability Formula +# Danielson-Bryan's readability formula # Reference: Danielson, W. A., & Bryan, S. D. (1963). Computer automation of two readability formulas. Journalism Quarterly, 40(2), 201–206. https://doi.org/10.1177/107769906304000207 def danielson_bryans_readability_formula(main, text): text = get_nums(main, text) @@ -441,10 +460,10 @@ def danielson_bryans_readability_formula(main, text): return danielson_bryan -# Dawood's Readability Formula +# Dawood's readability formula # References: # Dawood, B.A.K. (1977). The relationship between readability and selected language variables [Unpublished master’s thesis]. University of Baghdad. -# Cavalli-Sforza, V., Saddiki, H., & Nassiri, N. (2018). Arabic readability research—Current state and future directions. Procedia Computer Science, 142, 38–49. +# Cavalli-Sforza, V., Saddiki, H., & Nassiri, N. (2018). Arabic readability research: Current state and future directions. Procedia Computer Science, 142, 38–49. def dawoods_readability_formula(main, text): if text.lang == 'ara': text = get_nums(main, text) @@ -527,7 +546,7 @@ def elf(main, text): return elf -# Flesch-Kincaid Grade Level +# Flesch-Kincaid grade level # Reference: Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75, p. 14). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf def gl(main, text): if text.lang in main.settings_global['syl_tokenizers']: @@ -546,18 +565,18 @@ def gl(main, text): return gl -# Flesch Reading Ease +# Flesch reading ease # References: # Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233. https://doi.org/10.1037/h0057532 # Powers-Sumner-Kearl: # Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99–105. https://doi.org/10.1037/h0043254 # Dutch (Douma): -# Douma, W. H. (1960). De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules [Readability of Dutch farm papers: A discussion and application of readability-formulas] (p. 453). Afdeling sociologie en sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 +# Douma, W. H. (1960). De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules [Readability of Dutch farm papers: A discussion and application of readability-formulas] (p. 453). Afdeling Sociologie en Sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 # Dutch (Brouwer's Leesindex A): -# Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheid van Nederlands proza. Paedagogische studiën, 40, 454–464. https://objects.library.uu.nl/reader/index.php?obj=1874-205260&lan=en +# Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheid van Nederlands proza. Paedagogische Studiën, 40, 454–464. https://objects.library.uu.nl/reader/index.php?obj=1874-205260&lan=en # French: -# Kandel, L., & Moles A. (1958). Application de l’indice de flesch la langue francaise [applying flesch index to french language]. The Journal of Educational Research, 21, 283–287. -# Kopient, A., & Grabar, N. (2020). Rated lexicon for the simplification of medical texts. In B. Gersbeck-Schierholz (ed.), HEALTHINFO 2020: The fifth international conference on informatics and assistive technologies for health-care, medical support and wellbeing (pp. 11–17). IARIA. https://hal.science/hal-03095275/document +# Kandel, L., & Moles, A. (1958). Application de l’indice de flesch à la langue française. The Journal of Educational Research, 21, 283–287. +# Sitbon, L., Bellot, P., & Blache, P. (2007). Eléments pour adapter les systèmes de recherche d’information aux dyslexiques. Revue TAL : traitement automatique des langues, 48(2), 123–147. # German: # Amstad, T. (1978). Wie verständlich sind unsere Zeitungen? [Unpublished doctoral dissertation]. University of Zurich. # Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 56). Jugend und Volk. @@ -656,7 +675,7 @@ def re_flesch(main, text): return re -# Flesch Reading Ease (Farr-Jenkins-Paterson) +# Flesch reading ease (Farr-Jenkins-Paterson) # References: # Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. Journal of Applied Psychology, 35(5), 333–337. https://doi.org/10.1037/h0062427 # Powers-Sumner-Kearl: @@ -687,7 +706,7 @@ def re_farr_jenkins_paterson(main, text): return re -# FORCAST Grade Level +# FORCAST # Reference: Caylor, J. S., & Sticht, T. G. (1973). Development of a simple readability index for job reading material (p. 3). Human Resource Research Organization. https://ia902703.us.archive.org/31/items/ERIC_ED076707/ERIC_ED076707.pdf def rgl(main, text): if text.lang in main.settings_global['syl_tokenizers']: @@ -706,49 +725,9 @@ def rgl(main, text): return rgl -# Fórmula de Comprensibilidad de Gutiérrez de Polini -# References: -# Gutiérrez de Polini, L. E. (1972). Investigación sobre lectura en Venezuela [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela. -# Rodríguez Trujillo, N. (1980). Determinación de la comprensibilidad de materiales de lectura por medio de variables lingüísticas. Lectura y Vida, 1(1). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a1n1/01_01_Rodriguez.pdf -def cp(main, text): - if text.lang == 'spa': - text = get_nums(main, text) - - if text.num_words and text.num_sentences: - cp = ( - 95.2 - - 9.7 * (text.num_chars_alpha / text.num_words) - - 0.35 * (text.num_words / text.num_sentences) - ) - else: - cp = 'text_too_short' - else: - cp = 'no_support' - - return cp - -# Fórmula de Crawford -# Reference: Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. Lectura y Vida, 6(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf -def formula_de_crawford(main, text): - if text.lang == 'spa' and text.lang in main.settings_global['syl_tokenizers']: - text = get_nums(main, text) - - if text.num_words: - grade_level = ( - text.num_sentences / text.num_words * 100 * (-0.205) - + text.num_syls / text.num_words * 100 * 0.049 - - 3.407 - ) - else: - grade_level = 'text_too_short' - else: - grade_level = 'no_support' - - return grade_level - # Fucks's Stilcharakteristik # References: -# Fucks, W. (1955). Unterschied des Prosastils von Dichtern und anderen Schriftstellern: ein Beispiel mathematischer Stilanalyse. Bouvier. +# Fucks, W. (1955). Unterschied des prosastils von dichtern und anderen schriftstellern: Ein beispiel mathematischer stilanalyse. Bouvier. # Briest, W. (1974). Kann man Verständlichkeit messen? STUF - Language Typology and Universals, 27(1-3), 543–563. https://doi.org/10.1524/stuf.1974.27.13.543 def fuckss_stilcharakteristik(main, text): if text.lang in main.settings_global['syl_tokenizers']: @@ -763,25 +742,25 @@ def fuckss_stilcharakteristik(main, text): return stilcharakteristik -# Gulpease Index +# GULPEASE # References: # Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. Scuola e Città, 39(3), 110–124. -# Indice Gulpease. (2021, July 9). In Wikipedia.https://it.wikipedia.org/w/index.php?title=Indice_Gulpease&oldid=121763335. -def gulpease_index(main, text): +# Indice Gulpease. (2021, July 9). In Wikipedia. https://it.wikipedia.org/w/index.php?title=Indice_Gulpease&oldid=121763335. +def gulpease(main, text): if text.lang == 'ita': text = get_nums(main, text) if text.num_words: - gulpease_index = ( + gulpease = ( 89 + (300 * text.num_sentences - 10 * text.num_chars_alpha) / text.num_words ) else: - gulpease_index = 'text_too_short' + gulpease = 'text_too_short' else: - gulpease_index = 'no_support' + gulpease = 'no_support' - return gulpease_index + return gulpease # Gunning Fog Index # References: @@ -863,6 +842,27 @@ def fog_index(main, text): return fog_index +# Gutiérrez de Polini's readability formula +# References: +# Gutiérrez de Polini, L. E. (1972). Investigación sobre lectura en Venezuela [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela. +# Rodríguez Trujillo, N. (1980). Determinación de la comprensibilidad de materiales de lectura por medio de variables lingüísticas. Lectura y Vida, 1(1). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a1n1/01_01_Rodriguez.pdf +def cp(main, text): + if text.lang == 'spa': + text = get_nums(main, text) + + if text.num_words and text.num_sentences: + cp = ( + 95.2 + - 9.7 * (text.num_chars_alpha / text.num_words) + - 0.35 * (text.num_words / text.num_sentences) + ) + else: + cp = 'text_too_short' + else: + cp = 'no_support' + + return cp + # Legibilidad µ # Reference: Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos. Boletín de Investigación Educacional, Pontificia Universidad Católica de Chile, 21(2), 13–26. def mu(main, text): @@ -888,9 +888,9 @@ def mu(main, text): return mu -# Lensear Write +# Lensear Write Formula # Reference: O’Hayre, J. (1966). Gobbledygook has gotta go (p. 8). U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf -def lensear_write(main, text): +def lensear_write_formula(main, text): if text.lang.startswith('eng_') and text.lang in main.settings_global['syl_tokenizers']: text = get_nums(main, text) @@ -950,7 +950,7 @@ def lix(main, text): # Lorge, I. (1948). The Lorge and Flesch readability formulae: A correction. School and Society, 67, 141–142. # DuBay, W. H. (2006). In W. H. DuBay (Ed.), The classic readability studies (pp. 46–60). Impact Information. https://files.eric.ed.gov/fulltext/ED506404.pdf def lorge_readability_index(main, text): - if text.lang in main.settings_global['pos_taggers']: + if text.lang.startswith('eng_'): text = get_nums(main, text) if text.num_sentences and text.num_words: @@ -986,7 +986,7 @@ def lorge_readability_index(main, text): return lorge -# Luong-Nguyen-Dinh's Readability Formula +# Luong-Nguyen-Dinh's readability formula # Reference: Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 198–202). IEEE. https://doi.org/10.1109/KSE.2018.8573379 def luong_nguyen_dinhs_readability_formula(main, text): if text.lang == 'vie': @@ -1010,7 +1010,7 @@ def luong_nguyen_dinhs_readability_formula(main, text): return readability # McAlpine EFLAW Readability Score -# Reference: Nirmaldasan. (2009, April 30). McAlpine EFLAW readability score. Readability Monitor. Retrieved November 15, 2022, from https://strainindex.wordpress.com/2009/04/30/mcalpine-eflaw-readability-score/ +# Reference: McAlpine, R. (2006). From plain English to global English. Journalism Online. Retrieved October 31, 2024, from https://www.angelfire.com/nd/nirmaldasan/journalismonline/fpetge.html def eflaw(main, text): if text.lang.startswith('eng_'): text = get_nums(main, text) @@ -1169,12 +1169,12 @@ def rix(main, text): return rix -# SMOG Grade +# SMOG Grading # References: -# McLaughlin, G. H. (1969). SMOG grading: A new readability formula. Journal of Reading, 12(8), 639–646. +# McLaughlin, G. H. (1969). SMOG Grading: A new readability formula. Journal of Reading, 12(8), 639–646. # German: # Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 78). Jugend und Volk. -def smog_grade(main, text): +def smog_grading(main, text): if text.lang in main.settings_global['syl_tokenizers']: text = get_nums(main, text) @@ -1206,14 +1206,15 @@ def smog_grade(main, text): return g -# Spache Grade Level +# Spache readability formula # References: # Spache, G. (1953). A new readability formula for primary-grade reading materials. Elementary School Journal, 53(7), 410–413. https://doi.org/10.1086/458513 +# Revised: # Spache, G. (1974). Good reading for poor readers (Rev. 9th ed.). Garrard. # Michalke, M., Brown, E., Mirisola, A., Brulet, A., & Hauser, L. (2021, May 17). Measure readability. Documentation for package ‘koRpus’ version 0.13-8. Retrieved August 3, 2023, from https://search.r-project.org/CRAN/refmans/koRpus/html/readability-methods.html # Spache word list: # Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2020, November 17). data_char_wordlists.rda. quanteda.textstats. Retrieved August 3, 2023, from https://github.com/quanteda/quanteda.textstats/raw/master/data/data_char_wordlists.rda -def spache_grade_lvl(main, text): +def spache_readability_formula(main, text): if text.lang.startswith('eng_'): text = get_nums(main, text) @@ -1227,7 +1228,7 @@ def spache_grade_lvl(main, text): num_sentences = get_num_sentences_sample(text, sample, sample_start) - if main.settings_custom['measures']['readability']['spache_grade_lvl']['use_rev_formula']: + if main.settings_custom['measures']['readability']['spache_readability_formula']['use_rev_formula']: num_difficult_words = get_num_words_outside_list(sample, wordlist = 'spache') grade_lvls.append( 0.121 * (100 / num_sentences) @@ -1252,8 +1253,8 @@ def spache_grade_lvl(main, text): # Strain Index # References: -# Solomon, N. W. (2006). Qualitative analysis of media language [Unpublished doctoral dissertation]. Madurai Kamaraj University. -# Nirmaldasan. (2007, September 25). Strain index: A new readability formula. Readability Monitor. Retrieved August 3, 2023, from https://strainindex.wordpress.com/2007/09/25/hello-world/ +# Nathaniel, W. S. (2017). A quantitative analysis of media language [Master’s thesis, Madurai Kamaraj University]. LAMBERT Academic Publishing. +# Nirmaldasan. (2007, July). Strain Index: A new readability formula. Journalism Online. Retrieved October 31, 2024, from https://www.angelfire.com/nd/nirmaldasan/readability/si.html def strain_index(main, text): if text.lang in main.settings_global['syl_tokenizers']: text = get_nums(main, text) @@ -1277,9 +1278,9 @@ def strain_index(main, text): return strain_index -# Tränkle & Bailer's Readability Formula +# Tränkle-Bailer's readability formula # References: -# Tränkle, U., & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln für die Deutsche Sprache [Cross-validation and recalculation of the readability formulas for the German language]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 16(3), 231–244. +# Tränkle, U., & Bailer, H. (1984). Kreuzvalidierung und neuberechnung von lesbarkeitsformeln für die Deutsche sprache. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 16(3), 231–244. # Benoit, K. (2020, November 24). Calculate readability. quanteda: Quantitative Analysis of Textual Data. Retrieved August 3, 2023, from https://quanteda.io/reference/textstat_readability.html def trankle_bailers_readability_formula(main, text): if text.lang in main.settings_global['pos_taggers']: @@ -1322,9 +1323,9 @@ def trankle_bailers_readability_formula(main, text): return trankle_bailers -# Tuldava's Text Difficulty +# Tuldava's readability formula # References: -# Tuldava, J. (1975). Ob izmerenii trudnosti tekstov [On measuring the complexity of the text]. Uchenye zapiski Tartuskogo universiteta. Trudy po metodike prepodavaniya inostrannykh yazykov, 345, 102–120. +# Tuldava, J. (1975). Ob izmerenii trudnosti tekstov. Uchenye zapiski Tartuskogo universiteta. Trudy po metodike prepodavaniya inostrannykh yazykov, 345, 102–120. # Grzybek, P. (2010). Text difficulty and the Arens-Altmann law. In P. Grzybek, E. Kelih, & J. Mačutek (eds.), Text and language: Structures · functions · interrelations quantitative perspectives. Praesens Verlag. https://www.iqla.org/includes/basic_references/qualico_2009_proceedings_Grzybek_Kelih_Macutek_2009.pdf def td(main, text): if text.lang in main.settings_global['syl_tokenizers']: @@ -1342,7 +1343,7 @@ def td(main, text): return td -# Wheeler & Smith's Readability Formula +# Wheeler-Smith's readability formula # Reference: Wheeler, L. R., & Smith, E. H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31(7), 397–399. UNIT_TERMINATORS = ''.join(list(wl_sentence_tokenization.SENTENCE_TERMINATORS) + list(dict.fromkeys([ # Colons and semicolons: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:name=/COLON/:]%26[:General_Category=/Punctuation/:] diff --git a/wordless/wl_measures/wl_measures_statistical_significance.py b/wordless/wl_measures/wl_measures_statistical_significance.py index 512414a4b..39aab6407 100644 --- a/wordless/wl_measures/wl_measures_statistical_significance.py +++ b/wordless/wl_measures/wl_measures_statistical_significance.py @@ -70,7 +70,7 @@ def get_alt(direction): return alt -# Fisher's Exact Test +# Fisher's exact test # References: Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference (pp. 188–200). The South–Central Regional SAS Users' Group. def fishers_exact_test(main, o11s, o12s, o21s, o22s): settings = main.settings_custom['measures']['statistical_significance']['fishers_exact_test'] @@ -85,7 +85,7 @@ def fishers_exact_test(main, o11s, o12s, o21s, o22s): return [None] * len(p_vals), p_vals -# Log-likelihood Ratio +# Log-likelihood ratio test # References: Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74. def log_likelihood_ratio_test(main, o11s, o12s, o21s, o22s): settings = main.settings_custom['measures']['statistical_significance']['log_likelihood_ratio_test'] @@ -108,7 +108,7 @@ def log_likelihood_ratio_test(main, o11s, o12s, o21s, o22s): return gs, p_vals -# Mann-Whitney U Test +# Mann-Whitney U test # References: Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 232–263. https://doi.org/10.1075/ijcl.6.1.05kil def mann_whitney_u_test(main, freqs_x1s, freqs_x2s): settings = main.settings_custom['measures']['statistical_significance']['mann_whitney_u_test'] @@ -129,10 +129,10 @@ def mann_whitney_u_test(main, freqs_x1s, freqs_x2s): return u1s, p_vals -# Pearson's Chi-squared Test +# Pearson's chi-squared test # References: # Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. -# Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh University Press. +# Oakes, M. P. (1998). Statistics for corpus linguistics. Edinburgh University Press. def pearsons_chi_squared_test(main, o11s, o12s, o21s, o22s): settings = main.settings_custom['measures']['statistical_significance']['pearsons_chi_squared_test'] @@ -202,7 +202,7 @@ def students_t_test_2_sample(main, freqs_x1s, freqs_x2s): return t_stats, p_vals -def _z_score_p_val(z_scores, direction): +def _z_test_p_val(z_scores, direction): p_vals = numpy.empty_like(z_scores) if direction == _tr('wl_measures_statistical_significance', 'Two-tailed'): @@ -217,23 +217,23 @@ def _z_score_p_val(z_scores, direction): return p_vals -# z-score +# Z-test # References: Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), Proceedings of the symposium on statistical association methods for mechanized documentation (pp. 61–148). National Bureau of Standards. -def z_score(main, o11s, o12s, o21s, o22s): - settings = main.settings_custom['measures']['statistical_significance']['z_score'] +def z_test(main, o11s, o12s, o21s, o22s): + settings = main.settings_custom['measures']['statistical_significance']['z_test'] oxxs = o11s + o12s + o21s + o22s e11s, _, _, _ = get_freqs_expected(o11s, o12s, o21s, o22s) z_scores = wl_measure_utils.numpy_divide(o11s - e11s, numpy.sqrt(e11s * (1 - wl_measure_utils.numpy_divide(e11s, oxxs)))) - p_vals = _z_score_p_val(z_scores, settings['direction']) + p_vals = _z_test_p_val(z_scores, settings['direction']) return z_scores, p_vals -# z-score (Berry-Rogghe) +# Z-test (Berry-Rogghe) # References: Berry-Rogghe, G. L. M. (1973). The computation of collocations and their relevance in lexical studies. In A. J. Aiken, R. W. Bailey, & N. Hamilton-Smith (Eds.), The computer and literary studies (pp. 103–112). Edinburgh University Press. -def z_score_berry_rogghe(main, o11s, o12s, o21s, o22s, span): - settings = main.settings_custom['measures']['statistical_significance']['z_score_berry_rogghe'] +def z_test_berry_rogghe(main, o11s, o12s, o21s, o22s, span): + settings = main.settings_custom['measures']['statistical_significance']['z_test_berry_rogghe'] o1xs, o2xs, ox1s, _ = get_freqs_marginal(o11s, o12s, o21s, o22s) @@ -242,6 +242,6 @@ def z_score_berry_rogghe(main, o11s, o12s, o21s, o22s, span): es = ps * o1xs * span z_scores = wl_measure_utils.numpy_divide(o11s - es, numpy.sqrt(es * (1 - ps))) - p_vals = _z_score_p_val(z_scores, settings['direction']) + p_vals = _z_test_p_val(z_scores, settings['direction']) return z_scores, p_vals diff --git a/wordless/wl_profiler.py b/wordless/wl_profiler.py index dbc9de06b..a332c1eec 100644 --- a/wordless/wl_profiler.py +++ b/wordless/wl_profiler.py @@ -366,13 +366,14 @@ def update_gui_table(self, err_msg, text_stats_files): class Wl_Table_Profiler_Readability(Wl_Table_Profiler): def __init__(self, parent): HEADERS_READABILITY = [ - _tr('Wl_Table_Profiler_Readability', "Al-Heeti's Readability Prediction Formula"), + _tr('Wl_Table_Profiler_Readability', "Al-Heeti's Readability Formula"), _tr('Wl_Table_Profiler_Readability', 'Automated Arabic Readability Index'), _tr('Wl_Table_Profiler_Readability', 'Automated Readability Index'), _tr('Wl_Table_Profiler_Readability', "Bormuth's Cloze Mean"), _tr('Wl_Table_Profiler_Readability', "Bormuth's Grade Placement"), _tr('Wl_Table_Profiler_Readability', 'Coleman-Liau Index'), _tr('Wl_Table_Profiler_Readability', "Coleman's Readability Formula"), + _tr('Wl_Table_Profiler_Readability', "Crawford's Readability Formula"), _tr('Wl_Table_Profiler_Readability', 'Dale-Chall Readability Formula'), _tr('Wl_Table_Profiler_Readability', "Danielson-Bryan's Readability Formula"), _tr('Wl_Table_Profiler_Readability', "Dawood's Readability Formula"), @@ -383,14 +384,13 @@ def __init__(self, parent): _tr('Wl_Table_Profiler_Readability', 'Flesch-Kincaid Grade Level'), _tr('Wl_Table_Profiler_Readability', 'Flesch Reading Ease'), _tr('Wl_Table_Profiler_Readability', 'Flesch Reading Ease (Farr-Jenkins-Paterson)'), - _tr('Wl_Table_Profiler_Readability', 'FORCAST Grade Level'), - _tr('Wl_Table_Profiler_Readability', 'Fórmula de Comprensibilidad de Gutiérrez de Polini'), - _tr('Wl_Table_Profiler_Readability', 'Fórmula de Crawford'), + _tr('Wl_Table_Profiler_Readability', 'FORCAST'), _tr('Wl_Table_Profiler_Readability', "Fucks's Stilcharakteristik"), - _tr('Wl_Table_Profiler_Readability', 'Gulpease Index'), + _tr('Wl_Table_Profiler_Readability', 'GULPEASE'), _tr('Wl_Table_Profiler_Readability', 'Gunning Fog Index'), + _tr('Wl_Table_Profiler_Readability', "Gutiérrez de Polini's Readability Formula"), _tr('Wl_Table_Profiler_Readability', 'Legibilidad μ'), - _tr('Wl_Table_Profiler_Readability', 'Lensear Write'), + _tr('Wl_Table_Profiler_Readability', 'Lensear Write Formula'), _tr('Wl_Table_Profiler_Readability', 'Lix'), _tr('Wl_Table_Profiler_Readability', 'Lorge Readability Index'), _tr('Wl_Table_Profiler_Readability', "Luong-Nguyen-Dinh's Readability Formula"), @@ -399,12 +399,12 @@ def __init__(self, parent): _tr('Wl_Table_Profiler_Readability', 'neue Wiener Sachtextformel'), _tr('Wl_Table_Profiler_Readability', 'OSMAN'), _tr('Wl_Table_Profiler_Readability', 'Rix'), - _tr('Wl_Table_Profiler_Readability', 'SMOG Grade'), - _tr('Wl_Table_Profiler_Readability', 'Spache Grade Level'), + _tr('Wl_Table_Profiler_Readability', 'SMOG Grading'), + _tr('Wl_Table_Profiler_Readability', 'Spache Readability Formula'), _tr('Wl_Table_Profiler_Readability', 'Strain Index'), - _tr('Wl_Table_Profiler_Readability', "Tränkle & Bailer's Readability Formula"), - _tr('Wl_Table_Profiler_Readability', "Tuldava's Text Difficulty"), - _tr('Wl_Table_Profiler_Readability', "Wheeler & Smith's Readability Formula") + _tr('Wl_Table_Profiler_Readability', "Tränkle-Bailer's Readability Formula"), + _tr('Wl_Table_Profiler_Readability', "Tuldava's Readability Formula"), + _tr('Wl_Table_Profiler_Readability', "Wheeler-Smith's Readability Formula") ] super().__init__( @@ -589,7 +589,7 @@ def __init__(self, parent): _tr('Wl_Table_Profiler_Lexical_Density_Diversity', "Brunét's Index"), _tr('Wl_Table_Profiler_Lexical_Density_Diversity', 'Corrected TTR'), _tr('Wl_Table_Profiler_Lexical_Density_Diversity', "Fisher's Index of Diversity"), - _tr('Wl_Table_Profiler_Lexical_Density_Diversity', "Herdan's Vₘ"), + _tr('Wl_Table_Profiler_Lexical_Density_Diversity', "Herdan's vₘ"), 'HD-D', _tr('Wl_Table_Profiler_Lexical_Density_Diversity', "Honoré's Statistic"), _tr('Wl_Table_Profiler_Lexical_Density_Diversity', 'Lexical Density'), @@ -1185,6 +1185,7 @@ def run(self): wl_measures_readability.bormuths_gp(self.main, text), wl_measures_readability.coleman_liau_index(self.main, text), wl_measures_readability.colemans_readability_formula(self.main, text), + wl_measures_readability.crawfords_readability_formula(self.main, text), wl_measures_readability.x_c50(self.main, text), wl_measures_readability.danielson_bryans_readability_formula(self.main, text), wl_measures_readability.dawoods_readability_formula(self.main, text), @@ -1196,13 +1197,12 @@ def run(self): wl_measures_readability.re_flesch(self.main, text), wl_measures_readability.re_farr_jenkins_paterson(self.main, text), wl_measures_readability.rgl(self.main, text), - wl_measures_readability.cp(self.main, text), - wl_measures_readability.formula_de_crawford(self.main, text), wl_measures_readability.fuckss_stilcharakteristik(self.main, text), - wl_measures_readability.gulpease_index(self.main, text), + wl_measures_readability.gulpease(self.main, text), wl_measures_readability.fog_index(self.main, text), + wl_measures_readability.cp(self.main, text), wl_measures_readability.mu(self.main, text), - wl_measures_readability.lensear_write(self.main, text), + wl_measures_readability.lensear_write_formula(self.main, text), wl_measures_readability.lix(self.main, text), wl_measures_readability.lorge_readability_index(self.main, text), wl_measures_readability.luong_nguyen_dinhs_readability_formula(self.main, text), @@ -1211,8 +1211,8 @@ def run(self): wl_measures_readability.nws(self.main, text), wl_measures_readability.osman(self.main, text), wl_measures_readability.rix(self.main, text), - wl_measures_readability.smog_grade(self.main, text), - wl_measures_readability.spache_grade_lvl(self.main, text), + wl_measures_readability.smog_grading(self.main, text), + wl_measures_readability.spache_readability_formula(self.main, text), wl_measures_readability.strain_index(self.main, text), wl_measures_readability.trankle_bailers_readability_formula(self.main, text), wl_measures_readability.td(self.main, text), diff --git a/wordless/wl_settings/wl_settings_default.py b/wordless/wl_settings/wl_settings_default.py index 247a4a4cb..4b93e6352 100644 --- a/wordless/wl_settings/wl_settings_default.py +++ b/wordless/wl_settings/wl_settings_default.py @@ -2318,7 +2318,7 @@ def init_settings_default(main): 'variant': '1' }, - 'spache_grade_lvl': { + 'spache_readability_formula': { 'use_rev_formula': True }, @@ -2407,11 +2407,11 @@ def init_settings_default(main): 'direction': _tr('wl_settings_default', 'Two-tailed') }, - 'z_score': { + 'z_test': { 'direction': _tr('wl_settings_default', 'Two-tailed') }, - 'z_score_berry_rogghe': { + 'z_test_berry_rogghe': { 'direction': _tr('wl_settings_default', 'Two-tailed') } }, diff --git a/wordless/wl_settings/wl_settings_global.py b/wordless/wl_settings/wl_settings_global.py index 5af8b99cd..89c570475 100644 --- a/wordless/wl_settings/wl_settings_global.py +++ b/wordless/wl_settings/wl_settings_global.py @@ -3546,8 +3546,7 @@ def init_settings_global(): 'zul': ['vader_zul'] }, - # Only people's names are capitalized - # Case of measure names are preserved + # Only people's names are capitalized and case of measure names are preserved as in original papers 'mapping_measures': { 'dispersion': { _tr('wl_settings_global', 'None'): 'none', @@ -3578,12 +3577,12 @@ def init_settings_global(): _tr('wl_settings_global', 'None'): 'none', _tr('wl_settings_global', "Fisher's exact test"): 'fishers_exact_test', _tr('wl_settings_global', 'Log-likelihood ratio test'): 'log_likelihood_ratio_test', - _tr('wl_settings_global', 'Mann-Whitney U Test'): 'mann_whitney_u_test', + _tr('wl_settings_global', 'Mann-Whitney U test'): 'mann_whitney_u_test', _tr('wl_settings_global', "Pearson's chi-squared test"): 'pearsons_chi_squared_test', _tr('wl_settings_global', "Student's t-test (1-sample)"): 'students_t_test_1_sample', _tr('wl_settings_global', "Student's t-test (2-sample)"): 'students_t_test_2_sample', - _tr('wl_settings_global', 'z-score'): 'z_score', - _tr('wl_settings_global', 'z-score (Berry-Rogghe)'): 'z_score_berry_rogghe' + _tr('wl_settings_global', 'Z-test'): 'z_test', + _tr('wl_settings_global', 'Z-test (Berry-Rogghe)'): 'z_test_berry_rogghe' }, 'bayes_factor': { @@ -3599,9 +3598,9 @@ def init_settings_global(): _tr('wl_settings_global', "Dice's coefficient"): 'dices_coeff', _tr('wl_settings_global', 'Difference coefficient'): 'diff_coeff', _tr('wl_settings_global', 'Jaccard index'): 'jaccard_index', - _tr('wl_settings_global', 'Log-frequency biased MD'): 'lfmd', _tr('wl_settings_global', "Kilgarriff's ratio"): 'kilgarriffs_ratio', 'logDice': 'log_dice', + _tr('wl_settings_global', 'Log-frequency biased MD'): 'lfmd', _tr('wl_settings_global', 'Log ratio'): 'log_ratio', 'MI.log-f': 'mi_log_f', _tr('wl_settings_global', 'Minimum sensitivity'): 'min_sensitivity', @@ -3611,6 +3610,7 @@ def init_settings_global(): _tr('wl_settings_global', 'Odds ratio'): 'or', _tr('wl_settings_global', 'Pointwise mutual information'): 'pmi', _tr('wl_settings_global', 'Poisson collocation measure'): 'poisson_collocation_measure', + _tr('wl_settings_global', 'Squared association ratio'): 'im2', _tr('wl_settings_global', 'Squared phi coefficient'): 'squared_phi_coeff' } }, @@ -3791,17 +3791,17 @@ def init_settings_global(): 'keyword_extractor': True }, - 'z_score': { + 'z_test': { 'col_text': _tr('wl_settings_global', 'z-score'), - 'func': wl_measures_statistical_significance.z_score, + 'func': wl_measures_statistical_significance.z_test, 'to_sections': False, 'collocation_extractor': True, 'keyword_extractor': True }, - 'z_score_berry_rogghe': { + 'z_test_berry_rogghe': { 'col_text': _tr('wl_settings_global', 'z-score'), - 'func': wl_measures_statistical_significance.z_score_berry_rogghe, + 'func': wl_measures_statistical_significance.z_test_berry_rogghe, 'to_sections': False, 'collocation_extractor': True, 'keyword_extractor': False @@ -3847,9 +3847,9 @@ def init_settings_global(): 'func': wl_measures_effect_size.im3 }, - 'dices_coeff': { - 'col_text': _tr('wl_settings_global', "Dice's Coefficient"), - 'func': wl_measures_effect_size.dices_coeff + 'dice_sorensen_coeff': { + 'col_text': _tr('wl_settings_global', 'Dice-Sørensen coefficient'), + 'func': wl_measures_effect_size.dice_sorensen_coeff }, 'diff_coeff': { @@ -3922,6 +3922,11 @@ def init_settings_global(): 'func': wl_measures_effect_size.poisson_collocation_measure }, + 'im2': { + 'col_text': 'IM²', + 'func': wl_measures_effect_size.im2 + }, + 'squared_phi_coeff': { 'col_text': 'φ2', 'func': wl_measures_effect_size.squared_phi_coeff diff --git a/wordless/wl_settings/wl_settings_measures.py b/wordless/wl_settings/wl_settings_measures.py index 5a0773446..91f00ef1f 100644 --- a/wordless/wl_settings/wl_settings_measures.py +++ b/wordless/wl_settings/wl_settings_measures.py @@ -31,8 +31,8 @@ def __init__(self, main): self.settings_default = self.main.settings_default['measures']['readability'] self.settings_custom = self.main.settings_custom['measures']['readability'] - # Al-Heeti's Readability Prediction Formula - self.group_box_rd = QGroupBox(self.tr("Al-Heeti's Readability Prediction Formula"), self) + # Al-Heeti's Readability Formula + self.group_box_rd = QGroupBox(self.tr("Al-Heeti's Readability Formula"), self) self.label_rd_variant = QLabel(self.tr('Variant:'), self) self.combo_box_rd_variant = wl_boxes.Wl_Combo_Box(self) @@ -207,16 +207,16 @@ def __init__(self, main): self.group_box_nws.layout().setColumnStretch(2, 1) - # Spache Grade Level - self.group_box_spache_grade_lvl = QGroupBox(self.tr('Spache Grade Level'), self) + # Spache Readability Formula + self.group_box_spache_readability_formula = QGroupBox(self.tr('Spache Readability Formula'), self) self.checkbox_use_rev_formula = QCheckBox(self.tr('Use revised formula'), self) - self.group_box_spache_grade_lvl.setLayout(wl_layouts.Wl_Layout()) - self.group_box_spache_grade_lvl.layout().addWidget(self.checkbox_use_rev_formula, 0, 0) + self.group_box_spache_readability_formula.setLayout(wl_layouts.Wl_Layout()) + self.group_box_spache_readability_formula.layout().addWidget(self.checkbox_use_rev_formula, 0, 0) - # Tränkle & Bailer's Readability Formula - self.group_box_trankle_bailers_readability_formula = QGroupBox(self.tr("Tränkle & Bailer's Readability Formula"), self) + # Tränkle-Bailer's Readability Formula + self.group_box_trankle_bailers_readability_formula = QGroupBox(self.tr("Tränkle-Bailer's Readability Formula"), self) self.label_trankle_bailers_readability_formula_variant = QLabel(self.tr('Variant:'), self) self.combo_box_trankle_bailers_readability_formula_variant = wl_boxes.Wl_Combo_Box(self) @@ -242,7 +242,7 @@ def __init__(self, main): self.layout().addWidget(self.group_box_lorge_readability_index, 9, 0) self.layout().addWidget(self.group_box_nwl, 10, 0) self.layout().addWidget(self.group_box_nws, 11, 0) - self.layout().addWidget(self.group_box_spache_grade_lvl, 12, 0) + self.layout().addWidget(self.group_box_spache_readability_formula, 12, 0) self.layout().addWidget(self.group_box_trankle_bailers_readability_formula, 13, 0) self.layout().setContentsMargins(6, 4, 6, 4) @@ -262,7 +262,7 @@ def load_settings(self, defaults = False): else: settings = copy.deepcopy(self.settings_custom) - # Al-Heeti's Readability Prediction Formula + # Al-Heeti's Readability Formula self.combo_box_rd_variant.setCurrentText(settings['rd']['variant']) # Automated Readability Index @@ -300,14 +300,14 @@ def load_settings(self, defaults = False): # neue Wiener Sachtextformel self.combo_box_nws_variant.setCurrentText(settings['nws']['variant']) - # Spache Grade Level - self.checkbox_use_rev_formula.setChecked(settings['spache_grade_lvl']['use_rev_formula']) + # Spache Readability Formula + self.checkbox_use_rev_formula.setChecked(settings['spache_readability_formula']['use_rev_formula']) - # Tränkle & Bailer's Readability Formula + # Tränkle-Bailer's Readability Formula self.combo_box_trankle_bailers_readability_formula_variant.setCurrentText(settings['trankle_bailers_readability_formula']['variant']) def apply_settings(self): - # Al-Heeti's Readability Prediction Formula + # Al-Heeti's Readability Formula self.settings_custom['rd']['variant'] = self.combo_box_rd_variant.currentText() # Automated Readability Index @@ -345,10 +345,10 @@ def apply_settings(self): # neue Wiener Sachtextformel self.settings_custom['nws']['variant'] = self.combo_box_nws_variant.currentText() - # Spache Grade Level - self.settings_custom['spache_grade_lvl']['use_rev_formula'] = self.checkbox_use_rev_formula.isChecked() + # Spache Readability Formula + self.settings_custom['spache_readability_formula']['use_rev_formula'] = self.checkbox_use_rev_formula.isChecked() - # Tränkle & Bailer's Readability Formula + # Tränkle-Bailer's Readability Formula self.settings_custom['trankle_bailers_readability_formula']['variant'] = self.combo_box_trankle_bailers_readability_formula_variant.currentText() return True @@ -760,33 +760,33 @@ def __init__(self, main): self.group_box_students_t_test_2_sample.layout().setColumnStretch(2, 1) - # z-score - self.group_box_z_score = QGroupBox(self.tr('z-score'), self) + # Z-test + self.group_box_z_test = QGroupBox(self.tr('Z-test'), self) ( - self.label_z_score_direction, - self.combo_box_z_score_direction + self.label_z_test_direction, + self.combo_box_z_test_direction ) = wl_widgets.wl_widgets_direction(self) - self.group_box_z_score.setLayout(wl_layouts.Wl_Layout()) - self.group_box_z_score.layout().addWidget(self.label_z_score_direction, 0, 0) - self.group_box_z_score.layout().addWidget(self.combo_box_z_score_direction, 0, 1) + self.group_box_z_test.setLayout(wl_layouts.Wl_Layout()) + self.group_box_z_test.layout().addWidget(self.label_z_test_direction, 0, 0) + self.group_box_z_test.layout().addWidget(self.combo_box_z_test_direction, 0, 1) - self.group_box_z_score.layout().setColumnStretch(2, 1) + self.group_box_z_test.layout().setColumnStretch(2, 1) - # z-score (Berry-Rogghe) - self.group_box_z_score_berry_rogghe = QGroupBox(self.tr('z-score (Berry-Rogghe)'), self) + # Z-test (Berry-Rogghe) + self.group_box_z_test_berry_rogghe = QGroupBox(self.tr('Z-test (Berry-Rogghe)'), self) ( - self.label_z_score_berry_rogghe_direction, - self.combo_box_z_score_berry_rogghe_direction + self.label_z_test_berry_rogghe_direction, + self.combo_box_z_test_berry_rogghe_direction ) = wl_widgets.wl_widgets_direction(self) - self.group_box_z_score_berry_rogghe.setLayout(wl_layouts.Wl_Layout()) - self.group_box_z_score_berry_rogghe.layout().addWidget(self.label_z_score_berry_rogghe_direction, 0, 0) - self.group_box_z_score_berry_rogghe.layout().addWidget(self.combo_box_z_score_berry_rogghe_direction, 0, 1) + self.group_box_z_test_berry_rogghe.setLayout(wl_layouts.Wl_Layout()) + self.group_box_z_test_berry_rogghe.layout().addWidget(self.label_z_test_berry_rogghe_direction, 0, 0) + self.group_box_z_test_berry_rogghe.layout().addWidget(self.combo_box_z_test_berry_rogghe_direction, 0, 1) - self.group_box_z_score_berry_rogghe.layout().setColumnStretch(2, 1) + self.group_box_z_test_berry_rogghe.layout().setColumnStretch(2, 1) self.setLayout(wl_layouts.Wl_Layout()) self.layout().addWidget(self.group_box_fishers_exact_test, 0, 0) @@ -795,8 +795,8 @@ def __init__(self, main): self.layout().addWidget(self.group_box_pearsons_chi_squared_test, 3, 0) self.layout().addWidget(self.group_box_students_t_test_1_sample, 4, 0) self.layout().addWidget(self.group_box_students_t_test_2_sample, 5, 0) - self.layout().addWidget(self.group_box_z_score, 6, 0) - self.layout().addWidget(self.group_box_z_score_berry_rogghe, 7, 0) + self.layout().addWidget(self.group_box_z_test, 6, 0) + self.layout().addWidget(self.group_box_z_test_berry_rogghe, 7, 0) self.layout().setContentsMargins(6, 4, 6, 4) self.layout().setRowStretch(8, 1) @@ -830,11 +830,11 @@ def load_settings(self, defaults = False): self.combo_box_students_t_test_2_sample_use_data.setCurrentText(settings['students_t_test_2_sample']['use_data']) self.combo_box_students_t_test_2_sample_direction.setCurrentText(settings['students_t_test_2_sample']['direction']) - # z-score - self.combo_box_z_score_direction.setCurrentText(settings['z_score']['direction']) + # Z-test + self.combo_box_z_test_direction.setCurrentText(settings['z_test']['direction']) - # z-score (Berry-Rogghe) - self.combo_box_z_score_berry_rogghe_direction.setCurrentText(settings['z_score_berry_rogghe']['direction']) + # Z-test (Berry-Rogghe) + self.combo_box_z_test_berry_rogghe_direction.setCurrentText(settings['z_test_berry_rogghe']['direction']) def apply_settings(self): # Fisher's Exact Test @@ -860,11 +860,11 @@ def apply_settings(self): self.settings_custom['students_t_test_2_sample']['use_data'] = self.combo_box_students_t_test_2_sample_use_data.currentText() self.settings_custom['students_t_test_2_sample']['direction'] = self.combo_box_students_t_test_2_sample_direction.currentText() - # z-score - self.settings_custom['z_score']['direction'] = self.combo_box_z_score_direction.currentText() + # Z-test + self.settings_custom['z_test']['direction'] = self.combo_box_z_test_direction.currentText() - # z-score (Berry-Rogghe) - self.settings_custom['z_score_berry_rogghe']['direction'] = self.combo_box_z_score_berry_rogghe_direction.currentText() + # Z-test (Berry-Rogghe) + self.settings_custom['z_test_berry_rogghe']['direction'] = self.combo_box_z_test_berry_rogghe_direction.currentText() return True