diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4fa1bfd27..ef5c161f0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -20,7 +20,7 @@
## [3.6.0](https://github.com/BLKSerene/Wordless/releases/tag/3.6.0) - ??/??/2024
### đ New Features
-- Measures: Add effect size - squared association ratio
+- Measures: Add effect size - conditional probability / squared association ratio
- Utils: Add Stanza's Sindhi dependency parser
### đ Bugfixes
diff --git a/doc/doc.md b/doc/doc.md
index d98138534..36b202682 100644
--- a/doc/doc.md
+++ b/doc/doc.md
@@ -914,6 +914,7 @@ Ukrainian |KOI8-U |â
Urdu |CP1006 |â
Vietnamese |CP1258 |â
+
### [12.4 Supported Measures](#doc)
@@ -946,8 +947,6 @@ The following variables would be used in formulas:
**NumCharsAlpha**: Number of alphabetic characters (letters, CJK characters, etc.)
-Test of Statistical Significance|Measure of Bayes Factor|Formula
---------------------------------|-----------------------|-------
-Fisher's exact test ([Pedersen, 1996](#ref-pedersen-1996))||See: [Fisher's exact test - Wikipedia](https://en.wikipedia.org/wiki/Fisher%27s_exact_test#Example)
-Log-likelihood ratio test ([Dunning, 1993](#ref-dunning-1993))|Log-likelihood ratio test ([Wilson, 2013](#ref-wilson-2013))|![Formula](/doc/measures/statistical_significance/log_likehood_ratio_test.svg)
-Mann-Whitney U test ([Kilgarriff, 2001](#ref-kilgarriff-2001))||See: [MannâWhitney U test - Wikipedia](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Calculations)
-Pearson's chi-squared test ([Hofland & Johansson, 1982](#ref-hofland-johansson-1982); [Oakes, 1998](#ref-oakes-1998))||![Formula](/doc/measures/statistical_significance/pearsons_chi_squared_test.svg)
-Student's t-test (1-sample) ([Church et al., 1991](#ref-church-et-al-1991))||![Formula](/doc/measures/statistical_significance/students_t_test_1_sample.svg)
-Student's t-test (2-sample) ([Paquot & Bestgen, 2009](#ref-paquot-bestgen-2009))|Student's t-test (2-sample) ([Wilson, 2013](#ref-wilson-2013))|![Formula](/doc/measures/statistical_significance/students_t_test_2_sample.svg)
-Z-test ([Dennis, 1964](#ref-dennis-1964))||![Formula](/doc/measures/statistical_significance/z_test.svg)
-Z-test (Berry-Rogghe) ([Berry-Rogghe, 1973](#ref-berry-rogghe-1973))||![Formula](/doc/measures/statistical_significance/z_test_berry_rogghe.svg) where **S** is the average span size on both sides of the node word.
+Test of Statistical Significance|Measure of Bayes Factor|Formula|Collocation Extraction|Keyword Extraction
+--------------------------------|-----------------------|-------|----------------------|------------------
+Fisher's exact test ([Pedersen, 1996](#ref-pedersen-1996); [Kilgarriff, 2001, p. 105](#ref-kilgarriff-2001))||See: [Fisher's exact test - Wikipedia](https://en.wikipedia.org/wiki/Fisher%27s_exact_test#Example)|â|â
+Log-likelihood ratio test ([Dunning, 1993](#ref-dunning-1993); [Kilgarriff, 2001, p. 105](#ref-kilgarriff-2001))|Log-likelihood ratio test ([Wilson, 2013](#ref-wilson-2013))|![Formula](/doc/measures/statistical_significance/log_likehood_ratio_test.svg)|â|â
+Mann-Whitney U test ([Kilgarriff, 2001, pp. 103â104](#ref-kilgarriff-2001))||See: [MannâWhitney U test - Wikipedia](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Calculations)|âď¸|â
+Pearson's chi-squared test ([Hofland & Johansson, 1982, p. 12](#ref-hofland-johansson-1982); [Dunning, 1993, p. 63](#ref-dunning-1993); [Oakes, 1998, p. 25](#ref-oakes-1998))||![Formula](/doc/measures/statistical_significance/pearsons_chi_squared_test.svg)|â|â
+Student's t-test (1-sample) ([Church et al., 1991, pp. 120â126](#ref-church-et-al-1991))||![Formula](/doc/measures/statistical_significance/students_t_test_1_sample.svg)|â|âď¸
+Student's t-test (2-sample) ([Paquot & Bestgen, 2009, pp. 252â253](#ref-paquot-bestgen-2009))|Student's t-test (2-sample) ([Wilson, 2013](#ref-wilson-2013))|![Formula](/doc/measures/statistical_significance/students_t_test_2_sample.svg)|âď¸|â
+Z-test ([Dennis, 1964, p. 69](#ref-dennis-1964))||![Formula](/doc/measures/statistical_significance/z_test.svg)|â|âď¸
+Z-test (Berry-Rogghe) ([Berry-Rogghe, 1973](#ref-berry-rogghe-1973))||![Formula](/doc/measures/statistical_significance/z_test_berry_rogghe.svg) where **S** is the average span size on both sides of the node word.|â|âď¸
-Measure of Effect Size|Formula
-----------------------|-------
-%DIFF ([Gabrielatos & Marchi, 2011](#ref-gabrielatos-marchi-2011))|![Formula](/doc/measures/effect_size/pct_diff.svg)
-Cubic association ratio ([Daille, 1994](#ref-daille-1994))|![Formula](/doc/measures/effect_size/im3.svg)
-Dice-Sørensen coefficient ([Smadja et al., 1996](#ref-smadja-et-al-1996))|![Formula](/doc/measures/effect_size/dice_sorensen_coeff.svg)
-Difference coefficient ([Hofland & Johansson, 1982](#ref-hofland-johansson-1982); [Gabrielatos, 2018](#ref-gabrielatos-2018))|![Formula](/doc/measures/effect_size/diff_coeff.svg)
-Jaccard index ([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/jaccard_index.svg)
-Kilgarriff's ratio ([Kilgarriff, 2009](#ref-kilgarriff-2009))|![Formula](/doc/measures/effect_size/kilgarriffs_ratio.svg) where **Îą** is the smoothing parameter, whose value could be changed via **Menu Bar â Preferences â Settings â Measures â Effect Size â Kilgarriff's Ratio â Smoothing Parameter**.
-logDice ([RychlĂ˝, 2008](#ref-rychly-2008))|![Formula](/doc/measures/effect_size/log_dice.svg)
-Log-frequency biased MD ([Thanopoulos et al., 2002](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/lfmd.svg)
-Log Ratio ([Hardie, 2014](#ref-hardie-2014))|![Formula](/doc/measures/effect_size/log_ratio.svg)
-MI.log-f ([Kilgarriff & Tugwell, 2002](#ref-kilgarriff-tugwell-2002); [Lexical Computing Ltd., 2015](#ref-lexical-computing-ltd-2015))|![Formula](/doc/measures/effect_size/mi_log_f.svg)
-Minimum sensitivity ([Pedersen, 1998](#ref-pedersen-1998))|![Formula](/doc/measures/effect_size/min_sensitivity.svg)
-Mutual Dependency ([Thanopoulos et al., 2002](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/md.svg)
-Mutual Expectation ([Dias et al., 1999](#ref-dias-et-al-1999))|![Formula](/doc/measures/effect_size/me.svg)
-Mutual information ([Dunning, 1998](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/mi.svg)
-Odds ratio ([Pojanapunya & Todd, 2016](#ref-pojanapunya-todd-2016))|![Formula](/doc/measures/effect_size/odds_ratio.svg)
-Pointwise mutual information ([Church & Hanks, 1990](#ref-church-hanks-1990))|![Formula](/doc/measures/effect_size/pmi.svg)
-Poisson collocation measure ([Quasthoff & Wolff, 2002](#ref-quasthoff-wolff-2002))|![Formula](/doc/measures/effect_size/poisson_collocation_measure.svg)
-Squared association ratio ([Daille, 1995](#ref-daille-1995))|![Formula](/doc/measures/effect_size/im2.svg)
-Squared phi coefficient ([Church & Gale, 1991](#ref-church-gale-1991))|![Formula](/doc/measures/effect_size/squared_phi_coeff.svg)
+Measure of Effect Size|Formula|Collocation Extraction|Keyword Extraction
+----------------------|-------|----------------------|------------------
+%DIFF ([Gabrielatos & Marchi, 2011](#ref-gabrielatos-marchi-2011))|![Formula](/doc/measures/effect_size/pct_diff.svg)|âď¸|â
+Conditional probability ([Durrant, 2008, p. 84](#ref-durrant-2008))|![Formula](/doc/measures/effect_size/conditional_probability.svg)|â|âď¸
+Cubic association ratio ([Daille, 1994, p. 139](#ref-daille-1994); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im3.svg)|â|â
+Dice-Sørensen coefficient ([Smadja et al., 1996, p. 8](#ref-smadja-et-al-1996))|![Formula](/doc/measures/effect_size/dice_sorensen_coeff.svg)|â|âď¸
+Difference coefficient ([Hofland & Johansson, 1982, p. 14](#ref-hofland-johansson-1982); [Gabrielatos, 2018, p. 236](#ref-gabrielatos-2018))|![Formula](/doc/measures/effect_size/diff_coeff.svg)|âď¸|â
+Jaccard index ([Dunning, 1998, p. 48](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/jaccard_index.svg)|â|âď¸
+Kilgarriff's ratio ([Kilgarriff, 2009](#ref-kilgarriff-2009))|![Formula](/doc/measures/effect_size/kilgarriffs_ratio.svg) where **Îą** is the smoothing parameter, whose value could be changed via **Menu Bar â Preferences â Settings â Measures â Effect Size â Kilgarriff's Ratio â Smoothing Parameter**.|âď¸|â
+logDice ([RychlĂ˝, 2008, p. 9](#ref-rychly-2008))|![Formula](/doc/measures/effect_size/log_dice.svg)|â|âď¸
+Log-frequency biased MD ([Thanopoulos et al., 2002, p. 621](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/lfmd.svg)|â|âď¸
+Log Ratio ([Hardie, 2014](#ref-hardie-2014))|![Formula](/doc/measures/effect_size/log_ratio.svg)|â|â
+MI.log-f ([Kilgarriff & Tugwell, 2002](#ref-kilgarriff-tugwell-2002); [Lexical Computing Ltd., 2015, p. 4](#ref-lexical-computing-ltd-2015))|![Formula](/doc/measures/effect_size/mi_log_f.svg)|â|âď¸
+Minimum sensitivity ([Pedersen, 1998](#ref-pedersen-1998))|![Formula](/doc/measures/effect_size/min_sensitivity.svg)|â|âď¸
+Mutual Dependency ([Thanopoulos et al., 2002, p. 621](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/md.svg)|â|âď¸
+Mutual Expectation ([Dias et al., 1999](#ref-dias-et-al-1999))|![Formula](/doc/measures/effect_size/me.svg)|â|âď¸
+Mutual information ([Dunning, 1998, pp. 49â52](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/mi.svg)|â|âď¸
+Odds ratio ([Pecina, 2005, p. 15](#ref-pecina-2005), [Pojanapunya & Todd, 2016](#ref-pojanapunya-todd-2016))|![Formula](/doc/measures/effect_size/odds_ratio.svg)|â|â
+Pointwise mutual information ([Church & Hanks, 1990](#ref-church-hanks-1990); [Kilgarriff, 2001, pp. 104â105](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/pmi.svg)|â|â
+Poisson collocation measure ([Quasthoff & Wolff, 2002](#ref-quasthoff-wolff-2002))|![Formula](/doc/measures/effect_size/poisson_collocation_measure.svg)|â|âď¸
+Squared association ratio ([Daille, 1995, p. 21](#ref-daille-1995); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im2.svg)|â|â
+Squared phi coefficient ([Church & Gale, 1991](#ref-church-gale-1991))|![Formula](/doc/measures/effect_size/squared_phi_coeff.svg)|â|âď¸
## [13 References](#doc)
@@ -1579,7 +1580,7 @@ Measure of Effect Size|Formula
1. [**^**](#ref-cttr) Carroll, J. B. (1964). *Language and thought*. Prentice-Hall.
-1. [**^**](#ref-carrolls-d2) [**^**](#ref-carrolls-um) Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. *Computer Studies in the Humanities and Verbal Behaviour*, *3*(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
+1. [**^**](#ref-carrolls-d2) [**^**](#ref-carrolls-um) Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. *ETS Research Bulletin Series*, *1970*(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
1. [**^**](#ref-rgl) Caylor, J. S., & Sticht, T. G. (1973). *Development of a simple readability index for job reading material*. Human Resource Research Organization. https://ia902703.us.archive.org/31/items/ERIC_ED076707/ERIC_ED076707.pdf
@@ -1613,7 +1614,7 @@ Measure of Effect Size|Formula
1. [**^**](#ref-dawoods-readability-formula) Dawood, B.A.K. (1977). *The relationship between readability and selected language variables* [Unpublished masterâs thesis]. University of Baghdad.
-1. [**^**](#ref-z-test) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Proceedings of the symposium on statistical association methods for mechanized documentation* (pp. 61â148). National Bureau of Standards.
+1. [**^**](#ref-z-test) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Statistical association methods for mechanized documentation: Symposium proceedings* (pp. 61â148). National Bureau of Standards.
1. [**^**](#ref-me) Dias, G., GuillorĂŠ, S., & Pereira Lopes, J. G. (1999). Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In A. Condamines, C. Fabre, & M. PĂŠry-Woodley (Eds.), *TALN'99: 6ème ConfĂŠrence Annuelle Sur le Traitement Automatique des Langues Naturelles* (pp. 333â339). TALN.
@@ -1625,9 +1626,11 @@ Measure of Effect Size|Formula
1. [**^**](#ref-logttr) [**^**](#ref-logttr) Dugast, D. (1979). *Vocabulaire et stylistique: I thÊâtre et dialogue, travaux de linguistique quantitative*. Slatkine.
-1. [**^**](#ref-log-likehood-ratio-test) Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. *Computational Linguistics*, *19*(1), 61â74.
+1. [**^**](#ref-log-likehood-ratio-test) [**^**](#ref-pearsons-chi-squared-test) Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. *Computational Linguistics*, *19*(1), 61â74.
1. [**^**](#ref-jaccard-index) [**^**](#ref-mi) Dunning, T. E. (1998). *Finding structure in text, genome and other symbolic sequences* [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847
+
+1. [**^**](#ref-conditional-probability) Durrant, P. (2008). *High frequency collocations and second language learning* [Doctoral dissertation, University of Nottingham]. Nottingham eTheses. https://eprints.nottingham.ac.uk/10622/1/final_thesis.pdf
1. [**^**](#ref-osman) El-Haj, M., & Rayson, P. (2016). OSMAN: A novel Arabic readability metric. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), *Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)* (pp. 250â255). European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2016/index.html
@@ -1678,11 +1681,11 @@ Linguistic Computing Bulletin*, *7*(2), 172â177.
1. [**^**](#ref-re) Kandel, L., & Moles, A. (1958). Application de lâindice de flesch Ă la langue française. *The Journal of Educational Research*, *21*, 283â287.
-1. [**^**](#ref-mann-whiteney-u-test) Kilgarriff, A. (2001). Comparing corpora. *International Journal of Corpus Linguistics*, *6*(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil
+1. [**^**](#ref-fishers-exact-test) [**^**](#ref-log-likehood-ratio-test) [**^**](#ref-mann-whiteney-u-test) [**^**](#ref-im3) [**^**](#ref-pmi) [**^**](#ref-im2) Kilgarriff, A. (2001). Comparing corpora. *International Journal of Corpus Linguistics*, *6*(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil
-1. [**^**](#ref-kilgarriffs-ratio) Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), *Proceedings of the Corpus Linguistics Conference 2009* (p. 171). University of Liverpool.
+1. [**^**](#ref-kilgarriffs-ratio) Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), *Proceedings of the Corpus Linguistics Conference 2009 (CL2009)* (Article 171). University of Liverpool.
-1. [**^**](#ref-mi-log-f) Kilgarriff, A., & Tugwell, D. (2002). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In *Proceedings of the 8th Machine Translation Summit* (pp. 187â190). European Association for Machine Translation.
+1. [**^**](#ref-mi-log-f) Kilgarriff, A., & Tugwell, D. (2001). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In B. Maegaard (Ed.), *Proceedings of Machine Translation Summit VIII* (pp. 187â190). European Association for Machine Translation.
1. [**^**](#ref-ari) [**^**](#ref-gl) [**^**](#ref-fog-index) Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). *Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel* (Report No. RBR 8-75). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf
@@ -1700,7 +1703,7 @@ Linguistic Computing Bulletin*, *7*(2), 172â177.
1. [**^**](#ref-gulpease) Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. *Scuola e CittĂ *, *39*(3), 110â124.
-1. [**^**](#ref-num-syls-luong-nguyen-dinh-1000) [**^**](#ref-luong-nguyen-dinhs-readability-formula) Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. *2018 10th International Conference on Knowledge and Systems Engineering (KSE)* (pp. 198â202). IEEE. https://doi.org/10.1109/KSE.2018.8573379
+1. [**^**](#ref-num-syls-luong-nguyen-dinh-1000) [**^**](#ref-luong-nguyen-dinhs-readability-formula) Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. In T. M. Phuong & M. L. Nguyen (Eds.), *Proceedings of 2018 10th International Conference on Knowledge and Systems Engineering (KSE)* (pp. 198â202). IEEE. https://doi.org/10.1109/KSE.2018.8573379
1. [**^**](#ref-lynes-d3) Lyne, A. A. (1985). Dispersion. In A. A. Lyne (Ed.), *The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method* (pp. 101â124). Slatkine.
@@ -1710,7 +1713,7 @@ Linguistic Computing Bulletin*, *7*(2), 172â177.
1. [**^**](#ref-eflaw) McAlpine, R. (2006). *From plain English to global English*. Journalism Online. Retrieved October 31, 2024, from https://www.angelfire.com/nd/nirmaldasan/journalismonline/fpetge.html
-1. [**^**](#ref-mtld) McCarthy, P. M. (2005). *An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD)* [Doctoral dissertation, The University of Memphis]. ProQuest Dissertations and Theses Global.
+1. [**^**](#ref-mtld) McCarthy, P. M. (2005). *An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD)* (Publication No. 3199485) [Doctoral dissertation, The University of Memphis]. ProQuest Dissertations and Theses Global.
1. [**^**](#ref-hdd) [**^**](#ref-mtld) McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. *Behavior Research Methods*, *42*(2), 381â392. https://doi.org/10.3758/BRM.42.2.381
@@ -1731,6 +1734,8 @@ Linguistic Computing Bulletin*, *7*(2), 172â177.
1. [**^**](#ref-fishers-exact-test) Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), *Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference* (pp. 188â200). The SouthâCentral Regional SAS Users' Group.
1. [**^**](#ref-min-sensitivity) Pedersen, T. (1998). Dependent bigram identification. In *Proceedings of the Fifteenth National Conference on Artificial Intelligence* (p. 1197). AAAI Press.
+
+1. [**^**](#ref-odds-ratio) Pecina, P. (2005). An extensive empirical study of collocation extraction methods. In C. Callison-Burch & S. Wan (Eds.), *Proceedings of the Student Research Workshop* (pp. 13â18). Association for Computational Linguistics.
1. [**^**](#ref-fog-index) Pisarek, W. (1969). Jak mierzyÄ zrozumiaĹoĹÄ tekstu? *Zeszyty Prasoznawcze*, *4*(42), 35â48.
@@ -1746,7 +1751,7 @@ Linguistic Computing Bulletin*, *7*(2), 172â177.
1. [**^**](#ref-rosengrens-s) [**^**](#ref-rosengrens-kf) Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. *Ătudes de linguistique appliquĂŠe*, *1*, 103â127.
-1. [**^**](#ref-log-dice) RychlĂ˝, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. HorĂĄk (Eds.), *Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing*. Masaryk University
+1. [**^**](#ref-log-dice) RychlĂ˝, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. HorĂĄk (Eds.), *Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing* (pp. 6â9). Masaryk University
1. [**^**](#ref-ald) [**^**](#ref-fald) [**^**](#ref-arf) [**^**](#ref-farf) [**^**](#ref-awt) [**^**](#ref-fawt) SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. *Journal of Quantitative Linguistics*, *9*(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
diff --git a/doc/measures/effect_size/conditional_probability.svg b/doc/measures/effect_size/conditional_probability.svg
new file mode 100644
index 000000000..547f7ed3e
--- /dev/null
+++ b/doc/measures/effect_size/conditional_probability.svg
@@ -0,0 +1,29 @@
+
+
+
\ No newline at end of file
diff --git a/tests/test_colligation_extractor.py b/tests/test_colligation_extractor.py
index 36031b8b0..cadaa7fb7 100644
--- a/tests/test_colligation_extractor.py
+++ b/tests/test_colligation_extractor.py
@@ -34,12 +34,12 @@ def test_colligation_extractor():
tests_statistical_significance = [
test_statistical_significance
for test_statistical_significance, vals in main.settings_global['tests_statistical_significance'].items()
- if vals['collocation_extractor']
+ if vals['collocation']
]
measures_bayes_factor = [
measure_bayes_factor
for measure_bayes_factor, vals in main.settings_global['measures_bayes_factor'].items()
- if vals['collocation_extractor']
+ if vals['collocation']
]
measures_effect_size = list(main.settings_global['measures_effect_size'].keys())
diff --git a/tests/test_collocation_extractor.py b/tests/test_collocation_extractor.py
index 90bc79fb5..b0d16f3b1 100644
--- a/tests/test_collocation_extractor.py
+++ b/tests/test_collocation_extractor.py
@@ -34,12 +34,12 @@ def test_collocation_extractor():
tests_statistical_significance = [
test_statistical_significance
for test_statistical_significance, vals in main.settings_global['tests_statistical_significance'].items()
- if vals['collocation_extractor']
+ if vals['collocation']
]
measures_bayes_factor = [
measure_bayes_factor
for measure_bayes_factor, vals in main.settings_global['measures_bayes_factor'].items()
- if vals['collocation_extractor']
+ if vals['collocation']
]
measures_effect_size = list(main.settings_global['measures_effect_size'].keys())
diff --git a/tests/test_keyword_extractor.py b/tests/test_keyword_extractor.py
index 802512301..d5073c8ce 100644
--- a/tests/test_keyword_extractor.py
+++ b/tests/test_keyword_extractor.py
@@ -31,12 +31,12 @@ def test_keyword_extractor():
tests_statistical_significance = [
test_statistical_significance
for test_statistical_significance, vals in main.settings_global['tests_statistical_significance'].items()
- if vals['keyword_extractor']
+ if vals['keyword']
]
measures_bayes_factor = [
measure_bayes_factor
for measure_bayes_factor, vals in main.settings_global['measures_bayes_factor'].items()
- if vals['keyword_extractor']
+ if vals['keyword']
]
measures_effect_size = list(main.settings_global['measures_effect_size'].keys())
diff --git a/tests/tests_measures/test_measures_adjusted_freq.py b/tests/tests_measures/test_measures_adjusted_freq.py
index db22c8c7a..9e88403b5 100644
--- a/tests/tests_measures/test_measures_adjusted_freq.py
+++ b/tests/tests_measures/test_measures_adjusted_freq.py
@@ -22,7 +22,7 @@
main = wl_test_init.Wl_Test_Main()
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 410)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 410
def test_fald():
assert round(wl_measures_adjusted_freq.fald(main, test_measures_dispersion.TOKENS, 'a'), 3) == 11.764
assert wl_measures_adjusted_freq.fald(main, test_measures_dispersion.TOKENS, 'aa') == 0
@@ -36,9 +36,9 @@ def test_fawt():
assert wl_measures_adjusted_freq.fawt(main, test_measures_dispersion.TOKENS, 'aa') == 0
# References:
-# Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/
-# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. (p. 122)
-# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 409)
+# Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x | p. 13
+# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. | p. 122
+# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 409
def test_carrolls_um():
assert round(wl_measures_adjusted_freq.carrolls_um(main, [2, 1, 1, 1, 0]), 2) == 4.31
assert round(wl_measures_adjusted_freq.carrolls_um(main, [4, 2, 1, 1, 0]), 3) == 6.424
@@ -46,9 +46,9 @@ def test_carrolls_um():
assert wl_measures_adjusted_freq.carrolls_um(main, [0, 0, 0, 0, 0]) == 0
# References
-# Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
-# Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. Ătudes de linguistique appliquĂŠe, 1, 103â127. (p. 115)
-# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. (p. 122)
+# Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x | p. 14
+# Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. Ătudes de linguistique appliquĂŠe, 1, 103â127. | p. 115
+# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. | p. 122
def test_juillands_u():
assert round(wl_measures_adjusted_freq.juillands_u(main, [0, 4, 3, 2, 1]), 2) == 6.46
assert round(wl_measures_adjusted_freq.juillands_u(main, [2, 2, 2, 2, 2]), 0) == 10
@@ -56,9 +56,9 @@ def test_juillands_u():
assert wl_measures_adjusted_freq.juillands_u(main, [0, 0, 0, 0, 0]) == 0
# References:
-# Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. Ătudes de linguistique appliquĂŠe, 1, 103â127. (p. 117)
-# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. (p. 122)
-# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 409)
+# Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. Ătudes de linguistique appliquĂŠe, 1, 103â127. | p. 117
+# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. | p. 122
+# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 409
def test_rosengres_kf():
assert round(wl_measures_adjusted_freq.rosengrens_kf(main, [2, 2, 2, 2, 1]), 2) == 8.86
assert round(wl_measures_adjusted_freq.rosengrens_kf(main, [4, 2, 1, 1, 0]), 3) == 5.863
@@ -66,14 +66,14 @@ def test_rosengres_kf():
assert wl_measures_adjusted_freq.rosengrens_kf(main, [0, 0, 0, 0, 0]) == 0
# References:
-# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. (p. 122)
-# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 409)
+# Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. | p. 122
+# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 409
def test_engwalls_fm():
assert round(wl_measures_adjusted_freq.engwalls_fm(main, [4, 2, 1, 1, 0]), 1) == 6.4
assert round(wl_measures_adjusted_freq.engwalls_fm(main, [1, 2, 3, 4, 5]), 0) == 15
assert wl_measures_adjusted_freq.engwalls_fm(main, [0, 0, 0, 0, 0]) == 0
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 409)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 409
def test_kromers_ur():
assert round(wl_measures_adjusted_freq.kromers_ur(main, [2, 1, 1, 1, 0]), 1) == 4.5
assert wl_measures_adjusted_freq.kromers_ur(main, [0, 0, 0, 0, 0]) == 0
diff --git a/tests/tests_measures/test_measures_dispersion.py b/tests/tests_measures/test_measures_dispersion.py
index 35fd1329a..990ba65fa 100644
--- a/tests/tests_measures/test_measures_dispersion.py
+++ b/tests/tests_measures/test_measures_dispersion.py
@@ -21,7 +21,7 @@
main = wl_test_init.Wl_Test_Main()
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (pp. 406, 410)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | pp. 406, 410
TOKENS = 'b a m n i b e u p k | b a s a t b e w q n | b c a g a b e s t a | b a g h a b e a a t | b a h a a b e a x a'.replace('|', '').split()
DISTS = [2, 10, 2, 9, 2, 5, 2, 3, 3, 1, 3, 2, 1, 3, 2]
@@ -41,14 +41,14 @@ def test_awt():
assert wl_measures_dispersion.awt(main, TOKENS, 'a') == 3.18
assert wl_measures_dispersion.awt(main, TOKENS, 'aa') == 0
-# Reference: Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
+# Reference: Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x | p. 13
def test_carrolls_d2():
assert round(wl_measures_dispersion.carrolls_d2(main, [2, 1, 1, 1, 0]), 4) == 0.8277
assert wl_measures_dispersion.carrolls_d2(main, [0, 0, 0, 0]) == 0
# References:
-# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 416)
-# Lijffijt, J., & Gries, S. T. (2012). Correction to Stefan Th. Griesâ âdispersions and adjusted frequencies in corporaâ International Journal of Corpus Linguistics, 17(1), 147â149. https://doi.org/10.1075/ijcl.17.1.08lij (p. 148)
+# Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 416
+# Lijffijt, J., & Gries, S. T. (2012). Correction to Stefan Th. Griesâ âdispersions and adjusted frequencies in corporaâ International Journal of Corpus Linguistics, 17(1), 147â149. https://doi.org/10.1075/ijcl.17.1.08lij | p. 148
def test_griess_dp():
main.settings_custom['measures']['dispersion']['griess_dp']['apply_normalization'] = False
@@ -60,22 +60,22 @@ def test_griess_dp():
assert round(wl_measures_dispersion.griess_dp(main, [2, 1, 0]), 1) == 0.5
assert wl_measures_dispersion.griess_dp(main, [0, 0, 0, 0]) == 0
-# Reference: Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
+# Reference: Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x | p. 14
def test_juillands_d():
assert round(wl_measures_dispersion.juillands_d(main, [0, 4, 3, 2, 1]), 4) == 0.6464
assert wl_measures_dispersion.juillands_d(main, [0, 0, 0, 0]) == 0
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 408)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 408
def test_lynes_d3():
assert round(wl_measures_dispersion.lynes_d3(main, [1, 2, 3, 4, 5]), 3) == 0.944
assert wl_measures_dispersion.lynes_d3(main, [0, 0, 0, 0]) == 0
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 407)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 407
def test_rosengrens_s():
assert round(wl_measures_dispersion.rosengrens_s(main, [1, 2, 3, 4, 5]), 3) == 0.937
assert wl_measures_dispersion.rosengrens_s(main, [0, 0, 0, 0]) == 0
-# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri (p. 408)
+# Reference: Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403â437. https://doi.org/10.1075/ijcl.13.4.02gri | p. 408
def test_zhangs_distributional_consistency():
assert round(wl_measures_dispersion.zhangs_distributional_consistency(main, [1, 2, 3, 4, 5]), 3) == 0.937
assert wl_measures_dispersion.zhangs_distributional_consistency(main, [0, 0, 0, 0]) == 0
diff --git a/tests/tests_measures/test_measures_effect_size.py b/tests/tests_measures/test_measures_effect_size.py
index f29af43c1..fb0d9aed4 100644
--- a/tests/tests_measures/test_measures_effect_size.py
+++ b/tests/tests_measures/test_measures_effect_size.py
@@ -35,17 +35,17 @@ def assert_zeros(func, result = 0):
numpy.array([result] * 10)
)
-# Reference: Gabrielatos, C., & Marchi, A. (2012, September 13â14). Keyness: Appropriate metrics and practical issues [Conference session]. CADS International Conference 2012, University of Bologna, Italy. (pp. 21-22)
+# Reference: Gabrielatos, C., & Marchi, A. (2011, November 5). Keyness: Matching metrics to definitions [Conference session]. Corpus Linguistics in the South 1, University of Portsmouth, United Kingdom. https://eprints.lancs.ac.uk/id/eprint/51449/4/Gabrielatos_Marchi_Keyness.pdf | p. 18
def test_pct_diff():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.pct_diff(
main,
- numpy.array([20] * 2),
- numpy.array([1] * 2),
- numpy.array([29954 - 20] * 2),
- numpy.array([23691 - 1] * 2)
- ), 2),
- numpy.array([1481.83] * 2)
+ numpy.array([206523] * 2),
+ numpy.array([178174] * 2),
+ numpy.array([959641 - 206523] * 2),
+ numpy.array([1562358 - 178174] * 2)
+ ), 1),
+ numpy.array([88.7] * 2)
)
numpy.testing.assert_array_equal(
@@ -59,10 +59,23 @@ def test_pct_diff():
numpy.array([float('-inf'), float('inf'), 0])
)
+# Reference: Durrant, P. (2008). High frequency collocations and second language learning [Doctoral dissertation, University of Nottingham]. Nottingham eTheses. https://eprints.nottingham.ac.uk/10622/1/final_thesis.pdf | pp. 80, 84
+def test_conditional_probability():
+ numpy.testing.assert_array_equal(
+ numpy.round(wl_measures_effect_size.conditional_probability(
+ main,
+ numpy.array([28, 28]),
+ numpy.array([8002, 15740]),
+ numpy.array([15740, 8002]),
+ numpy.array([97596164, 97596164])
+ ), 3),
+ numpy.array([0.178, 0.349])
+ )
+
def test_im3():
assert_zeros(wl_measures_effect_size.im3)
-# Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), pp. 1â38. (p. 13)
+# Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), pp. 1â38. | p. 13
def test_dice_sorensen_coeff():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.dice_sorensen_coeff(
@@ -77,7 +90,7 @@ def test_dice_sorensen_coeff():
assert_zeros(wl_measures_effect_size.dice_sorensen_coeff)
-# Reference: Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. (p. 471)
+# Reference: Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. | p. 471
def test_diff_coeff():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.diff_coeff(
@@ -95,7 +108,7 @@ def test_diff_coeff():
def test_jaccard_index():
assert_zeros(wl_measures_effect_size.jaccard_index)
-# Reference: Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (p. 171). University of Liverpool.
+# Reference: Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (CL2009) (Article 171). University of Liverpool.
def test_kilgarriffs_ratio():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.kilgarriffs_ratio(
@@ -164,7 +177,7 @@ def test_md():
def test_me():
assert_zeros(wl_measures_effect_size.me)
-# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 (p. 51)
+# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 | p. 51
def test_mi():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.mi(
@@ -179,7 +192,7 @@ def test_mi():
assert_zeros(wl_measures_effect_size.mi)
-# Reference: Pojanapunya, P., & Todd, R. W. (2016). Log-likelihood and odds ratio keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 15(1), pp. 133â167. https://doi.org/10.1515/cllt-2015-0030 (p. 154)
+# Reference: Pojanapunya, P., & Todd, R. W. (2016). Log-likelihood and odds ratio keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 15(1), pp. 133â167. https://doi.org/10.1515/cllt-2015-0030 | p. 154
def test_odds_ratio():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.odds_ratio(
@@ -203,7 +216,7 @@ def test_odds_ratio():
numpy.array([float('-inf'), float('inf'), 0])
)
-# Reference: Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22â29. (p. 24)
+# Reference: Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22â29. | p. 24
def test_pmi():
numpy.testing.assert_array_equal(
numpy.round(wl_measures_effect_size.pmi(
@@ -241,6 +254,7 @@ def test_squared_phi_coeff():
if __name__ == '__main__':
test_pct_diff()
+ test_conditional_probability()
test_im3()
test_dice_sorensen_coeff()
test_diff_coeff()
diff --git a/tests/tests_measures/test_measures_lexical_density_diversity.py b/tests/tests_measures/test_measures_lexical_density_diversity.py
index a64edf280..729e084d9 100644
--- a/tests/tests_measures/test_measures_lexical_density_diversity.py
+++ b/tests/tests_measures/test_measures_lexical_density_diversity.py
@@ -30,7 +30,7 @@
TOKENS_101 = ['This', 'is', 'a', 'sentence', '.'] * 20 + ['another']
TOKENS_1000 = ['This', 'is', 'a', 'sentence', '.'] * 200
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 26). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 26
TOKENS_225 = [1] * 11 + [2, 3] * 9 + [4] * 7 + [5, 6] * 6 + [7, 8] * 5 + list(range(9, 16)) * 4 + list(range(16, 22)) * 3 + list(range(22, 40)) * 2 + list(range(40, 125))
def get_test_text(tokens):
@@ -130,31 +130,31 @@ def test_popescu_macutek_altmanns_b1_b2_b3_b4_b5():
assert round(b4, 3) == 0.078
assert round(b5, 3) == 0.664
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 30). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 30
def test_popescus_r1():
r1 = wl_measures_lexical_density_diversity.popescus_r1(main, text_tokens_225)
assert round(r1, 4) == 0.8667
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 39). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 39
def test_popescus_r2():
r2 = wl_measures_lexical_density_diversity.popescus_r2(main, text_tokens_225)
assert round(r2, 3) == 0.871
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 51). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 51
def test_popescus_r3():
r3 = wl_measures_lexical_density_diversity.popescus_r3(main, text_tokens_225)
assert round(r3, 4) == 0.3778
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 59). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 59
def test_popescus_r4():
r4 = wl_measures_lexical_density_diversity.popescus_r4(main, text_tokens_225)
assert round(r4, 4) == 0.6344
-# Reference: Popescu, I.-I. (2009). Word frequency studies (pp. 170, 172). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | pp. 170, 172
def test_repeat_rate():
settings['repeat_rate']['use_data'] = 'Rank-frequency distribution'
rr_distribution = wl_measures_lexical_density_diversity.repeat_rate(main, text_tokens_225)
@@ -169,7 +169,7 @@ def test_rttr():
assert rttr == 5 / 100 ** 0.5
-# Reference: Popescu, I.-I. (2009). Word frequency studies (pp. 176, 178). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | pp. 176, 178
def test_shannon_entropy():
settings['shannon_entropy']['use_data'] = 'Rank-frequency distribution'
h_distribution = wl_measures_lexical_density_diversity.shannon_entropy(main, text_tokens_225)
diff --git a/tests/tests_measures/test_measures_statistical_significance.py b/tests/tests_measures/test_measures_statistical_significance.py
index d9d925cfa..18aa93813 100644
--- a/tests/tests_measures/test_measures_statistical_significance.py
+++ b/tests/tests_measures/test_measures_statistical_significance.py
@@ -55,7 +55,7 @@ def test_get_alt():
assert wl_measures_statistical_significance.get_alt('Left-tailed') == 'less'
assert wl_measures_statistical_significance.get_alt('Right-tailed') == 'greater'
-# References: Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference (pp. 188-200). The SouthâCentral Regional SAS Users' Group. (p. 10)
+# References: Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference (pp. 188â200). The SouthâCentral Regional SAS Users' Group. | p. 10
def test_fishers_exact_test():
settings['fishers_exact_test']['direction'] = 'Two-tailed'
test_stats, p_vals = wl_measures_statistical_significance.fishers_exact_test(
@@ -100,7 +100,7 @@ def test_fishers_exact_test():
assert test_stats == [None] * 2
numpy.testing.assert_array_equal(p_vals, numpy.array([1] * 2))
-# References: Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61â74. (p. 72)
+# References: Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61â74. | p. 72
def test_log_likelihood_ratio_test():
settings['log_likelihood_ratio_test']['apply_correction'] = False
gs, _ = wl_measures_statistical_significance.log_likelihood_ratio_test(
@@ -134,7 +134,7 @@ def test_log_likelihood_ratio_test():
numpy.testing.assert_array_equal(gs, numpy.array([0, 0]))
numpy.testing.assert_array_equal(p_vals, numpy.array([1, 1]))
-# References: Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil (p. 238)
+# References: Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil | p. 238
def test_mann_whitney_u_test():
u1s, _ = wl_measures_statistical_significance.mann_whitney_u_test(
main,
@@ -175,8 +175,8 @@ def test_mann_whitney_u_test():
)
# References:
-# Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61â74. (p. 73)
-# Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference (pp. 188-200). The SouthâCentral Regional SAS Users' Group. (p. 10)
+# Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61â74. | p. 73
+# Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference (pp. 188â200). The SouthâCentral Regional SAS Users' Group. | p. 10
def test_pearsons_chi_squared_test():
settings['pearsons_chi_squared_test']['apply_correction'] = False
chi2s, _ = wl_measures_statistical_significance.pearsons_chi_squared_test(
@@ -209,7 +209,7 @@ def test_pearsons_chi_squared_test():
numpy.testing.assert_array_equal(chi2s, numpy.array([0] * 2))
numpy.testing.assert_array_equal(p_vals, numpy.array([1] * 2))
-# Manning, C. D., & SchĂźtze, H. (1999). Foundations of statistical natural language processing. MIT Press. (pp. 164-165)
+# Manning, C. D., & SchĂźtze, H. (1999). Foundations of statistical natural language processing. MIT Press. | pp. 164â165
def test_students_t_test_1_sample():
t_stats, _ = wl_measures_statistical_significance.students_t_test_1_sample(
main,
diff --git a/tests/tests_widgets/test_widgets.py b/tests/tests_widgets/test_widgets.py
index a60cbf35e..a58cd43a7 100644
--- a/tests/tests_widgets/test_widgets.py
+++ b/tests/tests_widgets/test_widgets.py
@@ -125,11 +125,11 @@ def test_wl_widgets_search_settings_tokens():
def test_wl_widgets_context_settings():
wl_widgets.wl_widgets_context_settings(main, tab = 'concordancer')
-def test_wl_widgets_measures_wordlist_generator():
- wl_widgets.wl_widgets_measures_wordlist_generator(main)
+def test_wl_widgets_measures_wordlist_ngram_generation():
+ wl_widgets.wl_widgets_measures_wordlist_ngram_generation(main)
-def test_wl_widgets_measures_collocation_extractor():
- wl_widgets.wl_widgets_measures_collocation_extractor(main, tab = 'collocation_extractor')
+def test_wl_widgets_measures_collocation_keyword_extraction():
+ wl_widgets.wl_widgets_measures_collocation_keyword_extraction(main, tab = 'collocation_extractor')
def test_wl_widgets_table_settings():
table = QTableView()
@@ -223,8 +223,8 @@ def test_wl_widgets_direction():
test_wl_widgets_search_settings()
test_wl_widgets_context_settings()
- test_wl_widgets_measures_wordlist_generator()
- test_wl_widgets_measures_collocation_extractor()
+ test_wl_widgets_measures_wordlist_ngram_generation()
+ test_wl_widgets_measures_collocation_keyword_extraction()
test_wl_widgets_table_settings()
test_wl_widgets_table_settings_span_position()
diff --git a/wordless/wl_colligation_extractor.py b/wordless/wl_colligation_extractor.py
index c61cb65c2..6d31aaf82 100644
--- a/wordless/wl_colligation_extractor.py
+++ b/wordless/wl_colligation_extractor.py
@@ -214,7 +214,10 @@ def __init__(self, main):
self.combo_box_measure_bayes_factor,
self.label_measure_effect_size,
self.combo_box_measure_effect_size
- ) = wl_widgets.wl_widgets_measures_collocation_extractor(self, tab = 'collocation_extractor')
+ ) = wl_widgets.wl_widgets_measures_collocation_keyword_extraction(
+ self,
+ extraction_type = 'collocation'
+ )
self.combo_box_limit_searching.addItems([
self.tr('None'),
diff --git a/wordless/wl_collocation_extractor.py b/wordless/wl_collocation_extractor.py
index 4c13aff35..65ae8e281 100644
--- a/wordless/wl_collocation_extractor.py
+++ b/wordless/wl_collocation_extractor.py
@@ -213,7 +213,10 @@ def __init__(self, main):
self.combo_box_measure_bayes_factor,
self.label_measure_effect_size,
self.combo_box_measure_effect_size
- ) = wl_widgets.wl_widgets_measures_collocation_extractor(self, tab = 'collocation_extractor')
+ ) = wl_widgets.wl_widgets_measures_collocation_keyword_extraction(
+ self,
+ extraction_type = 'collocation'
+ )
self.combo_box_limit_searching.addItems([
self.tr('None'),
diff --git a/wordless/wl_keyword_extractor.py b/wordless/wl_keyword_extractor.py
index b893d9711..e9765a10d 100644
--- a/wordless/wl_keyword_extractor.py
+++ b/wordless/wl_keyword_extractor.py
@@ -128,7 +128,10 @@ def __init__(self, main):
self.combo_box_measure_bayes_factor,
self.label_measure_effect_size,
self.combo_box_measure_effect_size
- ) = wl_widgets.wl_widgets_measures_collocation_extractor(self, tab = 'keyword_extractor')
+ ) = wl_widgets.wl_widgets_measures_collocation_keyword_extraction(
+ self,
+ extraction_type = 'keyword'
+ )
self.combo_box_test_statistical_significance.currentTextChanged.connect(self.generation_settings_changed)
self.combo_box_measure_bayes_factor.currentTextChanged.connect(self.generation_settings_changed)
diff --git a/wordless/wl_measures/wl_measures_adjusted_freq.py b/wordless/wl_measures/wl_measures_adjusted_freq.py
index eff4fabd4..612ae628e 100644
--- a/wordless/wl_measures/wl_measures_adjusted_freq.py
+++ b/wordless/wl_measures/wl_measures_adjusted_freq.py
@@ -26,8 +26,8 @@
# Euler-Mascheroni Constant
C = -scipy.special.digamma(1)
-# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
# Average logarithmic distance
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def fald(main, tokens, search_term):
dists = wl_measures_dispersion._get_dists(tokens, search_term)
@@ -40,10 +40,12 @@ def fald(main, tokens, search_term):
return fald
# Average reduced frequency
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def farf(main, tokens, search_term):
return wl_measures_dispersion.arf(main, tokens, search_term)
# Average waiting time
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def fawt(main, tokens, search_term):
dists = wl_measures_dispersion._get_dists(tokens, search_term)
@@ -55,7 +57,7 @@ def fawt(main, tokens, search_term):
return fawt
# Carroll's Um
-# Reference: Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
+# Reference: Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
def carrolls_um(main, freqs):
freq_total = sum(freqs)
@@ -65,7 +67,7 @@ def carrolls_um(main, freqs):
return um
# Engwall's FM
-# Reference: Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University.
+# Reference: Engwall, G. (1974). FrÊquence et distribution du vocabulaire dans un choix de romans français [Unpublished doctoral dissertation]. Stockholm University. | p. 53
def juillands_u(main, freqs):
d = wl_measures_dispersion.juillands_d(main, freqs)
u = max(0, d) * sum(freqs)
@@ -73,7 +75,7 @@ def juillands_u(main, freqs):
return u
# Juilland's U
-# Reference: Juilland, A., & Chang-Rodriguez, E. (1964). Frequency dictionary of Spanish words. Mouton.
+# Reference: Juilland, A., & Chang-Rodriguez, E. (1964). Frequency dictionary of Spanish words. Mouton. | p. LXVIII
def rosengrens_kf(main, freqs):
return numpy.sum(numpy.sqrt(freqs)) ** 2 / len(freqs)
diff --git a/wordless/wl_measures/wl_measures_dispersion.py b/wordless/wl_measures/wl_measures_dispersion.py
index 5a2572e92..a04e31c0e 100644
--- a/wordless/wl_measures/wl_measures_dispersion.py
+++ b/wordless/wl_measures/wl_measures_dispersion.py
@@ -23,7 +23,6 @@
from wordless.wl_measures import wl_measures_adjusted_freq
-# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def _get_dists(tokens, search_term):
positions = numpy.array([i for i, token in enumerate(tokens) if token == search_term])
@@ -37,6 +36,7 @@ def _get_dists(tokens, search_term):
return dists
# Average logarithmic distance
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def ald(main, tokens, search_term):
dists = _get_dists(tokens, search_term)
@@ -48,6 +48,7 @@ def ald(main, tokens, search_term):
return ald
# Average reduced frequency
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def arf(main, tokens, search_term):
dists = _get_dists(tokens, search_term)
@@ -60,6 +61,7 @@ def arf(main, tokens, search_term):
return arf
# Average waiting time
+# Reference: SavickĂ˝, P., & HlavĂĄÄovĂĄ, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215â231. https://doi.org/10.1076/jqul.9.3.215.14124
def awt(main, tokens, search_term):
dists = _get_dists(tokens, search_term)
@@ -71,7 +73,7 @@ def awt(main, tokens, search_term):
return awt
# Carroll's Dâ
-# Reference: Carroll, J. B. (1970). An alternative to Juillandâs usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour, 3(2), 61â65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
+# Reference: Carroll, J. B. (1970). An alternative to Juillands's usage coefficient for lexical frequencies. ETS Research Bulletin Series, 1970(2), iâ15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
def carrolls_d2(main, freqs):
freqs = numpy.array(freqs)
@@ -108,7 +110,7 @@ def griess_dp(main, freqs):
return dp
# Juilland's D
-# Reference: Juilland, A., & Chang-Rodriguez, E. (1964). Frequency dictionary of Spanish words. Mouton.
+# Reference: Juilland, A., & Chang-Rodriguez, E. (1964). Frequency dictionary of Spanish words. Mouton. | p. LIII
def juillands_d(main, freqs):
freqs = numpy.array(freqs)
diff --git a/wordless/wl_measures/wl_measures_effect_size.py b/wordless/wl_measures/wl_measures_effect_size.py
index 5073da287..348566467 100644
--- a/wordless/wl_measures/wl_measures_effect_size.py
+++ b/wordless/wl_measures/wl_measures_effect_size.py
@@ -40,15 +40,22 @@ def pct_diff(main, o11s, o12s, o21s, o22s):
)
)
+# Conditional probability
+# Reference: Durrant, P. (2008). High frequency collocations and second language learning [Doctoral dissertation, University of Nottingham]. Nottingham eTheses. https://eprints.nottingham.ac.uk/10622/1/final_thesis.pdf | p. 84
+def conditional_probability(main, o11s, o12s, o21s, o22s):
+ _, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s)
+
+ return wl_measure_utils.numpy_divide(o11s, ox1s) * 100
+
# Cubic association ratio
-# Reference: Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. BĂŠatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid=
+# Reference: Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. BĂŠatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= | p. 139
def im3(main, o11s, o12s, o21s, o22s):
e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 3, e11s))
# Dice-Sørensen coefficient
-# Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), 1â38.
+# Reference: Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1), 1â38. | p. 8
def dice_sorensen_coeff(main, o11s, o12s, o21s, o22s):
o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s)
@@ -56,8 +63,8 @@ def dice_sorensen_coeff(main, o11s, o12s, o21s, o22s):
# Difference coefficient
# References:
-# Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities.
-# Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus approaches to discourse: A critical review (pp. 225â258). Routledge.
+# Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. | p. 14
+# Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus approaches to discourse: A critical review (pp. 225â258). Routledge. | p. 236
def diff_coeff(main, o11s, o12s, o21s, o22s):
_, _, ox1s, ox2s = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s)
@@ -71,12 +78,12 @@ def diff_coeff(main, o11s, o12s, o21s, o22s):
)
# Jaccard index
-# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847
+# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 | p. 48
def jaccard_index(main, o11s, o12s, o21s, o22s):
return wl_measure_utils.numpy_divide(o11s, o11s + o12s + o21s)
# Kilgarriff's ratio
-# Reference: Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (p. 171). University of Liverpool.
+# Reference: Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. GonzĂĄlez-DĂaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (CL2009) (Article 171). University of Liverpool.
def kilgarriffs_ratio(main, o11s, o12s, o21s, o22s):
smoothing_param = main.settings_custom['measures']['effect_size']['kilgarriffs_ratio']['smoothing_param']
@@ -86,14 +93,14 @@ def kilgarriffs_ratio(main, o11s, o12s, o21s, o22s):
)
# logDice
-# Reference: RychlĂ˝, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. HorĂĄk (Eds.), Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing. Masaryk University
+# Reference: RychlĂ˝, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. HorĂĄk (Eds.), Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing (pp. 6â9). Masaryk University
def log_dice(main, o11s, o12s, o21s, o22s):
o1xs, _, ox1s, _ = wl_measures_statistical_significance.get_freqs_marginal(o11s, o12s, o21s, o22s)
return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s), default = 14)
# Log-frequency biased MD
-# Reference: Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. GonzĂĄlez & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620â625). European Language Resources Association.
+# Reference: Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. GonzĂĄlez & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620â625). European Language Resources Association. | p. 621
def lfmd(main, o11s, o12s, o21s, o22s):
e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
@@ -121,8 +128,8 @@ def log_ratio(main, o11s, o12s, o21s, o22s):
# MI.log-f
# References:
-# Kilgarriff, A., & Tugwell, D. (2002). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In Proceedings of the 8th Machine Translation Summit (pp. 187â190). European Association for Machine Translation.
-# Lexical Computing. (2015, July 8). Statistics used in Sketch Engine. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/
+# Kilgarriff, A., & Tugwell, D. (2001). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In B. Maegaard (Ed.), Proceedings of Machine Translation Summit VIII (pp. 187â190). European Association for Machine Translation.
+# Lexical Computing. (2015, July 8). Statistics used in Sketch Engine. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/ | p. 4
def mi_log_f(main, o11s, o12s, o21s, o22s):
e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
@@ -139,7 +146,7 @@ def min_sensitivity(main, o11s, o12s, o21s, o22s):
)
# Mutual Dependency
-# Reference: Thanopoulos, A, Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. GonzĂĄlez, & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620â625). European Language Resources Association.
+# Reference: Thanopoulos, A, Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. GonzĂĄlez, & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620â625). European Language Resources Association. | p. 621
def md(main, o11s, o12s, o21s, o22s):
e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
@@ -153,7 +160,7 @@ def me(main, o11s, o12s, o21s, o22s):
return o11s * wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s)
# Mutual information
-# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847
+# Reference: Dunning, T. E. (1998). Finding structure in text, genome and other symbolic sequences [Doctoral dissertation, University of Sheffield]. arXiv. https://arxiv.org/pdf/1207.1847 | pp. 49â52
def mi(main, o11s, o12s, o21s, o22s):
oxxs = o11s + o12s + o21s + o22s
e11s, e12s, e21s, e22s = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
@@ -200,7 +207,7 @@ def poisson_collocation_measure(main, o11s, o12s, o21s, o22s):
)
# Squared association ratio
-# Reference: Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University.
+# Reference: Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University. | p. 21
def im2(main, o11s, o12s, o21s, o22s):
e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
diff --git a/wordless/wl_measures/wl_measures_lexical_density_diversity.py b/wordless/wl_measures/wl_measures_lexical_density_diversity.py
index e0491c5cb..9985fccfe 100644
--- a/wordless/wl_measures/wl_measures_lexical_density_diversity.py
+++ b/wordless/wl_measures/wl_measures_lexical_density_diversity.py
@@ -39,7 +39,7 @@ def brunets_index(main, text):
# Corrected TTR
# References:
# Carroll, J. B. (1964). Language and thought. Prentice-Hall.
-# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment (p. 26). Palgrave Macmillan.
+# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. | p. 26
def cttr(main, text):
return text.num_types / numpy.sqrt(2 * text.num_tokens)
@@ -115,7 +115,7 @@ def honores_stat(main, text):
return r
# Lexical density
-# Reference: Halliday, M. A. K. (1989). Spoken and written language (2nd ed., p. 64). Oxford University Press.
+# Reference: Halliday, M. A. K. (1989). Spoken and written language (2nd ed.). Oxford University Press. | p. 64
def lexical_density(main, text):
if text.lang in main.settings_global['pos_taggers']:
wl_pos_tagging.wl_pos_tag_universal(main, text.get_tokens_flat(), lang = text.lang, tagged = text.tagged)
@@ -135,19 +135,19 @@ def lexical_density(main, text):
# LogTTR
# Herdan:
-# Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics (p. 28). Mouton.
+# Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics. Mouton. | p. 28
# Somers:
# Somers, H. H. (1966). Statistical methods in literary analysis. In J. Leeds (Ed.), The computer and literary style (pp. 128â140). Kent State University Press.
-# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment (p. 28). Palgrave Macmillan.
+# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. | p. 28
# Rubet:
# Dugast, D. (1979). Vocabulaire et stylistique: I thÊâtre et dialogue, travaux de linguistique quantitative. Slatkine.
-# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment (p. 28). Palgrave Macmillan.
+# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. | p. 28
# Maas:
# Maas, H.-D. (1972). Ăber den zusammenhang zwischen wortschatzumfang und länge eines textes. Zeitschrift fĂźr Literaturwissenschaft und Linguistik, 2(8), 73â96.
# Dugast:
# Dugast, D. (1978). Sur quoi se fonde la notion dâĂŠtendue thĂŠoretique du vocabulaire? Le Français Moderne, 46, 25â32.
# Dugast, D. (1979). Vocabulaire et stylistique: I thÊâtre et dialogue, travaux de linguistique quantitative. Slatkine.
-# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment (p. 28). Palgrave Macmillan.
+# Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. | p. 28
def logttr(main, text):
variant = main.settings_custom['measures']['lexical_density_diversity']['logttr']['variant']
@@ -167,7 +167,7 @@ def logttr(main, text):
# Mean segmental TTR
# References:
# Johnson, W. (1944). Studies in language behavior: I. a program of research. Psychological Monographs, 56(2), 1â15. https://doi.org/10.1037/h0093508
-# McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis] (p. 37). ProQuest Dissertations and Theses Global.
+# McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) (Publication No. 3199485) [Doctoral dissertation, The University of Memphis]. ProQuest Dissertations and Theses Global. | p. 37
def msttr(main, text):
num_tokens_seg = main.settings_custom['measures']['lexical_density_diversity']['msttr']['num_tokens_in_each_seg']
@@ -187,7 +187,7 @@ def msttr(main, text):
# Measure of textual lexical diversity
# References:
-# McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis] (pp. 95â96, 99â100). ProQuest Dissertations and Theses Global.
+# McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) (Publication No. 3199485) [Doctoral dissertation, The University of Memphis]. ProQuest Dissertations and Theses Global. | pp. 95â96, 99â100
# McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381â392. https://doi.org/10.3758/BRM.42.2.381
def mtld(main, text):
mtlds = numpy.empty(shape = 2)
@@ -275,7 +275,7 @@ def popescu_macutek_altmanns_b1_b2_b3_b4_b5(main, text):
return b1, b2, b3, b4, b5
# Popescu's Râ
-# Reference: Popescu, I.-I. (2009). Word frequency studies (pp. 18, 30, 33). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | pp. 18, 30, 33
def popescus_r1(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
ranks = numpy.empty(shape = text.num_types)
@@ -309,7 +309,7 @@ def popescus_r1(main, text):
return r1
# Popescu's Râ
-# Reference: Popescu, I.-I. (2009). Word frequency studies (pp. 35â36, 38). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | pp. 35â36, 38
def popescus_r2(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
freqs_nums_types = sorted(collections.Counter(types_freqs.values()).items())
@@ -344,7 +344,7 @@ def popescus_r2(main, text):
return r2
# Popescu's Râ
-# Reference: Popescu, I.-I. (2009). Word frequency studies (pp. 48â49, 53). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | pp. 48â49, 53
def popescus_r3(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
ranks_freqs = [
@@ -373,7 +373,7 @@ def popescus_r3(main, text):
return r3
# Popescu's Râ
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 57). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 57
def popescus_r4(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
@@ -389,7 +389,7 @@ def popescus_r4(main, text):
return r4
# Repeat rate
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 166). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 166
def repeat_rate(main, text):
use_data = main.settings_custom['measures']['lexical_density_diversity']['repeat_rate']['use_data']
@@ -414,7 +414,7 @@ def rttr(main, text):
return text.num_types / numpy.sqrt(text.num_tokens)
# Shannon entropy
-# Reference: Popescu, I.-I. (2009). Word frequency studies (p. 173). Mouton de Gruyter.
+# Reference: Popescu, I.-I. (2009). Word frequency studies. Mouton de Gruyter. | p. 173
def shannon_entropy(main, text):
use_data = main.settings_custom['measures']['lexical_density_diversity']['shannon_entropy']['use_data']
@@ -450,7 +450,7 @@ def ttr(main, text):
return text.num_types / text.num_tokens
# vocd-D
-# Reference: Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment (pp. 51, 56â57). Palgrave Macmillan.
+# Reference: Malvern, D., Richards, B., Chipere, N., & DurĂĄn, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. | pp. 51, 56â57
def vocdd(main, text):
def ttr(n, d):
return (d / n) * (numpy.sqrt(1 + 2 * n / d) - 1)
@@ -480,7 +480,7 @@ def ttr(n, d):
return popt[0]
# Yule's characteristic K
-# Reference: Yule, G. U. (1944). The statistical study of literary vocabulary (pp. 52â53). Cambridge University Press.
+# Reference: Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge University Press. | pp. 52â53
def yules_characteristic_k(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
freqs_nums_types = collections.Counter(types_freqs.values())
@@ -493,7 +493,7 @@ def yules_characteristic_k(main, text):
return k
# Yule's Index of Diversity
-# Reference: Williams, C. B. (1970). Style and vocabulary: Numerical studies (p. 100). Griffin.
+# Reference: Williams, C. B. (1970). Style and vocabulary: Numerical studies. Griffin. | p. 100
def yules_index_of_diversity(main, text):
types_freqs = collections.Counter(text.get_tokens_flat())
freqs_nums_types = collections.Counter(types_freqs.values())
diff --git a/wordless/wl_measures/wl_measures_readability.py b/wordless/wl_measures/wl_measures_readability.py
index d5faaf697..6838c4b15 100644
--- a/wordless/wl_measures/wl_measures_readability.py
+++ b/wordless/wl_measures/wl_measures_readability.py
@@ -183,7 +183,7 @@ def get_num_sentences_sample(text, sample, sample_start):
)
# Al-Heeti's readability formula
-# Reference: Al-Heeti, K. N. (1984). Judgment analysis technique applied to readability prediction of Arabic reading material [Doctoral dissertation, University of Northern Colorado] (pp. 102, 104, 106). ProQuest Dissertations and Theses Global.
+# Reference: Al-Heeti, K. N. (1984). Judgment analysis technique applied to readability prediction of Arabic reading material [Doctoral dissertation, University of Northern Colorado]. ProQuest Dissertations and Theses Global. | pp. 102, 104, 106
def rd(main, text):
if text.lang == 'ara':
text = get_nums(main, text)
@@ -232,9 +232,9 @@ def aari(main, text):
# Automated Readability Index
# References:
-# Smith, E. A., & Senter, R. J. (1967). Automated readability index (p. 8). Aerospace Medical Research Laboratories. https://apps.dtic.mil/sti/pdfs/AD0667273.pdf
+# Smith, E. A., & Senter, R. J. (1967). Automated readability index. Aerospace Medical Research Laboratories. https://apps.dtic.mil/sti/pdfs/AD0667273.pdf | p. 8
# Navy:
-# Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75, p. 14). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf
+# Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf | p. 14
def ari(main, text):
text = get_nums(main, text)
@@ -257,7 +257,7 @@ def ari(main, text):
return ari
# Bormuth's cloze mean & grade placement
-# Reference: Bormuth, J. R. (1969). Development of readability analyses (pp. 152, 160). U.S. Department of Health, Education, and Welfare. http://files.eric.ed.gov/fulltext/ED029166.pdf
+# Reference: Bormuth, J. R. (1969). Development of readability analyses. U.S. Department of Health, Education, and Welfare. http://files.eric.ed.gov/fulltext/ED029166.pdf | pp. 152, 160
def bormuths_cloze_mean(main, text):
if text.lang.startswith('eng_'):
text = get_nums(main, text)
@@ -515,7 +515,7 @@ def devereux_readability_index(main, text):
# Dickes-Steiwer Handformel
# References:
# Dickes, P. & Steiwer, L. (1977). Ausarbeitung von lesbarkeitsformeln fĂźr die deutsche sprache. Zeitschrift fĂźr Entwicklungspsychologie und Pädagogische Psychologie, 9(1), 20â28.
-# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 57). Jugend und Volk.
+# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache. Jugend und Volk. | p. 57
def dickes_steiwer_handformel(main, text):
text = get_nums(main, text)
@@ -547,7 +547,7 @@ def elf(main, text):
return elf
# Flesch-Kincaid grade level
-# Reference: Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75, p. 14). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf
+# Reference: Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf | p. 14
def gl(main, text):
if text.lang in main.settings_global['syl_tokenizers']:
text = get_nums(main, text)
@@ -571,7 +571,7 @@ def gl(main, text):
# Powers-Sumner-Kearl:
# Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99â105. https://doi.org/10.1037/h0043254
# Dutch (Douma):
-# Douma, W. H. (1960). De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules [Readability of Dutch farm papers: A discussion and application of readability-formulas] (p. 453). Afdeling Sociologie en Sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323
+# Douma, W. H. (1960). De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules [Readability of Dutch farm papers: A discussion and application of readability-formulas]. Afdeling Sociologie en Sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 | p. 453
# Dutch (Brouwer's Leesindex A):
# Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheid van Nederlands proza. Paedagogische StudiĂŤn, 40, 454â464. https://objects.library.uu.nl/reader/index.php?obj=1874-205260&lan=en
# French:
@@ -579,17 +579,17 @@ def gl(main, text):
# Sitbon, L., Bellot, P., & Blache, P. (2007). ElĂŠments pour adapter les systèmes de recherche dâinformation aux dyslexiques. Revue TAL : traitement automatique des langues, 48(2), 123â147.
# German:
# Amstad, T. (1978). Wie verständlich sind unsere Zeitungen? [Unpublished doctoral dissertation]. University of Zurich.
-# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 56). Jugend und Volk.
+# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache. Jugend und Volk. | p. 56
# Italian:
# Franchina, V., & Vacca, R. (1986). Adaptation of Flesh readability index on a bilingual text written by the same author both in Italian and English languages. Linguaggi, 3, 47â49.
# Garais, E. (2011). Web applications readability. Journal of Information Systems and Operations Management, 5(1), 117â121. http://www.rebe.rau.ro/RePEc/rau/jisomg/SP11/JISOM-SP11-A13.pdf
# Russian:
-# Oborneva, I. V. (2006). ĐвŃОПаŃиСиŃĐžĐ˛Đ°Đ˝Đ˝Đ°Ń ĐžŃонка ŃНОМнОŃŃи ŃŃойнŃŃ ŃокŃŃОв на ĐžŃнОво ŃŃĐ°ŃиŃŃиŃĐľŃĐşĐ¸Ń ĐżĐ°ŃаПоŃŃОв [Doctoral dissertation, Institute for Strategy of Education Development of the Russian Academy of Education] (p. 13). Freereferats.ru. https://static.freereferats.ru/_avtoreferats/01002881899.pdf?ver=3
+# Oborneva, I. V. (2006). ĐвŃОПаŃиСиŃĐžĐ˛Đ°Đ˝Đ˝Đ°Ń ĐžŃонка ŃНОМнОŃŃи ŃŃойнŃŃ ŃокŃŃОв на ĐžŃнОво ŃŃĐ°ŃиŃŃиŃĐľŃĐşĐ¸Ń ĐżĐ°ŃаПоŃŃОв [Doctoral dissertation, Institute for Strategy of Education Development of the Russian Academy of Education]. Freereferats.ru. https://static.freereferats.ru/_avtoreferats/01002881899.pdf?ver=3 | p. 13
# Spanish (FernĂĄndez Huerta):
# FernĂĄndez Huerta, J. (1959). Medidas sencillas de lecturabilidad. Consigna, 214, 29â32.
# Garais, E. (2011). Web applications readability. Journal of Information Systems and Operations Management, 5(1), 117â121. http://www.rebe.rau.ro/RePEc/rau/jisomg/SP11/JISOM-SP11-A13.pdf
# Spanish (Szigriszt Pazos):
-# Szigriszt Pazos, F. (1993). Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad [Doctoral dissertation, Complutense University of Madrid] (p. 247). Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y
+# Szigriszt Pazos, F. (1993). Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad [Doctoral dissertation, Complutense University of Madrid]. Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y | p. 247
# Ukrainian:
# Partiko, Z. V. (2001). Zagalâne redaguvannja. Normativni osnovi. AfiĹĄa.
# Grzybek, P. (2010). Text difficulty and the Arens-Altmann law. In P. Grzybek, E. Kelih, & J. MaÄutek (eds.), Text and language: Structures ¡ functions ¡ interrelations quantitative perspectives. Praesens Verlag. https://www.iqla.org/includes/basic_references/qualico_2009_proceedings_Grzybek_Kelih_Macutek_2009.pdf
@@ -707,7 +707,7 @@ def re_farr_jenkins_paterson(main, text):
return re
# FORCAST
-# Reference: Caylor, J. S., & Sticht, T. G. (1973). Development of a simple readability index for job reading material (p. 3). Human Resource Research Organization. https://ia902703.us.archive.org/31/items/ERIC_ED076707/ERIC_ED076707.pdf
+# Reference: Caylor, J. S., & Sticht, T. G. (1973). Development of a simple readability index for job reading material. Human Resource Research Organization. https://ia902703.us.archive.org/31/items/ERIC_ED076707/ERIC_ED076707.pdf | p. 3
def rgl(main, text):
if text.lang in main.settings_global['syl_tokenizers']:
text = get_nums(main, text)
@@ -728,7 +728,7 @@ def rgl(main, text):
# Fucks's Stilcharakteristik
# References:
# Fucks, W. (1955). Unterschied des prosastils von dichtern und anderen schriftstellern: Ein beispiel mathematischer stilanalyse. Bouvier.
-# Briest, W. (1974). Kann man Verständlichkeit messen? STUF - Language Typology and Universals, 27(1-3), 543â563. https://doi.org/10.1524/stuf.1974.27.13.543
+# Briest, W. (1974). Kann man Verständlichkeit messen? STUF - Language Typology and Universals, 27(1â3), 543â563. https://doi.org/10.1524/stuf.1974.27.13.543
def fuckss_stilcharakteristik(main, text):
if text.lang in main.settings_global['syl_tokenizers']:
text = get_nums(main, text)
@@ -764,11 +764,11 @@ def gulpease(main, text):
# Gunning Fog Index
# References:
-# Gunning, R. (1968). The technique of clear writing (revised ed., p. 38). McGraw-Hill Book Company.
+# Gunning, R. (1968). The technique of clear writing (revised ed.). McGraw-Hill Book Company. | p. 38
# Powers-Sumner-Kearl:
# Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99â105. https://doi.org/10.1037/h0043254
# Navy:
-# Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75, p. 14). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf
+# Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for Navy enlisted personnel (Report No. RBR 8-75). Naval Air Station Memphis. https://apps.dtic.mil/sti/pdfs/ADA006655.pdf | p. 14
# Polish:
# Pisarek, W. (1969). Jak mierzyÄ zrozumiaĹoĹÄ tekstu? Zeszyty Prasoznawcze, 4(42), 35â48.
def fog_index(main, text):
@@ -889,7 +889,7 @@ def mu(main, text):
return mu
# Lensear Write Formula
-# Reference: OâHayre, J. (1966). Gobbledygook has gotta go (p. 8). U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf
+# Reference: OâHayre, J. (1966). Gobbledygook has gotta go. U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf | p. 8
def lensear_write_formula(main, text):
if text.lang.startswith('eng_') and text.lang in main.settings_global['syl_tokenizers']:
text = get_nums(main, text)
@@ -945,10 +945,10 @@ def lix(main, text):
# Lorge Readability Index
# References:
# Lorge, I. (1944). Predicting readability. Teachers College Record, 45, 404â419.
-# DuBay, W. H. (2006). In W. H. DuBay (Ed.), The classic readability studies (pp. 46â60). Impact Information. https://files.eric.ed.gov/fulltext/ED506404.pdf
+# Lorge, I. (1944). Predicting readability. In W. H. DuBay (Ed.), The classic readability studies (pp. 46â60). Impact Information. https://files.eric.ed.gov/fulltext/ED506404.pdf
# Corrected:
# Lorge, I. (1948). The Lorge and Flesch readability formulae: A correction. School and Society, 67, 141â142.
-# DuBay, W. H. (2006). In W. H. DuBay (Ed.), The classic readability studies (pp. 46â60). Impact Information. https://files.eric.ed.gov/fulltext/ED506404.pdf
+# Lorge, I. (1944). Predicting readability. In W. H. DuBay (Ed.), The classic readability studies (pp. 46â60). Impact Information. https://files.eric.ed.gov/fulltext/ED506404.pdf
def lorge_readability_index(main, text):
if text.lang.startswith('eng_'):
text = get_nums(main, text)
@@ -987,7 +987,7 @@ def lorge_readability_index(main, text):
return lorge
# Luong-Nguyen-Dinh's readability formula
-# Reference: Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 198â202). IEEE. https://doi.org/10.1109/KSE.2018.8573379
+# Reference: Luong, A.-V., Nguyen, D., & Dinh, D. (2018). A new formula for Vietnamese text readability assessment. In T. M. Phuong & M. L. Nguyen (Eds.), Proceedings of 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 198â202). IEEE. https://doi.org/10.1109/KSE.2018.8573379
def luong_nguyen_dinhs_readability_formula(main, text):
if text.lang == 'vie':
text = get_nums(main, text)
@@ -1026,7 +1026,7 @@ def eflaw(main, text):
return eflaw
# neue Wiener Literaturformeln
-# Reference: Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 82). Jugend und Volk.
+# Reference: Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache. Jugend und Volk. | p. 82
def nwl(main, text):
if text.lang.startswith('deu_'):
text = get_nums(main, text)
@@ -1054,7 +1054,7 @@ def nwl(main, text):
return nwl
# neue Wiener Sachtextformel
-# Reference: Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (pp. 83â84). Jugend und Volk.
+# Reference: Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache. Jugend und Volk. | pp. 83â84
def nws(main, text):
if text.lang.startswith('deu_'):
text = get_nums(main, text)
@@ -1173,7 +1173,7 @@ def rix(main, text):
# References:
# McLaughlin, G. H. (1969). SMOG Grading: A new readability formula. Journal of Reading, 12(8), 639â646.
# German:
-# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache (p. 78). Jugend und Volk.
+# Bamberger, R., & Vanecek, E. (1984). Lesen-verstehen-lernen-schreiben: Die schwierigkeitsstufen von texten in deutscher sprache. Jugend und Volk. | p. 78
def smog_grading(main, text):
if text.lang in main.settings_global['syl_tokenizers']:
text = get_nums(main, text)
diff --git a/wordless/wl_measures/wl_measures_statistical_significance.py b/wordless/wl_measures/wl_measures_statistical_significance.py
index 39aab6407..a6d9f9cfe 100644
--- a/wordless/wl_measures/wl_measures_statistical_significance.py
+++ b/wordless/wl_measures/wl_measures_statistical_significance.py
@@ -109,7 +109,7 @@ def log_likelihood_ratio_test(main, o11s, o12s, o21s, o22s):
return gs, p_vals
# Mann-Whitney U test
-# References: Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil
+# References: Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 232â263. https://doi.org/10.1075/ijcl.6.1.05kil | pp. 103â104
def mann_whitney_u_test(main, freqs_x1s, freqs_x2s):
settings = main.settings_custom['measures']['statistical_significance']['mann_whitney_u_test']
@@ -131,8 +131,8 @@ def mann_whitney_u_test(main, freqs_x1s, freqs_x2s):
# Pearson's chi-squared test
# References:
-# Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities.
-# Oakes, M. P. (1998). Statistics for corpus linguistics. Edinburgh University Press.
+# Hofland, K., & Johanson, S. (1982). Word frequencies in British and American English. Norwegian Computing Centre for the Humanities. | p. 12
+# Oakes, M. P. (1998). Statistics for corpus linguistics. Edinburgh University Press. | p. 25
def pearsons_chi_squared_test(main, o11s, o12s, o21s, o22s):
settings = main.settings_custom['measures']['statistical_significance']['pearsons_chi_squared_test']
@@ -155,7 +155,7 @@ def pearsons_chi_squared_test(main, o11s, o12s, o21s, o22s):
return chi2s, p_vals
# Student's t-test (1-sample)
-# References: Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon (pp. 115â164). Psychology Press.
+# References: Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon (pp. 115â164). Psychology Press. | pp. 120â126
def students_t_test_1_sample(main, o11s, o12s, o21s, o22s):
settings = main.settings_custom['measures']['statistical_significance']['students_t_test_1_sample']
@@ -178,7 +178,7 @@ def students_t_test_1_sample(main, o11s, o12s, o21s, o22s):
return t_stats, p_vals
# Student's t-test (2-sample)
-# References: Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. Language and Computers, 68, 247â269.
+# References: Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. Language and Computers, 68, 247â269. | pp. 252â253
def students_t_test_2_sample(main, freqs_x1s, freqs_x2s):
settings = main.settings_custom['measures']['statistical_significance']['students_t_test_2_sample']
@@ -218,7 +218,7 @@ def _z_test_p_val(z_scores, direction):
return p_vals
# Z-test
-# References: Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), Proceedings of the symposium on statistical association methods for mechanized documentation (pp. 61â148). National Bureau of Standards.
+# References: Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), Statistical association methods for mechanized documentation: Symposium proceedings (pp. 61â148). National Bureau of Standards. | p. 69
def z_test(main, o11s, o12s, o21s, o22s):
settings = main.settings_custom['measures']['statistical_significance']['z_test']
diff --git a/wordless/wl_ngram_generator.py b/wordless/wl_ngram_generator.py
index d81f12738..da5a48270 100644
--- a/wordless/wl_ngram_generator.py
+++ b/wordless/wl_ngram_generator.py
@@ -244,7 +244,7 @@ def __init__(self, main):
self.combo_box_measure_dispersion,
self.label_measure_adjusted_freq,
self.combo_box_measure_adjusted_freq
- ) = wl_widgets.wl_widgets_measures_wordlist_generator(self)
+ ) = wl_widgets.wl_widgets_measures_wordlist_ngram_generation(self)
self.spin_box_allow_skipped_tokens.setRange(1, 20)
diff --git a/wordless/wl_settings/wl_settings_global.py b/wordless/wl_settings/wl_settings_global.py
index 89c570475..329417dad 100644
--- a/wordless/wl_settings/wl_settings_global.py
+++ b/wordless/wl_settings/wl_settings_global.py
@@ -3594,18 +3594,19 @@ def init_settings_global():
'effect_size': {
_tr('wl_settings_global', 'None'): 'none',
'%DIFF': 'pct_diff',
+ _tr('wl_settings_global', 'Conditional probability'): 'conditional_probability',
_tr('wl_settings_global', 'Cubic association ratio'): 'im3',
- _tr('wl_settings_global', "Dice's coefficient"): 'dices_coeff',
+ _tr('wl_settings_global', "Dice-Sørensen coefficient"): 'dice_sorensen_coeff',
_tr('wl_settings_global', 'Difference coefficient'): 'diff_coeff',
_tr('wl_settings_global', 'Jaccard index'): 'jaccard_index',
_tr('wl_settings_global', "Kilgarriff's ratio"): 'kilgarriffs_ratio',
'logDice': 'log_dice',
_tr('wl_settings_global', 'Log-frequency biased MD'): 'lfmd',
- _tr('wl_settings_global', 'Log ratio'): 'log_ratio',
+ _tr('wl_settings_global', 'Log Ratio'): 'log_ratio',
'MI.log-f': 'mi_log_f',
_tr('wl_settings_global', 'Minimum sensitivity'): 'min_sensitivity',
- _tr('wl_settings_global', 'Mutual dependency'): 'md',
- _tr('wl_settings_global', 'Mutual expectation'): 'me',
+ _tr('wl_settings_global', 'Mutual Dependency'): 'md',
+ _tr('wl_settings_global', 'Mutual Expectation'): 'me',
_tr('wl_settings_global', 'Mutual information'): 'mi',
_tr('wl_settings_global', 'Odds ratio'): 'or',
_tr('wl_settings_global', 'Pointwise mutual information'): 'pmi',
@@ -3738,8 +3739,8 @@ def init_settings_global():
'col_text': None,
'func': None,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'fishers_exact_test': {
@@ -3747,64 +3748,64 @@ def init_settings_global():
'col_text': None,
'func': wl_measures_statistical_significance.fishers_exact_test,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'log_likelihood_ratio_test': {
'col_text': _tr('wl_settings_global', 'Log-likelihood Ratio'),
'func': wl_measures_statistical_significance.log_likelihood_ratio_test,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'mann_whitney_u_test': {
'col_text': 'U1',
'func': wl_measures_statistical_significance.mann_whitney_u_test,
'to_sections': True,
- 'collocation_extractor': False,
- 'keyword_extractor': True
+ 'collocation': False,
+ 'keyword': True
},
'pearsons_chi_squared_test': {
'col_text': 'Ď2',
'func': wl_measures_statistical_significance.pearsons_chi_squared_test,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'students_t_test_1_sample': {
'col_text': _tr('wl_settings_global', 't-statistic'),
'func': wl_measures_statistical_significance.students_t_test_1_sample,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': False
},
'students_t_test_2_sample': {
'col_text': _tr('wl_settings_global', 't-statistic'),
'func': wl_measures_statistical_significance.students_t_test_2_sample,
'to_sections': True,
- 'collocation_extractor': False,
- 'keyword_extractor': True
+ 'collocation': False,
+ 'keyword': True
},
'z_test': {
'col_text': _tr('wl_settings_global', 'z-score'),
'func': wl_measures_statistical_significance.z_test,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': False
},
'z_test_berry_rogghe': {
'col_text': _tr('wl_settings_global', 'z-score'),
'func': wl_measures_statistical_significance.z_test_berry_rogghe,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': False
+ 'collocation': True,
+ 'keyword': False
}
},
@@ -3812,124 +3813,171 @@ def init_settings_global():
'none': {
'func': None,
'to_sections': None,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'log_likelihood_ratio_test': {
'func': wl_measures_bayes_factor.bayes_factor_log_likelihood_ratio_test,
'to_sections': False,
- 'collocation_extractor': True,
- 'keyword_extractor': True
+ 'collocation': True,
+ 'keyword': True
},
'students_t_test_2_sample': {
'func': wl_measures_bayes_factor.bayes_factor_students_t_test_2_sample,
'to_sections': True,
- 'collocation_extractor': False,
- 'keyword_extractor': True
+ 'collocation': False,
+ 'keyword': True
},
},
'measures_effect_size': {
'none': {
'col_text': None,
- 'func': None
+ 'func': None,
+ 'collocation': True,
+ 'keyword': True
},
'pct_diff': {
'col_text': '%DIFF',
- 'func': wl_measures_effect_size.pct_diff
+ 'func': wl_measures_effect_size.pct_diff,
+ 'collocation': False,
+ 'keyword': True
+ },
+
+ 'conditional_probability': {
+ 'col_text': 'P',
+ 'func': wl_measures_effect_size.conditional_probability,
+ 'collocation': True,
+ 'keyword': False
},
'im3': {
'col_text': 'IMÂł',
- 'func': wl_measures_effect_size.im3
+ 'func': wl_measures_effect_size.im3,
+ 'collocation': True,
+ 'keyword': True
},
'dice_sorensen_coeff': {
- 'col_text': _tr('wl_settings_global', 'Dice-Sørensen coefficient'),
- 'func': wl_measures_effect_size.dice_sorensen_coeff
+ 'col_text': _tr('wl_settings_global', 'Dice-Sørensen Coefficient'),
+ 'func': wl_measures_effect_size.dice_sorensen_coeff,
+ 'collocation': True,
+ 'keyword': False
},
'diff_coeff': {
'col_text': _tr('wl_settings_global', 'Difference Coefficient'),
- 'func': wl_measures_effect_size.diff_coeff
+ 'func': wl_measures_effect_size.diff_coeff,
+ 'collocation': False,
+ 'keyword': True
},
'jaccard_index': {
'col_text': _tr('wl_settings_global', 'Jaccard Index'),
- 'func': wl_measures_effect_size.jaccard_index
- },
-
- 'lfmd': {
- 'col_text': 'LFMD',
- 'func': wl_measures_effect_size.lfmd
+ 'func': wl_measures_effect_size.jaccard_index,
+ 'collocation': True,
+ 'keyword': False
},
'kilgarriffs_ratio': {
'col_text': _tr('wl_settings_global', "Kilgarriff's Ratio"),
- 'func': wl_measures_effect_size.kilgarriffs_ratio
+ 'func': wl_measures_effect_size.kilgarriffs_ratio,
+ 'collocation': False,
+ 'keyword': True
},
'log_dice': {
'col_text': 'logDice',
- 'func': wl_measures_effect_size.log_dice
+ 'func': wl_measures_effect_size.log_dice,
+ 'collocation': True,
+ 'keyword': False
+ },
+
+ 'lfmd': {
+ 'col_text': 'LFMD',
+ 'func': wl_measures_effect_size.lfmd,
+ 'collocation': True,
+ 'keyword': False
},
'log_ratio': {
'col_text': _tr('wl_settings_global', 'Log Ratio'),
- 'func': wl_measures_effect_size.log_ratio
+ 'func': wl_measures_effect_size.log_ratio,
+ 'collocation': True,
+ 'keyword': True
},
'mi_log_f': {
'col_text': 'MI.log-f',
- 'func': wl_measures_effect_size.mi_log_f
+ 'func': wl_measures_effect_size.mi_log_f,
+ 'collocation': True,
+ 'keyword': False
},
'min_sensitivity': {
'col_text': _tr('wl_settings_global', 'Minimum Sensitivity'),
- 'func': wl_measures_effect_size.min_sensitivity
+ 'func': wl_measures_effect_size.min_sensitivity,
+ 'collocation': True,
+ 'keyword': False
},
'md': {
'col_text': 'MD',
- 'func': wl_measures_effect_size.md
+ 'func': wl_measures_effect_size.md,
+ 'collocation': True,
+ 'keyword': False
},
'me': {
'col_text': 'ME',
- 'func': wl_measures_effect_size.me
+ 'func': wl_measures_effect_size.me,
+ 'collocation': True,
+ 'keyword': False
},
'mi': {
'col_text': 'MI',
- 'func': wl_measures_effect_size.mi
+ 'func': wl_measures_effect_size.mi,
+ 'collocation': True,
+ 'keyword': False
},
'or': {
'col_text': 'OR',
- 'func': wl_measures_effect_size.odds_ratio
+ 'func': wl_measures_effect_size.odds_ratio,
+ 'collocation': True,
+ 'keyword': True
},
'pmi': {
'col_text': 'PMI',
- 'func': wl_measures_effect_size.pmi
+ 'func': wl_measures_effect_size.pmi,
+ 'collocation': True,
+ 'keyword': True
},
'poisson_collocation_measure': {
'col_text': _tr('wl_settings_global', 'Poisson Collocation Measure'),
- 'func': wl_measures_effect_size.poisson_collocation_measure
+ 'func': wl_measures_effect_size.poisson_collocation_measure,
+ 'collocation': True,
+ 'keyword': False
},
'im2': {
'col_text': 'IM²',
- 'func': wl_measures_effect_size.im2
+ 'func': wl_measures_effect_size.im2,
+ 'collocation': True,
+ 'keyword': True
},
'squared_phi_coeff': {
'col_text': 'Ď2',
- 'func': wl_measures_effect_size.squared_phi_coeff
+ 'func': wl_measures_effect_size.squared_phi_coeff,
+ 'collocation': True,
+ 'keyword': False
}
},
diff --git a/wordless/wl_widgets/wl_widgets.py b/wordless/wl_widgets/wl_widgets.py
index b45dc270e..8022eed5c 100644
--- a/wordless/wl_widgets/wl_widgets.py
+++ b/wordless/wl_widgets/wl_widgets.py
@@ -616,7 +616,7 @@ def wl_widgets_context_settings(parent, tab):
return label_context_settings, button_context_settings
# Generation Settings
-def wl_widgets_measures_wordlist_generator(parent):
+def wl_widgets_measures_wordlist_ngram_generation(parent):
label_measure_dispersion = QLabel(_tr('wl_widgets', 'Measure of dispersion:'), parent)
combo_box_measure_dispersion = wl_boxes.Wl_Combo_Box_Measure(parent, measure_type = 'dispersion')
label_measure_adjusted_freq = QLabel(_tr('wl_widgets', 'Measure of adjusted frequency:'), parent)
@@ -627,7 +627,7 @@ def wl_widgets_measures_wordlist_generator(parent):
label_measure_adjusted_freq, combo_box_measure_adjusted_freq
)
-def wl_widgets_measures_collocation_extractor(parent, tab):
+def wl_widgets_measures_collocation_keyword_extraction(parent, extraction_type):
main = wl_misc.find_wl_main(parent)
label_test_statistical_significance = QLabel(_tr('wl_widgets', 'Test of statistical significance:'), parent)
@@ -641,16 +641,23 @@ def wl_widgets_measures_collocation_extractor(parent, tab):
measure_text = combo_box_test_statistical_significance.itemText(i)
measure_code = wl_measure_utils.to_measure_code(main, 'statistical_significance', measure_text)
- if not main.settings_global['tests_statistical_significance'][measure_code][tab]:
+ if not main.settings_global['tests_statistical_significance'][measure_code][extraction_type]:
combo_box_test_statistical_significance.removeItem(i)
for i in reversed(range(combo_box_measure_bayes_factor.count())):
measure_text = combo_box_measure_bayes_factor.itemText(i)
measure_code = wl_measure_utils.to_measure_code(main, 'bayes_factor', measure_text)
- if not main.settings_global['measures_bayes_factor'][measure_code][tab]:
+ if not main.settings_global['measures_bayes_factor'][measure_code][extraction_type]:
combo_box_measure_bayes_factor.removeItem(i)
+ for i in reversed(range(combo_box_measure_effect_size.count())):
+ measure_text = combo_box_measure_effect_size.itemText(i)
+ measure_code = wl_measure_utils.to_measure_code(main, 'effect_size', measure_text)
+
+ if not main.settings_global['measures_effect_size'][measure_code][extraction_type]:
+ combo_box_measure_effect_size.removeItem(i)
+
return (
label_test_statistical_significance, combo_box_test_statistical_significance,
label_measure_bayes_factor, combo_box_measure_bayes_factor,
diff --git a/wordless/wl_wordlist_generator.py b/wordless/wl_wordlist_generator.py
index 3e6d12b9d..17d4c5544 100644
--- a/wordless/wl_wordlist_generator.py
+++ b/wordless/wl_wordlist_generator.py
@@ -132,7 +132,7 @@ def __init__(self, main):
self.combo_box_measure_dispersion,
self.label_measure_adjusted_freq,
self.combo_box_measure_adjusted_freq
- ) = wl_widgets.wl_widgets_measures_wordlist_generator(self)
+ ) = wl_widgets.wl_widgets_measures_wordlist_ngram_generation(self)
self.checkbox_syllabification.stateChanged.connect(self.generation_settings_changed)
self.combo_box_measure_dispersion.currentTextChanged.connect(self.generation_settings_changed)