diff --git a/CHANGELOG.md b/CHANGELOG.md
index b86e18e8d..5c57e3373 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -28,6 +28,9 @@
- File Area: Fix Open Files - Opening Non-text Files - Do not show this again
- Utils: Fix Wordless's Japanese kanji tokenizer
+### ❌ Removals
+- Measures: Remove effect size - Log-frequency biased MD / Mutual Dependency
+
## [3.5.0](https://github.com/BLKSerene/Wordless/releases/tag/3.5.0) - 07/01/2024
### 🎉 New Features
- File Area: Add support for .lrc and .pptx files
diff --git a/doc/doc.md b/doc/doc.md
index ce402312f..60d283815 100644
--- a/doc/doc.md
+++ b/doc/doc.md
@@ -1495,9 +1495,6 @@ Kilgarriff's ratio:
logDice:
\text{logDice} = 14 + \log_{2} \frac{2 \times O_{11}}{O_{1x} + O_{x1}}
-Log-frequency biased MD:
- \text{LFMD} = \log_{2} \frac{O_{11}}{E_{11}} + \log_{2} O_{11}
-
Log Ratio:
\text{Log Ratio} = \log_{2} \frac{\frac{O_{11}}{O_{x1}}}{\frac{O_{12}}{O_{x2}}}
@@ -1507,9 +1504,6 @@ MI.log-f:
Minimum sensitivity:
\text{S} = \min\left\{\frac{O_{11}}{O_{1x}},\;\frac{O_{11}}{O_{x1}}\right\}
-Mutual Dependency:
- \text{MD} = \log_{2} \frac{{O_{11}}^2}{E_{11}}
-
Mutual Expectation:
\text{ME} = O_{11} \times \frac{2 \times O_{11}}{O_{1x} + O_{x1}}
@@ -1538,27 +1532,30 @@ Squared phi coefficient:
Measure of Effect Size|Formula|Collocation Extraction|Keyword Extraction
----------------------|-------|:--------------------:|:----------------:
Conditional probability
([Durrant, 2008, p. 84](#ref-durrant-2008))|![Formula](/doc/measures/effect_size/conditional_probability.svg)|✔|✖️
-Cubic association ratio
([Daille, 1994, p. 139](#ref-daille-1994); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im3.svg)|✔|✔
ΔP
([Gries, 2013](#ref-gries-2013))|![Formula](/doc/measures/effect_size/delta_p.svg)|✔|✖️
Dice-Sørensen coefficient
([Smadja et al., 1996, p. 8](#ref-smadja-et-al-1996))|![Formula](/doc/measures/effect_size/dice_sorensen_coeff.svg)|✔|✖️
Difference coefficient
([Hofland & Johansson, 1982, p. 14](#ref-hofland-johansson-1982); [Gabrielatos, 2018, p. 236](#ref-gabrielatos-2018))|![Formula](/doc/measures/effect_size/diff_coeff.svg)|✖️|✔
Jaccard index
([Dunning, 1998, p. 48](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/jaccard_index.svg)|✔|✖️
Kilgarriff's ratio
([Kilgarriff, 2009](#ref-kilgarriff-2009))|![Formula](/doc/measures/effect_size/kilgarriffs_ratio.svg)
where **α** is the smoothing parameter, whose value could be changed via **Menu Bar → Preferences → Settings → Measures → Effect Size → Kilgarriff's Ratio → Smoothing Parameter**.|✖️|✔
logDice
([Rychlý, 2008, p. 9](#ref-rychly-2008))|![Formula](/doc/measures/effect_size/log_dice.svg)|✔|✖️
-Log-frequency biased MD
([Thanopoulos et al., 2002, p. 621](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/lfmd.svg)|✔|✖️
Log Ratio
([Hardie, 2014](#ref-hardie-2014))|![Formula](/doc/measures/effect_size/log_ratio.svg)|✔|✔
MI.log-f
([Kilgarriff & Tugwell, 2002](#ref-kilgarriff-tugwell-2002); [Lexical Computing Ltd., 2015, p. 4](#ref-lexical-computing-ltd-2015))|![Formula](/doc/measures/effect_size/mi_log_f.svg)|✔|✖️
Minimum sensitivity
([Pedersen, 1998](#ref-pedersen-1998))|![Formula](/doc/measures/effect_size/min_sensitivity.svg)|✔|✖️
-Mutual Dependency
([Thanopoulos et al., 2002, p. 621](#ref-thanopoulos-et-al-2002))|![Formula](/doc/measures/effect_size/md.svg)|✔|✖️
Mutual Expectation
([Dias et al., 1999](#ref-dias-et-al-1999))|![Formula](/doc/measures/effect_size/me.svg)|✔|✖️
Mutual information
([Dunning, 1998, pp. 49–52](#ref-dunning-1998))|![Formula](/doc/measures/effect_size/mi.svg)|✔|✖️
Odds ratio
([Pecina, 2005, p. 15](#ref-pecina-2005), [Pojanapunya & Todd, 2016](#ref-pojanapunya-todd-2016))|![Formula](/doc/measures/effect_size/odds_ratio.svg)|✔|✔
%DIFF
([Gabrielatos & Marchi, 2011](#ref-gabrielatos-marchi-2011))|![Formula](/doc/measures/effect_size/pct_diff.svg)|✖️|✔
Pointwise mutual information
([Church & Hanks, 1990](#ref-church-hanks-1990); [Kilgarriff, 2001, pp. 104–105](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/pmi.svg)|✔|✔
+Pointwise mutual information (cubic)**¹**
([Daille, 1994, p. 139](#ref-daille-1994); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im3.svg)|✔|✔
+Pointwise mutual information (squared)**¹**
([Daille, 1995, p. 21](#ref-daille-1995); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im2.svg)|✔|✔
Poisson collocation measure
([Quasthoff & Wolff, 2002](#ref-quasthoff-wolff-2002))|![Formula](/doc/measures/effect_size/poisson_collocation_measure.svg)|✔|✖️
-Squared association ratio
([Daille, 1995, p. 21](#ref-daille-1995); [Kilgarriff, 2001, p, 99](#ref-kilgarriff-2001))|![Formula](/doc/measures/effect_size/im2.svg)|✔|✔
Squared phi coefficient
([Church & Gale, 1991](#ref-church-gale-1991))|![Formula](/doc/measures/effect_size/squared_phi_coeff.svg)|✔|✖️
+> [!NOTE]
+1. The calculation of *Pointwise mutual information (squared)* and *pointwise mutual information (cubic)* are exactly the same as that of *Mutual Dependency* and *Log-frequency biased MD* respectively which were proposed in:
+
+
Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), *Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)* (pp. 620–625). European Language Resources Association.
+
## [13 References](#doc)
@@ -1778,8 +1775,6 @@ Linguistic Computing Bulletin*, *7*(2), 172–177.
1. [**^**](#ref-num-words-spache) [**^**](#ref-spache-readability-formula) Spache, G. (1974). *Good reading for poor readers* (Rev. 9th ed.). Garrard.
1. [**^**](#ref-re) Szigriszt Pazos, F. (1993). *Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad* [Doctoral dissertation, Complutense University of Madrid]. Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y
-
-1. [**^**](#ref-lfmd) [**^**](#ref-md) Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), *Proceedings of the Third International Conference on Language Resources and Evaluation* (pp. 620–625). European Language Resources Association.
1. [**^**](#ref-trankle-bailers-readability-formula) Tränkle, U., & Bailer, H. (1984). Kreuzvalidierung und neuberechnung von lesbarkeitsformeln für die Deutsche sprache. *Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie*, *16*(3), 231–244.
diff --git a/doc/measures/effect_size/lfmd.svg b/doc/measures/effect_size/lfmd.svg
deleted file mode 100644
index 88f886d18..000000000
--- a/doc/measures/effect_size/lfmd.svg
+++ /dev/null
@@ -1,45 +0,0 @@
-
-
-
\ No newline at end of file
diff --git a/doc/measures/effect_size/md.svg b/doc/measures/effect_size/md.svg
deleted file mode 100644
index 8d317ea7c..000000000
--- a/doc/measures/effect_size/md.svg
+++ /dev/null
@@ -1,33 +0,0 @@
-
-
-
\ No newline at end of file
diff --git a/tests/tests_measures/test_measures_effect_size.py b/tests/tests_measures/test_measures_effect_size.py
index 1cb937a16..963f52437 100644
--- a/tests/tests_measures/test_measures_effect_size.py
+++ b/tests/tests_measures/test_measures_effect_size.py
@@ -50,9 +50,6 @@ def test_conditional_probability():
assert_zeros(wl_measures_effect_size.conditional_probability)
-def test_im3():
- assert_zeros(wl_measures_effect_size.im3)
-
# Reference: Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165. https://doi.org/10.1075/ijcl.18.1.09gri | p. 144
def test_delta_p():
numpy.testing.assert_array_equal(
@@ -119,9 +116,6 @@ def test_kilgarriffs_ratio():
def test_log_dice():
assert_zeros(wl_measures_effect_size.log_dice, result = 14)
-def test_lfmd():
- assert_zeros(wl_measures_effect_size.lfmd)
-
# Reference: Hardie, A. (2014, April 28). Log Ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/.
def test_log_ratio():
numpy.testing.assert_array_equal(
@@ -164,9 +158,6 @@ def test_min_sensitivity():
assert_zeros(wl_measures_effect_size.min_sensitivity)
-def test_md():
- assert_zeros(wl_measures_effect_size.md)
-
def test_me():
assert_zeros(wl_measures_effect_size.me)
@@ -248,12 +239,15 @@ def test_pmi():
assert_zeros(wl_measures_effect_size.pmi)
-def test_poisson_collocation_measure():
- assert_zeros(wl_measures_effect_size.poisson_collocation_measure)
+def test_im3():
+ assert_zeros(wl_measures_effect_size.im3)
def test_im2():
assert_zeros(wl_measures_effect_size.im2)
+def test_poisson_collocation_measure():
+ assert_zeros(wl_measures_effect_size.poisson_collocation_measure)
+
# Reference: Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom.
def test_squared_phi_coeff():
numpy.testing.assert_array_equal(
@@ -271,23 +265,21 @@ def test_squared_phi_coeff():
if __name__ == '__main__':
test_conditional_probability()
- test_im3()
test_delta_p()
test_dice_sorensen_coeff()
test_diff_coeff()
test_jaccard_index()
test_kilgarriffs_ratio()
test_log_dice()
- test_lfmd()
test_log_ratio()
test_mi_log_f()
test_min_sensitivity()
- test_md()
test_me()
test_mi()
test_odds_ratio()
test_pct_diff()
test_pmi()
- test_poisson_collocation_measure()
+ test_im3()
test_im2()
+ test_poisson_collocation_measure()
test_squared_phi_coeff()
diff --git a/wordless/wl_measures/wl_measures_effect_size.py b/wordless/wl_measures/wl_measures_effect_size.py
index 922bc5b7a..98939dbaa 100644
--- a/wordless/wl_measures/wl_measures_effect_size.py
+++ b/wordless/wl_measures/wl_measures_effect_size.py
@@ -29,13 +29,6 @@ def conditional_probability(main, o11s, o12s, o21s, o22s):
return wl_measure_utils.numpy_divide(o11s, ox1s) * 100
-# Cubic association ratio
-# Reference: Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= | p. 139
-def im3(main, o11s, o12s, o21s, o22s):
- e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
-
- return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 3, e11s))
-
# ΔP
# Reference: Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165. https://doi.org/10.1075/ijcl.18.1.09gri
def delta_p(main, o11s, o12s, o21s, o22s):
@@ -88,13 +81,6 @@ def log_dice(main, o11s, o12s, o21s, o22s):
return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(2 * o11s, o1xs + ox1s), default = 14)
-# Log-frequency biased MD
-# Reference: Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620–625). European Language Resources Association. | p. 621
-def lfmd(main, o11s, o12s, o21s, o22s):
- e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
-
- return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s)) + wl_measure_utils.numpy_log2(o11s)
-
# Log Ratio
# Reference: Hardie, A. (2014, April 28). Log Ratio: An informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/
def log_ratio(main, o11s, o12s, o21s, o22s):
@@ -134,13 +120,6 @@ def min_sensitivity(main, o11s, o12s, o21s, o22s):
wl_measure_utils.numpy_divide(o11s, ox1s)
)
-# Mutual Dependency
-# Reference: Thanopoulos, A, Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González, & C. P. S. Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620–625). European Language Resources Association. | p. 621
-def md(main, o11s, o12s, o21s, o22s):
- e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
-
- return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s))
-
# Mutual Expectation
# Reference: Dias, G., Guilloré, S., & Pereira Lopes, J. G. (1999). Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In A. Condamines, C. Fabre, & M. Péry-Woodley (Eds.), TALN'99: 6ème Conférence Annuelle Sur le Traitement Automatique des Langues Naturelles (pp. 333–339). TALN.
def me(main, o11s, o12s, o21s, o22s):
@@ -202,6 +181,20 @@ def pmi(main, o11s, o12s, o21s, o22s):
return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s, e11s))
+# Pointwise mutual information (cubic)
+# Reference: Daille, B. (1994). Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid= | p. 139
+def im3(main, o11s, o12s, o21s, o22s):
+ e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
+
+ return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 3, e11s))
+
+# Pointwise mutual information (squared)
+# Reference: Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University. | p. 21
+def im2(main, o11s, o12s, o21s, o22s):
+ e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
+
+ return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s))
+
# Poisson collocation measure
# Reference: Quasthoff, U., & Wolff, C. (2002). The poisson collocation measure and its applications. Proceedings of 2nd International Workshop on Computational Approaches to Collocations. IEEE.
def poisson_collocation_measure(main, o11s, o12s, o21s, o22s):
@@ -213,13 +206,6 @@ def poisson_collocation_measure(main, o11s, o12s, o21s, o22s):
wl_measure_utils.numpy_log(oxxs)
)
-# Squared association ratio
-# Reference: Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. UCREL technical papers (Vol. 5). Lancaster University. | p. 21
-def im2(main, o11s, o12s, o21s, o22s):
- e11s, _, _, _ = wl_measures_statistical_significance.get_freqs_expected(o11s, o12s, o21s, o22s)
-
- return wl_measure_utils.numpy_log2(wl_measure_utils.numpy_divide(o11s ** 2, e11s))
-
# Squared phi coefficient
# Reference: Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom.
def squared_phi_coeff(main, o11s, o12s, o21s, o22s):
diff --git a/wordless/wl_settings/wl_settings_global.py b/wordless/wl_settings/wl_settings_global.py
index 8ebcfe2f1..c36c6af86 100644
--- a/wordless/wl_settings/wl_settings_global.py
+++ b/wordless/wl_settings/wl_settings_global.py
@@ -3594,25 +3594,23 @@ def init_settings_global():
'effect_size': {
_tr('wl_settings_global', 'None'): 'none',
_tr('wl_settings_global', 'Conditional probability'): 'conditional_probability',
- _tr('wl_settings_global', 'Cubic association ratio'): 'im3',
'ΔP': 'delta_p',
_tr('wl_settings_global', 'Dice-Sørensen coefficient'): 'dice_sorensen_coeff',
_tr('wl_settings_global', 'Difference coefficient'): 'diff_coeff',
_tr('wl_settings_global', 'Jaccard index'): 'jaccard_index',
_tr('wl_settings_global', "Kilgarriff's ratio"): 'kilgarriffs_ratio',
'logDice': 'log_dice',
- _tr('wl_settings_global', 'Log-frequency biased MD'): 'lfmd',
_tr('wl_settings_global', 'Log Ratio'): 'log_ratio',
'MI.log-f': 'mi_log_f',
_tr('wl_settings_global', 'Minimum sensitivity'): 'min_sensitivity',
- _tr('wl_settings_global', 'Mutual Dependency'): 'md',
_tr('wl_settings_global', 'Mutual Expectation'): 'me',
_tr('wl_settings_global', 'Mutual information'): 'mi',
_tr('wl_settings_global', 'Odds ratio'): 'or',
'%DIFF': 'pct_diff',
_tr('wl_settings_global', 'Pointwise mutual information'): 'pmi',
+ _tr('wl_settings_global', 'Pointwise mutual information (cubic)'): 'im3',
+ _tr('wl_settings_global', 'Pointwise mutual information (squared)'): 'im2',
_tr('wl_settings_global', 'Poisson collocation measure'): 'poisson_collocation_measure',
- _tr('wl_settings_global', 'Squared association ratio'): 'im2',
_tr('wl_settings_global', 'Squared phi coefficient'): 'squared_phi_coeff'
}
},
@@ -3849,13 +3847,6 @@ def init_settings_global():
'keyword': False
},
- 'im3': {
- 'col_text': 'IM³',
- 'func': wl_measures_effect_size.im3,
- 'collocation': True,
- 'keyword': True
- },
-
'delta_p': {
'col_text': 'ΔP',
'func': wl_measures_effect_size.delta_p,
@@ -3898,13 +3889,6 @@ def init_settings_global():
'keyword': False
},
- 'lfmd': {
- 'col_text': 'LFMD',
- 'func': wl_measures_effect_size.lfmd,
- 'collocation': True,
- 'keyword': False
- },
-
'log_ratio': {
'col_text': _tr('wl_settings_global', 'Log Ratio'),
'func': wl_measures_effect_size.log_ratio,
@@ -3926,13 +3910,6 @@ def init_settings_global():
'keyword': False
},
- 'md': {
- 'col_text': 'MD',
- 'func': wl_measures_effect_size.md,
- 'collocation': True,
- 'keyword': False
- },
-
'me': {
'col_text': 'ME',
'func': wl_measures_effect_size.me,
@@ -3968,11 +3945,11 @@ def init_settings_global():
'keyword': True
},
- 'poisson_collocation_measure': {
- 'col_text': _tr('wl_settings_global', 'Poisson Collocation Measure'),
- 'func': wl_measures_effect_size.poisson_collocation_measure,
+ 'im3': {
+ 'col_text': 'IM³',
+ 'func': wl_measures_effect_size.im3,
'collocation': True,
- 'keyword': False
+ 'keyword': True
},
'im2': {
@@ -3982,6 +3959,13 @@ def init_settings_global():
'keyword': True
},
+ 'poisson_collocation_measure': {
+ 'col_text': _tr('wl_settings_global', 'Poisson Collocation Measure'),
+ 'func': wl_measures_effect_size.poisson_collocation_measure,
+ 'collocation': True,
+ 'keyword': False
+ },
+
'squared_phi_coeff': {
'col_text': 'φ2',
'func': wl_measures_effect_size.squared_phi_coeff,