This automatically-generated file contains references from the main union bibliography that have been filtered for a single tag. Do not edit this file; instead, please update the main bibliography and tag references appropriately to have them show up here. Thank you!
The papers are listed in the same order as the main bibliography; e.g., by year of publication / release; then by surname / name of the first author.
- Kirk, H. R., Vidgen, B., Röttger, P., Thrush, T., and Hale, S. A. (2023). Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. (NAACL '23') 10.18653/v1/2022.naacl-main.97 [paper]
- Tan, S., Joty, S., Baxter, K., Taeihagh, A., Bennett, G. A., & Kan, M. Y. (2021). Reliability Testing for Natural Language Processing Systems. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4153–4169, Online. Association for Computational Linguistics. doi:10.18653/v1/2021.acl-long.321 [paper]
- Caglayan, O., Madhyastha, P., & Specia, L. (2020). Curious case of language generation evaluation metrics: A cautionary tale. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2322–2328, Barcelona, Spain (Online). International Committee on Computational Linguistics. doi:10.18653/v1/2020.coling-main.210. [paper]
- Ethayarajh, K., & Jurafsky, D. (2020, November). Utility is in the eye of the user: A critique of NLP leaderboards. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) doi:10.18653/v1/2020.emnlp-main.393. [paper]
- Garnerin, M., Rossato, S., & Besacier, L. (2020). Pratiques d’évaluation en ASR et biais de performance (Evaluation methodology in ASR and performance bias). In Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL) (pp. 1-9). [paper]
- Goldfarb-Tarrant, S., Marchant, R., Sanchez, R. M., Pandya, M., & Lopez, A. (2020). Intrinsic bias metrics do not correlate with application bias. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) [paper].
- Linzen, T. (2020, July). How can we accelerate progress towards human-like linguistic generalization?. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.465 [paper]
- Mathur, N., Baldwin, T., & Cohn, T. (2020, July). Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.448 [paper]
- Nangia, N., Vania, C., Bhalerao, R., & Bowman, S. R. (2020, November). CrowS-pairs: A challenge dataset for measuring social biases in masked language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.18653/v1/2020.emnlp-main.154 [paper]
- Bregeon, D., Antoine, J. Y., Villaneau, J., & Lefeuvre-Halftermeyer, A. (2019). Redonner du sens à l’accord interannotateurs: vers une interprétation des mesures d’accord en termes de reproductibilité de l’annotation. Traitement Automatique des Langues, 60(2), 23. [paper]
- Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
- Mathet, Y., & Widlöcher, A. (2016). Évaluation des annotations: ses principes et ses pièges. Traitement Automatique des Langues, 57(2), 73-98. [paper]