Mismatch between `pred_tag` and root list size #7

faraday · 2019-11-29T13:18:45Z

@onurgu Thank you for sharing this project.

In train.py file this reference exists when resolving proper disambiguation for a word:
first_sentence['roots'][word_idx][pred_tag]

However training can provide an incorrect index through pred_tag

An example:
[{'sentence_length': 4, 'surface_forms': ['Ali', 'ata', 'bakabilir', '.'], 'surface_form_lengths': [3, 3, 9, 1], 'roots': [['Ali'], ['at', 'at', 'ata', 'ata'], ['bak', 'bak'], ['.']], 'root_lengths': [[3], [2, 2, 3, 3], [3, 3], [1]], 'morph_tokens': [[['Noun', 'Prop', 'A3sg', 'Pnon', 'Nom']], [['Noun', 'A3sg', 'Pnon', 'Dat'], ['Verb', 'Pos', 'Opt', 'A3sg'], ['Noun', 'A3sg', 'Pnon', 'Nom'], ['Verb', 'Pos', 'Imp', 'A2sg']], [['Verb', 'Pos^DB', 'Verb', 'Able', 'Aor', 'A3sg'], ['Verb', 'Pos^DB', 'Verb', 'Able', 'Aor^DB', 'Adj', 'Zero']], [['Punc']]], 'morph_token_lengths': [[5], [4, 4, 4, 4], [6, 7], [1]]}]

For the word ata , pred_tag can turn up to be 4, leading to a list reference mismatch considering ['at', 'at', 'ata', 'ata']. This list size is 4 as well. pred_tag=4 cannot address a proper list item.

This bug is not related to training data size. I can train a model without this problem using a much smaller sample.

The text was updated successfully, but these errors were encountered:

onurgu · 2019-12-27T08:00:13Z

Hi,

I couldn't reproduce this problem.

When does the condition pred_tag == 4 occurs? Does it happen when the training data size is high?

My model gives this output:

Ali ata bakabilir.
Reading script from "tfeatures.scr"
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%

  *****  LEXICON LOOK-UP  *****


LOOKUP STATISTICS (success with different strategies):
strategy 0:     3 times         (75.00 %)
strategy 1:     1 times         (25.00 %)
strategy 2:     0 times         (0.00 %)
strategy 3:     0 times         (0.00 %)
not found:      0 times         (0.00 %)

corpus size:    4 words
execution time: 0 sec
speed:          4 words/sec

  *****  END OF LEXICON LOOK-UP  *****

file processed
file processed
1/1 [==============================] - 2s
{'surface_form_lengths': [3, 3, 9, 1], 'root_lengths': [[3], [2, 2, 3, 3], [3, 3], [1]], 'surface_forms': [u'Ali', u'ata', u'bakabilir', u'.'], 'morph_token_lengths': [[5], [4, 4, 4, 4], [6, 7], [1]], 'morph_tokens': [[[u'Noun', u'Prop', u'A3sg', u'Pnon', u'Nom']], [[u'Noun', u'A3sg', u'Pnon', u'Dat'], [u'Verb', u'Pos', u'Opt', u'A3sg'], [u'Noun', u'A3sg', u'Pnon', u'Nom'], [u'Verb', u'Pos', u'Imp', u'A2sg']], [[u'Verb', u'Pos^DB', u'Verb', u'Able', u'Aor', u'A3sg'], [u'Verb', u'Pos^DB', u'Verb', u'Able', u'Aor^DB', u'Adj', u'Zero']], [[u'Punc']]], 'sentence_length': 4, 'roots': [[u'Ali'], [u'at', u'at', u'ata', u'ata'], [u'bak', u'bak'], [u'.']]}
Ali Ali+Noun+Prop+A3sg+Pnon+Nom
ata ata+Noun+A3sg+Pnon+Nom
bakabilir bak+Verb+Pos^DB+Verb+Able+Aor+A3sg
. .+Punc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between `pred_tag` and root list size #7

Mismatch between `pred_tag` and root list size #7

faraday commented Nov 29, 2019

onurgu commented Dec 27, 2019

Mismatch between pred_tag and root list size #7

Mismatch between pred_tag and root list size #7

Comments

faraday commented Nov 29, 2019

onurgu commented Dec 27, 2019

Mismatch between `pred_tag` and root list size #7

Mismatch between `pred_tag` and root list size #7