Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted length mismatch #3

Open
timvieira opened this issue Aug 31, 2016 · 1 comment
Open

Unwanted length mismatch #3

timvieira opened this issue Aug 31, 2016 · 1 comment
Assignees
Labels

Comments

@timvieira
Copy link

Why are the following parses considered to have different lengths? I'm guessing it has something to do with a punctuation filter.

GOLD=  
    (ROOT (S (CC And) (NP (NP (NNS rents)) (PP (IN on) (NP (NP (NNP Beverly) (NNP Hills)
    (POS ')) (NNP Rodeo) (NNP Drive)))) (ADVP (RB generally)) (VP (VBP do) (RB n't) 
    (VP (VB exceed)  (NP (NP (RB about) ($ $) (CD 125)) (NP (DT a) (JJ square) 
    (NN foot))))) (. .)))
TEST= 
    (ROOT (S (CC And) (NP (NP (NNS rents)) (PP (IN on) (NP (NNP Beverly) (NNP Hills)))) 
    ('' ')  (NP (NNP Rodeo) (NNP Drive)) (ADVP (RB generally)) (VP (VBP do) (RB n't) 
    (VP (VB exceed) (NP (NP (QP (IN about) ($ $) (CD 125))) (NP (DT a) (NN square) 
    (NN foot))))) (. .)))

In this case, I think the TEST parse drops the token ('' '), but the GOLD parse does not because it is has a possessive tag.

@jkkummerfeld
Copy link
Owner

That's right, the punctuation dropping is based on POS tags. The motivation was to follow evalb, which does it based on tags too.

For punctuation that can be deterministically mapped to a POS tag the code ensures consistency (see

word_to_POS_mapping = {
) but it's less clear what to do for something like quotes. The simplest solution may be to have an option to remove based on tokens instead of tags, though I don't have time to work on that right this minute (I'll keep this issue open though so I can get to it in future).

@jkkummerfeld jkkummerfeld self-assigned this Aug 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants