Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence detection for spans in verse context #33

Open
martinpub opened this issue Jun 22, 2021 · 2 comments
Open

Sentence detection for spans in verse context #33

martinpub opened this issue Jun 22, 2021 · 2 comments

Comments

@martinpub
Copy link
Collaborator

In the context of the Nordic Guidelines verse environment, sentence detection interfers with outer verse structure. In the following example, the desired result would be to always operate inside the <span class="line> boundaries, but rather, the whole linegroup becomes a sentence. This happens even in the presence of in-linegroup punctuation such as ".".

<div class="verse">

        <p class="linegroup"><span id="st5-181" class="sentence"><span class="line">Tomten sig å tunet rör</span><br/><span class="line">tyst i afton-stunden,</span><br/><span class="line">där till folk och fä han gör</span><br/><span class="line">nu den vana runden,</span><br/><span class="line">ser, att alla slumra gott,</span><br/><span class="line">alla ha sin kvällsvard fått:</span><br/><span class="line">hästarna och korna,</span><br/><span class="line">Brunte och Gullhorna.</span></span></p>

        <p class="linegroup"><span id="st5-182" class="sentence"><span class="line">Lugn och lycksam dagen gick,</span><br/><span class="line">alla sågos sträva.</span><br/><span class="line">Far sitt gärde upp-plöjt fick,</span><br/><span class="line">mor sågs drällen väva.</span><br/><span class="line">Gamlamor vid rocken spann,</span><br/><span class="line">såg ock till, att brasan brann.</span><br/><span class="line">Drängen grov rabatten,</span><br/><span class="line">tösen hämta' vatten.</span></span></p>

[...]

I think the proper way of splitting a poem in sentences would be to use the lines as guides? Or at least the sentence detection should use the punctuation available even in this context.

Again, this is targeting a construction that is specific to the Nordic Guidelines, and we face a similar problem as discussed in #29.

This is similar to <span class="lic"> as raised in #12.

Any ideas/input @kalaspuffar @bertfrees?

@bertfrees
Copy link
Collaborator

Yes, very similar to #12 and requires a similar solution.

This happens even in the presence of in-linegroup punctuation such as ".".

I think the reason for this is that the lexer sees no spaces between "mor sågs drällen väva." and "Gamlamor ...". It sees "väva.Gamlamor" and therefore doesn't detect a sentence boundary. I'm not sure if this is something that needs to be addressed though, because it's not an issue anymore when you limit the scope of sentences to the "line" spans.

@martinpub
Copy link
Collaborator Author

Thanks for your input @bertfrees. We'll return to this after summer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants