Skip to content
This repository has been archived by the owner on Jul 11, 2019. It is now read-only.

XML File for Carrot2 #17

Open
Quoniam opened this issue Jul 10, 2017 · 2 comments
Open

XML File for Carrot2 #17

Quoniam opened this issue Jul 10, 2017 · 2 comments
Assignees

Comments

@Quoniam
Copy link
Collaborator

Quoniam commented Jul 10, 2017

Some characters in text (abstrats, descriptions, claims) make a non correct XML file to be use by Carrot2

&,<,>
Sure, but may be others, they are seen as an XML control character, so all of this control characters are to be treated

@cvanderlei cvanderlei self-assigned this Jul 19, 2017
@cvanderlei
Copy link
Contributor

Do anybody have any suggestion how to identify the others characters to be treated?

@rfaga
Copy link
Collaborator

rfaga commented Jul 21, 2017

@cvanderlei
I think we could use standard lib xml.sax to escape all inserted xml content:

https://wiki.python.org/moin/EscapingXml

I sent a possible fix on 1db5544 , but need more tests

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants