recipe and template to usual manuscript real tests #102

ppKrauss · 2016-11-24T15:17:29Z

Hi, I was testing meTypeset with the basic UBUNTU installation and it works fine with 99% of the .docx samples of test folder.

Now I testing "real life" article.docx ... But not see (some clues?) how to transform into NLM XML:

there are examples or documentation describing the template or "template pattern" of typical scientific articles accepted by meTypeset?
there are options to indicate another template? any config file?

The text was updated successfully, but these errors were encountered:

MartinPaulEve · 2016-11-25T08:53:23Z

Hi Peter, I don't quite understand the question here... meTypeset is supposed to parse arbitrary document input and produce structured JATS output... Best wishes, Martin

…

On 24/11/16 15:17, Peter wrote: Hi, I was testing /meTypeset/ with the basic UBUNTU installation <https://github.com/MartinPaulEve/meTypeset/blob/master/INSTALL.md#basic-metypeset-instalation> and it works fine with 99% of the |.docx| samples of test <https://github.com/MartinPaulEve/meTypeset/tree/master/tests> folder. Now I testing "real life" |article.docx| ... But not see (some clues?) how to transform into NLM XML: * there are examples or documentation describing the template or "template pattern" of typical scientific articles accepted by meTypeset? * there are options to indicate another template? any config file? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#102>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_ot1SF_e9CR2B2URooj8ou0yWlHhw3ks5rBaqKgaJpZM4K7wWM>.

-- Professor Martin Paul Eve Chair of Literature, Technology and Publishing Birkbeck, University of London T: 0203 073 8420 E: [email protected] W: https://www.martineve.com R: 416, 43 Gordon Square, London, WC1H 0PD Books: https://www.martineve.com/books/ Articles: https://www.martineve.com/c-v/ Series Editor: New Horizons in Contemporary Writing (Bloomsbury) Director, Birkbeck Centre for Technology and Publishing Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit (https://www.pynchon.net) Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)

ppKrauss · 2016-11-25T10:23:05Z

Hi, thanks the reply (!). Well, let's try a test-case: an aleatory PLOS article pbio.2000638. Its copy was adapted here to GoogleDocs (we can edit), and have a docx option for download.

ppKrauss · 2016-11-28T23:31:38Z

See our UBUNTU installation sandbox, and its results, like this sample,
http://jats.science/testApproach-v02/nlm/out.xml
(it comes from this source article converted to equivalent docx).

As this meTypeset description,

... In effect, this recurses through the document a few times, identifies where Word's internal headings have already been applied, and corroborates those with the use of Bold, Italics, Underlining, and font face and size changes to work out the nesting of various headings and sub-headings in the document. At the end of this stage, the expectation is that we'd have the Introduction, Methods, Results, and (key to this discussion) References section as top-level headings (or example, that is; none of these are semantically hardcoded) and any subsections beneath them.

so, I will not check metadata (title, authors, affiliation, abstract) but can check <body> structure... Some curious results:

a text fragment "value of guessing is therefore 0.764 * 60 + (1–0.764) * (-50) = 34" was marked as
... <xref rid="ID5196d4ec-e9fe-49bf-81a6-dd23784755d4" ref-type="bibr">1–0.764</xref>) * ...: there are no reference-list item with the "1–0.764" label.
the paragraph "By contrast, sampling further information has a fixed probability (0.1) of transitioning into one of 10 possible subsequent states..." was converted to <label>By contrast, sampling further information has a fixed probability ( 0</label> <caption> <p>1) of transitioning into one of...
...

MartinPaulEve · 2016-11-29T09:13:09Z

Ah, now this is interesting! The first one should be fairly easy to create a small test document. A paragraph of text such as: This is a test paragraph (1-12). This is another: 5 + (1-0.764). Should have the same result. Once we have a test document, we can write a test and implement a fix (need to distinguish the latter mathematical equation from the former page references.) The second one is more complex. Not sure what's happening there. I'd need to narrow this down to a test case... Best wishes, Martin

…

On 28/11/16 23:31, Peter wrote: See our UBUNTU installation <https://github.com/MartinPaulEve/meTypeset/blob/master/INSTALL.md> sandbox, and its results, like this sample, *http://jats.science/testApproach-v02/nlm/out.xml* (it comes from this source article <https://docs.google.com/document/d/19LPDYbfuOmd1MWz7lfX00zAihUAfXptRN_7_7Um2uSs/> converted to equivalent docx <http://jats.science/testApproach-v02/Approach-InducedBiases-v02.docx>). As this /meTypeset/ description <https://www.ncbi.nlm.nih.gov/books/NBK279666/>, ... In effect, this recurses through the document a few times, identifies where Word's internal headings have already been applied, and corroborates those with the use of Bold, Italics, Underlining, and font face and size changes to work out the nesting of various headings and sub-headings in the document. At the end of this stage, the expectation is that we'd have the Introduction, Methods, Results, and (key to this discussion) References section as top-level headings (or example, that is; none of these are semantically hardcoded) and any subsections beneath them. so, I will not check metadata (title, authors, affiliation, abstract) but can check |<body>| structure... Some curious results: * a text fragment /"value of guessing is therefore 0.764 * 60 + (1–0.764) * (-50) = 34"/ was marked as |... <xref rid="ID5196d4ec-e9fe-49bf-81a6-dd23784755d4" ref-type="bibr">1–0.764</xref>) * ...|: there are no reference-list item with the "1–0.764" label. * the paragraph /"By contrast, sampling further information has a fixed probability (0.1) of transitioning into one of 10 possible subsequent states..."/ was converted to |<label>By contrast, sampling further information has a fixed probability ( 0</label> <caption> <p>1) of transitioning into one of...| * ... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#102 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_otxjnGyEPgq8L6Qsk_MCz1i-PpHw1ks5rC2RagaJpZM4K7wWM>.

-- Professor Martin Paul Eve Chair of Literature, Technology and Publishing Birkbeck, University of London T: 0203 073 8420 E: [email protected] W: https://www.martineve.com R: 416, 43 Gordon Square, London, WC1H 0PD Books: https://www.martineve.com/books/ Articles: https://www.martineve.com/c-v/ Series Editor: New Horizons in Contemporary Writing (Bloomsbury) Director, Birkbeck Centre for Technology and Publishing Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit (https://www.pynchon.net) Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)

ppKrauss · 2016-11-29T11:47:05Z

Hi @MartinPaulEve,

about "create a small test document": perhaps a set of small-test-cases ... so I can create a new github repo for it, and perhaps later we can select some to add in your /tests
about small-test-case "test paragraph (1-12)... (1-0.764)", there are two complementar approaches for a solution:
- to define valid style of a documents, because can be other, like "[1-12]", superscript text, etc.
  (and an statistic method to detect style of a set of same-style sample of documents)
- to detect first the reference-list and the list of valid ref labels: if eg. "0.764" is not a valid label, it will not be marked as ref link.
about more complex: I will only report... Prefer to understand and solve simplest first ;-)

MartinPaulEve · 2016-11-29T15:43:00Z

Hi Peter, Thanks for this. It would be great if you could create some test cases. I'm absolutely swamped with other work at the moment. Do see how we've created other tests using Robot and feel free to build on that... Best wishes, Martin

…

On 29/11/16 11:47, Peter wrote: Hi @MartinPaulEve <https://github.com/MartinPaulEve>, * about "create a small test document": perhaps a set of small-test-cases ... so I can create a new github repo for it, and perhaps later we can select some to add in your /tests * about small-test-case "test paragraph (1-12)... (1-0.764)", there are two complementar approaches for a solution: o to define /valid style/ of a documents, because can be other, like "[1-12]", superscript text, etc. (and an statistic method to detect style of a set of same-style sample of documents) o to detect first the reference-list and the list of /valid ref labels/: if eg. "0.764" is not a valid label, it will not be marked as ref link. * about more complex: I will only report... Prefer to understand and solve simplest first ;-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#102 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_ot63mR69lHbfnRmPGdCZN8VCalEQaks5rDBC6gaJpZM4K7wWM>.

-- Professor Martin Paul Eve Chair of Literature, Technology and Publishing Birkbeck, University of London T: 0203 073 8420 E: [email protected] W: https://www.martineve.com R: 416, 43 Gordon Square, London, WC1H 0PD Books: https://www.martineve.com/books/ Articles: https://www.martineve.com/c-v/ Series Editor: New Horizons in Contemporary Writing (Bloomsbury) Director, Birkbeck Centre for Technology and Publishing Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit (https://www.pynchon.net) Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recipe and template to usual manuscript real tests #102

recipe and template to usual manuscript real tests #102

ppKrauss commented Nov 24, 2016

MartinPaulEve commented Nov 25, 2016 via email

ppKrauss commented Nov 25, 2016

ppKrauss commented Nov 28, 2016

MartinPaulEve commented Nov 29, 2016 via email

ppKrauss commented Nov 29, 2016

MartinPaulEve commented Nov 29, 2016 via email

recipe and template to usual manuscript real tests #102

recipe and template to usual manuscript real tests #102

Comments

ppKrauss commented Nov 24, 2016

MartinPaulEve commented Nov 25, 2016 via email

ppKrauss commented Nov 25, 2016

ppKrauss commented Nov 28, 2016

MartinPaulEve commented Nov 29, 2016 via email

ppKrauss commented Nov 29, 2016

MartinPaulEve commented Nov 29, 2016 via email