-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recipe and template to usual manuscript real tests #102
Comments
Hi Peter,
I don't quite understand the question here... meTypeset is supposed to
parse arbitrary document input and produce structured JATS output...
Best wishes,
Martin
…On 24/11/16 15:17, Peter wrote:
Hi, I was testing /meTypeset/ with the basic UBUNTU installation
<https://github.com/MartinPaulEve/meTypeset/blob/master/INSTALL.md#basic-metypeset-instalation>
and it works fine with 99% of the |.docx| samples of test
<https://github.com/MartinPaulEve/meTypeset/tree/master/tests> folder.
Now I testing "real life" |article.docx| ... But not see (some clues?)
how to transform into NLM XML:
*
there are examples or documentation describing the template or
"template pattern" of typical scientific articles accepted by meTypeset?
*
there are options to indicate another template? any config file?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#102>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AA_ot1SF_e9CR2B2URooj8ou0yWlHhw3ks5rBaqKgaJpZM4K7wWM>.
--
Professor Martin Paul Eve
Chair of Literature, Technology and Publishing
Birkbeck, University of London
T: 0203 073 8420
E: [email protected]
W: https://www.martineve.com
R: 416, 43 Gordon Square, London, WC1H 0PD
Books: https://www.martineve.com/books/
Articles: https://www.martineve.com/c-v/
Series Editor: New Horizons in Contemporary Writing (Bloomsbury)
Director, Birkbeck Centre for Technology and Publishing
Founder, Open Library of the Humanities (https://www.openlibhums.org)
Chief Editor, Orbit (https://www.pynchon.net)
Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)
|
Hi, thanks the reply (!). Well, let's try a test-case: an aleatory PLOS article pbio.2000638. Its copy was adapted here to GoogleDocs (we can edit), and have a docx option for download. |
See our UBUNTU installation sandbox, and its results, like this sample, As this meTypeset description,
so, I will not check metadata (title, authors, affiliation, abstract) but can check
|
Ah, now this is interesting!
The first one should be fairly easy to create a small test document. A
paragraph of text such as:
This is a test paragraph (1-12). This is another: 5 + (1-0.764).
Should have the same result. Once we have a test document, we can write
a test and implement a fix (need to distinguish the latter mathematical
equation from the former page references.)
The second one is more complex. Not sure what's happening there. I'd
need to narrow this down to a test case...
Best wishes,
Martin
…On 28/11/16 23:31, Peter wrote:
See our UBUNTU installation
<https://github.com/MartinPaulEve/meTypeset/blob/master/INSTALL.md>
sandbox, and its results, like this sample,
*http://jats.science/testApproach-v02/nlm/out.xml*
(it comes from this source article
<https://docs.google.com/document/d/19LPDYbfuOmd1MWz7lfX00zAihUAfXptRN_7_7Um2uSs/>
converted to equivalent docx
<http://jats.science/testApproach-v02/Approach-InducedBiases-v02.docx>).
As this /meTypeset/ description
<https://www.ncbi.nlm.nih.gov/books/NBK279666/>,
... In effect, this recurses through the document a few times,
identifies where Word's internal headings have already been applied,
and corroborates those with the use of Bold, Italics, Underlining,
and font face and size changes to work out the nesting of various
headings and sub-headings in the document. At the end of this stage,
the expectation is that we'd have the Introduction, Methods,
Results, and (key to this discussion) References section as
top-level headings (or example, that is; none of these are
semantically hardcoded) and any subsections beneath them.
so, I will not check metadata (title, authors, affiliation, abstract)
but can check |<body>| structure... Some curious results:
*
a text fragment /"value of guessing is therefore 0.764 * 60 +
(1–0.764) * (-50) = 34"/ was marked as
|... <xref rid="ID5196d4ec-e9fe-49bf-81a6-dd23784755d4"
ref-type="bibr">1–0.764</xref>) * ...|: there are no reference-list
item with the "1–0.764" label.
*
the paragraph /"By contrast, sampling further information has a
fixed probability (0.1) of transitioning into one of 10 possible
subsequent states..."/ was converted to |<label>By contrast,
sampling further information has a fixed probability ( 0</label>
<caption> <p>1) of transitioning into one of...|
*
...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_otxjnGyEPgq8L6Qsk_MCz1i-PpHw1ks5rC2RagaJpZM4K7wWM>.
--
Professor Martin Paul Eve
Chair of Literature, Technology and Publishing
Birkbeck, University of London
T: 0203 073 8420
E: [email protected]
W: https://www.martineve.com
R: 416, 43 Gordon Square, London, WC1H 0PD
Books: https://www.martineve.com/books/
Articles: https://www.martineve.com/c-v/
Series Editor: New Horizons in Contemporary Writing (Bloomsbury)
Director, Birkbeck Centre for Technology and Publishing
Founder, Open Library of the Humanities (https://www.openlibhums.org)
Chief Editor, Orbit (https://www.pynchon.net)
Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)
|
Hi @MartinPaulEve,
|
Hi Peter,
Thanks for this.
It would be great if you could create some test cases. I'm absolutely
swamped with other work at the moment. Do see how we've created other
tests using Robot and feel free to build on that...
Best wishes,
Martin
…On 29/11/16 11:47, Peter wrote:
Hi @MartinPaulEve <https://github.com/MartinPaulEve>,
*
about "create a small test document": perhaps a set of
small-test-cases ... so I can create a new github repo for it, and
perhaps later we can select some to add in your /tests
*
about small-test-case "test paragraph (1-12)... (1-0.764)", there
are two complementar approaches for a solution:
o
to define /valid style/ of a documents, because can be other,
like "[1-12]", superscript text, etc.
(and an statistic method to detect style of a set of same-style
sample of documents)
o
to detect first the reference-list and the list of /valid ref
labels/: if eg. "0.764" is not a valid label, it will not be
marked as ref link.
*
about more complex: I will only report... Prefer to understand and
solve simplest first ;-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_ot63mR69lHbfnRmPGdCZN8VCalEQaks5rDBC6gaJpZM4K7wWM>.
--
Professor Martin Paul Eve
Chair of Literature, Technology and Publishing
Birkbeck, University of London
T: 0203 073 8420
E: [email protected]
W: https://www.martineve.com
R: 416, 43 Gordon Square, London, WC1H 0PD
Books: https://www.martineve.com/books/
Articles: https://www.martineve.com/c-v/
Series Editor: New Horizons in Contemporary Writing (Bloomsbury)
Director, Birkbeck Centre for Technology and Publishing
Founder, Open Library of the Humanities (https://www.openlibhums.org)
Chief Editor, Orbit (https://www.pynchon.net)
Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)
|
Hi, I was testing meTypeset with the basic UBUNTU installation and it works fine with 99% of the
.docx
samples of test folder.Now I testing "real life"
article.docx
... But not see (some clues?) how to transform into NLM XML:there are examples or documentation describing the template or "template pattern" of typical scientific articles accepted by meTypeset?
there are options to indicate another template? any config file?
The text was updated successfully, but these errors were encountered: