Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipe and template to usual manuscript real tests #102

Open
ppKrauss opened this issue Nov 24, 2016 · 6 comments
Open

recipe and template to usual manuscript real tests #102

ppKrauss opened this issue Nov 24, 2016 · 6 comments

Comments

@ppKrauss
Copy link

Hi, I was testing meTypeset with the basic UBUNTU installation and it works fine with 99% of the .docx samples of test folder.

Now I testing "real life" article.docx ... But not see (some clues?) how to transform into NLM XML:

  • there are examples or documentation describing the template or "template pattern" of typical scientific articles accepted by meTypeset?

  • there are options to indicate another template? any config file?

@MartinPaulEve
Copy link
Owner

MartinPaulEve commented Nov 25, 2016 via email

@ppKrauss
Copy link
Author

Hi, thanks the reply (!). Well, let's try a test-case: an aleatory PLOS article pbio.2000638. Its copy was adapted here to GoogleDocs (we can edit), and have a docx option for download.

@ppKrauss
Copy link
Author

See our UBUNTU installation sandbox, and its results, like this sample,
http://jats.science/testApproach-v02/nlm/out.xml
(it comes from this source article converted to equivalent docx).

As this meTypeset description,

... In effect, this recurses through the document a few times, identifies where Word's internal headings have already been applied, and corroborates those with the use of Bold, Italics, Underlining, and font face and size changes to work out the nesting of various headings and sub-headings in the document. At the end of this stage, the expectation is that we'd have the Introduction, Methods, Results, and (key to this discussion) References section as top-level headings (or example, that is; none of these are semantically hardcoded) and any subsections beneath them.

so, I will not check metadata (title, authors, affiliation, abstract) but can check <body> structure... Some curious results:

  • a text fragment "value of guessing is therefore 0.764 * 60 + (1–0.764) * (-50) = 34" was marked as
    ... <xref rid="ID5196d4ec-e9fe-49bf-81a6-dd23784755d4" ref-type="bibr">1–0.764</xref>) * ...: there are no reference-list item with the "1–0.764" label.

  • the paragraph "By contrast, sampling further information has a fixed probability (0.1) of transitioning into one of 10 possible subsequent states..." was converted to <label>By contrast, sampling further information has a fixed probability ( 0</label> <caption> <p>1) of transitioning into one of...

  • ...

@MartinPaulEve
Copy link
Owner

MartinPaulEve commented Nov 29, 2016 via email

@ppKrauss
Copy link
Author

Hi @MartinPaulEve,

  • about "create a small test document": perhaps a set of small-test-cases ... so I can create a new github repo for it, and perhaps later we can select some to add in your /tests

  • about small-test-case "test paragraph (1-12)... (1-0.764)", there are two complementar approaches for a solution:

    • to define valid style of a documents, because can be other, like "[1-12]", superscript text, etc.
      (and an statistic method to detect style of a set of same-style sample of documents)

    • to detect first the reference-list and the list of valid ref labels: if eg. "0.764" is not a valid label, it will not be marked as ref link.

  • about more complex: I will only report... Prefer to understand and solve simplest first ;-)

@MartinPaulEve
Copy link
Owner

MartinPaulEve commented Nov 29, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants