Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbered lists where the start number is not 1 #27

Open
dividor opened this issue Mar 7, 2017 · 6 comments
Open

Numbered lists where the start number is not 1 #27

dividor opened this issue Mar 7, 2017 · 6 comments

Comments

@dividor
Copy link

dividor commented Mar 7, 2017

Hi there,

We have some legacy documents, where the authors have started a numbered list at "1", then entered a bulleted list, table, then another numbered list item where the number is set to '2'. When parsing with Mammoth, this second numbered list item is set to "1".

I tried not setting freshness ...

p[style-name='Numbered List'] => ol > li

But no luck.

Is there a way to persist the numbering from the word document please?

thanks!

@mwilliamson
Copy link
Owner

Unfortunately not. If there's a sensible way to implement this, then pull requests are welcome.

@dividor
Copy link
Author

dividor commented Apr 22, 2017

Just to note, I resolved this with some really hairy parsing using Beautiful soup, so wouldn't need a Mammoth fix at this time.

@dividor
Copy link
Author

dividor commented Apr 26, 2017

In case it's useful, attached is a text document to illustrate.

Continuing_lists.docx

@dividor
Copy link
Author

dividor commented Jul 29, 2017

Just to note - we're living without this feature just fine. Feel free to close.

@zt50tz
Copy link

zt50tz commented Feb 19, 2019

I have the same need.

I tried to add num_id param to class _NumberingLevel and set it in read_numbering_xml_element. Here is the test code:

def read_numbering_xml_element(element):
    abstract_nums = _read_abstract_nums(element)
    nums = _read_nums(element, abstract_nums)
    not_abstract_num_ids = set(nums) - set(abstract_nums)
    for not_abstract_num_id in not_abstract_num_ids:
        for level in nums[not_abstract_num_id]:
            nums[not_abstract_num_id][level].num_id = not_abstract_num_id
    return Numbering(nums)

So, in transform_document i can get num_id in numbering param of element. And can try to do something with it.

def transform_items(element):
    if isinstance(element, documents.Paragraph):
        if element.numbering and element.numbering.num_id:
            print element
Paragraph(...,
  style_id=u'aa',
  style_name=u'List Paragraph',
  numbering=_NumberingLevel(level_index='0', is_ordered=True, num_id=u'41'),
  alignment=None,..
)

But if this line executes:

nums[not_abstract_num_id][level].num_id = not_abstract_num_id

HTML writer puts p tag instead of ol.

I think, this addition corrupt default style map. What can i do with it? Or it is totally wrong way and I need to take a look on some other things?

Thanks.

@TychonautVII
Copy link

Id be interested in this feature too! Getting the number of lists (and what I need in my case is footnotes and references) from word, seems to follow what I understand to be mammoths philosophy of getting the content from word (but not necessarily the style), the specific number in a footnote or list seem to be a content thing!

I tried to do this with the transform api but I don't think I can.

Is there any lower level way to access the content of the word document in mammoth? I'd like to figure out what number they started at

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants