Numbered lists where the start number is not 1 #27

dividor · 2017-03-07T21:49:40Z

Hi there,

We have some legacy documents, where the authors have started a numbered list at "1", then entered a bulleted list, table, then another numbered list item where the number is set to '2'. When parsing with Mammoth, this second numbered list item is set to "1".

I tried not setting freshness ...

p[style-name='Numbered List'] => ol > li

But no luck.

Is there a way to persist the numbering from the word document please?

thanks!

mwilliamson · 2017-03-07T23:03:04Z

Unfortunately not. If there's a sensible way to implement this, then pull requests are welcome.

dividor · 2017-04-22T01:02:11Z

Just to note, I resolved this with some really hairy parsing using Beautiful soup, so wouldn't need a Mammoth fix at this time.

dividor · 2017-04-26T21:07:37Z

In case it's useful, attached is a text document to illustrate.

Continuing_lists.docx

dividor · 2017-07-29T19:10:18Z

Just to note - we're living without this feature just fine. Feel free to close.

zt50tz · 2019-02-19T20:19:31Z

I have the same need.

I tried to add num_id param to class _NumberingLevel and set it in read_numbering_xml_element. Here is the test code:

def read_numbering_xml_element(element):
    abstract_nums = _read_abstract_nums(element)
    nums = _read_nums(element, abstract_nums)
    not_abstract_num_ids = set(nums) - set(abstract_nums)
    for not_abstract_num_id in not_abstract_num_ids:
        for level in nums[not_abstract_num_id]:
            nums[not_abstract_num_id][level].num_id = not_abstract_num_id
    return Numbering(nums)

So, in transform_document i can get num_id in numbering param of element. And can try to do something with it.

def transform_items(element):
    if isinstance(element, documents.Paragraph):
        if element.numbering and element.numbering.num_id:
            print element

Paragraph(...,
  style_id=u'aa',
  style_name=u'List Paragraph',
  numbering=_NumberingLevel(level_index='0', is_ordered=True, num_id=u'41'),
  alignment=None,..
)

But if this line executes:

nums[not_abstract_num_id][level].num_id = not_abstract_num_id

HTML writer puts p tag instead of ol.

I think, this addition corrupt default style map. What can i do with it? Or it is totally wrong way and I need to take a look on some other things?

Thanks.

TychonautVII · 2020-05-29T15:44:43Z

Id be interested in this feature too! Getting the number of lists (and what I need in my case is footnotes and references) from word, seems to follow what I understand to be mammoths philosophy of getting the content from word (but not necessarily the style), the specific number in a footnote or list seem to be a content thing!

I tried to do this with the transform api but I don't think I can.

Is there any lower level way to access the content of the word document in mammoth? I'd like to figure out what number they started at

mwilliamson added the enhancement label Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numbered lists where the start number is not 1 #27

Numbered lists where the start number is not 1 #27

dividor commented Mar 7, 2017 •

edited

Loading

mwilliamson commented Mar 7, 2017

dividor commented Apr 22, 2017

dividor commented Apr 26, 2017

dividor commented Jul 29, 2017

zt50tz commented Feb 19, 2019

TychonautVII commented May 29, 2020

Numbered lists where the start number is not 1 #27

Numbered lists where the start number is not 1 #27

Comments

dividor commented Mar 7, 2017 • edited Loading

mwilliamson commented Mar 7, 2017

dividor commented Apr 22, 2017

dividor commented Apr 26, 2017

dividor commented Jul 29, 2017

zt50tz commented Feb 19, 2019

TychonautVII commented May 29, 2020

dividor commented Mar 7, 2017 •

edited

Loading