-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to python-mammoth a capability to output Tracked Changes from Word docx into HTML #152
Comments
Having insertions and deletions controlled by the style map probably makes the most sense. I probably won't have time to work on this any time soon, but a minimal example document and corresponding expected HTML would be helpful. |
Hi again, and thanks for attention to my request actually, if you think this feature will be valuable for the library, I can work on it and submit a PR please find attached a document and expected result I know what changes need to be done, on high level:
expected HTML output
|
Thanks for the offer, but I'm afraid I'm not currently accepting pull requests. |
I would like to propose the following feature (needed for one of my work projects):
I need an ability to output into HTML document that has tracked changes on, so that all insertions are going under
<ins>
tag and deletions under<del>
tagFor example:
Word:
This is a house that
JohnJack buildHtml:
<p>This is the house that <del>John</del><ins>Jack</ins> built</p>
It should be an optional feature, which the client can control though additional parameter of the convert_to_html() function or by using a specific style map, like currently python-mammoth can show or hide comments based on style map ).
Implementation details:
In OpenXML format these tags are present in the following format
Current version of mammoth ignores <w:del> tag and for <w:ins> tag it takes all children nodes
I propose to introduce Insertion and Deletion elements in Document model that will handle the data of these nodes
p.s. In fact I have this implemented in my local repo and if such feature looks interesting, I can make a pull request
But I would leave to the author of the library to define how the public interface for this option will look like, would it be really a paremeter in convert_to_html
mammoth.convert_to_html(fileobj=fileobj, ignore_tracked_changes=True)
or would it be some specific style in style_map
using style_map looks preferable as this parameter is passed from https://github.com/microsoft/markitdown into mammoth as well, so it would be great to make a change in mammoth that will not require a change in markitdown
Here are some unit tests that I used to verify my implementation
The text was updated successfully, but these errors were encountered: