This repository is a python implimentation of a document parser. Given a docx file the output will be a csv file holding all the parsed data preserving the html tags and formats.
- Note - The sole purpose of the script is to extract exam related data and to preserve hierarchial forms of table data, images and equation related special characters only.
Requirments, pypandoc, mammoth, mammoth, BeautifulSoup, re, os, PIL, csv,