Skip to content

This python code parses documents with Docx extension and converts it into csv data preserving tables and equations in html format

Notifications You must be signed in to change notification settings

Vignesh19y9/Docx-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docx-Parser

This repository is a python implimentation of a document parser. Given a docx file the output will be a csv file holding all the parsed data preserving the html tags and formats.

  • Note - The sole purpose of the script is to extract exam related data and to preserve hierarchial forms of table data, images and equation related special characters only.

Requirments, pypandoc, mammoth, mammoth, BeautifulSoup, re, os, PIL, csv,

About

This python code parses documents with Docx extension and converts it into csv data preserving tables and equations in html format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages