Docx-Parser

This repository is a python implimentation of a document parser. Given a docx file the output will be a csv file holding all the parsed data preserving the html tags and formats.

Note - The sole purpose of the script is to extract exam related data and to preserve hierarchial forms of table data, images and equation related special characters only.

Requirments, pypandoc, mammoth, mammoth, BeautifulSoup, re, os, PIL, csv,

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
input		input
output		output
scrn_shot		scrn_shot
Docx_parser.py		Docx_parser.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docx-Parser

About

Releases

Packages

Languages

Vignesh19y9/Docx-Parser

Folders and files

Latest commit

History

Repository files navigation

Docx-Parser

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages