Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 918 Bytes

README.md

File metadata and controls

21 lines (15 loc) · 918 Bytes

PDF2MD

Containerized Application to convert pdf to markdown

Commercial usage

Marker - Submodule

Due to the licensing of the underlying models like layoutlmv3 and nougat, this is only suitable for noncommercial usage (citation from [marker repo] (https://github.com/VikParuchuri/marker)).

  • LayoutLMv3: CC BY-NC-SA 4.0 . Source
  • PyMuPDF - GPL . Source Other dependencies/datasets are openly licensed (doclaynet, byt5), or used in a way that is compatible with commercial usage (ghostscript).

Acknowledgments

This work would not have been possible without [email protected]. and amazing open source models and datasets, including (but not limited to):

  • Nougat from Meta
  • Layoutlmv3 from Microsoft
  • DocLayNet from IBM
  • ByT5 from Google

Thank you to the authors of these models and datasets for making them available to the community!