Skip to content

This Python script converts PDF documents into well-formatted Markdown files using Large Language Models (LLMs) like Google Gemini or OpenAI's GPT models via the LangChain framework. It extracts text from PDFs, splits it into manageable chunks, and then leverages the power of LLMs to generate structured, readable Markdown.

Notifications You must be signed in to change notification settings

HawkClaws/Document-Intelligence-with-LLM

Repository files navigation

Document-Intelligence-with-LLM

日本語のREADMEはこちら

このライブラリは、様々な形式のテキストデータ(PDF、テキストファイル、文字列など)を入力とし、それを構造化されたMarkdown形式のテキストに変換するためのツールです。具体的には、入力テキストから目次(TOC)を自動生成し、その目次に基づいて本文を抽出し、整形します。

English README is here

This library is a tool designed to transform text data from various formats (PDF, text files, strings, etc.) into structured Markdown text. Specifically, it automatically generates a Table of Contents (TOC) from the input text and then extracts and formats the main content based on that TOC.

About

This Python script converts PDF documents into well-formatted Markdown files using Large Language Models (LLMs) like Google Gemini or OpenAI's GPT models via the LangChain framework. It extracts text from PDFs, splits it into manageable chunks, and then leverages the power of LLMs to generate structured, readable Markdown.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages