Jina.ai has introduced Reader-LM, specialized small language models inspired by "Jina Reader" designed for converting raw, noisy HTML from the open web into clean markdown. The resulting reader-lm models outperform larger LLMs in this specific task, offering a cost-effective and multilingual solution. This project demonstrates the use of the Reader-LM model for converting HTML content to Markdown content served using LitServe, an easy-to-use, flexible serving engine for AI models built on FastAPI.
The project is structured as follows:
server.py
: The file containing the main code for the web server.client.py
: The file containing the code for client-side requests.LICENSE
: The license file for the project.README.md
: The README file that contains information about the project.assets
: The folder containing screenshots for working on the application..gitignore
: The file containing the list of files and directories to be ignored by Git.
- Python (for the programming language)
- PyTorch (for the deep learning framework)
- Hugging Face Transformers Library (for the model)
- LitServe (for the serving engine)
To get started with this project, follow the steps below:
- Run the server:
python server.py
- Upon running the server successfully, you will see uvicorn running on port 8000.
- Open a new terminal window.
- Run the client:
python client.py
Now, you can see the model output based on the HTML content. The model will convert the HTML content to Markdown content.
The project can be used to serve the Reader-LM model using LitServe. Here, the model is used to convert HTML content to Markdown content. This suggests potential applications in web scraping, content repurposing, and accessibility improvements.
Contributions are welcome! If you would like to contribute to this project, please raise an issue to discuss the changes you want to make. Once the changes are approved, you can create a pull request.
This project is licensed under the Apache-2.0 License.
If you have any questions or suggestions about the project, please contact me on my GitHub profile.
Happy coding! 🚀