Team : AI Avengers
Problem Statement 3
Video Link : https://drive.google.com/drive/folders/1t8od7b7FEhIDgUu-O-yqm83ofaAfWAw6?usp=sharing
G2 regularly updates its website with new products by creating new categories and refining existing ones. One crucial aspect of this process is ensuring that each product has a precise description and URL before it is added to the site. We are interested in automating the process of updating product descriptions in our database. We will provide you with a few product URLs, and your output will be a brief 3-4 lines description of each product.
The URL Scraper and Summarizer is a tool that allows users to scrape text data from web pages based on the URLs provided and generate concise summaries of the products mentioned on those web pages. The scraped data is processed, cleaned, and sent as input to a Large Language Model (LLM), which generates summaries for each product mentioned in the entered URLs.
- Scrapes text data from web pages based on user-provided URLs.
- Cleans and processes the scraped data by removing whitespace and other noise.
- Utilizes a Large Language Model (LLM) to generate concise summaries for each product mentioned in the entered URLs.
- Supports various natural language processing tasks, including text summarization and information retrieval.
To install the URL Scraper and Summarizer, follow these steps:
- Clone the repository: git clone https://github.com/yourusername/url-scraper-summarizer.git
- Run G2PS3.ipynb on colab with T4 on Google Colab.
- Make sure you create a directory named "G2" under content (folder icon on the left panel)
To use the URL Scraper and Summarizer:
- Enter the URLs of the web pages you want to scrape in the input field. (Gradio interface)
- Click the "Submit" button to initiate the scraping process.
- Once the scraping is complete, the summaries for each product mentioned in the entered URLs will be displayed.
For questions, feedback, or further assistance, contact us at [email protected], [email protected]