This project is a sophisticated multi-agent system designed to automate the process of gathering, analyzing, and summarizing news articles from various sources. The system employs three specialized agents working in concert to deliver comprehensive news analysis and reporting.
- Fetches articles from configured news sources and RSS feeds
- Handles both web scraping and RSS feed parsing
- Implements intelligent rate limiting and error handling
- Stores raw article data for processing
- Processes raw articles using natural language processing (NLP)
- Leverages OpenAI's GPT-3.5 Turbo for advanced summarization
- Extracts key topics and themes
- Generates concise article summaries
- Performs keyword analysis and categorization
- Generates structured reports from analyzed content
- Maintains article archives
- Separates new and previously processed articles
- Creates organized summaries for easy consumption
nltk==3.8.1
openai==1.3.0
loguru==0.7.2
python-dotenv==1.0.0
requests==2.31.0
beautifulsoup4==4.12.2
feedparser==6.0.10
- Python 3.8+
- SQLite3
- 2GB+ RAM recommended
- Internet connection for API access
-
Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name
-
Set up virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Environment Configuration: Create a
.env
file in the root directory with the following:OPENAI_API_KEY=your_openai_api_key
The system is highly configurable through the utils/config.py
file:
- Configurable list of news sources and RSS feeds
- Support for both web scraping and RSS parsing
- Custom keyword-based RSS feeds
- Adjustable summary length
- Configurable number of keywords per article
- Customizable update intervals
- Flexible retry mechanisms
- Configurable report directory
- Multiple output formats
- Customizable logging levels
-
Start the system:
python main.py
-
Monitor Progress: The system creates three log files:
data_acquisition.log
: Tracks article fetchingcontent_analysis.log
: Monitors processingreporting.log
: Records report generation
-
Acquisition Phase
- Fetches articles from configured sources
- Validates and deduplicates content
- Stores raw data in SQLite database
-
Analysis Phase
- Processes raw content using NLP
- Generates summaries using OpenAI
- Extracts keywords and themes
- Categorizes content
-
Reporting Phase
- Generates structured reports
- Archives processed articles
- Creates searchable indexes
project/
├── agents/
│ ├── __init__.py
│ ├── data_acquisition_agent.py
│ ├── content_analysis_agent.py
│ └── reporting_agent.py
├── utils/
│ ├── __init__.py
│ └── config.py
├── data/
│ └── articles.db
├── reports/
├── .env
├── main.py
└── requirements.txt
The system implements comprehensive logging using Loguru:
- Rotation-based log files
- Configurable log levels
- Detailed error tracking
- Performance metrics
- Robust retry mechanisms for failed requests
- Graceful degradation for API limits
- Comprehensive error logging
- Data validation at each step
- Fork the repository
- Create a feature branch
- Commit changes
- Push to the branch
- Create a Pull Request
MIT License - See LICENSE file for details