-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: improve documentation and configuration - Update circuit breaker…
… defaults, align metric names, improve docs clarity
- Loading branch information
1 parent
fbc68c9
commit aedb65d
Showing
3 changed files
with
99 additions
and
92 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,109 +1,130 @@ | ||
# Hapax | ||
# Hapax: AI Infrastructure | ||
|
||
## Large Language Model Infrastructure, Simplified | ||
Hapax is a production-ready AI infrastructure layer that ensures uninterrupted AI operations through intelligent provider management and automatic failover. Named after the Greek word ἅπαξ (meaning "once"), it embodies our core promise: configure once, then let it seamlessly manage your AI infrastructure. | ||
|
||
Building with Large Language Models is complex. Multiple providers, varying APIs, inconsistent performance, unpredictable costs—these challenges consume more engineering time than the actual innovation. | ||
## Common AI Infrastructure Challenges | ||
|
||
Hapax offers a different approach. | ||
Organizations face several critical challenges in managing their AI infrastructure. Service disruptions from AI provider outages create direct revenue impacts, while engineering teams dedicate significant resources to managing multiple AI providers. Teams struggle with limited visibility into AI usage across departments, compounded by complex integration requirements spanning different AI providers. | ||
|
||
What if managing LLM infrastructure was as simple as editing a configuration file? What if switching providers, adding endpoints, or implementing fallback strategies could be done with minimal effort? | ||
## Core Capabilities | ||
|
||
Imagine a system that: | ||
- Connects to multiple LLM providers seamlessly | ||
- Provides automatic failover between providers | ||
- Offers comprehensive monitoring and metrics | ||
- Allows instant configuration updates without downtime | ||
Hapax delivers a robust infrastructure layer through three core capabilities: | ||
|
||
This is Hapax. | ||
### Intelligent Provider Management | ||
The system ensures continuous service through real-time health monitoring with configurable timeouts and check intervals. Automatic failover between providers maintains zero downtime, while a sophisticated three-state circuit breaker (closed, half-open, open) with configurable thresholds prevents cascade failures. Request deduplication using the singleflight pattern optimizes resource utilization. | ||
|
||
### Real-World Flexibility in Action | ||
### Production-Ready Architecture | ||
The architecture prioritizes reliability through high-performance request routing and load balancing. Comprehensive error handling and request validation ensure data integrity, while structured logging with request tracing enables detailed debugging. Configurable timeout and rate limiting mechanisms protect system resources. | ||
|
||
Imagine you're running a production service using OpenAI's GPT model. Suddenly, you want to: | ||
- Add a new Anthropic Claude model endpoint | ||
- Create a fallback strategy | ||
- Implement detailed monitoring | ||
### Security & Monitoring | ||
Security is foundational, implemented through API key-based authentication and comprehensive request validation and sanitization. The monitoring system provides granular usage tracking per endpoint and detailed request logging for operational visibility. | ||
|
||
With Hapax, this becomes simple: | ||
## Usage Tracking & Monitoring | ||
|
||
```yaml | ||
# Simply append to your existing configuration | ||
providers: | ||
anthropic: | ||
type: anthropic | ||
models: | ||
claude-3.5-haiku: | ||
api_key: ${ANTHROPIC_API_KEY} | ||
endpoint: /v1/anthropic/haiku | ||
``` | ||
No downtime. No complex redeployment. Just configuration. | ||
## Intelligent Provider Management | ||
Hapax goes beyond simple API routing. It creates a resilient ecosystem for your LLM interactions: | ||
Hapax provides built-in monitoring capabilities through Prometheus integration, offering comprehensive visibility into your AI infrastructure: | ||
|
||
**Automatic Failover**: When one provider experiences issues, Hapax seamlessly switches to backup providers. Your service continues operating without interruption. | ||
**Deduplication**: Prevent duplicate requests and unnecessary API calls. Hapax intelligently manages request caching and prevents redundant processing. | ||
### Request Tracking | ||
Monitor API usage through versioned endpoints: | ||
```bash | ||
# Standard endpoint structure | ||
/v1/completions | ||
/health # Global system health status | ||
/v1/health # Versioned API health status | ||
/metrics | ||
``` | ||
|
||
**Provider Health Monitoring**: Continuously track provider performance. Automatically reconnect to primary providers once they're back online, ensuring optimal resource utilization. | ||
### Prometheus Integration | ||
The monitoring system tracks essential metrics including request counts and status by endpoint, request latencies, active request volume, error rates by provider, and circuit breaker states. Health check performance metrics and request deduplication statistics provide deep insights into system efficiency. | ||
|
||
Each metric is designed for operational visibility: | ||
- `hapax_http_requests_total` tracks request volume by endpoint and status | ||
- `hapax_http_request_duration_seconds` measures request latency | ||
- `hapax_http_active_requests` shows current load by endpoint | ||
- `hapax_errors_total` monitors error rates by type | ||
- `circuit_breaker_state` indicates provider health status | ||
- `hapax_health_check_duration_seconds` validates provider responsiveness | ||
- `hapax_deduplicated_requests_total` confirms request efficiency | ||
- `hapax_rate_limit_hits_total` tracks rate limiting by client | ||
|
||
### Access Management | ||
Security is enforced through API key-based authentication, with per-endpoint rate limiting and comprehensive request validation and sanitization. | ||
|
||
## Technical Implementation | ||
|
||
```json | ||
// Example: Completion Request | ||
{ | ||
"messages": [ | ||
{"role": "system", "content": "You are a customer service assistant."}, | ||
{"role": "user", "content": "I need help with my order #12345"} | ||
] | ||
} | ||
``` | ||
|
||
## Comprehensive Observability | ||
When your primary provider experiences issues, Hapax: | ||
1. Detects the failure through continuous health checks (1-minute intervals) | ||
2. Activates the circuit breaker after 3 consecutive failures | ||
3. Routes traffic to healthy backup providers in preference order | ||
4. Maintains detailed metrics for operational visibility | ||
|
||
Hapax isn't just a gateway—it's a complete monitoring and alerting system for your LLM infrastructure: | ||
- Detailed Prometheus metrics | ||
- Real-time performance tracking | ||
- Comprehensive error reporting | ||
- Intelligent alerting mechanisms | ||
## Deployment Options | ||
|
||
## API Versioning for Scalability | ||
Deploy Hapax in minutes with our production-ready container: | ||
|
||
Create multiple API versions effortlessly. Each endpoint can have its own configuration, allowing granular control and smooth evolutionary paths for your services. | ||
```bash | ||
docker run -p 8080:8080 \ | ||
-e OPENAI_API_KEY=your_key \ | ||
-e ANTHROPIC_API_KEY=your_key \ | ||
-e CONFIG_PATH=/app/config.yaml \ | ||
teilomillet/hapax:latest | ||
``` | ||
|
||
Default configuration is provided but can be customized via `config.yaml`: | ||
```yaml | ||
routes: | ||
- path: /v1/completions | ||
handler: completion | ||
version: v1 | ||
- path: /v2/completions | ||
handler: advanced_completion | ||
version: v2 | ||
circuitBreaker: | ||
maxRequests: 100 | ||
interval: 30s | ||
timeout: 10s | ||
failureThreshold: 5 | ||
|
||
providerPreference: | ||
- ollama | ||
- anthropic | ||
- openai | ||
``` | ||
## Getting Started | ||
## Integration Architecture | ||
```bash | ||
# Pull and start Hapax with default configuration | ||
docker run -p 8080:8080 -e ANTHROPIC_API_KEY=your_api_key teilomillet/hapax:latest | ||
Hapax provides comprehensive integration capabilities through multiple components: | ||
# Or, to use a custom configuration with environment variables: | ||
# 1. Extract the default configuration | ||
docker run --rm teilomillet/hapax:latest cat /app/config.yaml > config.yaml | ||
### REST API with Versioned Endpoints | ||
The API architecture provides dedicated endpoints for core functionalities: | ||
- `/v1/completions` handles AI completions, | ||
- `/v1/health` provides versioned API health monitoring, | ||
- `/health` offers global system health status. | ||
- `/metrics` exposes Prometheus metrics for comprehensive monitoring. | ||
|
||
# 2. Create a .env file to store your environment variables | ||
echo "ANTHROPIC_API_KEY=your_api_key" > .env | ||
### Comprehensive Monitoring | ||
The monitoring infrastructure integrates Prometheus metrics across all critical components, enabling detailed tracking of request latencies, circuit breaker states, provider health status, and request deduplication. This comprehensive approach ensures complete operational visibility. | ||
|
||
# 3. Start Hapax with your configuration and environment variables | ||
docker run -p 8080:8080 \ | ||
-v $(pwd)/config.yaml:/app/config.yaml \ | ||
--env-file .env \ | ||
teilomillet/hapax:latest | ||
``` | ||
### Health Checks | ||
The health monitoring system operates with enterprise-grade configurability. Check intervals default to one minute with adjustable timeouts, while failure thresholds are tuned to prevent false positives. Health monitoring extends from individual providers to Docker container status, with granular per-provider health tracking. | ||
|
||
### Production Safeguards | ||
System integrity is maintained through multiple safeguards: request deduplication prevents redundant processing, automatic failover ensures continuous operation, circuit breaker patterns protect against cascade failures, and structured JSON logging with correlation IDs enables thorough debugging. | ||
|
||
## What's Next | ||
## Technical Requirements | ||
|
||
Hapax is continuously evolving. | ||
Running Hapax requires a Docker-compatible environment with network access to AI providers. The system operates efficiently with 1GB RAM, though 4GB is recommended for production deployments. Access credentials (API keys) are required for supported providers: OpenAI, Anthropic, etc./. | ||
|
||
## Open Source | ||
## Documentation | ||
|
||
Licensed under Apache 2.0, Hapax is open for collaboration and customization. | ||
Comprehensive documentation is available through multiple resources. The [Quick Start Guide](https://github.com/teilomillet/hapax/wiki) provides initial setup instructions, while detailed information about the API and security measures can be found in the [API Documentation](docs/api.md) and [Security Overview](docs/security.md). For operational insights, consult the [Monitoring Guide](docs/monitoring.md). | ||
|
||
## Community & Support | ||
## License | ||
|
||
- **Discussions**: [GitHub Discussions](https://github.com/teilomillet/hapax/discussions) | ||
- **Documentation**: [Hapax Wiki](https://github.com/teilomillet/hapax/wiki) | ||
- **Issues**: [GitHub Issues](https://github.com/teilomillet/hapax/issues) | ||
Licensed under Apache 2.0. See [LICENSE](LICENSE) for details. | ||
|
||
## Our Vision | ||
--- | ||
|
||
We believe LLM infrastructure should be simple, reliable, and adaptable. Hapax represents our commitment to making LLM integration accessible and powerful. | ||
For detailed technical specifications, visit our [Technical Documentation](docs/technical.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters