-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: resolve some of the errors spot with the lint.
- Loading branch information
1 parent
de10a5c
commit 45377ad
Showing
11 changed files
with
208 additions
and
292 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,290 +1,106 @@ | ||
# Hapax | ||
|
||
A lightweight HTTP server for Large Language Model (LLM) interactions, built with Go. | ||
|
||
## Version | ||
v0.0.16 | ||
|
||
## Features | ||
|
||
- HTTP server with completion endpoint (`/v1/completions`) | ||
- Health check endpoint (`/health`) | ||
- Configurable server settings (port, timeouts, etc.) | ||
- Clean shutdown handling | ||
- Comprehensive test suite with mock LLM implementation | ||
- Token validation with tiktoken | ||
- Automatic token counting | ||
- Context length validation | ||
- Max tokens validation | ||
- Middleware architecture: | ||
- Request ID tracking | ||
- Request timing metrics | ||
- Panic recovery | ||
- CORS support | ||
- API key authentication | ||
- Rate limiting (token bucket) | ||
- Prometheus metrics collection | ||
- Enhanced error handling: | ||
- Structured JSON error responses | ||
- Request ID tracking in errors | ||
- Zap-based logging with context | ||
- Custom error types for different scenarios | ||
- Seamless error middleware integration | ||
- Dynamic routing: | ||
- Version-based routing (v1, v2) | ||
- Route-specific middleware | ||
- Health check endpoints | ||
- Header validation | ||
- Provider management: | ||
- Multiple provider support (OpenAI, Anthropic, etc.) | ||
- Provider health monitoring | ||
- Automatic failover to backup providers | ||
- Configurable health check intervals | ||
- Provider-specific configuration | ||
|
||
## Installation | ||
## Large Language Model Infrastructure, Simplified | ||
|
||
```bash | ||
go get github.com/teilomillet/hapax | ||
``` | ||
Building with Large Language Models is complex. Multiple providers, varying APIs, inconsistent performance, unpredictable costs—these challenges consume more engineering time than the actual innovation. | ||
|
||
## Configuration | ||
Hapax offers a different approach. | ||
|
||
Hapax uses YAML for configuration. Here's an example configuration file: | ||
What if managing LLM infrastructure was as simple as editing a configuration file? What if switching providers, adding endpoints, or implementing fallback strategies could be done with minimal effort? | ||
|
||
```yaml | ||
server: | ||
port: 8080 | ||
read_timeout: 30s | ||
write_timeout: 30s | ||
max_header_bytes: 1048576 # 1MB | ||
shutdown_timeout: 30s | ||
Imagine a system that: | ||
- Connects to multiple LLM providers seamlessly | ||
- Provides automatic failover between providers | ||
- Offers comprehensive monitoring and metrics | ||
- Allows instant configuration updates without downtime | ||
|
||
routes: | ||
- path: "/completions" | ||
handler: "completion" | ||
version: "v1" | ||
methods: ["POST"] | ||
middleware: ["auth", "ratelimit"] | ||
headers: | ||
Content-Type: "application/json" | ||
health_check: | ||
enabled: true | ||
interval: 30s | ||
timeout: 5s | ||
threshold: 3 | ||
checks: | ||
api: "http" | ||
|
||
- path: "/health" | ||
handler: "health" | ||
version: "v1" | ||
methods: ["GET"] | ||
health_check: | ||
enabled: true | ||
interval: 15s | ||
timeout: 2s | ||
threshold: 2 | ||
checks: | ||
system: "tcp" | ||
|
||
llm: | ||
provider: ollama | ||
model: llama2 | ||
endpoint: http://localhost:11434 | ||
system_prompt: "You are a helpful assistant." | ||
max_context_tokens: 4096 # Maximum context length for your model | ||
options: | ||
temperature: 0.7 | ||
max_tokens: 2000 | ||
|
||
logging: | ||
level: info # debug, info, warn, error | ||
format: json # json, text | ||
``` | ||
### Configuration Options | ||
#### Server Configuration | ||
- `port`: HTTP server port (default: 8080) | ||
- `read_timeout`: Maximum duration for reading request body (default: 30s) | ||
- `write_timeout`: Maximum duration for writing response (default: 30s) | ||
- `max_header_bytes`: Maximum size of request headers (default: 1MB) | ||
- `shutdown_timeout`: Maximum duration to wait for graceful shutdown (default: 30s) | ||
|
||
#### LLM Configuration | ||
- `provider`: LLM provider name (e.g., "ollama", "openai") | ||
- `model`: Model name (e.g., "llama2", "gpt-4") | ||
- `endpoint`: API endpoint URL | ||
- `system_prompt`: Default system prompt for conversations | ||
- `max_context_tokens`: Maximum context length in tokens (model-dependent) | ||
- `options`: Provider-specific options | ||
- `temperature`: Sampling temperature (0.0 to 1.0) | ||
- `max_tokens`: Maximum tokens to generate | ||
|
||
#### Logging Configuration | ||
- `level`: Log level (debug, info, warn, error) | ||
- `format`: Log format (json, text) | ||
|
||
## Quick Start | ||
|
||
```go | ||
package main | ||
import ( | ||
"context" | ||
"log" | ||
"github.com/teilomillet/hapax" | ||
"github.com/teilomillet/gollm" | ||
"go.uber.org/zap" | ||
) | ||
func main() { | ||
// Initialize logger (optional, defaults to production config) | ||
logger, _ := zap.NewProduction() | ||
defer logger.Sync() | ||
hapax.SetLogger(logger) | ||
// Create an LLM instance (using gollm) | ||
llm := gollm.New() | ||
// Create a completion handler | ||
handler := hapax.NewCompletionHandler(llm) | ||
// Create a router | ||
router := hapax.NewRouter(handler) | ||
// Use default configuration | ||
config := hapax.DefaultConfig() | ||
// Create and start server | ||
server := hapax.NewServer(config, router) | ||
if err := server.Start(context.Background()); err != nil { | ||
log.Fatal(err) | ||
} | ||
} | ||
``` | ||
This is Hapax. | ||
|
||
## API Endpoints | ||
### Real-World Flexibility in Action | ||
|
||
### POST /v1/completions | ||
Imagine you're running a production service using OpenAI's GPT model. Suddenly, you want to: | ||
- Add a new Anthropic Claude model endpoint | ||
- Create a fallback strategy | ||
- Implement detailed monitoring | ||
|
||
Generate completions using the configured LLM. | ||
With Hapax, this becomes simple: | ||
|
||
**Request:** | ||
```json | ||
{ | ||
"prompt": "Your prompt here" | ||
} | ||
``` | ||
|
||
**Response:** | ||
```json | ||
{ | ||
"completion": "LLM generated response" | ||
} | ||
```yaml | ||
# Simply append to your existing configuration | ||
providers: | ||
anthropic: | ||
type: anthropic | ||
models: | ||
claude-3.5-haiku: | ||
api_key: ${ANTHROPIC_API_KEY} | ||
endpoint: /v1/anthropic/haiku | ||
``` | ||
**Error Responses:** | ||
- 400 Bad Request: Invalid JSON or missing prompt | ||
- 405 Method Not Allowed: Wrong HTTP method | ||
- 500 Internal Server Error: LLM error | ||
|
||
### GET /health | ||
|
||
Check server health status. | ||
|
||
**Response:** | ||
```json | ||
{ | ||
"status": "ok" | ||
} | ||
``` | ||
No downtime. No complex redeployment. Just configuration. | ||
## Error Handling | ||
## Intelligent Provider Management | ||
Hapax provides structured error handling with JSON responses: | ||
Hapax goes beyond simple API routing. It creates a resilient ecosystem for your LLM interactions: | ||
```json | ||
{ | ||
"type": "validation_error", | ||
"message": "Invalid request format", | ||
"request_id": "req_123abc", | ||
"details": { | ||
"field": "prompt", | ||
"error": "required" | ||
} | ||
} | ||
``` | ||
**Automatic Failover**: When one provider experiences issues, Hapax seamlessly switches to backup providers. Your service continues operating without interruption. | ||
Error types include: | ||
- `validation_error`: Request validation failures | ||
- `provider_error`: LLM provider issues | ||
- `rate_limit_error`: Rate limiting | ||
- `internal_error`: Unexpected server errors | ||
**Deduplication**: Prevent duplicate requests and unnecessary API calls. Hapax intelligently manages request caching and prevents redundant processing. | ||
## Docker Support | ||
**Provider Health Monitoring**: Continuously track provider performance. Automatically reconnect to primary providers once they're back online, ensuring optimal resource utilization. | ||
The application comes with full Docker support, making it easy to deploy and run in containerized environments. | ||
## Comprehensive Observability | ||
### Features | ||
Hapax isn't just a gateway—it's a complete monitoring and alerting system for your LLM infrastructure: | ||
- Detailed Prometheus metrics | ||
- Real-time performance tracking | ||
- Comprehensive error reporting | ||
- Intelligent alerting mechanisms | ||
- **Multi-stage Build**: Optimized container size with separate build and runtime stages | ||
- **Security**: Runs as non-root user with minimal runtime dependencies | ||
- **Health Checks**: Built-in health monitoring for container orchestration | ||
- **Prometheus Integration**: Ready-to-use metrics endpoint for monitoring | ||
- **Docker Compose**: Complete setup with Prometheus integration | ||
## API Versioning for Scalability | ||
### Running with Docker | ||
Create multiple API versions effortlessly. Each endpoint can have its own configuration, allowing granular control and smooth evolutionary paths for your services. | ||
1. Build and run using Docker: | ||
```bash | ||
docker build -t hapax . | ||
docker run -p 8080:8080 hapax | ||
```yaml | ||
routes: | ||
- path: /v1/completions | ||
handler: completion | ||
version: v1 | ||
- path: /v2/completions | ||
handler: advanced_completion | ||
version: v2 | ||
``` | ||
2. Or use Docker Compose for the full stack with Prometheus: | ||
## Getting Started | ||
```bash | ||
docker compose up -d | ||
# Pull Hapax | ||
docker pull ghcr.io/teilomillet/hapax:latest | ||
|
||
# Generate default configuration | ||
docker run --rm -v $(pwd):/output \ | ||
ghcr.io/teilomillet/hapax:latest \ | ||
cp /app/config.example.yaml /output/config.yaml | ||
|
||
# Launch Hapax | ||
docker run -p 8080:8080 \ | ||
-v $(pwd)/config.yaml:/app/config.yaml \ | ||
ghcr.io/teilomillet/hapax:latest | ||
``` | ||
|
||
### Container Health | ||
|
||
The container includes health checks that monitor: | ||
- HTTP server availability | ||
- Application readiness | ||
- Basic functionality | ||
## What's Next | ||
|
||
Access the health status: | ||
- Health endpoint: http://localhost:8080/health | ||
- Metrics endpoint: http://localhost:8080/metrics | ||
- Prometheus: http://localhost:9090 | ||
Hapax is continuously evolving. | ||
|
||
## Testing | ||
## Open Source | ||
|
||
The project includes a comprehensive test suite with a mock LLM implementation that can be used for testing LLM-dependent code: | ||
|
||
```go | ||
import "github.com/teilomillet/hapax/mock_test" | ||
// Create a mock LLM with custom response | ||
llm := &MockLLM{ | ||
GenerateFunc: func(ctx context.Context, p *gollm.Prompt) (string, error) { | ||
return "Custom response", nil | ||
}, | ||
} | ||
``` | ||
|
||
Run the tests: | ||
```bash | ||
go test ./... | ||
``` | ||
Licensed under Apache 2.0, Hapax is open for collaboration and customization. | ||
|
||
## License | ||
## Community & Support | ||
|
||
APACHE License 2.0 | ||
- **Discussions**: [GitHub Discussions](https://github.com/teilomillet/hapax/discussions) | ||
- **Documentation**: [Hapax Wiki](https://github.com/teilomillet/hapax/wiki) | ||
- **Issues**: [GitHub Issues](https://github.com/teilomillet/hapax/issues) | ||
|
||
## Contributing | ||
## Our Vision | ||
|
||
Contributions are welcome! Please feel free to submit a Pull Request. | ||
We believe LLM infrastructure should be simple, reliable, and adaptable. Hapax represents our commitment to making LLM integration accessible and powerful. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.