Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Updating the same project with complete opensource alternatives #2

Open
bb1nfosec opened this issue Jan 21, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@bb1nfosec
Copy link

Target Component

AI Agents (Researcher/Developer/Executor)

Enhancement Description

This enhancement rewrites PentAGI to eliminate reliance on public APIs (e.g., OpenAI, Google, Tavily). The new architecture is entirely open-source, self-hosted, and locally modifiable. This enables greater flexibility, control, and cost efficiency while ensuring data privacy.

Key Changes
Local Language Model Integration:

Replace public LLM APIs with self-hosted models (e.g., LLaMA, GPT-J).
Host the model locally using frameworks like LocalAI or FastAPI.

Local Search Engine:
Replace external search APIs (e.g., Google/Tavily) with local indexing tools such as Whoosh or OpenSearch.

Locally Installed Tools:
Use open-source pentesting tools installed on the system or Dockerized (e.g., nmap, sqlmap, Metasploit).

Self-Contained Deployment:
Use Docker Compose to manage all components, ensuring easy deployment and scalability.

Custom Reporting:
Use Jinja2 and WeasyPrint for generating detailed HTML and PDF reports.

Technical Details

Implementation Details

  1. LocalAI Integration

Framework: Host models like GPT-J or LLaMA locally with LocalAI.

API Endpoint: Serve via http://localhost:8000/generate.

import requests

def generate_text(prompt):
    response = requests.post(
        "http://localhost:8000/generate",
        json={"prompt": prompt, "max_tokens": 100}
    )
    return response.json()["response"
```]

2. Local Search Engine

Tool: Whoosh for lightweight indexing and querying.

3. Open-Source Pentesting Tools

Tools: Nmap, SQLmap, Metasploit, OpenVAS.

4. Reporting Engine

HTML Reporting: Use Jinja2 templates.

### Designs and Mockups

Dashboard UI

Main Page:

Input fields for scanning targets and queries.

Buttons to run pentests, search, and generate reports.

Results Page:

Tabs for vulnerability results, search insights, and generated reports.

### Alternative Solutions

LLM Alternatives:

Use smaller models like GPT-2 or Alpaca for lower hardware requirements.

Explore lightweight frameworks like HuggingFace Transformers with quantized models.

Search Alternatives:

For advanced scalability, replace Whoosh with OpenSearch.

Deployment Alternatives:

Use Kubernetes instead of Docker Compose for larger-scale deployments.

### Verification

- [x] I have checked that this enhancement hasn't been already proposed
- [x] This enhancement aligns with PentAGI's goal of autonomous penetration testing
- [x] I have considered the security implications of this enhancement
- [x] I have provided clear use cases and benefits
@bb1nfosec bb1nfosec added the enhancement New feature or request label Jan 21, 2025
@bb1nfosec bb1nfosec changed the title [Enhancement]: Updating the same project with complete opensource alternaties [Enhancement]: Updating the same project with complete opensource alternatives Jan 21, 2025
@mazamizo21
Copy link

If we can also include Deepseek, as I have spent a significant amount of money and have never completed the task due to persistent issues with installing tools that repeatedly fail.
I believe it would be a valuable addition to have a comprehensive Docker container with the most widely available tools. This would allow the AI to focus on penetration testing rather than troubleshooting the installation of tools.

@asdek
Copy link
Contributor

asdek commented Jan 22, 2025

Hello @bb1nfosec

Thank you for your enhancement request. Let me address each point:

Local Language Model Integration

Currently, you can configure the following environment variables:

  • LLM_SERVER_URL=
  • LLM_SERVER_KEY=
  • LLM_SERVER_MODEL=

These variables allow PentAGI to connect to a local backend (such as vLLM) implementing the OpenAI interaction specification, but it can also be deployed locally. Additionally, you can set up a LiteLLM proxy server, which offers an OpenAI-compatible interface and forwards requests to another local server. I'll going to review the LocalAI specification and get back to you on this. It seems feasible to support this as an option for selecting a custom LLM server.

Local Search Engine

Supporting Whoosh or OpenSearch doesn't appear to be a major issue. However, the data storage structure may impose constraints on configuring such integration. What do you think about developing a simple HTTP-based protocol with a single POST endpoint for search parameters, while the connection logic to local databases is handled separately?

Locally Installed Tools

Currently, when a job is started, a Docker image is automatically selected to serve the flow. All necessary tools are either pre-installed within the image or downloaded as needed. I have considered creating a custom build based on Kali Linux, but have found a suitable image in booyaabes/kali-linux-full:latest. It wouldn't be difficult to create a Dockerfile with the required tools. Nonetheless, the agents will need internet access during execution to download dependencies and tools from package managers, including those from GitHub. You can see the Docker image selection prompt here: image_chooser.tmpl.

Self-Contained Deployment

PentAGI is currently available in a docker-compose format, divided into three parts:

You can view the installation instructions:

I'm planning to create a video guide on setup and configuration, including LangFuse and observability components. There will also be a separate video and guide on securely setting up the Docker environment, covering docker-in-docker and other deployment options. Could you provide more details on what you mean in this section?

Custom Reporting

Currently, this feature is under development. There will be a standard report for flows and separate reports for each task in markdown and PDF formats. If you have specific requirements regarding the format and content of the reports, please share them with us, and we will incorporate them into our feature analysis.

Thank you for helping us enhance our product.

@asdek
Copy link
Contributor

asdek commented Jan 22, 2025

Hello @mazamizo21

I apologize for the inconvenience you've experienced. I will try to reproduce this problem on my side. In the meantime, here's a workaround: please ensure that the Docker image booyaabes/kali-linux-full:latest is selected. This image is about 6GB in size but already contains all the necessary base utilities, as can be mentioned in the job description. Additionally, within your job's instructions, you can specify that during execution, it should first verify whether the required utility is already installed instead of downloading and installing it separately.

As an alternative solution, you can add to your prompt that GUI-based utilities should be excluded from use during execution and explicitly list a blacklist of utilities that should neither be installed nor used.

Currently, users can only manage the job text, but we have tasks in our backlog to expand user customization to include system prompts via UI. For better understanding how your task might be executed, please take a look at the list of system prompts here: System Prompts.

Thank you for your patience and understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants