From 2502609dcace6622dcbf1667d6cabe1f4db189e7 Mon Sep 17 00:00:00 2001
From: jliberma <jliberman@nvidia.com>
Date: Mon, 30 Dec 2024 20:13:43 -0600
Subject: [PATCH] Update README.md

---
 README.md | 72 +++++++++++++++++++++++++++----------------------------
 1 file changed, 35 insertions(+), 37 deletions(-)
diff --git a/README.md b/README.md
index 75264d8..03d192b 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,16 @@
-# PDF to Podcast
+<h2><img align="center" src="https://github.com/user-attachments/assets/cbe0d62f-c856-4e0b-b3ee-6184b7c4d96f">NVIDIA AI Blueprint: PDF to Podcast</h2>
 
 ## Overview
-A microservice driven implementation for transforming PDFs into engaging audio content. For a deeper dive into the system architecture, please see the diagram below:
+A microservice driven implementation for transforming PDFs into engaging audio content.
 
-You can view a mermaid diagram of our system [here](docs/README.md).
+For a deeper dive into the system architecture, please see the diagram below:
+
+View a mermaid diagram of our system [here](docs/README.md).
 
 ## Quick Start Guide
 
 1. **Environment Variables**:
-   We require the following environment variables to be set:
+   Set the following environment variables:
    ```bash
    # Create .env file with required variables
    echo "ELEVENLABS_API_KEY=your_key" > .env
@@ -16,10 +18,10 @@ You can view a mermaid diagram of our system [here](docs/README.md).
    echo "MAX_CONCURRENT_REQUESTS=1" >> .env
    ```
 
-   Note that in production we use the NVIDIA Eleven Labs API key, which can handle concurrent requests. For local development, you may want to set `MAX_CONCURRENT_REQUESTS=1` to avoid rate-limiting issues. You can generate your own testing API key for free [here](https://elevenlabs.io/).
+   > **Note:** the NVIDIA Eleven Labs API key used in this example can handle concurrent requests. For local development, set `MAX_CONCURRENT_REQUESTS=1` to avoid rate-limiting issues. Generate your own API key for free [here](https://elevenlabs.io/).
 
 2. **Install Dependencies**:
-   We use UV to manage python dependencies.
+   We use [UV](https://pypi.org/project/uv/) to manage Python dependencies.
    
    ```bash
    make uv
@@ -32,7 +34,8 @@ You can view a mermaid diagram of our system [here](docs/README.md).
    If you open up a new terminal window and want to quickly re-use the same environment, you can run `make uv` again.
 
 3. **Start Development Server**:
-   You can start the entire stack with:
+   Start the entire stack with:
+
    ```bash
    make all-services
    ```
@@ -46,52 +49,56 @@ You can view a mermaid diagram of our system [here](docs/README.md).
 
    You can also set `DETACH=1` to run the services in detached mode, which allows you to continue using your terminal while the services are running.
 
-4. **Run Podcast Generation**:
+5. **Run Podcast Generation**:
+
    ```bash
    source .venv/bin/activate
    python tests/test.py --target <pdf1.pdf> --context <pdf2.pdf>
    ```
 
-   This will generate a 2-person podcast. In order to generate a 1-person monologue, you can add the `--monologue` flag. Check out the test file for more examples. If you are not on a GPU machine, the PDF service might take a while to run.
-
-## Hosting the PDF service on a separate machine
+   This will generate a 2-person podcast. Add the `--monologue` flag to generate a 1-person podcast. Check out the test file for more examples.
 
-As stated above, we use [docling](https://github.com/DS4SD/docling) as our default PDF service. When you spin up the stack, docling will be built and run automatically.
+## Customization
 
-If you would like to run the PDF service on a separate machine, you can add the following to your `.env` file. The `make model-dev` target will let you spin up only the docling service:
-```bash
-echo "MODEL_API_URL=<pdf-model-service-url" >> .env
-```
+### Host the PDF service on a separate machine
 
-### Using `nv-ingest`
+This blueprint uses [docling](https://github.com/DS4SD/docling) as the default PDF extraction service.
 
-We also support using a fork of NVIDIA's [NV-Ingest](https://github.com/NVIDIA/NV-Ingest) as our PDF service. This requires two A100-SXM machines. See the [repo](https://github.com/jdye64/nv-ingest/tree/brev-dev-convert-endpoint) for more information. If you would like to use this, you can add the following to your `.env` file:
+To run the PDF extraction service on a separate machine, add the following to your `.env` file: 
 ```bash
-echo "MODEL_API_URL=<nv-ingest-url>/v1" >> .env
+echo "MODEL_API_URL=<pdf-model-service-url" >> .env
 ```
-**Note the use of `v1` in the URL.**    
+The `make model-dev` target will let you spin up only the docling service.  
 
-## Selecting LLMs 
+### Use a Self-hosted NIM 
 
-We currently use an ensemble of 3 LLMS to generate these podcasts. Out of the box, we recommend using the LLama 3.1-70B NIM. If you would like to use a different model, you can update the `models.json` file with the desired model. The default `models.json` calls a NIM that we have currently hosted. Feel free to use it as you develop locally. When you deploy, please use our NIM API Catalog endpoints.
+We currently use an ensemble of 3 LLMS to generate these podcasts. Out of the box, we recommend using the LLama 3.1-70B NIM. If you would like to use a different model, you can update the `models.json` file with the desired model. The default `models.json` calls an NVIDIA-hosted NIM. Feel free to use it as you develop locally. When you deploy, please use our NIM API Catalog endpoints.
 
-## Optimizing for GPU usage
+### Change the Default Models and GPU Assignments
 
 Due to our design, it is relatively easy to swap out different pieces of our stack to optimize for GPU usage and available hardware. For example, you could swap each model with the smaller LLama 3.1-8B NIM and disable GPU usage for `docling` in `docker-compose.yaml`.
 
-## Development Tools
-
-### Tracing
+### Enable Tracing
 We expose a Jaeger instance at `http://localhost:16686/` for tracing. This is useful for debugging and monitoring the system.
 
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Run tests: `python tests/test.py <pdf1> <pdf2>`
+5. Run linting: `make ruff`
+6. Submit a pull request
+
 ### Code Quality
 The project uses `ruff` for linting and formatting. You must run `make ruff` before your PR can be merged:
+
 ```bash
 make ruff  # Runs both lint and format
 ```
-
-## CI/CD
+### CI/CD
 We use GitHub Actions for CI/CD. We run the following actions:
+
 - `ruff`: Runs linting and formatting
 - `pr-test`: Runs an end-to-end podcast test on the PR
 - `build-and-push`: Builds and pushes a new container image to the remote repo. This is used to update production deployments
@@ -106,12 +113,3 @@ We use GitHub Actions for CI/CD. We run the following actions:
 - Implement proper certificate management
 - Configure appropriate security headers
 - Follow other web security best practices
-
-## Contributing
-
-1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Run tests: `python tests/test.py <pdf1> <pdf2>`
-5. Run linting: `make ruff`
-6. Submit a pull request
\ No newline at end of file