valentinfrlch · valentinfrlch · May 22, 2024 · May 19, 2024 · May 20, 2024 · May 20, 2024
diff --git a/.DS_Store b/.DS_Store
diff --git a/README.md b/README.md
@@ -1,107 +1,84 @@
 # GPT-4 Vision for Home Assistant
-[![hacs_badge](https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badge)](https://github.com/custom-components/hacs)
+<p align=center>
+<img src=https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badg>
+<img src=https://img.shields.io/badge/version-0.3.0-blue>
+<a href="https://github.com/valentinfrlch/ha-gpt4vision/issues">
+      <img alt="Issues" src="https://img.shields.io/github/issues/valentinfrlch/ha-gpt4vision?color=0088ff" />
+    </a>
+    <p align=center style="font-weight:bold">
+      Image Analyzer for Home Assistant using GPT-4o.
+    </p>
+</p>
 
-Image Analyzer for Home Assistant using GPT-4o.
+  <p align="center">
+    <a href="#features">🌟 Features </a>
+    ·
+    <a href="#resources">📖 Resources</a>
+    ·
+    <a href="#installation">⬇️ Installation</a>
+    ·
+    <a href="#service-call-and-usage">▶️ Usage</a>
+    ·
+    <a href="#how-to-report-a-bug-or-request-a-feature">🪲 How to report Bugs</a>
+
+  </p>
 
 **ha-gpt4vision** creates the `gpt4vision.image_analyzer` service in Home Assistant.
-This service sends an image to OpenAI using its API and returns the model's output as a response variable, making it easy to use in automations.
+This service sends an image to an AI provider and returns the output as a response variable for easy use in automations.
+Supported providers are OpenAI and [LocalAI](https://github.com/mudler/LocalAI).
 
 ## Features
-- Service returns the model's output as response variable. This makes the service more accessible for automations. See examples below for usage.
-- To reduce the cost of the API call, images can be downscaled to a target width.
-- The default model, GPT-4o, is cheaper and faster than GPT-4-turbo.
-  - Any model capable of vision can be used. For available models check this page: [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models).
-- This custom component can be installed through HACS and can be set up in the Home Assistant UI.
+- Multimodal conversation with AI models
+- Compatible with both OpenAI's API and [LocalAI](https://github.com/mudler/LocalAI)
+- Images can be downscaled for faster processing
+- Can be installed and updated through HACS and can be set up in the Home Assistant UI
+
+## Resources
+Check the [wiki](https://github.com/valentinfrlch/ha-gpt4vision/wiki/Usage-Examples) for examples on how you can integrate gpt4vision into your Home Assistant or join the [discussion](https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241) in the Home Assistant Community.
 
 ## API key
 > [!IMPORTANT]  
-> **This service needs a valid API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).
-> A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).
+> If you're planning on using **OpenAI's API** you'll **need an API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).
+
+A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).
+
 
 # Installation
 ### Installation via HACS (recommended)
 1. Add this repository's url (https://github.com/valentinfrlch/ha-gpt4vision) to HACS under custom repositories.
 2. Install through HACS
 3. Restart Home Assistant
-4. Add integration in Home Assistant Settings/Devices & services
-5. Provide your API key
+4. Search for `GPT-4 Vision` in Home Assistant Settings/Devices & services
+5. Select wheter you want to use OpenAI or your own LocalAI server for processesing
+   - For OpenAI's API provide your API key
+   - For LocalAI enter your IP address and port of your LocalAI server
 
 ### Manual Installation
-1. Download and copy folder **gpt4vision** to your **custom_components** folder.
+1. Download and copy the **gpt4vision** folder into your **custom_components** folder.
 2. Add integration in Home Assistant Settings/Devices & services
-3. Provide your API key
+3. Provide your API key or IP address and port of your LocalAI server
 
 ## Service call and usage
 After restarting, the gpt4vision.image_analyzer service will be available. You can test it in the developer tools section in home assistant.
 To get GPT's analysis of a local image, use the following service call.
 
-```yaml
-service: gpt4vision.image_analyzer
-data:
-  message: '[Prompt message for AI]'
-  model: '[model]'
-  image_file: '[path for image file]'
-  target_width: [Target width for image downscaling]
-  max_tokens: [maximum number of tokens]'
-```
-The parameters `message`, `max_tokens` and `image_file` are mandatory for the execution of the service.
-Optionally, the `model` and the `target_width` can be set. For available models check this page: https://platform.openai.com/docs/models.
-
-## Automation Example
-In automations, if your response variable name is `response`, you can access the response as `{{response.response_text}}`:
-```yaml
-sequence:
-  - service: gpt4vision.image_analyzer
-    metadata: {}
-    data:
-      message: Describe the person in the image
-      image_file: /config/www/tmp/test.jpg
-      max_tokens: 100
-    response_variable: response
-  - service: tts.speak
-    metadata: {}
-    data:
-      cache: true
-      media_player_entity_id: media_player.entity_id
-      message: "{{response.response_text}}"
-    target:
-      entity_id: tts.tts_entity
-```
-
-## Usage Examples
-### Example 1: Announcement for package delivery
-If your camera doesn't support built-in delivery announcements, this is likely the easiest way to get them without running an object detection model.
-
 ```yaml
 service: gpt4vision.image_analyzer
 data:
   max_tokens: 100
+  message: Describe what you see
+  image_file: |-
+    /config/www/tmp/example.jpg
+    /config/www/tmp/example2.jpg
+  provider: LocalAI
   model: gpt-4o
   target_width: 1280
-  image_file: '/config/www/tmp/front_porch.jpg'
-  message: >-
-    Does it look like the person is delivering a package? Answer with only "yes"
-    or "no".
-    # Answer: yes
 ```
-<img alt="man delivering package" src="https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/ab615fd5-25b5-4e07-9c44-b10ec7a678c0">
+The parameters `message`, `max_tokens`, `image_file` and `provider` are required. You can send multiple images per service call. Note that each path must be on a new line and that sending multiple images may require higher `max_tokens` values for accurate results.
 
-### Example 2: Suspicious behaviour
-An automation could be triggered if a person is detected around the house when no one is home.
-![suspicious behaviour](https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/411678c4-f344-4eeb-9eb2-b78484a4d872)
+Optionally, the `model` and `target_width` properties can be set. For available models check these pages: [OpenAI](https://platform.openai.com/docs/models) and [LocalAI](https://localai.io/models/).
 
-```
-service: gpt4vision.image_analyzer
-data:
-  max_tokens: 100
-  model: gpt-4o
-  target_width: 1280
-  image_file: '/config/www/tmp/garage.jpg'
-  message: >-
-    What is the person doing? Does anything look suspicious? Answer only with
-    "yes" or "no".
-```
-## Issues
+## How to report a bug or request a feature
 > [!NOTE]
-> **Bugs:** If you encounter any bugs and have read the docs carefully, feel free to file a bug report.  
-> **Feature Requests:** If you have an idea for a feature, file a feature request.
+> **Bugs:** If you encounter any bugs and have followed the instructions carefully, feel free to file a bug report.  
+> **Feature Requests:** If you have an idea for a feature, create a feature request.
diff --git a/custom_components/gpt4vision/__init__.py b/custom_components/gpt4vision/__init__.py
@@ -1,5 +1,18 @@
 # Declare variables
-from .const import DOMAIN, CONF_API_KEY, CONF_MAXTOKENS, CONF_TARGET_WIDTH, CONF_MODEL, CONF_MESSAGE, CONF_IMAGE_FILE
+from .const import (
+    DOMAIN,
+    CONF_PROVIDER,
+    CONF_OPENAI_API_KEY,
+    CONF_MAXTOKENS,
+    CONF_TARGET_WIDTH,
+    CONF_MODEL,
+    CONF_MESSAGE,
+    CONF_IMAGE_FILE,
+    CONF_IP_ADDRESS,
+    CONF_PORT,
+    CONF_TEMPERATURE
+)
+from .request_handlers import handle_localai_request, handle_openai_request
 import base64
 import io
 import os
@@ -11,17 +24,54 @@
 
 async def async_setup_entry(hass, entry):
     """Set up gpt4vision from a config entry."""
-    # Get the API key from the configuration entry
-    api_key = entry.data[CONF_API_KEY]
-
-    # Store the API key in hass.data
-    hass.data[DOMAIN] = {
-        "api_key": api_key
-    }
+    # Get all entries from config flow
+    openai_api_key = entry.data.get(CONF_OPENAI_API_KEY)
+    ip_address = entry.data.get(CONF_IP_ADDRESS)
+    port = entry.data.get(CONF_PORT)
+
+    # Ensure DOMAIN exists in hass.data
+    if DOMAIN not in hass.data:
+        hass.data[DOMAIN] = {}
+
+    # Merge the new data with the existing data
+    hass.data[DOMAIN].update({
+        key: value
+        for key, value in {
+            CONF_OPENAI_API_KEY: openai_api_key,
+            CONF_IP_ADDRESS: ip_address,
+            CONF_PORT: port,
+        }.items()
+        if value is not None
+    })
 
     return True
 
 
+def validate(mode, api_key, ip_address, port, image_paths):
+    """Validate the configuration for the component
+
+    Args:
+        mode (string): "OpenAI" or "LocalAI"
+        api_key (string): OpenAI API key
+        ip_address (string): LocalAI server IP address
+        port (string): LocalAI server port
+
+    Raises:
+        ServiceValidationError: if configuration is invalid
+    """
+
+    if mode == "OpenAI":
+        if not api_key:
+            raise ServiceValidationError("openai_not_configured")
+    elif mode == "LocalAI":
+        if not ip_address or not port:
+            raise ServiceValidationError("localai_not_configured")
+    # Check if image file exists
+    for image_path in image_paths:
+        if not os.path.exists(image_path):
+            raise ServiceValidationError("invalid_image_path")
+
+
 def setup(hass, config):
     async def image_analyzer(data_call):
         """send GET request to OpenAI API '/v1/chat/completions' endpoint
@@ -30,30 +80,37 @@ async def image_analyzer(data_call):
             json: response_text
         """
 
-        # Try to get the API key from hass.data
-        api_key = hass.data.get(DOMAIN, {}).get("api_key")
-
-        # Check if api key is present
-        if not api_key:
-            raise ServiceValidationError(
-                "API key is required. Please set up the integration again.")
+        # Read from configuration (hass.data)
+        api_key = hass.data.get(DOMAIN, {}).get(CONF_OPENAI_API_KEY)
+        ip_address = hass.data.get(DOMAIN, {}).get(CONF_IP_ADDRESS)
+        port = hass.data.get(DOMAIN, {}).get(CONF_PORT)
 
         # Read data from service call
-        # Resolution (width only) of the image. Example: 1280 for 720p etc.
-        target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
-        # Local path to your image. Example: "/config/www/images/garage.jpg"
-        image_path = data_call.data.get(CONF_IMAGE_FILE)
-        # Maximum number of tokens used by model. Default is 100.
-        max_tokens = int(data_call.data.get(CONF_MAXTOKENS))
-        # GPT model: Default model is gpt-4o
-        model = str(data_call.data.get(CONF_MODEL, "gpt-4o"))
+        mode = str(data_call.data.get(CONF_PROVIDER))
         # Message to be sent to AI model
         message = str(data_call.data.get(CONF_MESSAGE)[0:2000])
-
-        # Check if image file exists
-        if not os.path.exists(image_path):
-            raise ServiceValidationError(
-                f"Image does not exist: {image_path}")
+        # Local path to your image. Example: "/config/www/images/garage.jpg"
+        image_path = data_call.data.get(CONF_IMAGE_FILE)
+        # create a list of image paths (separator: newline character)
+        image_paths = image_path.split("\n")
+        # Resolution (width only) of the image. Example: 1280 for 720p etc.
+        target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
+        # Temperature parameter. Default is 0.5
+        temperature = float(data_call.data.get(CONF_TEMPERATURE, 0.5))
+
+        # Validate configuration
+        validate(mode, api_key, ip_address, port, image_paths)
+
+        if mode == "OpenAI":
+            # GPT model: Default model is gpt-4o for OpenAI
+            model= str(data_call.data.get(CONF_MODEL, "gpt-4o"))
+            # Maximum number of tokens used by model. Default is 100.
+            max_tokens= int(data_call.data.get(CONF_MAXTOKENS))
+        if mode == "LocalAI":
+            # GPT model: Default model is gpt-4-vision-preview for LocalAI
+            model= str(data_call.data.get(CONF_MODEL, "gpt-4-vision-preview"))
+            # Maximum number of tokens used by model. Default is 100.
+            max_tokens= int(data_call.data.get(CONF_MAXTOKENS))
 
         def encode_image(image_path):
             """Encode image as base64
@@ -67,55 +124,43 @@ def encode_image(image_path):
 
             # Open the image file
             with Image.open(image_path) as img:
-                width, height = img.size
-                aspect_ratio = width / height
-                target_height = int(target_width / aspect_ratio)
+                # calculate new height based on aspect ratio
+                width, height= img.size
+                aspect_ratio= width / height
+                target_height= int(target_width / aspect_ratio)
 
                 # Resize the image only if it's larger than the target size
-                # API call price is based on resolution. The smaller the image, the cheaper the call
-                # Check https://openai.com/api/pricing/ for information on pricing
                 if width > target_width or height > target_height:
-                    img = img.resize((target_width, target_height))
+                    img= img.resize((target_width, target_height))
 
                 # Convert the image to base64
-                img_byte_arr = io.BytesIO()
+                img_byte_arr= io.BytesIO()
                 img.save(img_byte_arr, format='JPEG')
-                base64_image = base64.b64encode(
+                base64_image= base64.b64encode(
                     img_byte_arr.getvalue()).decode('utf-8')
 
             return base64_image
 
-        # Get the base64 string from the image
-        base64_image = encode_image(image_path)
-
-        # HTTP Request for AI API
-        # Header Parameters
-        headers = {'Content-type': 'application/json',
-                   'Authorization': 'Bearer ' + api_key}
-
-        # Body Parameters
-        data = {"model": model, "messages": [{"role": "user", "content": [{"type": "text", "text": message},
-                                                                          {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]}], "max_tokens": max_tokens}
+        # Get the base64 string from the images
+        base64_images= []
+        for image_path in image_paths:
+            base64_image= encode_image(image_path)
+            base64_images.append(base64_image)
 
         # Get the Home Assistant http client
-        session = async_get_clientsession(hass)
+        session= async_get_clientsession(hass)
 
-        # Get response from OpenAI and read content inside message
-        response = await session.post(
-            "https://api.openai.com/v1/chat/completions", headers=headers, json=data)
+        if mode == "LocalAI":
+            response_text = await handle_localai_request(session, model, message, base64_images, ip_address, port, max_tokens, temperature)
 
-        # Check if response is successful
-        if response.status != 200:
-            raise ServiceValidationError(
-                (await response.json()).get('error').get('message'))
+        elif mode == "OpenAI":
+            response_text = await handle_openai_request(session, model, message, base64_images, api_key, max_tokens, temperature)
 
-        response_text = (await response.json()).get(
-            "choices")[0].get("message").get("content")
         return {"response_text": response_text}
 
     hass.services.register(
         DOMAIN, "image_analyzer", image_analyzer,
-        supports_response=SupportsResponse.ONLY
+        supports_response = SupportsResponse.ONLY
     )
 
     return True