Skip to content

Commit

Permalink
Merge pull request #3 from valentinfrlch/dev
Browse files Browse the repository at this point in the history
Merge main with dev for v0.3.0
  • Loading branch information
valentinfrlch authored May 22, 2024
2 parents b8bf7c0 + f03975c commit 9bc6651
Show file tree
Hide file tree
Showing 14 changed files with 454 additions and 175 deletions.
Binary file added .DS_Store
Binary file not shown.
125 changes: 51 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,107 +1,84 @@
# GPT-4 Vision for Home Assistant
[![hacs_badge](https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badge)](https://github.com/custom-components/hacs)
<p align=center>
<img src=https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badg>
<img src=https://img.shields.io/badge/version-0.3.0-blue>
<a href="https://github.com/valentinfrlch/ha-gpt4vision/issues">
<img alt="Issues" src="https://img.shields.io/github/issues/valentinfrlch/ha-gpt4vision?color=0088ff" />
</a>
<p align=center style="font-weight:bold">
Image Analyzer for Home Assistant using GPT-4o.
</p>
</p>

Image Analyzer for Home Assistant using GPT-4o.
<p align="center">
<a href="#features">🌟 Features </a>
·
<a href="#resources">📖 Resources</a>
·
<a href="#installation">⬇️ Installation</a>
·
<a href="#service-call-and-usage">▶️ Usage</a>
·
<a href="#how-to-report-a-bug-or-request-a-feature">🪲 How to report Bugs</a>

</p>

**ha-gpt4vision** creates the `gpt4vision.image_analyzer` service in Home Assistant.
This service sends an image to OpenAI using its API and returns the model's output as a response variable, making it easy to use in automations.
This service sends an image to an AI provider and returns the output as a response variable for easy use in automations.
Supported providers are OpenAI and [LocalAI](https://github.com/mudler/LocalAI).

## Features
- Service returns the model's output as response variable. This makes the service more accessible for automations. See examples below for usage.
- To reduce the cost of the API call, images can be downscaled to a target width.
- The default model, GPT-4o, is cheaper and faster than GPT-4-turbo.
- Any model capable of vision can be used. For available models check this page: [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models).
- This custom component can be installed through HACS and can be set up in the Home Assistant UI.
- Multimodal conversation with AI models
- Compatible with both OpenAI's API and [LocalAI](https://github.com/mudler/LocalAI)
- Images can be downscaled for faster processing
- Can be installed and updated through HACS and can be set up in the Home Assistant UI

## Resources
Check the [wiki](https://github.com/valentinfrlch/ha-gpt4vision/wiki/Usage-Examples) for examples on how you can integrate gpt4vision into your Home Assistant or join the [discussion](https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241) in the Home Assistant Community.

## API key
> [!IMPORTANT]
> **This service needs a valid API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).
> A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).
> If you're planning on using **OpenAI's API** you'll **need an API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).
A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).


# Installation
### Installation via HACS (recommended)
1. Add this repository's url (https://github.com/valentinfrlch/ha-gpt4vision) to HACS under custom repositories.
2. Install through HACS
3. Restart Home Assistant
4. Add integration in Home Assistant Settings/Devices & services
5. Provide your API key
4. Search for `GPT-4 Vision` in Home Assistant Settings/Devices & services
5. Select wheter you want to use OpenAI or your own LocalAI server for processesing
- For OpenAI's API provide your API key
- For LocalAI enter your IP address and port of your LocalAI server

### Manual Installation
1. Download and copy folder **gpt4vision** to your **custom_components** folder.
1. Download and copy the **gpt4vision** folder into your **custom_components** folder.
2. Add integration in Home Assistant Settings/Devices & services
3. Provide your API key
3. Provide your API key or IP address and port of your LocalAI server

## Service call and usage
After restarting, the gpt4vision.image_analyzer service will be available. You can test it in the developer tools section in home assistant.
To get GPT's analysis of a local image, use the following service call.

```yaml
service: gpt4vision.image_analyzer
data:
message: '[Prompt message for AI]'
model: '[model]'
image_file: '[path for image file]'
target_width: [Target width for image downscaling]
max_tokens: [maximum number of tokens]'
```
The parameters `message`, `max_tokens` and `image_file` are mandatory for the execution of the service.
Optionally, the `model` and the `target_width` can be set. For available models check this page: https://platform.openai.com/docs/models.

## Automation Example
In automations, if your response variable name is `response`, you can access the response as `{{response.response_text}}`:
```yaml
sequence:
- service: gpt4vision.image_analyzer
metadata: {}
data:
message: Describe the person in the image
image_file: /config/www/tmp/test.jpg
max_tokens: 100
response_variable: response
- service: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.entity_id
message: "{{response.response_text}}"
target:
entity_id: tts.tts_entity
```

## Usage Examples
### Example 1: Announcement for package delivery
If your camera doesn't support built-in delivery announcements, this is likely the easiest way to get them without running an object detection model.

```yaml
service: gpt4vision.image_analyzer
data:
max_tokens: 100
message: Describe what you see
image_file: |-
/config/www/tmp/example.jpg
/config/www/tmp/example2.jpg
provider: LocalAI
model: gpt-4o
target_width: 1280
image_file: '/config/www/tmp/front_porch.jpg'
message: >-
Does it look like the person is delivering a package? Answer with only "yes"
or "no".
# Answer: yes
```
<img alt="man delivering package" src="https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/ab615fd5-25b5-4e07-9c44-b10ec7a678c0">
The parameters `message`, `max_tokens`, `image_file` and `provider` are required. You can send multiple images per service call. Note that each path must be on a new line and that sending multiple images may require higher `max_tokens` values for accurate results.

### Example 2: Suspicious behaviour
An automation could be triggered if a person is detected around the house when no one is home.
![suspicious behaviour](https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/411678c4-f344-4eeb-9eb2-b78484a4d872)
Optionally, the `model` and `target_width` properties can be set. For available models check these pages: [OpenAI](https://platform.openai.com/docs/models) and [LocalAI](https://localai.io/models/).

```
service: gpt4vision.image_analyzer
data:
max_tokens: 100
model: gpt-4o
target_width: 1280
image_file: '/config/www/tmp/garage.jpg'
message: >-
What is the person doing? Does anything look suspicious? Answer only with
"yes" or "no".
```
## Issues
## How to report a bug or request a feature
> [!NOTE]
> **Bugs:** If you encounter any bugs and have read the docs carefully, feel free to file a bug report.
> **Feature Requests:** If you have an idea for a feature, file a feature request.
> **Bugs:** If you encounter any bugs and have followed the instructions carefully, feel free to file a bug report.
> **Feature Requests:** If you have an idea for a feature, create a feature request.
161 changes: 103 additions & 58 deletions custom_components/gpt4vision/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Declare variables
from .const import DOMAIN, CONF_API_KEY, CONF_MAXTOKENS, CONF_TARGET_WIDTH, CONF_MODEL, CONF_MESSAGE, CONF_IMAGE_FILE
from .const import (
DOMAIN,
CONF_PROVIDER,
CONF_OPENAI_API_KEY,
CONF_MAXTOKENS,
CONF_TARGET_WIDTH,
CONF_MODEL,
CONF_MESSAGE,
CONF_IMAGE_FILE,
CONF_IP_ADDRESS,
CONF_PORT,
CONF_TEMPERATURE
)
from .request_handlers import handle_localai_request, handle_openai_request
import base64
import io
import os
Expand All @@ -11,17 +24,54 @@

async def async_setup_entry(hass, entry):
"""Set up gpt4vision from a config entry."""
# Get the API key from the configuration entry
api_key = entry.data[CONF_API_KEY]

# Store the API key in hass.data
hass.data[DOMAIN] = {
"api_key": api_key
}
# Get all entries from config flow
openai_api_key = entry.data.get(CONF_OPENAI_API_KEY)
ip_address = entry.data.get(CONF_IP_ADDRESS)
port = entry.data.get(CONF_PORT)

# Ensure DOMAIN exists in hass.data
if DOMAIN not in hass.data:
hass.data[DOMAIN] = {}

# Merge the new data with the existing data
hass.data[DOMAIN].update({
key: value
for key, value in {
CONF_OPENAI_API_KEY: openai_api_key,
CONF_IP_ADDRESS: ip_address,
CONF_PORT: port,
}.items()
if value is not None
})

return True


def validate(mode, api_key, ip_address, port, image_paths):
"""Validate the configuration for the component
Args:
mode (string): "OpenAI" or "LocalAI"
api_key (string): OpenAI API key
ip_address (string): LocalAI server IP address
port (string): LocalAI server port
Raises:
ServiceValidationError: if configuration is invalid
"""

if mode == "OpenAI":
if not api_key:
raise ServiceValidationError("openai_not_configured")
elif mode == "LocalAI":
if not ip_address or not port:
raise ServiceValidationError("localai_not_configured")
# Check if image file exists
for image_path in image_paths:
if not os.path.exists(image_path):
raise ServiceValidationError("invalid_image_path")


def setup(hass, config):
async def image_analyzer(data_call):
"""send GET request to OpenAI API '/v1/chat/completions' endpoint
Expand All @@ -30,30 +80,37 @@ async def image_analyzer(data_call):
json: response_text
"""

# Try to get the API key from hass.data
api_key = hass.data.get(DOMAIN, {}).get("api_key")

# Check if api key is present
if not api_key:
raise ServiceValidationError(
"API key is required. Please set up the integration again.")
# Read from configuration (hass.data)
api_key = hass.data.get(DOMAIN, {}).get(CONF_OPENAI_API_KEY)
ip_address = hass.data.get(DOMAIN, {}).get(CONF_IP_ADDRESS)
port = hass.data.get(DOMAIN, {}).get(CONF_PORT)

# Read data from service call
# Resolution (width only) of the image. Example: 1280 for 720p etc.
target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
# Local path to your image. Example: "/config/www/images/garage.jpg"
image_path = data_call.data.get(CONF_IMAGE_FILE)
# Maximum number of tokens used by model. Default is 100.
max_tokens = int(data_call.data.get(CONF_MAXTOKENS))
# GPT model: Default model is gpt-4o
model = str(data_call.data.get(CONF_MODEL, "gpt-4o"))
mode = str(data_call.data.get(CONF_PROVIDER))
# Message to be sent to AI model
message = str(data_call.data.get(CONF_MESSAGE)[0:2000])

# Check if image file exists
if not os.path.exists(image_path):
raise ServiceValidationError(
f"Image does not exist: {image_path}")
# Local path to your image. Example: "/config/www/images/garage.jpg"
image_path = data_call.data.get(CONF_IMAGE_FILE)
# create a list of image paths (separator: newline character)
image_paths = image_path.split("\n")
# Resolution (width only) of the image. Example: 1280 for 720p etc.
target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
# Temperature parameter. Default is 0.5
temperature = float(data_call.data.get(CONF_TEMPERATURE, 0.5))

# Validate configuration
validate(mode, api_key, ip_address, port, image_paths)

if mode == "OpenAI":
# GPT model: Default model is gpt-4o for OpenAI
model= str(data_call.data.get(CONF_MODEL, "gpt-4o"))
# Maximum number of tokens used by model. Default is 100.
max_tokens= int(data_call.data.get(CONF_MAXTOKENS))
if mode == "LocalAI":
# GPT model: Default model is gpt-4-vision-preview for LocalAI
model= str(data_call.data.get(CONF_MODEL, "gpt-4-vision-preview"))
# Maximum number of tokens used by model. Default is 100.
max_tokens= int(data_call.data.get(CONF_MAXTOKENS))

def encode_image(image_path):
"""Encode image as base64
Expand All @@ -67,55 +124,43 @@ def encode_image(image_path):

# Open the image file
with Image.open(image_path) as img:
width, height = img.size
aspect_ratio = width / height
target_height = int(target_width / aspect_ratio)
# calculate new height based on aspect ratio
width, height= img.size
aspect_ratio= width / height
target_height= int(target_width / aspect_ratio)

# Resize the image only if it's larger than the target size
# API call price is based on resolution. The smaller the image, the cheaper the call
# Check https://openai.com/api/pricing/ for information on pricing
if width > target_width or height > target_height:
img = img.resize((target_width, target_height))
img= img.resize((target_width, target_height))

# Convert the image to base64
img_byte_arr = io.BytesIO()
img_byte_arr= io.BytesIO()
img.save(img_byte_arr, format='JPEG')
base64_image = base64.b64encode(
base64_image= base64.b64encode(
img_byte_arr.getvalue()).decode('utf-8')

return base64_image

# Get the base64 string from the image
base64_image = encode_image(image_path)

# HTTP Request for AI API
# Header Parameters
headers = {'Content-type': 'application/json',
'Authorization': 'Bearer ' + api_key}

# Body Parameters
data = {"model": model, "messages": [{"role": "user", "content": [{"type": "text", "text": message},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]}], "max_tokens": max_tokens}
# Get the base64 string from the images
base64_images= []
for image_path in image_paths:
base64_image= encode_image(image_path)
base64_images.append(base64_image)

# Get the Home Assistant http client
session = async_get_clientsession(hass)
session= async_get_clientsession(hass)

# Get response from OpenAI and read content inside message
response = await session.post(
"https://api.openai.com/v1/chat/completions", headers=headers, json=data)
if mode == "LocalAI":
response_text = await handle_localai_request(session, model, message, base64_images, ip_address, port, max_tokens, temperature)

# Check if response is successful
if response.status != 200:
raise ServiceValidationError(
(await response.json()).get('error').get('message'))
elif mode == "OpenAI":
response_text = await handle_openai_request(session, model, message, base64_images, api_key, max_tokens, temperature)

response_text = (await response.json()).get(
"choices")[0].get("message").get("content")
return {"response_text": response_text}

hass.services.register(
DOMAIN, "image_analyzer", image_analyzer,
supports_response=SupportsResponse.ONLY
supports_response = SupportsResponse.ONLY
)

return True
Loading

0 comments on commit 9bc6651

Please sign in to comment.