Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main with dev for v0.3.0 #3

Merged
merged 29 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f38dfb8
working on support for localai
valentinfrlch May 19, 2024
6519174
Added validation functions
valentinfrlch May 20, 2024
fc5bf16
Update README.md
valentinfrlch May 20, 2024
575e84e
Translations updated
valentinfrlch May 20, 2024
31c5603
Merge branch 'dev' of https://github.com/valentinfrlch/ha-gpt4vision …
valentinfrlch May 20, 2024
9aacb3a
Fixed issue where entries would override each other
valentinfrlch May 20, 2024
ffdab1b
Added max_tokens support for localai provider
valentinfrlch May 20, 2024
1fc514b
Update README.md
valentinfrlch May 20, 2024
54fa941
Abort set up process when provider is already configured
valentinfrlch May 21, 2024
aedff1f
Merge branch 'dev' of https://github.com/valentinfrlch/ha-gpt4vision …
valentinfrlch May 21, 2024
26a4163
Updated strings.json with more helpful error messages
valentinfrlch May 21, 2024
2be91ee
Abort set up process when provider is already configured
valentinfrlch May 21, 2024
118cae0
Update README.md
valentinfrlch May 21, 2024
a55d473
Updated request_handlers.py for allowing multiple images
valentinfrlch May 21, 2024
451119c
Updated README to reflext the change how multiple images are handled
valentinfrlch May 21, 2024
be34f3d
Updated request_handlers.py for allowing multiple images
valentinfrlch May 21, 2024
00b86c2
Update README.md
valentinfrlch May 21, 2024
0426f24
Added translations for new error messages
valentinfrlch May 21, 2024
eb46f4e
Merge branch 'dev' of https://github.com/valentinfrlch/ha-gpt4vision …
valentinfrlch May 21, 2024
37312dc
Updated request_handlers.py for allowing multiple images
valentinfrlch May 21, 2024
b7d95c1
Overhauled the README for v0.3.0 release
valentinfrlch May 21, 2024
83fcccb
Better method to validate localai server is running
valentinfrlch May 22, 2024
9f08ece
Better method to validate localai server is running
valentinfrlch May 22, 2024
3fbdc82
Added temperature parameter
valentinfrlch May 22, 2024
958040b
Added temperature parameter
valentinfrlch May 22, 2024
fe19ce8
Added temperature parameter
valentinfrlch May 22, 2024
c6a0851
Added temperature parameter
valentinfrlch May 22, 2024
b425af4
Better method to validate localai server is running
valentinfrlch May 22, 2024
f03975c
Merge branch 'main' into dev
valentinfrlch May 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
125 changes: 51 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,107 +1,84 @@
# GPT-4 Vision for Home Assistant
[![hacs_badge](https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badge)](https://github.com/custom-components/hacs)
<p align=center>
<img src=https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badg>
<img src=https://img.shields.io/badge/version-0.3.0-blue>
<a href="https://github.com/valentinfrlch/ha-gpt4vision/issues">
<img alt="Issues" src="https://img.shields.io/github/issues/valentinfrlch/ha-gpt4vision?color=0088ff" />
</a>
<p align=center style="font-weight:bold">
Image Analyzer for Home Assistant using GPT-4o.
</p>
</p>

Image Analyzer for Home Assistant using GPT-4o.
<p align="center">
<a href="#features">🌟 Features </a>
·
<a href="#resources">📖 Resources</a>
·
<a href="#installation">⬇️ Installation</a>
·
<a href="#service-call-and-usage">▶️ Usage</a>
·
<a href="#how-to-report-a-bug-or-request-a-feature">🪲 How to report Bugs</a>

</p>

**ha-gpt4vision** creates the `gpt4vision.image_analyzer` service in Home Assistant.
This service sends an image to OpenAI using its API and returns the model's output as a response variable, making it easy to use in automations.
This service sends an image to an AI provider and returns the output as a response variable for easy use in automations.
Supported providers are OpenAI and [LocalAI](https://github.com/mudler/LocalAI).

## Features
- Service returns the model's output as response variable. This makes the service more accessible for automations. See examples below for usage.
- To reduce the cost of the API call, images can be downscaled to a target width.
- The default model, GPT-4o, is cheaper and faster than GPT-4-turbo.
- Any model capable of vision can be used. For available models check this page: [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models).
- This custom component can be installed through HACS and can be set up in the Home Assistant UI.
- Multimodal conversation with AI models
- Compatible with both OpenAI's API and [LocalAI](https://github.com/mudler/LocalAI)
- Images can be downscaled for faster processing
- Can be installed and updated through HACS and can be set up in the Home Assistant UI

## Resources
Check the [wiki](https://github.com/valentinfrlch/ha-gpt4vision/wiki/Usage-Examples) for examples on how you can integrate gpt4vision into your Home Assistant or join the [discussion](https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241) in the Home Assistant Community.

## API key
> [!IMPORTANT]
> **This service needs a valid API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).
> A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).
> If you're planning on using **OpenAI's API** you'll **need an API key**. You must obtain a valid OpenAI key from [here](https://platform.openai.com/api-keys).

A pricing calculator is available here: [https://openai.com/api/pricing/](https://openai.com/api/pricing/).


# Installation
### Installation via HACS (recommended)
1. Add this repository's url (https://github.com/valentinfrlch/ha-gpt4vision) to HACS under custom repositories.
2. Install through HACS
3. Restart Home Assistant
4. Add integration in Home Assistant Settings/Devices & services
5. Provide your API key
4. Search for `GPT-4 Vision` in Home Assistant Settings/Devices & services
5. Select wheter you want to use OpenAI or your own LocalAI server for processesing
- For OpenAI's API provide your API key
- For LocalAI enter your IP address and port of your LocalAI server

### Manual Installation
1. Download and copy folder **gpt4vision** to your **custom_components** folder.
1. Download and copy the **gpt4vision** folder into your **custom_components** folder.
2. Add integration in Home Assistant Settings/Devices & services
3. Provide your API key
3. Provide your API key or IP address and port of your LocalAI server

## Service call and usage
After restarting, the gpt4vision.image_analyzer service will be available. You can test it in the developer tools section in home assistant.
To get GPT's analysis of a local image, use the following service call.

```yaml
service: gpt4vision.image_analyzer
data:
message: '[Prompt message for AI]'
model: '[model]'
image_file: '[path for image file]'
target_width: [Target width for image downscaling]
max_tokens: [maximum number of tokens]'
```
The parameters `message`, `max_tokens` and `image_file` are mandatory for the execution of the service.
Optionally, the `model` and the `target_width` can be set. For available models check this page: https://platform.openai.com/docs/models.

## Automation Example
In automations, if your response variable name is `response`, you can access the response as `{{response.response_text}}`:
```yaml
sequence:
- service: gpt4vision.image_analyzer
metadata: {}
data:
message: Describe the person in the image
image_file: /config/www/tmp/test.jpg
max_tokens: 100
response_variable: response
- service: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.entity_id
message: "{{response.response_text}}"
target:
entity_id: tts.tts_entity
```

## Usage Examples
### Example 1: Announcement for package delivery
If your camera doesn't support built-in delivery announcements, this is likely the easiest way to get them without running an object detection model.

```yaml
service: gpt4vision.image_analyzer
data:
max_tokens: 100
message: Describe what you see
image_file: |-
/config/www/tmp/example.jpg
/config/www/tmp/example2.jpg
provider: LocalAI
model: gpt-4o
target_width: 1280
image_file: '/config/www/tmp/front_porch.jpg'
message: >-
Does it look like the person is delivering a package? Answer with only "yes"
or "no".
# Answer: yes
```
<img alt="man delivering package" src="https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/ab615fd5-25b5-4e07-9c44-b10ec7a678c0">
The parameters `message`, `max_tokens`, `image_file` and `provider` are required. You can send multiple images per service call. Note that each path must be on a new line and that sending multiple images may require higher `max_tokens` values for accurate results.

### Example 2: Suspicious behaviour
An automation could be triggered if a person is detected around the house when no one is home.
![suspicious behaviour](https://github.com/valentinfrlch/ha-gpt4vision/assets/85313672/411678c4-f344-4eeb-9eb2-b78484a4d872)
Optionally, the `model` and `target_width` properties can be set. For available models check these pages: [OpenAI](https://platform.openai.com/docs/models) and [LocalAI](https://localai.io/models/).

```
service: gpt4vision.image_analyzer
data:
max_tokens: 100
model: gpt-4o
target_width: 1280
image_file: '/config/www/tmp/garage.jpg'
message: >-
What is the person doing? Does anything look suspicious? Answer only with
"yes" or "no".
```
## Issues
## How to report a bug or request a feature
> [!NOTE]
> **Bugs:** If you encounter any bugs and have read the docs carefully, feel free to file a bug report.
> **Feature Requests:** If you have an idea for a feature, file a feature request.
> **Bugs:** If you encounter any bugs and have followed the instructions carefully, feel free to file a bug report.
> **Feature Requests:** If you have an idea for a feature, create a feature request.
161 changes: 103 additions & 58 deletions custom_components/gpt4vision/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Declare variables
from .const import DOMAIN, CONF_API_KEY, CONF_MAXTOKENS, CONF_TARGET_WIDTH, CONF_MODEL, CONF_MESSAGE, CONF_IMAGE_FILE
from .const import (
DOMAIN,
CONF_PROVIDER,
CONF_OPENAI_API_KEY,
CONF_MAXTOKENS,
CONF_TARGET_WIDTH,
CONF_MODEL,
CONF_MESSAGE,
CONF_IMAGE_FILE,
CONF_IP_ADDRESS,
CONF_PORT,
CONF_TEMPERATURE
)
from .request_handlers import handle_localai_request, handle_openai_request
import base64
import io
import os
Expand All @@ -11,17 +24,54 @@

async def async_setup_entry(hass, entry):
"""Set up gpt4vision from a config entry."""
# Get the API key from the configuration entry
api_key = entry.data[CONF_API_KEY]

# Store the API key in hass.data
hass.data[DOMAIN] = {
"api_key": api_key
}
# Get all entries from config flow
openai_api_key = entry.data.get(CONF_OPENAI_API_KEY)
ip_address = entry.data.get(CONF_IP_ADDRESS)
port = entry.data.get(CONF_PORT)

# Ensure DOMAIN exists in hass.data
if DOMAIN not in hass.data:
hass.data[DOMAIN] = {}

# Merge the new data with the existing data
hass.data[DOMAIN].update({
key: value
for key, value in {
CONF_OPENAI_API_KEY: openai_api_key,
CONF_IP_ADDRESS: ip_address,
CONF_PORT: port,
}.items()
if value is not None
})

return True


def validate(mode, api_key, ip_address, port, image_paths):
"""Validate the configuration for the component

Args:
mode (string): "OpenAI" or "LocalAI"
api_key (string): OpenAI API key
ip_address (string): LocalAI server IP address
port (string): LocalAI server port

Raises:
ServiceValidationError: if configuration is invalid
"""

if mode == "OpenAI":
if not api_key:
raise ServiceValidationError("openai_not_configured")
elif mode == "LocalAI":
if not ip_address or not port:
raise ServiceValidationError("localai_not_configured")
# Check if image file exists
for image_path in image_paths:
if not os.path.exists(image_path):
raise ServiceValidationError("invalid_image_path")


def setup(hass, config):
async def image_analyzer(data_call):
"""send GET request to OpenAI API '/v1/chat/completions' endpoint
Expand All @@ -30,30 +80,37 @@ async def image_analyzer(data_call):
json: response_text
"""

# Try to get the API key from hass.data
api_key = hass.data.get(DOMAIN, {}).get("api_key")

# Check if api key is present
if not api_key:
raise ServiceValidationError(
"API key is required. Please set up the integration again.")
# Read from configuration (hass.data)
api_key = hass.data.get(DOMAIN, {}).get(CONF_OPENAI_API_KEY)
ip_address = hass.data.get(DOMAIN, {}).get(CONF_IP_ADDRESS)
port = hass.data.get(DOMAIN, {}).get(CONF_PORT)

# Read data from service call
# Resolution (width only) of the image. Example: 1280 for 720p etc.
target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
# Local path to your image. Example: "/config/www/images/garage.jpg"
image_path = data_call.data.get(CONF_IMAGE_FILE)
# Maximum number of tokens used by model. Default is 100.
max_tokens = int(data_call.data.get(CONF_MAXTOKENS))
# GPT model: Default model is gpt-4o
model = str(data_call.data.get(CONF_MODEL, "gpt-4o"))
mode = str(data_call.data.get(CONF_PROVIDER))
# Message to be sent to AI model
message = str(data_call.data.get(CONF_MESSAGE)[0:2000])

# Check if image file exists
if not os.path.exists(image_path):
raise ServiceValidationError(
f"Image does not exist: {image_path}")
# Local path to your image. Example: "/config/www/images/garage.jpg"
image_path = data_call.data.get(CONF_IMAGE_FILE)
# create a list of image paths (separator: newline character)
image_paths = image_path.split("\n")
# Resolution (width only) of the image. Example: 1280 for 720p etc.
target_width = data_call.data.get(CONF_TARGET_WIDTH, 1280)
# Temperature parameter. Default is 0.5
temperature = float(data_call.data.get(CONF_TEMPERATURE, 0.5))

# Validate configuration
validate(mode, api_key, ip_address, port, image_paths)

if mode == "OpenAI":
# GPT model: Default model is gpt-4o for OpenAI
model= str(data_call.data.get(CONF_MODEL, "gpt-4o"))
# Maximum number of tokens used by model. Default is 100.
max_tokens= int(data_call.data.get(CONF_MAXTOKENS))
if mode == "LocalAI":
# GPT model: Default model is gpt-4-vision-preview for LocalAI
model= str(data_call.data.get(CONF_MODEL, "gpt-4-vision-preview"))
# Maximum number of tokens used by model. Default is 100.
max_tokens= int(data_call.data.get(CONF_MAXTOKENS))

def encode_image(image_path):
"""Encode image as base64
Expand All @@ -67,55 +124,43 @@ def encode_image(image_path):

# Open the image file
with Image.open(image_path) as img:
width, height = img.size
aspect_ratio = width / height
target_height = int(target_width / aspect_ratio)
# calculate new height based on aspect ratio
width, height= img.size
aspect_ratio= width / height
target_height= int(target_width / aspect_ratio)

# Resize the image only if it's larger than the target size
# API call price is based on resolution. The smaller the image, the cheaper the call
# Check https://openai.com/api/pricing/ for information on pricing
if width > target_width or height > target_height:
img = img.resize((target_width, target_height))
img= img.resize((target_width, target_height))

# Convert the image to base64
img_byte_arr = io.BytesIO()
img_byte_arr= io.BytesIO()
img.save(img_byte_arr, format='JPEG')
base64_image = base64.b64encode(
base64_image= base64.b64encode(
img_byte_arr.getvalue()).decode('utf-8')

return base64_image

# Get the base64 string from the image
base64_image = encode_image(image_path)

# HTTP Request for AI API
# Header Parameters
headers = {'Content-type': 'application/json',
'Authorization': 'Bearer ' + api_key}

# Body Parameters
data = {"model": model, "messages": [{"role": "user", "content": [{"type": "text", "text": message},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]}], "max_tokens": max_tokens}
# Get the base64 string from the images
base64_images= []
for image_path in image_paths:
base64_image= encode_image(image_path)
base64_images.append(base64_image)

# Get the Home Assistant http client
session = async_get_clientsession(hass)
session= async_get_clientsession(hass)

# Get response from OpenAI and read content inside message
response = await session.post(
"https://api.openai.com/v1/chat/completions", headers=headers, json=data)
if mode == "LocalAI":
response_text = await handle_localai_request(session, model, message, base64_images, ip_address, port, max_tokens, temperature)

# Check if response is successful
if response.status != 200:
raise ServiceValidationError(
(await response.json()).get('error').get('message'))
elif mode == "OpenAI":
response_text = await handle_openai_request(session, model, message, base64_images, api_key, max_tokens, temperature)

response_text = (await response.json()).get(
"choices")[0].get("message").get("content")
return {"response_text": response_text}

hass.services.register(
DOMAIN, "image_analyzer", image_analyzer,
supports_response=SupportsResponse.ONLY
supports_response = SupportsResponse.ONLY
)

return True
Loading
Loading