Skip to content

Commit

Permalink
Merge pull request #116 from smritae01/dev
Browse files Browse the repository at this point in the history
Add Proof of Concept work
  • Loading branch information
trgardos authored Dec 7, 2023
2 parents 0e0758f + 49886d2 commit 0625948
Show file tree
Hide file tree
Showing 125 changed files with 3,308 additions and 0 deletions.
Binary file added POC/02241014.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
610 changes: 610 additions & 0 deletions POC/AzureVision-resized.ipynb

Large diffs are not rendered by default.

280 changes: 280 additions & 0 deletions POC/AzureVision.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2f17ba8c",
"metadata": {},
"source": [
"# Azure Vision Implementaion - Dima "
]
},
{
"cell_type": "markdown",
"id": "91f6a7e3",
"metadata": {},
"source": [
"This notebook utilizes Azure AI Document Intelligence Studio to extract text from a set of Herbarium specimens, obtained from: https://www.gbif.org/occurrence/gallery?basis_of_record=PRESERVED_SPECIMEN&media_ty%5B%E2%80%A6%5Daxon_key=6&year=1000,1941&advanced=1&occurrence_status=present\n",
"\n",
"A selection of 30 specimens was downloaded to the /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens folder. \n",
"\n",
"The folder is made up of:\n",
"1) 20 images that contain pure text, ranging from plain to hard-to-read-cursive and 1\n",
"2) 10 images that contain both the visual plant specimen and the attached textual labels\n",
"\n",
"Special care was taken to select a diverse collection of specimens, ranging in text quality and type\n",
"\n",
"In regards to the 10 images: there was a general trend in that the images with plant specimens and the actual text, the text was too small and or blurry to be deciphered by any LLM. Next steps would include improving the quality of the text for the LLM to analyze it. \n",
"\n",
"Currently: the notebook takes an input image from: /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens, runs it through Azure Vision, analyzes all text, creates a pdf with the original image, an annotated image that has boxes around identified words and predicted words written over the original text. Below the image the entire text identified is printed along with the confidence score for each identified term. All this is saved and stored in: /projectnb/sparkgrp/ml-herbarium-grp/fall2023/AzureVision-results\n",
"\n",
"Immediate next steps:\n",
"\n",
"1. Obtain a student Microsoft Azure account to finish the work (testing was done with a personal account, ran out of free credits)\n",
"2. Improve annotated images- currently the predicted text is hard to read, going to change it so that its above the orginal words. \n",
"3. Integrate GPT-4 to parse the written text into a format that clearly returns the species, date collected, geography. \n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "bc1c7278",
"metadata": {},
"outputs": [],
"source": [
"#!pip install azure-ai-formrecognizer --pre\n",
"#!pip install opencv-python-headless matplotlib\n",
"#!pip install matplotlib pillow\n",
"#!pip install ipywidgets\n",
"#!pip install shapely\n",
"#!pip install openai\n",
"#!pip install reportlab"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "c1566288",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Couldn't get a file descriptor referring to the console\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"An error occurred while processing /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens/Text_Sample_19.png: (403) Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"Code: 403\n",
"Message: Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"An error occurred while processing /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens/Text_Sample_10.png: (403) Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"Code: 403\n",
"Message: Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"An error occurred while processing /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens/Text_Sample_6.png: (403) Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"Code: 403\n",
"Message: Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"An error occurred while processing /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens/Mixed_Sample_7.png: (403) Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"Code: 403\n",
"Message: Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"An error occurred while processing /projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens/Text_Sample_4.png: (403) Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n",
"Code: 403\n",
"Message: Out of call volume quota for FormRecognizer F0 pricing tier. Please retry after 6 days. To increase your call volume switch to a paid tier.\n"
]
}
],
"source": [
"from azure.core.credentials import AzureKeyCredential\n",
"from azure.ai.formrecognizer import DocumentAnalysisClient\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.image as mpimg\n",
"from PIL import Image, ImageDraw, ImageFont\n",
"import openai\n",
"import re\n",
"import os\n",
"from reportlab.lib.pagesizes import letter\n",
"from reportlab.pdfgen import canvas\n",
"\n",
"\n",
"# Azure Cognitive Services endpoint and key\n",
"endpoint = \"https://herbariumsamplerecognition.cognitiveservices.azure.com/\"\n",
"key = \"d341921d724e44bda113bc343e88d476\"\n",
"\n",
"def sanitize_filename(filename):\n",
" # Remove characters that are not alphanumeric, spaces, dots, or underscores\n",
" return re.sub(r'[^\\w\\s\\.-]', '', filename)\n",
"\n",
"def format_bounding_box(bounding_box):\n",
" if not bounding_box:\n",
" return \"N/A\"\n",
" return \", \".join([\"[{}, {}]\".format(p.x, p.y) for p in bounding_box])\n",
"\n",
"def draw_boxes(image_path, words):\n",
" original_image = Image.open(image_path)\n",
" annotated_image = original_image.copy()\n",
" draw = ImageDraw.Draw(annotated_image)\n",
"\n",
" for word in words:\n",
" polygon = word['polygon']\n",
" if polygon:\n",
" bbox = [(point.x, point.y) for point in polygon]\n",
" try:\n",
" # Replace special characters that cannot be encoded in 'latin-1'\n",
" text_content = word['content'].encode('ascii', 'ignore').decode('ascii')\n",
" except Exception as e:\n",
" print(f\"Error processing text {word['content']}: {e}\")\n",
" text_content = \"Error\"\n",
" draw.polygon(bbox, outline=\"red\")\n",
" draw.text((bbox[0][0], bbox[0][1]), text_content, fill=\"green\")\n",
" \n",
" return annotated_image\n",
"\n",
"\n",
"def parse_document_content(content):\n",
" openai.api_key = 'your-api-key'\n",
"\n",
" try:\n",
" response = openai.Completion.create(\n",
" model=\"gpt-4\",\n",
" prompt=f\"Extract specific information from the following text: {content}\\n\\nSpecies Name: \",\n",
" max_tokens=100\n",
" # Add additional parameters as needed\n",
" )\n",
" parsed_data = response.choices[0].text.strip()\n",
" return parsed_data\n",
" except Exception as e:\n",
" print(\"An error occurred:\", e)\n",
" return None\n",
"\n",
"\n",
"def analyze_read(image_path, output_path, show_first_output=False):\n",
" try:\n",
" with open(image_path, \"rb\") as f:\n",
" image_stream = f.read()\n",
"\n",
" document_analysis_client = DocumentAnalysisClient(\n",
" endpoint=endpoint, credential=AzureKeyCredential(key)\n",
" )\n",
"\n",
" poller = document_analysis_client.begin_analyze_document(\n",
" \"prebuilt-read\", image_stream)\n",
" result = poller.result()\n",
"\n",
" # Collect words, their polygon data, and confidence\n",
" words = []\n",
" confidence_text = \"\"\n",
" for page in result.pages:\n",
" for word in page.words:\n",
" words.append({\n",
" 'content': word.content,\n",
" 'polygon': word.polygon\n",
" })\n",
" confidence_text += \"'{}' confidence {}\\n\".format(word.content, word.confidence)\n",
"\n",
" document_content = result.content + \"\\n\\nConfidence Metrics:\\n\" + confidence_text\n",
" #parsed_info = parse_document_content(document_content)\n",
"\n",
" original_image = Image.open(image_path)\n",
" annotated_img = draw_boxes(image_path, words)\n",
"\n",
" # Set up PDF\n",
" output_filename = os.path.join(output_path, sanitize_filename(os.path.basename(image_path).replace('.png', '.pdf')))\n",
" c = canvas.Canvas(output_filename, pagesize=letter)\n",
" width, height = letter # usually 612 x 792\n",
"\n",
" # Draw original image\n",
" if original_image.height <= height:\n",
" c.drawImage(image_path, 0, height - original_image.height, width=original_image.width, height=original_image.height, mask='auto')\n",
" y_position = height - original_image.height\n",
" else:\n",
" # Handle large images or add scaling logic here\n",
" pass\n",
"\n",
" # Draw annotated image\n",
" annotated_image_path = '/tmp/annotated_image.png' # Temporary path for the annotated image\n",
" annotated_img.save(annotated_image_path)\n",
" if y_position - annotated_img.height >= 0:\n",
" c.drawImage(annotated_image_path, 0, y_position - annotated_img.height, width=annotated_img.width, height=annotated_img.height, mask='auto')\n",
" y_position -= annotated_img.height\n",
" else:\n",
" c.showPage() # Start a new page if not enough space\n",
" c.drawImage(annotated_image_path, 0, height - annotated_img.height, width=annotated_img.width, height=annotated_img.height, mask='auto')\n",
" y_position = height - annotated_img.height\n",
"\n",
" # Add text\n",
" textobject = c.beginText()\n",
" textobject.setTextOrigin(10, y_position - 15)\n",
" textobject.setFont(\"Times-Roman\", 12)\n",
"\n",
" for line in document_content.split('\\n'):\n",
" if textobject.getY() - 15 < 0: # Check if new page is needed for more text\n",
" c.drawText(textobject)\n",
" c.showPage()\n",
" textobject = c.beginText()\n",
" textobject.setTextOrigin(10, height - 15)\n",
" textobject.setFont(\"Times-Roman\", 12)\n",
" textobject.textLine(line)\n",
" \n",
" c.drawText(textobject)\n",
" c.save()\n",
"\n",
" # Show the first output\n",
" if show_first_output:\n",
" os.system(f\"open {output_filename}\")\n",
"\n",
" except Exception as e:\n",
" print(f\"An error occurred while processing {image_path}: {e}\")\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" input_folder = '/projectnb/sparkgrp/ml-herbarium-grp/fall2023/LLM_Specimens'\n",
" output_folder = '/projectnb/sparkgrp/ml-herbarium-grp/fall2023/AzureVision-results'\n",
" first_output_shown = False\n",
"\n",
" # Create the output folder if it doesn't exist\n",
" if not os.path.exists(output_folder):\n",
" os.makedirs(output_folder)\n",
"\n",
" # Iterate over each image in the input folder\n",
" for image_file in os.listdir(input_folder):\n",
" image_path = os.path.join(input_folder, image_file)\n",
" \n",
" # Check if the file is an image\n",
" if image_path.lower().endswith(('.png', '.jpg', '.jpeg')):\n",
" analyze_read(image_path, output_folder, show_first_output=not first_output_shown)\n",
" first_output_shown = True # Ensure that only the first output is shown\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f27d0103",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
99 changes: 99 additions & 0 deletions POC/ChineseGPT4Vision.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 0625948

Please sign in to comment.