0.4 Delta
What's New:
- Vector Hybrid Search which combines vector similarity with keyword matching to enhance search accuracy.
- Added document processing pipeline steps to generate embeddings for text-based files. Bring your embedding (Azure Open AI or open-source embedding model).
- Extended document processing pipeline with richer language detection and translation to avoid common error with OOTB Azure Cognitive Search skillsets
- Switched to direct search index inserts instead of Azure Cognitive Search Indexer and skillsets
- Restructured and added vector columns to Azure Cognitive Search Index (expanded JSON into index fields)
- Update UX to embed users query and execute Vector Hybrid Search with Semantic
- Added pipeline to process images and store enrichments as keywords in Azure Cognitive Search index which would allow user to do text to image search.
- Added iFrame document and image rendering of source material under citation panel of UX.
- Added support for several new file types using Unstructured.io
- Text-based: pdf, docx, html, htm, csv, md, pptx, txt, json, xlsx, xml, eml, msg
- Images: jpg, jpeg, png, gif, bmp, tif, tiff
- Added support for US Government deployments
- Added filtered query support for Azure Cognitive Search index fields
- Enabled upload to a folder and adding tags to uploaded file in UX
- Enabled filtering search by "folder" and/or "tags" fields in Adjust panel in UX
- Added function testing of document pre-processing pipelines and embeddings REST APIs
- Added branding updates that allow a warning banner and UX title updates
- Enhanced infrastructure and application logging
- Detailed chunk-based logging for embeddings and indexing
- New Azure Workbook to help investigate infrastructure level errors (i.e. App Service not starting up correctly)
What's Changed
- Merge pull request #204 from microsoft/vNext-Dev by @dayland in #208
- hot fix for missing subtitle in non-pdf by @georearl in #210
- Forward integrate HF for non-PDF missing subtitle attribute by @dayland in #214
- Update README.md for model version support by @ArpitaisAn0maly in #216
- Geearl/5974 acr bicep by @georearl in #199
- Gov init by @rohrerb in #212
- Model check Fix by @rohrerb in #223
- Bash variable if check repair by @rohrerb in #224
- Sovereign Setup Instructions by @rohrerb in #222
- Geearl/5977 container app service by @georearl in #205
- Temperature setting by @asbanger in #215
- udpated default model to gpt-35-turbo-16k by @dayland in #226
- Geearl/5982 appservice debug 2 by @georearl in #217
- Fix for Deployment Name by @rohrerb in #229
- Parameter for environment so we can share the vNext.yml by @rohrerb in #228
- Gov Automation Fix. by @rohrerb in #232
- Some more az login fixes for automation. by @rohrerb in #233
- Temp Build Fix for Gov by @rohrerb in #234
- Geearl/5972 generate embeddings by @georearl in #230
- 6012 Add utils method for extracting html charset else default to utf8 by @ryonsteele in #239
- brrohrer/6010 enrichment func gov by @rohrerb in #237
- AAD Regression Fix by @rohrerb in #240
- Lwilk/5892 image func by @lmwilki in #227
- Geearl/6053 large files missing chunks by @georearl in #242
- 5993 Change how page numbers are collected and stored for chunks by @ryonsteele in #243
- requeue logic with max retries by @georearl in #241
- Geearl/6055 missing index key values by @georearl in #244
- Fix passing of api variable to function by @dayland in #245
- brrohrer/6050-GovIngestPipeline by @rohrerb in #247
- Move installation into startup entrypoint script to reduce build by @ryonsteele in #246
- Ryonsteele/6063 default responselength change by @ryonsteele in #249
- Add
build-containers
to pipelines by @dayland in #250 - updates to process flow diagram for embeddings by @dayland in #248
- Aparmar/5661 hybrid search by @ArpitaisAn0maly in #255
- 6105/brrohrer stack trace logging by @rohrerb in #253
- Geearl/6070 non pdf by @georearl in #256
- Ryonsteele/int test by @ryonsteele in #254
- Geearl/6112 missing chunk index by @georearl in #257
- Fix embeddings mismatch between pipeline and REST APi by @dayland in #258
- Improvements to functional tests by @dayland in #260
- Branding and UX updates by @dayland in #261
- Merging 0.3-Gamma hotfixes into vNext-Dev by @dayland in #264
- Fix to have non-pdf files honor the DEV_CODE flags by @dayland in #265
- Add image enrichment to index for image function by @ryonsteele in #259
- Add more test case files and types by @ryonsteele in #266
- Extended logging to UX for uploaded files by @dayland in #267
- Enable citations to use new vector index fields by @dayland in #268
- Ryonsteele/6108 cicd functional by @ryonsteele in #272
- Aparmar/6151 enable hybrid semantic search by @ArpitaisAn0maly in #275
- Updates for Sprint 12 arch changes by @dayland in #273
- Ryonsteele/6115 api tests by @ryonsteele in #277
- Update to use model name over deployment name by @dayland in #279
- Fix to put calculated chunk name in index for images by @dayland in #280
- Create seperate pipeline ymls for gov and com by @ryonsteele in #283
- Updating the GPT-35-Turbo-16K capacity to default by @asbanger in #284
- Fix for AOAI BICEP deployment by @dayland in #287
- Fixing the GitHub broken Links by @asbanger in #281
- Ryonsteele/indexing statuslog by @ryonsteele in #289
- Change to create seperate func ASP and resolve enrichment concurrency by @ryonsteele in #293
- Geearl/6184 function timeout by @georearl in #294
- ReadMe update for Image Search for Delta release. by @ArpitaisAn0maly in #296
- Prompt to delete index on deploy if vector size changes by @ryonsteele in #297
- Vector Search Doc Update to ReadMe by @ArpitaisAn0maly in #300
- Update preprocessing feature document for unstructured and image by @ryonsteele in #298
- Remove stack from upload statuslog by @ryonsteele in #301
- Geearl/6130 azure caps by @georearl in #303
- Fix env variable mispelling by @ryonsteele in #304
- Dabiscup/applicationtitle2 by @danimal521 in #302
- Resolve startup failure with aoai model usage by @ryonsteele in #307
- brrohrer/6199 - Unstructured JSON Support by @rohrerb in #306
- fixes for email processing and minor document updates by @georearl in #309
- Ryonsteele/6200 fileupload bugfix by @ryonsteele in #312
- Ryonsteele/6198 workbook infra by @ryonsteele in #314
- Updaing local.env.example for clarity on embedding models by @ArpitaisAn0maly in #315
- Add new util method for getting chunk path by @ryonsteele in #313
- Logging updates by @ryonsteele in #317
- Remove doc and ppt from the list of doctypes in diagram by @ryonsteele in #319
- Add concurrency for backend api and embed api error handling by @ryonsteele in #320
- Added UX for folders and tags by @dayland in #288
- Ltierney/docupdates by @lon-tierney in #321
- Update
local.env.example
to match deployment guide for required fields by @dayland in #324 - Vector query fix applied for non semantic hybrid by @ArpitaisAn0maly in #325
- Ryonsteele/6218 fix folder tabbing issue by @ryonsteele in #326
- Ryonsteele/6220 lock enrichment versions by @ryonsteele in #327
- Add known issue for jq params issue to check config and rerun by @ryonsteele in #329
- Update to embedding Model parameter Name to reflect Deployment Name by @ArpitaisAn0maly in #330
- Aparmar/delta bm25similarity by @ArpitaisAn0maly in #333
- Adding new enrichments to keyword search by @dayland in #334
- 0.4-Delta Release by @dayland in #336
New Contributors
- @rohrerb made their first contribution in #212
- @ryonsteele made their first contribution in #239
- @danimal521 made their first contribution in #302
Full Changelog: v0.3-Gamma...v0.4-Delta