Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change log level on record ingestion failures #458

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions client/src/nv_ingest_client/util/milvus.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,9 +363,9 @@ def _pull_text(element, enable_text: bool, enable_charts: bool, enable_tables: b
pg_num = element["metadata"]["content_metadata"]["page_number"]
doc_type = element["document_type"]
if not verify_emb:
logger.error(f"failed to find embedding for entity: {source_name} page: {pg_num} type: {doc_type}")
logger.info(f"failed to find embedding for entity: {source_name} page: {pg_num} type: {doc_type}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that get's logged to info would (should) have been logged by error. Is this trying to solve for missing error information?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is so that we dont see the logs all the time. Currently in error they come up always. We got requests to make this less noisy.

if not text:
logger.error(f"failed to find text for entity: {source_name} page: {pg_num} type: {doc_type}")
logger.info(f"failed to find text for entity: {source_name} page: {pg_num} type: {doc_type}")
# if we do find text but no embedding remove anyway
text = None
return text
Expand All @@ -387,7 +387,7 @@ def _insert_location_into_content_metadata(element, enable_charts: bool, enable_
source_name = element["metadata"]["source_metadata"]["source_name"]
pg_num = element["metadata"]["content_metadata"]["page_number"]
doc_type = element["document_type"]
logger.error(f"failed to find location for entity: {source_name} page: {pg_num} type: {doc_type}")
logger.info(f"failed to find location for entity: {source_name} page: {pg_num} type: {doc_type}")
location = max_dimensions = None
element["metadata"]["content_metadata"]["location"] = location
element["metadata"]["content_metadata"]["max_dimensions"] = max_dimensions
Expand All @@ -408,6 +408,8 @@ def write_records_minio(
If a sparse model is supplied, it will be used to generate sparse
embeddings to allow for hybrid search. Will filter records based on
type, depending on what types are enabled via the boolean parameters.
If the user sets the log level to info, any time a record fails
ingestion, it will be reported to the user.

Parameters
----------
Expand Down Expand Up @@ -495,7 +497,9 @@ def create_bm25_model(
"""
This function takes the input records and creates a corpus,
factoring in filters (i.e. texts, charts, tables) and fits
a BM25 model with that information.
a BM25 model with that information. If the user sets the log
level to info, any time a record fails ingestion, it will be
reported to the user.

Parameters
----------
Expand Down Expand Up @@ -543,7 +547,9 @@ def stream_insert_milvus(
"""
This function takes the input records and creates a corpus,
factoring in filters (i.e. texts, charts, tables) and fits
a BM25 model with that information.
a BM25 model with that information. If the user sets the log
level to info, any time a record fails ingestion, it will be
reported to the user.

Parameters
----------
Expand Down
Loading