Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When using text_match as a field name, the text match expression will throw an error: mismatched input 'text_match' expecting Identifier: invalid parameter. #40290

Open
1 task done
zhuwenxing opened this issue Mar 3, 2025 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master/2.5
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

=== start connecting to Milvus     ===

Does collection hello_match_text exist in Milvus: False

=== Create collection `hello_match_text` ===


=== Start inserting entities       ===

Number of insert entities in Milvus: 6

=== Start querying with `text_match(text_match, 'milvus')` ===

2025-03-03 11:54:58,769 [ERROR][handler]: RPC error: [query], <MilvusException: (code=1100, message=failed to create query plan: cannot parse expression: text_match(text_match, 'milvus'), error: line 1:11 mismatched input 'text_match' expecting Identifier: invalid parameter)>, <Time:{'RPC start': '2025-03-03 11:54:58.753119', 'RPC error': '2025-03-03 11:54:58.768924'}> (decorators.py:140)
2025-03-03 11:54:58,769 [ERROR][query]: Failed to query collection: match_text (milvus_client.py:486)

Expected Behavior

If text_match cannot be used as a field name, an error should be thrown when creating the collection, rather than waiting until the text match operation is executed to report the error.

Steps To Reproduce

Milvus Log

reporduce code

import numpy as np

from pymilvus import (
    MilvusClient,
    Function,
    FunctionType,
    DataType,
)

fmt = "\n=== {:30} ===\n"
collection_name = "match_text"
dim = 8

#################################################################################
# 1. connect to Milvus
# Add a new connection alias `default` for Milvus server in `localhost:19530`
print(fmt.format("start connecting to Milvus"))
milvus_client = MilvusClient("http://10.104.32.135:19530")

has_collection = milvus_client.has_collection(collection_name, timeout=5)
print(f"Does collection hello_match_text exist in Milvus: {has_collection}")
if has_collection:
    milvus_client.drop_collection(collection_name)


schema = milvus_client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False)
# set analyzer params in match_text field for more situations
# default as analyzer_params = {"type": "standard"}
schema.add_field("text_match", DataType.VARCHAR, max_length=1000, enable_analyzer=True, enable_match=True),
schema.add_field("embeddings", DataType.FLOAT_VECTOR, dim=dim)

print(fmt.format("Create collection `hello_match_text`"))

index_params = milvus_client.prepare_index_params()
index_params.add_index(
    "embeddings",
    index_type= "AUTOINDEX",
    metric_type= "IP"
)

milvus_client.create_collection(collection_name, schema=schema, index_params=index_params, consistency_level="Strong")

################################################################################
# 3. insert data
# We are going to insert 6 rows of data into `hello_match_text`
# Data to be inserted must be organized in fields.
#
# The insert() method returns:
# - either automatically generated primary keys by Milvus if auto_id=True in the schema;
# - or the existing primary key field from the entities if auto_id=False in the schema.

print(fmt.format("Start inserting entities"))

rng = np.random.default_rng(seed=19530)
num_entities = 6
keywords = ["milvus", "match", "search", "query", "analyzer", "tokenizer"]
embeddings = rng.random((num_entities, dim), np.float32)

entities = [{
        "id": i,
        "text_match":f"This is a test match_text {i} with keywords: {keywords[i]}",
        "embeddings": embeddings[i]
    } for i in range(num_entities)
]

insert_result = milvus_client.insert(collection_name, entities)
print(f"Number of insert entities in Milvus: {insert_result['insert_count']}")  # check the num_entities
milvus_client.flush(collection_name)

# ###############################################################################
# 4. query and scalar filtering search with text match
# After data were inserted into Milvus and indexed, you can perform:
# - query with text match expression
# - search data with text match filter

# -----------------------------------------------------------------------------
# query based text match with single keyword filter
filter = f"text_match(text_match, '{keywords[0]}')"
print(fmt.format(f"Start querying with `{filter}`"))

result = milvus_client.query(collection_name, filter, output_fields=["text_match"])
print(f"query result:\n-{result}")

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 3, 2025
Copy link

dosubot bot commented Mar 3, 2025

The error you're encountering is due to the use of text_match as a field name, which conflicts with a reserved keyword in Milvus. Milvus has a mechanism to validate field names during collection creation to ensure they do not use reserved keywords. However, it seems that text_match is not being flagged during this validation process, leading to the error only appearing during query execution.

To resolve this issue, you should avoid using text_match as a field name. Instead, choose a different name that does not conflict with reserved keywords. This will prevent the error from occurring during query execution.

If you believe that text_match should be flagged as a reserved keyword during collection creation, it might be worth checking if the keyword list in the Milvus codebase needs updating or if there's a bug in the validation logic. The validateFieldName function is responsible for checking if a field name is a reserved keyword [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@zhuwenxing
Copy link
Contributor Author

related to: #35873

#35873 (comment) should be updated

@yanliang567
Copy link
Contributor

/assign @xiaocai2333
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 3, 2025
@yanliang567 yanliang567 added this to the 2.5.6 milestone Mar 3, 2025
@xiaocai2333
Copy link
Contributor

yeah, we need to update the keywords.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants