Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Bulk Insert failed during Milvus restore with GCS native buckets using Milvus Backup tool #39134

Open
1 task done
gifi-siby opened this issue Jan 9, 2025 · 10 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@gifi-siby
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Perform a Milvus restore operation using the Milvus Backup tool to restore data to a Milvus instance associated with a GCS native bucket. Below error is found:
workerpool: execute job no binlog to import, input=[paths:"nativemill/BackupMilvus864/flavorBack1/binlogs/insert_log/453580537997566884/453580537997566885/453580537997766901/" paths:"" ]: invalid parameter

Expected Behavior

Restore operation using the MilvusBack tool should be success without any error.

Steps To Reproduce

1. Create a backup using MilvusBackup tool
2. Restore the data to milvus associated with GCP native bucket

Milvus Log

No response

Anything else?

As part of BulkInsert, ListObjects function of GCPNativeObjectStorage is called to list all the folders and files inside a certain path (inside backupPath/binlogs/insert_log/collectionID/partitionID/segmentID/groupID).
Only end files were returned from ListObjects (folder-like objects are skipped) and thus empty array of string was returned (thus no binlogs were found).
This issue is found as part of the MilvusBackup tool enhancement:
Enhancement task: #444
PR raised: #495

@gifi-siby gifi-siby added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 9, 2025
@yanliang567
Copy link
Contributor

/assign @huanghaoyuanhhy
/unassign

@sre-ci-robot
Copy link
Contributor

@yanliang567: GitHub didn't allow me to assign the following users: huanghaoyuanhhy.

Note that only milvus-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @huanghaoyuanhhy
/unassign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 10, 2025
@yanliang567 yanliang567 added this to the 2.5.3 milestone Jan 10, 2025
@gifi-siby
Copy link
Contributor Author

/assign @gifi-siby

@xiaofan-luan
Copy link
Collaborator

I think for Zilliz cloud, backup tool is actually used for environemnt of both AWS, GCP and Azure.
maybe this is a access issue? @huanghaoyuanhhy can you check the error message and see if it's clear?

@gifi-siby
Copy link
Contributor Author

The error came up from Milvus with GCP native storage (using Google Cloud Storage libraries)

@xiaofan-luan
Copy link
Collaborator

The error came up from Milvus with GCP native storage (using Google Cloud Storage libraries)

do you mean gcp_native_object_storage?
I guess backup tool are still using minio compatbile API. maybe copy this to backup tool as well?

@gifi-siby
Copy link
Contributor Author

I am enhancing MilvusBackup tool to support GCP native as well (related PR: #495 ) and I got this issue during the restore process. The error I got was from BulkInsert operation:
workerpool: execute job no binlog to import

@huanghaoyuanhhy
Copy link
Contributor

/assign

@huanghaoyuanhhy
Copy link
Contributor

huanghaoyuanhhy commented Jan 15, 2025

It looks like both milvus and backup issues are caused by hierarchical namespaces, I am still trying to reproduce.

As I mentioned in zilliztech/milvus-backup#446, most of our tests and users use minio or S3 generic buckets. Hierarchical namespaces may cause inconsistent behavior of listObjects.

ref:
GCP
Azure

@xiaofan-luan
Copy link
Collaborator

It looks like both milvus and backup issues are caused by hierarchical namespaces, I am still trying to reproduce.

As I mentioned in zilliztech/milvus-backup#446, most of our tests and users use minio or S3 generic buckets. Hierarchical namespaces may cause inconsistent behavior of listObjects.

ref: GCP Azure

The GCP native object storage implementation inside milvus seems not really correct.
There is already a fix for that

sre-ci-robot pushed a commit that referenced this issue Jan 18, 2025
…nsert failure on using GCS buckets (#39352)

Related task: [#39134
](#39134)

Previous PR: [#39150 ](#39150)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants