Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering issues while transferring data into Starrock from a CSV file. The data transfer type is stream_load. #328

Open
pawan-chauhan-9560 opened this issue Jun 20, 2024 · 2 comments

Comments

@pawan-chauhan-9560
Copy link

pawan-chauhan-9560 commented Jun 20, 2024

Issue Description

privateIp :- FE_Url

Description of the Issue:

We created a Parquet/CSV file from SQL Server. While inserting the data into Starrocks, we are encountering an error.

SQL table structure
id | productid | productName|
1 | 42 | Cookie |
2 | 43 | Ice cream, Frozen Desert|

CSV file structure:
1,42,Cookie
2,43,"Ice cream, Frozen Desert"

Sling version 1.2.11

Operating System linux

Replication Configuration:

Command Which i using to insert data into the starrocks
`#!/bin/bash

Define the log file path

LOGFILE="/opt/sling/slingDataTranfserlog_19_06_24.log"

Iterate over each .csv file in the directory

for file in /opt/sling/csvnew/part.01.0460.csv; do
echo "Uploading $file..." >> $LOGFILE 2>&1

Use curl to upload the file and capture the response

response=$(curl --location-trusted -u 'user:password'
-H "Expect: 100-continue"
-H "column_separator: ,"
-H "columns: id,product,productName"
-H "skip_header: 1"
-T "$file"
-X PUT
http://privateIP:8030/api/DatabaseName/TableName/_stream_load 2>&1)

Log the response

echo "$response" >> $LOGFILE 2>&1
done`

streams: Stream

source:  CSV file ( Created from sql server)
target: Starrocks
streams:
  ...
  • Log Output (please run command with -d):
{
    "TxnId": 1386825,
    "Label": "37776833-c498-41e9-aa4e-2c81dec9eb33",
    "Status": "Fail",
    "Message": "too many filtered rows",
    "NumberTotalRows": 100000,
    "NumberLoadedRows": 99754,
    "NumberFilteredRows": 246,
    "NumberUnselectedRows": 0,
    "LoadBytes": 5018672,
    "LoadTimeMs": 243,
    "BeginTxnTimeMs": 1,
    "StreamLoadPlanTimeMs": 2,
    "ReadDataTimeMs": 1,
    "WriteDataTimeMs": 239,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://privateIp:8040/api/_load_error_log?file=error_log_c244f8c539780e6f_8a3e7085f85dc593"
}


Error: Value count does not match column count: expected = 3, actual = 4. Column separator: ',', Row delimiter: '\n'. Row: 2,43,"Ice cream, Frozen Desert"
@flarco
Copy link
Collaborator

flarco commented Jun 20, 2024

Hi, without a file, I cannot test. Can you produce a sample file that is erroring for you, and share it? So I can reproduce the error. You can email it to [email protected] if you prefer.

@pawan-chauhan-9560
Copy link
Author

We have shared a sample dataset with [[email protected]]. Please check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants