Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/data importer #9408

Merged

Conversation

ruchidh
Copy link

@ruchidh ruchidh commented Feb 18, 2025

Description

Description:
This PR introduces the P0 features for the Data Importer, enabling users to import structured data into OpenSearch in a lightweight manner. The initial release is functional for testing and includes core import functionalities.

Key Features Implemented:
✅ Cluster/Domain/Collection Selection & Permission Verification - Users can select a target OpenSearch cluster and verify if they have the necessary permissions to add data or create an index.
✅ Single File Import - Supports importing a single file up to the specified size limit.
✅ Partial Success Handling & Error Notifications - If some data fails to import, users receive notifications with details about the failed records.
✅ Common Data Format Support - Allows importing structured data in widely used formats such as CSV, JSON, and NDJSON.
✅ Preview Top Rows of Structured Data - Before import, users can preview the top X rows of the dataset.

Not in Scope for This Release:
🚫 Reviewing the entire file before import.
🚫 Importing multiple files simultaneously.
🚫 Handling very large files beyond the defined size limit.

Next Steps:
If adoption meets the defined threshold, we will proceed with P1 features, including an editable table for resolving data inconsistencies and post-import verification enhancements.

Testing & Validation:

Verified import functionality for supported file types.
Tested permission validation for cluster selection.
Checked error handling for partial failures.
Confirmed correct previewing of structured data.

Screenshot

Changelog

  • feat: Data importer P0 release

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

Copy link
Contributor

ℹ️ Manual Changeset Creation Reminder

Please ensure manual commit for changeset file 9408.yml under folder changelogs/fragments to complete this PR.

If you want to use the available OpenSearch Changeset Bot App to avoid manual creation of changeset file you can install it in your forked repository following this link.

For more information about formatting of changeset files, please visit OpenSearch Auto Changeset and Release Notes Tool.

@@ -371,7 +371,7 @@
# dynamic_config_service.enabled: false

# Set the value to true to enable direct data import from a file
# data_importer.enabled: false
data_importer.enabled: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd recommend not source controlling this config file. historically, people have created PRs with passwords or IPs that they had to clean up later.

You can add a new package.json script or you can start OSD like

yarn start --data_importer.enabled=true --opensearch_security.enabled=false

Copy link
Member

@huyaboo huyaboo Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I usually use git skip to locally ignore config file changes. But TIL you can add configs on the fly

className="customSearchBar"
/>
</div>
<div style={{ height: 'calc(100% - 40px)', overflowY: 'auto' }}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid inline styling

@@ -1527,6 +1527,29 @@
resolved "https://registry.yarnpkg.com/@faker-js/faker/-/faker-7.6.0.tgz#9ea331766084288634a9247fcd8b84f16ff4ba07"
integrity sha512-XK6BTq1NDMo9Xqw/YkYyGjSsg44fbNwYRx7QK2CuoQgyy+f1rrTDHoExVM5PsyXCtfl2vs2vVJ0MN0yN6LppRw==

"@fast-csv/[email protected]":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have dependency changes that require a lockfile change?

if (file) {
setIsLoadingPreview(true);
const reader = new FileReader();
reader.onload = (event) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid reading the entire file into memory? This can be problematic for larger files. I added in the /_preview route a way to parse only the first 10 entries of the document so the file contents (and index mappings) can be found by calling /_preview.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /_preview can handle file parsing since not all file formats are the same (e.g. entries may not be newline delimited)

onSelectedDataSources: onDataSourceSelect,
onManageDataSource: () => {}, // Add a proper handler if needed
}}
onManageDataSource={function (): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? The DataSourceMenuComponent is needed to only read/grab dataSource information

@@ -132,7 +162,7 @@ export const DataImporterPluginApp = ({
http,
inputFile,
// TODO This should be determined from the index name textbox/selectable
false,
true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is set to true? This boolean controls whether the index name is a new or pre-existing OpenSearch index.

</div>
{loadedRows < totalRows && (
<EuiButton onClick={loadMoreRows} style={{ marginTop: '20px' }}>
Click to See More
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have some sort of lazy loading, then we probably need changes server side to return more docs. Not a blocking comment.

const [showDelimiterChoice, setShowDelimiterChoice] = useState<boolean>(shouldShowDelimiter());
const [delimiter, setDelimiter] = useState<string | undefined>(
dataType === CSV_FILE_TYPE ? CSV_SUPPORTED_DELIMITERS[0] : undefined
);
const [visibleRows, setVisibleRows] = useState<number>(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using 10 a lot as the increment, can we export these instances as a constant in constant.ts?

@huyaboo
Copy link
Member

huyaboo commented Feb 18, 2025

DCO check failed. Can you edit your commits to --signoff

Copy link
Member

@zhongnansu zhongnansu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruchidh Could you add more details of this PR and linked the issue to the description?

Copy link

codecov bot commented Feb 18, 2025

Codecov Report

Attention: Patch coverage is 22.00000% with 39 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/data-importer@7d2c8a7). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...a_importer/public/components/data_importer_app.tsx 31.25% 19 Missing and 3 partials ⚠️
.../data_importer/public/components/preview_table.tsx 6.25% 15 Missing ⚠️
...mporter/public/components/import_type_selector.tsx 0.00% 2 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##             feature/data-importer    #9408   +/-   ##
========================================================
  Coverage                         ?   61.72%           
========================================================
  Files                            ?     3833           
  Lines                            ?    92185           
  Branches                         ?    14603           
========================================================
  Hits                             ?    56897           
  Misses                           ?    31618           
  Partials                         ?     3670           
Flag Coverage Δ
Linux_1 28.99% <ø> (?)
Linux_2 56.46% <ø> (?)
Linux_4 28.91% <ø> (?)
Windows_1 29.00% <ø> (?)
Windows_2 56.41% <ø> (?)
Windows_3 39.34% <22.00%> (?)
Windows_4 28.91% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -87,10 +87,11 @@ export function importFileRoute(
}
} else {
try {
const mapping = request.body.mapping ? JSON.parse(request.body.mapping) : {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching that

dataSource: schema.maybe(schema.string()),
}),
},
validate: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we validating here? shouldn't we enforce dataSourceId if it's provided as a query param?

Signed-off-by: Ruchi Sharma <[email protected]>
Signed-off-by: Ruchi Sharma <[email protected]>
Copy link
Contributor

❌ Invalid Prefix For Manual Changeset Creation

Invalid description prefix. Found "feat". Only "skip" entry option is permitted for manual commit of changeset files.

If you were trying to skip the changelog entry, please use the "skip" entry option in the ##Changelog section of your PR description.

@huyaboo huyaboo merged commit 1a51a92 into opensearch-project:feature/data-importer Feb 25, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants