-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/data importer #9408
Feature/data importer #9408
Conversation
Signed-off-by: Ruchi Sharma <[email protected]>
Signed-off-by: Ruchi Sharma <[email protected]>
ℹ️ Manual Changeset Creation ReminderPlease ensure manual commit for changeset file 9408.yml under folder changelogs/fragments to complete this PR. If you want to use the available OpenSearch Changeset Bot App to avoid manual creation of changeset file you can install it in your forked repository following this link. For more information about formatting of changeset files, please visit OpenSearch Auto Changeset and Release Notes Tool. |
config/opensearch_dashboards.yml
Outdated
@@ -371,7 +371,7 @@ | |||
# dynamic_config_service.enabled: false | |||
|
|||
# Set the value to true to enable direct data import from a file | |||
# data_importer.enabled: false | |||
data_importer.enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd recommend not source controlling this config file. historically, people have created PRs with passwords or IPs that they had to clean up later.
You can add a new package.json script or you can start OSD like
yarn start --data_importer.enabled=true --opensearch_security.enabled=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I usually use git skip to locally ignore config file changes. But TIL you can add configs on the fly
className="customSearchBar" | ||
/> | ||
</div> | ||
<div style={{ height: 'calc(100% - 40px)', overflowY: 'auto' }}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid inline styling
@@ -1527,6 +1527,29 @@ | |||
resolved "https://registry.yarnpkg.com/@faker-js/faker/-/faker-7.6.0.tgz#9ea331766084288634a9247fcd8b84f16ff4ba07" | |||
integrity sha512-XK6BTq1NDMo9Xqw/YkYyGjSsg44fbNwYRx7QK2CuoQgyy+f1rrTDHoExVM5PsyXCtfl2vs2vVJ0MN0yN6LppRw== | |||
|
|||
"@fast-csv/[email protected]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have dependency changes that require a lockfile change?
if (file) { | ||
setIsLoadingPreview(true); | ||
const reader = new FileReader(); | ||
reader.onload = (event) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid reading the entire file into memory? This can be problematic for larger files. I added in the /_preview
route a way to parse only the first 10 entries of the document so the file contents (and index mappings) can be found by calling /_preview
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /_preview
can handle file parsing since not all file formats are the same (e.g. entries may not be newline delimited)
onSelectedDataSources: onDataSourceSelect, | ||
onManageDataSource: () => {}, // Add a proper handler if needed | ||
}} | ||
onManageDataSource={function (): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? The DataSourceMenuComponent
is needed to only read/grab dataSource
information
@@ -132,7 +162,7 @@ export const DataImporterPluginApp = ({ | |||
http, | |||
inputFile, | |||
// TODO This should be determined from the index name textbox/selectable | |||
false, | |||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this is set to true
? This boolean controls whether the index name
is a new or pre-existing OpenSearch index.
</div> | ||
{loadedRows < totalRows && ( | ||
<EuiButton onClick={loadMoreRows} style={{ marginTop: '20px' }}> | ||
Click to See More |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have some sort of lazy loading, then we probably need changes server side to return more docs. Not a blocking comment.
const [showDelimiterChoice, setShowDelimiterChoice] = useState<boolean>(shouldShowDelimiter()); | ||
const [delimiter, setDelimiter] = useState<string | undefined>( | ||
dataType === CSV_FILE_TYPE ? CSV_SUPPORTED_DELIMITERS[0] : undefined | ||
); | ||
const [visibleRows, setVisibleRows] = useState<number>(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using 10
a lot as the increment, can we export these instances as a constant in constant.ts
?
DCO check failed. Can you edit your commits to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruchidh Could you add more details of this PR and linked the issue to the description?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/data-importer #9408 +/- ##
========================================================
Coverage ? 61.72%
========================================================
Files ? 3833
Lines ? 92185
Branches ? 14603
========================================================
Hits ? 56897
Misses ? 31618
Partials ? 3670
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Ruchi Sharma <[email protected]>
Signed-off-by: Ruchi Sharma <[email protected]>
@@ -87,10 +87,11 @@ export function importFileRoute( | |||
} | |||
} else { | |||
try { | |||
const mapping = request.body.mapping ? JSON.parse(request.body.mapping) : {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching that
dataSource: schema.maybe(schema.string()), | ||
}), | ||
}, | ||
validate: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't we validating here? shouldn't we enforce dataSourceId
if it's provided as a query param?
Signed-off-by: Ruchi Sharma <[email protected]>
Signed-off-by: Ruchi Sharma <[email protected]>
❌ Invalid Prefix For Manual Changeset CreationInvalid description prefix. Found "feat". Only "skip" entry option is permitted for manual commit of changeset files. If you were trying to skip the changelog entry, please use the "skip" entry option in the ##Changelog section of your PR description. |
Signed-off-by: Ruchi Sharma <[email protected]>
1a51a92
into
opensearch-project:feature/data-importer
Description
Description:
This PR introduces the P0 features for the Data Importer, enabling users to import structured data into OpenSearch in a lightweight manner. The initial release is functional for testing and includes core import functionalities.
Key Features Implemented:
✅ Cluster/Domain/Collection Selection & Permission Verification - Users can select a target OpenSearch cluster and verify if they have the necessary permissions to add data or create an index.
✅ Single File Import - Supports importing a single file up to the specified size limit.
✅ Partial Success Handling & Error Notifications - If some data fails to import, users receive notifications with details about the failed records.
✅ Common Data Format Support - Allows importing structured data in widely used formats such as CSV, JSON, and NDJSON.
✅ Preview Top Rows of Structured Data - Before import, users can preview the top X rows of the dataset.
Not in Scope for This Release:
🚫 Reviewing the entire file before import.
🚫 Importing multiple files simultaneously.
🚫 Handling very large files beyond the defined size limit.
Next Steps:
If adoption meets the defined threshold, we will proceed with P1 features, including an editable table for resolving data inconsistencies and post-import verification enhancements.
Testing & Validation:
Verified import functionality for supported file types.
Tested permission validation for cluster selection.
Checked error handling for partial failures.
Confirmed correct previewing of structured data.
Screenshot
Changelog
Check List
yarn test:jest
yarn test:jest_integration