Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17609: Remove HDFS module #2923

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

SOLR-17609: Remove HDFS module #2923

wants to merge 42 commits into from

Conversation

epugh
Copy link
Contributor

@epugh epugh commented Dec 21, 2024

https://issues.apache.org/jira/browse/SOLR-17609

Description

Explore what removing HDFS module from Solr impacts.

Solution

Searching for references for hdfs, hadoop, and other keywords. Running tests.

  • Look at security policy last, once we get good green builds, what can we remove?
  • Look at the Notice.txt, it refers to Apache Hadoop
  • Look at libs.versions.toml

Tests

Only existing tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@epugh epugh marked this pull request as draft December 21, 2024 15:37
@github-actions github-actions bot added the dependencies Dependency upgrades label Dec 21, 2024
@epugh
Copy link
Contributor Author

epugh commented Dec 21, 2024

Question! Should AbstractRecoveryZkTestBase be merged back into RecoveryZkTest since we no longer have the HDFS contrib tests? That goes for AbstractChaosMonkeySafeLeaderTestBase, AbstractMoveReplicaTestBase,AbstractRecoveryZkTestBase, AbstractRestartWhileUpdatingTestBase, AbstractSyncSliceTestBase, AbstractTlogReplayBufferedWhileIndexingTestBase, AbstractUnloadDistributedZkTestBase files as well?

@epugh
Copy link
Contributor Author

epugh commented Dec 21, 2024

Opportunity to remove the isSharedStorage property of DirectoryFactory?

@epugh epugh marked this pull request as ready for review January 18, 2025 13:39
@epugh
Copy link
Contributor Author

epugh commented Jan 18, 2025

I think this is ready for review! I don't plan on merging it till we get some more consensus.

One thing I need eyes on: the security.policy. I was able to confirm my changes to solr-tests.policy by removing properties one by one and running the test suite, but can't do that for security.policy.. I will do some manual testing.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the log or test XML config files that had harmless references to HDFS, it's good to see this change is mostly very contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that the "blockcache" java package, and corresponding tests, be moved into the solr-core module.

The javadocs for the BlockCache package is misleading. It says "An HDFS blockcache implementation" but the BlockCache actually has no outwards dependencies aside from Lucene and basic stuff. A more accurate characterization of the package is: "A generic Directory layer/wrapper that caches data, on or off heap as desired". This is a hidden gem with remarkable intellectual property that our project will hopefully use on top of another Directory. For example if DirectoryFactory uses NIO better, we might very well use this with GCP FileSystemProvider, which just so happens to already be in our dependencies.

@epugh
Copy link
Contributor Author

epugh commented Feb 1, 2025

It's february 1st, so I'm going to look to merge this on Monday barring any other concerns being brought up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants