Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17662 SPI: TransformerFactory/DocTransformer #3172

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dsmiley
Copy link
Contributor

@dsmiley dsmiley commented Feb 10, 2025

https://issues.apache.org/jira/browse/SOLR-17662

Despite me wanting this for a long time, what motivated me today specifically was looking at @malliaridis trying to solve risks of static class initialization for some of these factories in #2881. The use of SPIs should make that concern go away for some things like static plugin registries.

Copy link
Contributor Author

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a draft for discussion of the approach & scope.
Use of javax.inject.Named is non-obvious but I think good. Maybe should be jakarta namespace moving forward, actually. We could do name guessing as well... but as you can see here due to terrible/inconsistent class names for TransformerFactory impls, it wouldn't work. Even so; I like the explicitness in the annotation so lets not fight it. It's a one-liner.

@@ -1129,7 +1129,7 @@ protected SolrCore(
initWriters();
qParserPlugins.init(QParserPlugin.standardPlugins, this);
valueSourceParsers.init(ValueSourceParser.standardValueSourceParsers, this);
transformerFactories.init(TransformerFactory.defaultFactories, this);
transformerFactories.init(TransformerFactory.builtIns(coreContainer.getConfig()), this);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name "builtIns" best so I chose it. "implicits" would be my second choice. I like a general name, thus not including "factories" as some places where we would use this pattern are not even "factories". "default" isn't terrible but less good.

@@ -18,6 +18,7 @@

import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;

import jakarta.inject.Named;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve the problem of identifying the name for plugins, I found this annotation, which seems quite suitable. I read further about it when looking at HK2 (a dependency of ours) lightweight dependency-injection framework that uses the annotations in that JAR/package. This PR doesn't use HK2 BTW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also have added a getName to instances but it's more verbose and forces instantiation of the class. They are instantiated here anyway but nonetheless I like the idea of a future optimization that doesn't instantiate until first use. Could also decide a public static final String field NAME is the name. Anyway, FWIW I prefer the annotation; feels more modern and is simpler. Wouldn't be okay in SolrJ (e.g. for DocRouter) as we don't want to add dependencies there but that's fine.

import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;

@Named("core")
public class CoreAugmenterFactory extends TransformerFactory {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<RANT> Why the hell didn't we name TransformerFactory as its rightful name, DocTransformerFactory, and name it's implementations as such as well? I don't even need to know who/why; it's just wrong. We have sinned and need to make amends someday.</RANT>

@@ -105,18 +107,17 @@ default boolean mayModifyValue() {
}
}

public static final Map<String, TransformerFactory> defaultFactories = new HashMap<>(9, 1.0f);
// loaded once, based on the node
private static Map<String, TransformerFactory> builtIns;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no static initializer anymore; no class loading deadlock risk. Right @uschindler ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some other TransformerFactory impls not listed here; 2 for QueryElevationComponent which weirdly self-register when QEC loads, and the other is LTR. I could expand the scope of this PR but wanted to start simpler to get the conversation going.

defaultFactories.put("geo", new GeoTransformerFactory());
defaultFactories.put("core", new CoreAugmenterFactory());
/** (for internal use) */
public static synchronized Map<String, TransformerFactory> builtIns(NodeConfig nodeConfig) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the lazy loaded list is initialized based on NodeConfig (basically CoreContainer), admittedly there's a test stability risk since NodeConfig ClassLoader classpath of the first test that runs will "win". For the built-in plugins, let's just not do anything that would disturb that list. If we wanted to test that someone could add a plugin in a JAR, that would be a high level integration test using Docker or BATS. Not unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the use of NodeConfig here thus adds a capability of config-less addition of a plugin (i.e. just add a JAR for a custom DocTransformer; no need to edit solrconfig.xml / configSet).

@dsmiley
Copy link
Contributor Author

dsmiley commented Feb 10, 2025

Adding another check, that the implementations do not implement SolrCoreAware, would also be good, as they are the builtIn ones instantiated once. I imagine a non-issue for existing ones but I could imagine a user trying this feature out and it not working for that reason. Of course eventually we want that to work but it's out of scope of the built-in registry.

Copy link
Contributor

@malliaridis malliaridis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say I understand the code and like the solution using the service loader, but I haven't used it myself before so I am not aware of any potential security implications.

From my understanding, you are solving the initialization issue, with the only thing being that we now have to provide a parameter to get the collection. I'm not surewhat that means usage-wise since I don’t know the use cases well enough, but since it’s for internal use, I guess it should be fine?

I definitely prefer lazy loading, so +1 for that. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants