-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17662 SPI: TransformerFactory/DocTransformer #3172
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a draft for discussion of the approach & scope.
Use of javax.inject.Named
is non-obvious but I think good. Maybe should be jakarta
namespace moving forward, actually. We could do name guessing as well... but as you can see here due to terrible/inconsistent class names for TransformerFactory impls, it wouldn't work. Even so; I like the explicitness in the annotation so lets not fight it. It's a one-liner.
@@ -1129,7 +1129,7 @@ protected SolrCore( | |||
initWriters(); | |||
qParserPlugins.init(QParserPlugin.standardPlugins, this); | |||
valueSourceParsers.init(ValueSourceParser.standardValueSourceParsers, this); | |||
transformerFactories.init(TransformerFactory.defaultFactories, this); | |||
transformerFactories.init(TransformerFactory.builtIns(coreContainer.getConfig()), this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name "builtIns" best so I chose it. "implicits" would be my second choice. I like a general name, thus not including "factories" as some places where we would use this pattern are not even "factories". "default" isn't terrible but less good.
@@ -18,6 +18,7 @@ | |||
|
|||
import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME; | |||
|
|||
import jakarta.inject.Named; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To solve the problem of identifying the name for plugins, I found this annotation, which seems quite suitable. I read further about it when looking at HK2 (a dependency of ours) lightweight dependency-injection framework that uses the annotations in that JAR/package. This PR doesn't use HK2 BTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also have added a getName to instances but it's more verbose and forces instantiation of the class. They are instantiated here anyway but nonetheless I like the idea of a future optimization that doesn't instantiate until first use. Could also decide a public static final String field NAME
is the name. Anyway, FWIW I prefer the annotation; feels more modern and is simpler. Wouldn't be okay in SolrJ (e.g. for DocRouter) as we don't want to add dependencies there but that's fine.
import org.apache.solr.common.params.SolrParams; | ||
import org.apache.solr.request.SolrQueryRequest; | ||
|
||
@Named("core") | ||
public class CoreAugmenterFactory extends TransformerFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<RANT>
Why the hell didn't we name TransformerFactory as its rightful name, DocTransformerFactory, and name it's implementations as such as well? I don't even need to know who/why; it's just wrong. We have sinned and need to make amends someday.</RANT>
@@ -105,18 +107,17 @@ default boolean mayModifyValue() { | |||
} | |||
} | |||
|
|||
public static final Map<String, TransformerFactory> defaultFactories = new HashMap<>(9, 1.0f); | |||
// loaded once, based on the node | |||
private static Map<String, TransformerFactory> builtIns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no static initializer anymore; no class loading deadlock risk. Right @uschindler ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some other TransformerFactory impls not listed here; 2 for QueryElevationComponent which weirdly self-register when QEC loads, and the other is LTR. I could expand the scope of this PR but wanted to start simpler to get the conversation going.
defaultFactories.put("geo", new GeoTransformerFactory()); | ||
defaultFactories.put("core", new CoreAugmenterFactory()); | ||
/** (for internal use) */ | ||
public static synchronized Map<String, TransformerFactory> builtIns(NodeConfig nodeConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the lazy loaded list is initialized based on NodeConfig (basically CoreContainer), admittedly there's a test stability risk since NodeConfig ClassLoader classpath of the first test that runs will "win". For the built-in plugins, let's just not do anything that would disturb that list. If we wanted to test that someone could add a plugin in a JAR, that would be a high level integration test using Docker or BATS. Not unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, the use of NodeConfig here thus adds a capability of config-less addition of a plugin (i.e. just add a JAR for a custom DocTransformer; no need to edit solrconfig.xml / configSet).
Adding another check, that the implementations do not implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say I understand the code and like the solution using the service loader, but I haven't used it myself before so I am not aware of any potential security implications.
From my understanding, you are solving the initialization issue, with the only thing being that we now have to provide a parameter to get the collection. I'm not surewhat that means usage-wise since I don’t know the use cases well enough, but since it’s for internal use, I guess it should be fine?
I definitely prefer lazy loading, so +1 for that. 😄
https://issues.apache.org/jira/browse/SOLR-17662
Despite me wanting this for a long time, what motivated me today specifically was looking at @malliaridis trying to solve risks of static class initialization for some of these factories in #2881. The use of SPIs should make that concern go away for some things like static plugin registries.