Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reusing empty NamedList rather than recreating a new empty NamedList … #2932

Merged
merged 7 commits into from
Jan 11, 2025

Conversation

renatoh
Copy link
Contributor

@renatoh renatoh commented Jan 3, 2025

For every call with face=true which is not providing any facet.fields, we do create an empty NamedList.
Having a single instance of an empty NamedList feels like the more efficient approach to me.

…every time the facet.field parameter is empty
@renatoh
Copy link
Contributor Author

renatoh commented Jan 3, 2025

I've investigated why the tests are now failing.
The issue is that the method org.apache.solr.request.SimpleFacets#getFacetFieldCounts is now returning an empty NamedList, and not its subtype SimpleOrderedMap anymore. This seems fine since the return type of that method is NamedList.
The problem is that returning NamedList causes that in the response-json, the facet_counts/facet_fields will become a list:
Screenshot 2025-01-03 at 18 31 15
The method org.apache.solr.common.util.JsonTextWriter#writeNamedList distinguish between these types and writes a different data structure.

I think the root cause is the method org.apache.solr.request.SimpleFacets#getFacetFieldCounts allowing to return the super-type NamedList.
Shall I change the return type of that method to SimpleOrderedMap?

@madrob
Copy link
Contributor

madrob commented Jan 3, 2025

@dsmiley this looks related to your named list vs simple map thread on dev list

@renatoh
Copy link
Contributor Author

renatoh commented Jan 4, 2025

@madrob do you have a link to the mention thread?

@madrob
Copy link
Contributor

madrob commented Jan 4, 2025

@dsmiley
Copy link
Contributor

dsmiley commented Jan 5, 2025

My general opinion here is wondering it it worth bothering with the micro optimization of using a shared instance. And more importantly, there is no immutable NamedList so I'm uncomfortable with introducing a shared mutable instance.

@renatoh
Copy link
Contributor Author

renatoh commented Jan 5, 2025

@dsmiley I do get your point that a shared instance should be immutable. I could instantiate the shared empty instances with an immutable list, similar to what we do here:
org.apache.solr.common.util.NamedList#getImmutableCopy.

For my understanding: What is the purpose of SimpleOrderedMap, it is just to control the output format of the response writers?

@dsmiley
Copy link
Contributor

dsmiley commented Jan 5, 2025

Yes, it's about the response format but moreover it communicates that keys do not (well technically "should" not) repeat. It's tempting to actually implement Map on this thing.

@renatoh
Copy link
Contributor Author

renatoh commented Jan 5, 2025

Not being very familiar with the Solr source code, this construct with NamedList and SimpleOrderedMap is pretty confusing to me.
From what I understand NamedList is basically a List of Pair<Key,Value> (Ordered and keys can repeat) and SimpleOrderedMap is an HashMap or LinkedHashMap (not necessarily ordered but with unique) . Is that understanding correct?

Back in 2006 NamedList had the following todo on it:

  • :TODO: In the future, it would be nice if this extended Map or Collection,
  • had iterators, used java5 generics, had a faster lookup for
  • large lists, etc...
  • It could also have an interface, and multiple implementations.
  • One might have indexed lookup, one might not.

But this one got removed at one point

@dsmiley
Copy link
Contributor

dsmiley commented Jan 6, 2025

NamedList and thus SimpleOrderedMap is ordered (insertion order). HashMap is called HashMap because it explains its implementation via use of the hashCode() and thus get(key) is O(1). NamedList and thus SimpleOrderedMap doesn't use a hashCode so I would say it's not specifically like a HashMap but I would indeed say that SimpleOrderedMap is a Map even though it doesn't implement the interface. It has Map in its name, after all. I recall there was concern that implementing Map would lead to people passing this off to code that makes assumptions on Map having better than O(N) lookup by key (even TreeMap is O(Log N)) but NamedList is O(N) leading to a performance problem for non-trivial sized NamedLists.

@renatoh
Copy link
Contributor Author

renatoh commented Jan 6, 2025

Thanks for the explanation.
Would another option be to use a Map internally for SimpleOrderedMap without changing the interface, that lead to better lookup performance without the need of touching a lot of code?

@dsmiley
Copy link
Contributor

dsmiley commented Jan 6, 2025

Adding an additional HashMap inside SimpleOrderedMap misses the point of why NamedList exists. It's for something super lean/fast that's only for transferring stuff (i.e. to be serialized), navigated by iteration once. For that use-case, a hashCode system is a waste. But Solr uses it for configuration as well, a questionable choice that's not in line with NamedList's advantages. Perhaps a radical point of view is that NamedList shouldn't even have "get" methods, leaving only iteration, thus emphasizing its purpose. For sure, Solr over-uses NamedList. And it's hard to stop using it because of SolrJ "javabin" compatibility.

@@ -34,6 +35,8 @@
* serialized. It aims to minimize overhead and to be efficient at adding new elements.
*/
public class SimpleOrderedMap<T> extends NamedList<T> {

private static final SimpleOrderedMap<Object> emptySimpleOrderedMap = new SimpleOrderedMap<>(Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it EMPTY ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nowadays we just do List.of() which is equivalent, shorter, and doesn't import another class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" and doesn't import another class." Never thought of that advantages of List.of over Collections.empty(). Good point, will adjust it!

@@ -67,4 +70,8 @@ public SimpleOrderedMap<T> clone() {
newList.addAll(nvPairs);
return new SimpleOrderedMap<>(newList);
}

public static SimpleOrderedMap<Object> emptySimpleOrderedMap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a javadoc. And that name is quite redundant given the type where we're declaring this.
WDYT of naming this simply of thus mimicking the end effect of List.of() or Set.of() or Map.of()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name of() is implying, at least to me, that something can be passed in and and creates something else out of it. Like when you do List.of("String"). Maybe empty() would be the better option. What do you thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using of opens the door for some more static methods for more, which you could add now if it makes of more attractive to you. Just a single pair would be fine and perhaps useful. I look to the JDK for inspiration on data structure conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see you point regarding of() vs empty(). I looked at how it is done on the List and have implemented some more of()-method for 1,2 and n-elements. what do you think?

Comment on lines 81 to 87
/**
* Returns an immutable instance of SimpleOrderedMap with a single element.
* @return List containing the elements
*/
public static SimpleOrderedMap<Object> of(Object o1) {
return new SimpleOrderedMap<>(List.of(o1));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is illogical; the list should be key-value pairs. SimpleOrderedMap is fundamentally a Map. Look at Map's static "of" methods for inspiration.

* Returns an immutable instance of SimpleOrderedMap with two elements.
* @return List containing the elements
*/
public static SimpleOrderedMap<Object> of(Object o1, Object o2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

named key and value, thus logically one pair

Copy link
Contributor Author

@renatoh renatoh Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, was confused by the fact it internally holds a List t store the values.
Do you think it still makes sense to have and of for two key-value pairs e.g. like this:
of(String name1, Object val1, String name2, Object val2)

Looking at Map.of, I think it would also make to use type inference rather than , like this:
public static SimpleOrderedMap of(String name, T val)
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, if Map has it, we can to.
Any way, I also think we shouldn't over-invest in SimpleOrderedMap without callers of these methods.

* Returns an immutable instance of SimpleOrderedMap with an arbitrary number of elements.
* @return List containing the elements
*/
public static SimpleOrderedMap<Object> of(Object... elements) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.of has no similar array based argument so I suggest not doing the same here. Presumably the key and value distinction then becomes less obvious; easier to misuse.

@dsmiley dsmiley merged commit 6c02da2 into apache:main Jan 11, 2025
1 check passed
dsmiley pushed a commit that referenced this pull request Jan 11, 2025
)

SimpleFacets:  use this, avoiding needless new NamedList creation
tidy
(cherry picked from commit 6c02da2)
@dsmiley
Copy link
Contributor

dsmiley commented Jan 11, 2025

Also merged to branch_9x thus will appear in Solr 9.9.
Thanks for contributing!

@dsmiley
Copy link
Contributor

dsmiley commented Jan 16, 2025

If you're interested: https://issues.apache.org/jira/browse/SOLR-17623

@renatoh
Copy link
Contributor Author

renatoh commented Jan 16, 2025

sure, i will give it a go.
so the idea is to keep the ArrayList from NamedList as internal data structure but do i implement java.util.Map, correct?
How do I get access to Jira?

@dsmiley
Copy link
Contributor

dsmiley commented Jan 16, 2025

Yes; simply add methods to SimpleOrderedMap.

JIRA: Should be in the header of the page: https://selfserve.apache.org/jira-account.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants