Champion issue: #8887
The dictionary expression
feature has identified a need for collection expressions to pass along user-specified data in order to configure
the behavior of the final collection. Specifically, dictionaries allow users to customize how their keys compare,
using them to define equality between keys, and sorting or hashing (in the case of sorted or hashed collections
respectively). This need applies when creating any sort of dictionary type (like D d = new D(...)
,
D d = D.CreateRange(...)
and even IDictionary<...> d = <synthesized dict>
)
To support this, a new with(...arguments...)
element is proposed as the first element of a collection expression
like so:
Dictionary<string, int> nameToAge = [with(comparer), .. d1, .. d2, .. d3];
- When translating to a
new CollectionType(...)
call, these...arguments...
are used to determine the appropriate constructor and are passed along accordingly. - When translating to a
CollectionFactory.Create
call, these...arguments...
are passed before with theReadOnlySpan<ElementType>
elements argument, all of which are used to determine the appropriateCreate
overload, and are passed along accordingly. - When translating to an interface (like
IDictionary<,>
) only a single argument is allowed. It implements one of the well-known BCL comparer interfaces, and will be used to control the key comparing semantics of the final instance.
This syntax was chosen as it:
- Keeps all information within the
[...]
syntax. Ensuring that the code still clearly indicates a collection being created. - Does not imply calling a
new
constructor (when that isn't how all collections are created). - Does not imply creating/copying the values of the collection multiple times (like a postfix
with { ... }
might. - Is both not subtle, while also not being excessively verbose. For example, using
;
instead of,
to indicate arguments is a very easy piece of syntax to miss.with()
only adds 6 characters, and will easily stand out, especially with syntax coloring of thewith
keyword. - Reads nicely. "This is a collection expression 'with' these arguments, consisting of these elements."
- Solves the need for comparers for both dictionaries and sets.
- Ensures any user need for passing arguments, or any needs we ourselves have beyond comparers in the future are already handled.
- Does not conflict with any existing code (using https://grep.app/ to search).
A minor question exists if the preferred form would be args(...)
or init(...)
instead of with(...)
. But the forms are
otherwise identical.
Open question: Should any support for passing a comparer be provided at all? Yes/No.
Working group recommendation: Yes. Comparers are critical for proper behaving of collections, and examination of many packages indicates usage of them to customize collection behavior.
Open question: If support for comparers is desired, should it be through a feature specific to only comparers? Or should it handle arbitrary arguments?
Working group recommendation: Support arbitrary arguments. This solves both the 'comparer' issue, while nipping all present and future argument concerns in the bud. For example, users who need to customize performance-oriented arguments (like 'capacity') now have a solution beyond waiting for the compiler to support the patterns they are using with the codegen they desire.
The below section covers prior design philosophy discussions. Including why certain forms were rejected.
There are two main directions we can go in to supply this user-defined data. The first is to special case only
values in the comparer space (which we define as types inheriting from the BCL's IComparer<T>
or
IEqualityComparer<T>
types). The second is to provide a generalized mechanism to supply arbitrary arguments
to the final invoked API when creating collection expressions. The primary dictionary expression specification
shows how we could do the former, while this specification seeks to do the latter.
Examinations of the solutions for just passing comparers have revealed weaknesses in their approach if we wanted to expand them to arbitrary arguments. For example:
-
Reusing element syntax, like we do with the form:
[StringComparer.OrdinalIgnoreCase, "mads": 21]
. This works well in a space whereKeyValuePair<,>
and comparers do not inherit from common types. But it breaks down in a world where one might do:HashSet<object> h = [StringComparer.OrdinalIgnoreCase, "v"]
. Is this passing along a comparer? Or attempting to put two object values into the set? -
Separating out arguments versus elements with subtle syntax (like using a semicolon instead of a comma to separate them in
[comparer; v1]
). This risks very confusing situations where a user accidentally writes[1; 2]
(and gets a collection that passes '1' as, say, the 'capacity' argument for aList<>
, and only contains the single value '2'), when they intended[1, 2]
(a collection with two elements).
Because of this, in order to support arbitrary arguments, we believe a more obvious syntax is needed to more clearly demarcate these values. Several other design concerns have also come up with in this space. In no particular order, these are:
-
That the solution not be ambiguous and cause breaks with code that people are likely using with collection expressions today. For example:
List<Widget> c = [new(...), w1, w2, w3];
This is legal today, with the
new(...)
expression being a 'implicit object creation' that creates a new widget. We cannot repurpose this to pass along arguments toList<>
's constructor as it would certainly break existing code. -
That the syntax not extend to outside of the
[...]
construct. For example:HashSet<string> s = [...] with ...;
These syntaxes can be construed to mean that the collection is created first, and then recreated into a differing form, implying multiple transformations of the data, and potentially unwanted higher costs (even if that's not what is emitted).
-
That
new
as a potential keyword to use at all in this space is undesirably confusing. Both because[...]
already indicates that a new object is created, and because translations of the collection expression may go through non-constructor APIs (for example, the Create method pattern). -
That the solution not be excessively verbose. A core value proposition of collection expressions is brevity. So if the form adds a large amount of syntactic scaffolding, it will feel like a step backwards, and will undercut the value proposition of using collection-expressions, versus calling into the existing APIs to make the collection.
Note that a syntax like new([...], ...)
runs afoul of both '2' and '3' above. It makes it appear as if we
are calling into a constructor (when we may not be) and it implies that a created collection expression is
passed to that constructor, which is definitely is not.
Based on all of the above, a small handful of options have come up that are felt to solve the needs of passing arguments, without stepping out of bounds of the goals of collection expressions.
The design of this form would be as follows:
collection_element
: expression_element
| spread_element
| key_value_pair_element
+ | with_element
;
+with_element
+ : 'with' argument_list
+ ;
Examples of how this would look are:
// With an existing type:
// Initialize to twice the capacity since we'll have to add
// more values later.
List<string> names = [with(capacity: values.Count * 2), .. values];
// With the dictionary types.
Dictionary<string, int> nameToAge1 = [with(comparer)];
Dictionary<string, int> nameToAge2 = [with(comparer), kvp1, kvp2, kvp3];
Dictionary<string, int> nameToAge3 = [with(comparer), k1:v1, k2:v2, k3:v4];
Dictionary<string, int> nameToAge4 = [with(comparer), .. d1, .. d2, .. d3];
Dictionary<string, int> nameToAge = [with(comparer), kvp1, k1: v2, .. d1];
These forms seem to "read" reasonable well. In all those cases, the code is "creating a collection expression, 'with' the following arguments to pass along to control the final instance, and then the subsequent elements used to populate it. For example, the first line "creates a list of strings 'with' a capacity of two times the count of the values about to be spread into it"
Importantly, this code has little chance of being overlooked like with forms such as: [arg; element]
, while
also adding minimal verbosity, with a large amount of flexibility to pass any desired arguments along.
This would technically be a breaking change as with(...)
could have been a call to a pre-existing method
called with
. However, unlike new(...)
which is a known and recommended way to create implicitly-typed
values, with(...)
is far less likely as a method name, running afoul of .Net naming for methods. In the
unlikely event that a user did have such a method, they would certainly be able to continue calling into the
existing method by using @with(...)
.
We would translate this with(...)
element like so:
Dictionary<string, int> nameToAge1 = [with(StringComparer.OrdinalIgnoreCase), ...]; // translates to:
// argument_list *becomes* the argument list for the
// constructor call.
__result = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase); // followed by normal initialization
// or:
ImmutableDictionary<string, int> nameToAge2 = [with(StringComparer.OrdinalIgnoreCase), ...]; // translates to:
// argument_list arguments are passed initially to the
// 'create method'.
__result = ImmutableDictionary.CreateRange(StringComparer.OrdinalIgnoreCase, /* key/values to initialize dictionary with */);
// or
IReadOnlyDictionary<string, int> nameToAge2 = [with(StringComparer.OrdinalIgnoreCase), ...]; // translates to:
// create synthesized dictionary with hashing/equality
// behavior determined by StringComparer.OrdinalIgnoreCase.
In other words, the argument_list arguments would be passed to the appropriate constructor if we are calling a constructor, or to the appropriate 'create method' if we are calling such a method. We would also allow a single argument inheriting from the BCL comparer types to be provided when instantiating one of the destination dictionary interface types to control its behavior.
This form is effectively identical to the with(...)
form, just using a slightly different identifier. The
benefit here would primarily be around clearer identification of what is in the (...)
section. They are
clearly 'arguments' as 'args' states.
Of note: 'args' is already a contextual keyword in C#. It was added as part of "top-level statements" to
allow top-level code to refer to the string[]
arguments passed into the program. So this form is effectively
identical to the with(...)
just with a subjective preference on a different keyword.
This form seems to be even less likely to have any breaks versus with(...)
. A method called with(...)
and used in a collection expression is at least conceivable. A method called args(...)
feels like an even
lower realm of chance, making it even more acceptable to take the break.
Examples of this form are:
// With an existing type:
// Initialize to twice the capacity since we'll have to add
// more values later.
List<string> names = [args(capacity: values.Count * 2), .. values];
// With the dictionary types.
Dictionary<string, int> nameToAge1 = [args(comparer)];
Dictionary<string, int> nameToAge2 = [args(comparer), kvp1, kvp2, kvp3];
Dictionary<string, int> nameToAge3 = [args(comparer), k1:v1, k2:v2, k3:v4];
Dictionary<string, int> nameToAge4 = [args(comparer), .. d1, .. d2, .. d3];
Dictionary<string, int> nameToAge = [args(comparer), kvp1, k1: v2, .. d1];
These forms seem to "read" reasonable well. In all those cases, the code is "creating a collection expression, with the following 'args' to pass along to control the final instance, and then the subsequent elements used to populate it. For example, the first line "creates a list of strings with a capacity 'arg' of two times the count of the values about to be spread into it"
Same as option 2, just with 'init' as the keyword chosen:
// With an existing type:
// Initialize to twice the capacity since we'll have to add
// more values later.
List<string> names = [init(capacity: values.Count * 2), .. values];
// With the dictionary types.
Dictionary<string, int> nameToAge1 = [init(comparer)];
Dictionary<string, int> nameToAge2 = [init(comparer), kvp1, kvp2, kvp3];
Dictionary<string, int> nameToAge3 = [init(comparer), k1:v1, k2:v2, k3:v4];
Dictionary<string, int> nameToAge4 = [init(comparer), .. d1, .. d2, .. d3];
Dictionary<string, int> nameToAge = [init(comparer), kvp1, k1: v2, .. d1];
The design here would play off of how new(...) { v1, v2, ... }
can already instantiate a target collection type
and supply initial collection values. The arguments in the new(...)
clause would be passed to the constructor if
creating a new instance, or as the initial arguments if calling a create method. We would allow a single comparer
argument if creating a new IDictionary<,>
or IReadOnlyDictionary<,>
.
There are several downsides to this idea, as enumerated in the initial weaknesses section. First, there is a
general concern around syntax appearing outside of the [...]
section. We want the [...]
to be instantly
recognizable, which is not the case if there is a new(...)
appearing first. Second, seeing the new(...)
strongly triggers the view that this is simply an implicit-object-creation. And, while somewhat true for the
case where a constructor is actually called (like for Dictionary<,>
) it is misleading when calling a create
method, or creating an interface. Finally, there is general apprehension around using new
at all as there
is a feeling of redundancy around both the new
indicating a new instance, and [...]
indicating a new instance.
Collection arguments are not considered when determining collection expression conversions.
Construction is updated as follows.
The elements of a collection expression are evaluated in order, left to right. Within collection arguments, the arguments are evaluated in order, left to right. Each element or argument is evaluated exactly once, and any further references refer to the results of this initial evaluation.
If collection_arguments is included and is not the first element in the collection expression, a compile-time error is reported.
If the target type is a struct or class type that implements System.Collections.IEnumerable
, and the target type does not have a create method, and the target type is not a generic parameter type then:
- Overload resolution is used to determine the best instance constructor from the argument list.
- If the argument list contains any values with dynamic type, the best instance constructor is determined at runtime.
- If a best instance constructor is found, the constructor is invoked with the argument list.
- If the constructor has a
params
parameter, the invocation may be in expanded form.
- If the constructor has a
- Otherwise, a binding error is reported.
// List<T> candidates:
// List<T>()
// List<T>(IEnumerable<T> collection)
// List<T>(int capacity)
List<int> l;
l = [with(capacity: 3), 1, 2]; // new List<int>(capacity: 3)
l = [with([1, 2]), 3]; // new List<int>(IEnumerable<int> collection)
l = [with(default)]; // error: ambiguous constructor
If the target type is a type with a create method, then:
- The argument list is the collection expression containing the elements only (no arguments), followed by the argument list.
- Overload resolution is used to determine the best factory method from the argument list from the create method candidates:
- If the argument list contains any values with dynamic type, the best factory method is determined at runtime.
- If a best factory method is found, the method is invoked with the argument list.
- If the factory method has a
params
parameter, the invocation may be in expanded form.
- If the factory method has a
- Otherwise, a binding error is reported.
MyCollection<string> c = [with(GetComparer()), "1", "2"];
// IEqualityComparer<string> _tmp1 = GetComparer();
// ReadOnlySpan<string> _tmp2 = ["1", "2"];
// c = MyBuilder.Create<string>(_tmp2, _tmp1);
[CollectionBuilder(typeof(MyBuilder), "Create")]
class MyCollection<T> { ... }
class MyBuilder
{
public static MyCollection<T> Create<T>(ReadOnlySpan<T> elements);
public static MyCollection<T> Create<T>(ReadOnlySpan<T> elements, IEqualityComparer<T> comparer);
}
If the target type is an interface type, then:
- Overload resolution is used to determine the best instance constructor from the argument list from the following candidate signatures:
- If the target type is
IEnumerable<E>
,IReadOnlyCollection<E>
, orIReadOnlyList<E>
, the candidates are:new()
- If the target type is
ICollection<E>
, orIList<E>
, the candidates are:new()
new(int capacity)
- If the target type is
IReadOnlyDictionary<K, V>
, the candidates are:new()
new(IEqualityComparer<K> comparer)
- If the target type is
IDictionary<K, V>
, the candidates are:new()
new(int capacity)
new(IEqualityComparer<K> comparer)
new(int capacity, IEqualityComparer<K> comparer)
- If the argument list contains any values with dynamic type, the best instance constructor is determined at runtime.
- If the target type is
- If a best factory method is found, the method is invoked with the argument list.
- Otherwise, a binding error is reported.
IDictionary<string, int> d;
IReadOnlyDictionary<string, int> r;
d = [with(StringComparer.Ordinal)]; // new Dictionary<string, int>(StringComparer.Ordinal)
r = [with(StringComparer.Ordinal)]; // new $PrivateImpl<string, int>(StringComparer.Ordinal)
d = [with(capacity: 2)]; // new Dictionary<string, int>(capacity: 2)
r = [with(capacity: 2)]; // error: 'capacity' parameter not recognized
If the target type is any other type, and the argument list is not empty, a binding error is reported.
Span<int> a = [with(), 1, 2, 3]; // ok
Span<int> b = [with([1, 2]), 3]; // error: arguments not supported
For a collection expression where the target type definition has a [CollectionBuilder]
attribute, the create method candidates for overload resolution are the following, updated from collection expressions: create methods.
A
[CollectionBuilder(...)]
attribute specifies the builder type and method name of a method to be invoked to construct an instance of the collection type.The builder type must be a non-generic
class
orstruct
.First, the set of applicable create methods
CM
is determined. It consists of methods that meet the following requirements:
- The method must have the name specified in the
[CollectionBuilder(...)]
attribute.- The method must be defined on the builder type directly.
- The method must be
static
.- The method must be accessible where the collection expression is used.
- The arity of the method must match the arity of the collection type.
- The method must have a first parameter of type
System.ReadOnlySpan<E>
, passed by value.- There is an identity conversion, implicit reference conversion, or boxing conversion from the method return type to the collection type.
Methods declared on base types or interfaces are ignored and not part of the
CM
set.
For a collection expression with a target type
C<S0, S1, …>
where the type declarationC<T0, T1, …>
has an associated builder methodB.M<U0, U1, …>()
, the generic type arguments from the target type are applied in order — and from outermost containing type to innermost — to the builder method.
The key differences from the earlier algorithm are:
- Candidate methods may have additional parameters following the
ReadOnlySpan<E>
parameter. - Multiple candidate methods are supported.
Should the constructor candidates for ICollection<T>
and IList<T>
be the accessible constructors from List<T>
, or specific signatures independent from List<T>
, say new()
and new(int capacity)
?
Similarly, should the constructor candidates for IDictionary<TKey, TValue>
be the accessible constructors from Dictionary<TKey, TValue>
, or specific signatures, say new()
, new(int capacity)
, new(IEqualityComparer<K> comparer)
, and new(int capacity, IEqualityComparer<K> comparer)
?
What about IReadOnlyDictionary<TKey, TValue>
which may be implemented by a synthesized type?
Should the candidate methods for collection builder types include all overloads on the builder type with the required name, or should the candidates be limited as described above, for instance by requiring the first parameter is ReadOnlySpan<T>
?
Should arguments with dynamic
type be allowed? That might require using the runtime binder for overload resolution, which would make it difficult to limit the set of candidates, for instance for collection builder cases.
For target types such as arrays and span types that do not allow arguments, should an explicit empty argument list, with()
, be allowed?
Is an error reported for with()
when compiling with an earlier language version, or does with
bind to another symbol in scope?