The most important decisions for this proposal are around how to handle subclassing.
The committee has largely come to regret Symbol.species
. While it may or may not prove web-compatible to remove support for it in existing built-ins, the methods added by this proposal will not use it, instead always creating an instance of the base Set
type. Subclasses wishing to override this behavior would need to override these methods.
When user code invokes the Set
constructor, it looks up and repeatedly invokes the .add
method on this
. This was intended to be an affordance for subclassing. Since these methods never create subclasses, this is not done in these methods: replacing Set.prototype.add
will not affect the results returned by these methods.
After a great deal of discussion, the committee ultimately decided that these methods should access the [[SetData]]
internal slot of the receiver (this
) directly, rather than by calling public methods like this.has()
or this[Symbol.iterator]()
. This means subclasses will need to either ensure that the internal slot is consistent with their public API or override all of these methods.
The committee decided that arguments should be accessed by their public API, rather than internal slots, so that it is possible to pass a thing which looks like a Set (e.g. a Proxy for a Set) as an argument to an actual Set.
In the September 2022 meeting it was decided that this specifically meant accessing the .has
, .keys
, and .size
methods of the argument. As of this writing notes are not yet available, but slides for that discussion are available here.
This proposal uses ToIntegerOrInfinity
to coerce the .size
property to get the size of the argument. However, unlike most uses of that operation, it explicitly guardes against NaN
and things which coerce to NaN
(including undefined
), instead of treating NaN
as being 0
. This matches a similar decision made in proposal-iterator-helpers: it means that passing a thing without a .size
property will be an error, instead of treating it as having size 0.
This proposal does not use [Symbol.iterator]
because of a quirk in the optimal algorithm for intersection, discussed in these slides.
The problem arises because the [Symbol.iterator]
method on Map
returns map entries, whereas the has
method on Map
tests for key membership, so the two are not consistent. If we did use [Symbol.iterator]
, it would mean that passing a Map
would behave as if passing the Set given by the Map
's keys only when the Map
was larger than the receiver, and otherwise would give an empty intersection. The position of the committee was that this sometimes-works behavior for a built-in type was unacceptably bad. Using keys
does not have this problem.
It is still possible for a user-implemented class to run into this behavior. The committee decided that was an acceptable cost.
Some methods only ever require iterating the argument or testing membership in it, rather than sometimes needing one and sometimes the other. For example, .union
only requires iterating the argument.
But all methods in this proposal always get all three of the above-named properties. This is to ensure that there is a single consistent interface they expect.
Also, just using .keys
would mean that passing an array to .union
would give the union with the indices of the array, which was held to be confusing. Requiring .size
to be defined means that arrays are rejected instead of that more confusing behavior.
Intersecting a small set with a large set should always take time proportional to the smaller set, regardless of whether the receiver or the argument is smaller.
That is to say, in .intersection
, when this
is smaller than the argument it is more efficient to iterate this
and check if each member is an element of the argument; conversely, when this
is larger than the argument it is more efficient to iterate the argument and check if each member is an element of this
. This means the observable behavior of .intersection
depends on the relative sizes of the two collections. That's necessary to achieve the desired time complexity.
.difference
has a similar branch; in that case, while the time complexity taking into account the time taken to copy the receiver cannot be less than than proportional to the size of the receiver, the number of user-observable function calls can still be minimized by branching on which of the two sets is larger, just as for .intersection
.
In principle, we could fall back to iterating the argument and converting it to a Set
when the argument does not implement all of .size
, .keys
, and .has
. But that has subtle performance implications. Specifically, when the receiver is significantly smaller than the argument, .intersection
requires time proportionate to the size of the receiver, not the argument. Iterating the argument to produce a Set
would therefore be potentially much slower than passing a similarly-sized Set
. So we decided that users should have to do that potentially-expensive iteration themselves, i.e., to do .intersection(new Set(iterable))
instead of just .intersection(iterable)
.
All methods are fetched eagerly and stored, rather than being looked up per iteration of the loop. This greatly simplifies optimization and is consistent with what we do elsewhere.
All of these methods take only a single argument. Some of them, like union
, could in theory take multiple values. This proposal currently takes the position that this use case is rare enough to not be worth worrying about; invoking the method repeatedly is held to be sufficient.
For each of the Set
-producing methods with the exception of intersection
, in the resulting Set
s all elements which were in the receiver appear first, in the order in which they appeared in the receiver, followed by elements which were only in the argument, in the order in which they appeared in the argument.
It was determined that this order is not efficiently achievable for intersection
in all implementations, so for that method only, the order is instead the order of the elements within the smaller of the two sets. This means that the order of the resulting set depends on the relative sizes of the two sets, which is a sharp edge, but the committee thought it was the best of the available options.