-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple broadcastability shape constraints to TensorType
#1122
Add simple broadcastability shape constraints to TensorType
#1122
Comments
Type
levelsTensorType
TensorType
TensorType
Discussions in #1089 are hard to follow, and I have a naive question: why do we need to add a broadcastability constraints if we can do shape inference? Wouldn't it be enough to have something like |
Part of the issue is that shape inference (i.e. If we add a constraint to represent partial shape information, then we can continue to propagate and distinguish between the same information provided by |
By the way, this is also one of the primary reasons for actually supporting the "dynamic" broadcasting (i.e. broadcasting without the "required" static shape information) mentioned in #1089: we can't rely on every If we simply say "no, we don't support this" across the board, then broadcasting will break in graphs containing such |
We need a way to express broadcastability constraints on the static shape values in
TensorType.shape
so that we can distinguish between the "no information" and "dimension is not broadcastable" cases, or anything similar.We could always consider allowing
-1
to encode a "not equal to 1" (i.e. not broadcastable) constraint onTensorType.shape
values. I believe this has come up before, and it isn't perhaps the most desirable approach, but it should serve to adequately express the oldTensorType.broadcastable
constraints while also allowingTensorType.shape[d] == None
to represent a complete lack of shape information—as it intuitively does.We would need to refactor our current usage of
TensorType.shape
to account for this case, but, aside from being a less than exciting task, it shouldn't be particularly difficult.Making this change would finally allow us to reason more definitively about some ambiguous instances of compile-time broadcasting, and help us remove some of the strong and unreliable assumptions we are currently required to make in our constructors and rewrites.
More specifically, it would help us raise errors when—for example—gradients are constructed, as in #1089. However, for that case specifically, some things still need to be worked out. For instance, it's not entirely clear which default non-broadcastability assumptions/constraints we should make in certain places, or simply whether or not there are even acceptable defaults possible.
The text was updated successfully, but these errors were encountered: