-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Element-addressable memory follow-up #1614
Comments
Do we have a sketch for the 32-bit range checks? It seems to me that we need these for soundness (and we also need to make sure that the address is a multiple of 4 for word-addressable operations). The naive approach would require 2 more columns in the main trace and 1 more column in the auxiliary trace. This would bring up the cost of element-addressable memory to 3 extra columns in the main trace and 1 extra column in the auxiliary trace, making it somewhat less attractive. Could we do anything better than the naive approach? |
Maybe we can avoid the 3 extra columns in the main trace (but not the one in the auxiliary trace) by relying on just range checking the address on the decoder side. |
I think we can probably discuss this in this issue. One potential solution which almost works is to add
This would help us with We could change Oh - one other operation I forgot about is |
I would like to pinpoint the problem(s) we are solving. The way I understand it, the main question is: Do we need to range check both the memory address of the op as well as the memory address of the word? Or is range checking only one of the two is enough and if yes, which one? I think the answer is that we are going to need two u32 range checks, however, we can do both range checks on the side of the decoder. The way to go about it is:
We then enforce the following constraint
As @bobbinth mentioned in his previous comment, we are going to shuffle around and change some of the opcodes to make this work. |
As discussed offline, whether we perform the range checks on the decoder side or the chiplet side will depend notably on #1610. In short, the downside of doing it on the chiplet side is that it adds more columns to the chiplet. However if with #1610 we decide to stack the circuit evaluation chiplet with the other chiplets and this results in more columns being added to all chiplets, then we would perform the range-checks in the memory chiplet as well. Otherwise, we would probably prefer to do them on the decoder side. Another consideration is that the circuit evaluation chiplet would also need to perform memory accesses, but wouldn't benefit from the
Also as discussed offline, we could avoid shuffling the opcodes around by adding a |
One thing I was thinking about: if there is a way to somehow enforce sorting without relying on I know there are some memory designs which do not rely on range checks. @Al-Kindi-0 @plafer - do you know how these work? Basically, curious how costly these would be as compared to the alternatives. |
The
|
Totally agree that we need to sort by clock cycle - but I guess the question is do other constructions rely on range checks to do sorting, or do they use something else for this? |
Not familiar with other ways to do sorting without range checking the differences, and from the docs referenced above it seems that they use range checks to do the ordering. |
In this comment, I will summarize the state of the discussion regarding the options for implementing the missing range checks. To recap, the problem we're trying to solve is guaranteeing that the values of the
if
then
We are still exploring 2 possible approaches: either we run the range-checks in the Memory chiplet itself, or on the "caller" side (e.g. when an Both approaches will require adding an extra auxiliary column for the range-checks, since the current one is already "full" (i.e. the logup constraints are already degree 9). However, @Al-Kindi-0 mentioned that we might be able to optimize the current one such that we wouldn't need this new column. Next, I will describe how both approaches work in more detail. Run the range-checks in the Memory chipletWith this approach, we replace the Run the range-checks on the caller sideThe 2 current known "callers" are the decoder (with instructions decoder-sideWith this approach, we add a column For those range-checks to carry over to the Memory chiplet, we need to use those range-checked values in the bus message itself. Thus, the Thus, I believe we also need to commit to So far we've only discussed those instructions that access a single memory address. However, instructions like ACE chiplet side
SummaryBelow we summarize the pro's and con's of each approach
|
Suppose we stack the chiplets as follows:
Then we could reuse the wiring bus for the ACE chiplet for the range checks performed by the memory chiplet. The way we can do this is by defining the bus equation as Here:
The constraints on the above bus will then be:
This brings us to two related questions:
The answer to the first question is implied by that of the second and it will involve adding degree reduction columns to reduce the degree of the selectors. With such a wide chiplet trace, we can afford the degree reduction columns without any issue. This will also mean that adding the extra columns to the memory chiplet should not be an issue. Hence, if we can find a way to somehow remove |
This is an issue covering all the things left to do following #1598.
word_addr
columnbatch
was since then renamed toword_addr
)Ideally, we'd only need a single 32-bit range-check, but having both is at least sufficient for correctness.
The text was updated successfully, but these errors were encountered: