-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP: Slot-only tinsel #48
Comments
Hi David, Yes I also think this could be interesting, without too much effort. We don't have to drop DRAM -- we could send a message to the DRAM requesting a 64-byte chunk, for example. It would just be more DMA-like without caches. This would be moving closer to Jan Gray's style of architecture. And yes, 3-4x may well be doable. On the DE5s we can only have 32 FPUs though (due to DSP hard block limitations), so there would be one FPU per 8 cores in a 256 core configuration. |
We're all adults. Fixed-point FTW! |
I should add, there is scope to use multiple-issue in the FPUs, i.e. allow more than one instruction per cycle by allowing, for example, MUL to be issued in parallel with ADD. Yep, could also go with fixed-point, but floating point is a strength of the mullti-threaded approach. |
Ah, but surely integer performance is also a strength of more threads... Sorry, it's flippant Friday - not seriously suggesting fixed-point in this case. |
(Where TIP is "Tinsel Improvement Proposal", al a PIP)
The idea of a tinsel variant where devices only have access to the slots
continues to intrigue, particularly if there is a big advantage in thread-count
(I believe 4x was mentioned...).
Currently we have 1024 bytes per thread, and that is enough for a very
carefully designed system. Assume we still have 64-byte messages, and
reserve 8 message slots for send/receive. We then get 1024 bytes for
stack/working space. Assume:
A 4-vector is 16 bytes, so we can keep hold of 8 4-vectors in that space,
and send 2 4-vectors's per message.
So I think we could do a reasonably interesting 2x2x2 agglomerated finite-volume
and soak up the extra cores quite well.
Not saying we must do it, but the idea of quadrupling the core count is very interesting,
even at the expense of DRAM>
The text was updated successfully, but these errors were encountered: