Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP: Slot-only tinsel #48

Open
m8pple opened this issue Apr 20, 2018 · 4 comments
Open

PEP: Slot-only tinsel #48

m8pple opened this issue Apr 20, 2018 · 4 comments

Comments

@m8pple
Copy link
Contributor

m8pple commented Apr 20, 2018

(Where TIP is "Tinsel Improvement Proposal", al a PIP)

The idea of a tinsel variant where devices only have access to the slots
continues to intrigue, particularly if there is a big advantage in thread-count
(I believe 4x was mentioned...).

Currently we have 1024 bytes per thread, and that is enough for a very
carefully designed system. Assume we still have 64-byte messages, and
reserve 8 message slots for send/receive. We then get 1024 bytes for
stack/working space. Assume:

  • 256 bytes for "stack" (a misnomer at this scale)
  • 128 bytes for local connectivity (topology info)
  • 128 bytes for state.
    A 4-vector is 16 bytes, so we can keep hold of 8 4-vectors in that space,
    and send 2 4-vectors's per message.

So I think we could do a reasonably interesting 2x2x2 agglomerated finite-volume
and soak up the extra cores quite well.

Not saying we must do it, but the idea of quadrupling the core count is very interesting,
even at the expense of DRAM>

@mn416
Copy link
Collaborator

mn416 commented Apr 20, 2018

Hi David,

Yes I also think this could be interesting, without too much effort. We don't have to drop DRAM -- we could send a message to the DRAM requesting a 64-byte chunk, for example. It would just be more DMA-like without caches. This would be moving closer to Jan Gray's style of architecture.

And yes, 3-4x may well be doable. On the DE5s we can only have 32 FPUs though (due to DSP hard block limitations), so there would be one FPU per 8 cores in a 256 core configuration.

@m8pple
Copy link
Contributor Author

m8pple commented Apr 20, 2018

We're all adults. Fixed-point FTW!

@mn416
Copy link
Collaborator

mn416 commented Apr 20, 2018

I should add, there is scope to use multiple-issue in the FPUs, i.e. allow more than one instruction per cycle by allowing, for example, MUL to be issued in parallel with ADD.

Yep, could also go with fixed-point, but floating point is a strength of the mullti-threaded approach.

@m8pple
Copy link
Contributor Author

m8pple commented Apr 20, 2018

Ah, but surely integer performance is also a strength of more threads...

Sorry, it's flippant Friday - not seriously suggesting fixed-point in this case.

@m8pple m8pple changed the title TIP: Slot-only tinsel PEP: Slot-only tinsel Apr 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants