Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible neuron, synapse data layout #195

Closed
rcoreilly opened this issue Apr 14, 2023 · 6 comments
Closed

Flexible neuron, synapse data layout #195

rcoreilly opened this issue Apr 14, 2023 · 6 comments

Comments

@rcoreilly
Copy link
Member

This seems very straightforward, now that Neurons and Syns are all just giant arrays of structs on the Network:

  • Neurons and Syns are just single giant []float32 arrays (likewise on GPU).
  • Vars are just enums (put desc tooltips into separate array of strings)
  • Access is via methods with indexes to layer, neuron, variable, etc that take Context as first arg (nearly ubiquitous), which has a Network index in CPU mode, that allows access to a global list of networks, which then allows access to these arrays. On GPU, the accessor methods just access the global arrays defined in the kernel.
  • It is then entirely trivial to reorganize the memory layout any which way.
@rcoreilly
Copy link
Member Author

  • Context is shared between GPU and CPU and can't contain pointers, hence the need for the network index. Could alternatively have a pointer that is meaningless to the GPU -- isn't needed there anyway.
  • Also one of the indexes can be data index so data parallel can be interleaved in sequence.

@rcoreilly
Copy link
Member Author

more context: GPU has to access global arrays directly, which are allocated for each kernel, so need global functions that are defined differently on CPU vs. GPU:

func NeuronVarIdx(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) int32 {
    return var * ctx.Strides.NeuronVar + neurIdx * ctx.Strides.Neuron + dataIdx * ctx.Strides.NeuronData
}

// CPU version
func NeuronVar(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) float32 {
    return Networks[ctx.NetIdx].Neurons[NeuronVarIdx(ctx, neurIdx, dataIdx, nvar)]
}

// GPU version
[[vk::binding(1, 2)]] RWStructuredBuffer<float> Neurons;
float NeuronVar(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) {
    return Neurons[NeuronVarIdx(ctx, neurIdx, dataIdx, nvar)]
}

Have a separate SetNeuronVar, and versions that go via layer if needed (layer has starting global neuron index), etc.

Can just switch out NeuronVarIdx functions to see impact of different layouts.

@rcoreilly
Copy link
Member Author

The non-float32 vars (flags, indexes) in Neuron would be stored separately, so you don't have to complicate anything about type conversions. Also, SynCa would be separated from Synapses memory, per #168

@rcoreilly
Copy link
Member Author

Context requires data parallel state for all the PVLV, NeuroMod stuff.

@rcoreilly
Copy link
Member Author

Plan: add GlobalVars enum in globals.go, store memory in Network, with all the NeuroMod and PVLV state. Can parameterize with NDrives with computed offsets etc for flexible storage. maybe have an offset index lookup table for each val (above a given enum value). Data inner-loop indexing for each var. GPU just exposes the Globals directly as usual.

@rcoreilly
Copy link
Member Author

This worked as expected and massively improves GPU performance, and even CPU performance is significantly improved in NData > 1 cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant