Flexible neuron, synapse data layout #195

rcoreilly · 2023-04-14T02:07:23Z

This seems very straightforward, now that Neurons and Syns are all just giant arrays of structs on the Network:

Neurons and Syns are just single giant []float32 arrays (likewise on GPU).
Vars are just enums (put desc tooltips into separate array of strings)
Access is via methods with indexes to layer, neuron, variable, etc that take Context as first arg (nearly ubiquitous), which has a Network index in CPU mode, that allows access to a global list of networks, which then allows access to these arrays. On GPU, the accessor methods just access the global arrays defined in the kernel.
It is then entirely trivial to reorganize the memory layout any which way.

The text was updated successfully, but these errors were encountered:

rcoreilly · 2023-04-14T04:54:33Z

Context is shared between GPU and CPU and can't contain pointers, hence the need for the network index. Could alternatively have a pointer that is meaningless to the GPU -- isn't needed there anyway.
Also one of the indexes can be data index so data parallel can be interleaved in sequence.

rcoreilly · 2023-04-14T05:12:54Z

more context: GPU has to access global arrays directly, which are allocated for each kernel, so need global functions that are defined differently on CPU vs. GPU:

func NeuronVarIdx(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) int32 {
    return var * ctx.Strides.NeuronVar + neurIdx * ctx.Strides.Neuron + dataIdx * ctx.Strides.NeuronData
}

// CPU version
func NeuronVar(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) float32 {
    return Networks[ctx.NetIdx].Neurons[NeuronVarIdx(ctx, neurIdx, dataIdx, nvar)]
}

// GPU version
[[vk::binding(1, 2)]] RWStructuredBuffer<float> Neurons;
float NeuronVar(ctx *Context, neurIdx, dataIdx int32, nvar NeuronVars) {
    return Neurons[NeuronVarIdx(ctx, neurIdx, dataIdx, nvar)]
}

Have a separate SetNeuronVar, and versions that go via layer if needed (layer has starting global neuron index), etc.

Can just switch out NeuronVarIdx functions to see impact of different layouts.

rcoreilly · 2023-04-14T05:21:55Z

The non-float32 vars (flags, indexes) in Neuron would be stored separately, so you don't have to complicate anything about type conversions. Also, SynCa would be separated from Synapses memory, per #168

rcoreilly · 2023-05-22T06:44:57Z

Context requires data parallel state for all the PVLV, NeuroMod stuff.

rcoreilly · 2023-05-23T04:17:27Z

Plan: add GlobalVars enum in globals.go, store memory in Network, with all the NeuroMod and PVLV state. Can parameterize with NDrives with computed offsets etc for flexible storage. maybe have an offset index lookup table for each val (above a given enum value). Data inner-loop indexing for each var. GPU just exposes the Globals directly as usual.

rcoreilly · 2023-06-16T18:10:30Z

This worked as expected and massively improves GPU performance, and even CPU performance is significantly improved in NData > 1 cases.

rcoreilly mentioned this issue May 23, 2023

Flexible memory layout for GPU #229

Merged

rcoreilly closed this as completed Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible neuron, synapse data layout #195

Flexible neuron, synapse data layout #195

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented May 22, 2023

rcoreilly commented May 23, 2023

rcoreilly commented Jun 16, 2023

Flexible neuron, synapse data layout #195

Flexible neuron, synapse data layout #195

Comments

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented Apr 14, 2023

rcoreilly commented May 22, 2023

rcoreilly commented May 23, 2023

rcoreilly commented Jun 16, 2023