v0.1.0
Added GPU acceleration and F32 versions of things.
Some cool things that still need to be done are:
- writing some kind of macro to generate the code for f32 and f64 versions of certain structs and traits to not have duplicated code.
- making so that the 'get' methods implemented return slices instead of copies of the vectors as to not duplicate things in RAM and save as much RAM as possible for very large models.
- improve the GPU shaders, perhaps finding a way to send the full unflattened matrices to the GPU instead of sending just a flattened array.
- create GPU accelerated activations and loss functions as to make everything GPU accelerated.
- perhaps write some shader to calculate the Model loss to output gradient (derivatives).
- implement convolutional layers and perhaps even solve some image classification problems in a example
- add a example that uses GPU acceleration
So still a lot of work to do.