Skip to content

Latest commit

 

History

History
70 lines (58 loc) · 2.26 KB

README.md

File metadata and controls

70 lines (58 loc) · 2.26 KB

spnn

description

  • spnn: simple parallelized neural network.
  • A comparison of fully connected network (forward and backward propagation) implementations.
  • Implementations are listed below,
    1. CPU single thread.
    2. CPU multiple threads using openmp.
    3. GPU single thread using cuda.
    4. GPU multiple threads using cuda.
    5. OpenBLAS.
  • The task selected is digit classification on MNIST data.

code

  • Code is written in C++/CUDA.
  • OpenMP variant uses openmp library.
  • OpenBLAS variant uses openblas library.
  • include/ contains headers.
  • src/ contains all variant implementations.
  • data/ contains MNIST data.
  • proposal.pdf contains the project proposal.
  • presentation.pdf contains the presentation given at the end of the project.
  • report.pdf contains details, experiments and analysis.
  • Makefile is used to make target executables.

documentation

  • The documentation of the code is itself.

usage

  • Open a terminal in the directory containing Makefile.
  • Use make all to build all targets.
  • The targets are listed as follows,
    1. cpu_serial.out
    2. cuda_parallel.out
    3. openmp.out
    4. openblas.out
    5. cuda_serial.out
  • To build a specific target use make <target-name>.
  • To remove all targets use make clean.
  • Use ./<target-name> to run a target.

demonstration

  • Accuracy vs epochs for the fully connected network irrepective of implementation.

  • Implementations comparision for a specfic model.

  • Time taken vs Params size for different implementaions. Observe the GPU parallelized variant curve is flat at almost 0.

roadmap

  • Things to consider during analysis
    • correctness (> 10% accuracy)
    • repeatablity (nothing fancy)
    • memory check (no mem leaks or other bad stuff using valgrind --tool=memcheck)
  • Initialization done uniformly in -1 to 1
  • Layers are numbered from 0 i.e. first hidden layer is layer 1
  • Control size of name field
  • Impl loss function
  • Remove memleaks from step_train
  • Batch gradient descent: fix loss decrement and check backprop
  • Normalize
  • Get MNIST data
  • Profile
  • Remove data loading from time taken