forked from exafmm/exafmm-beta
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
74 lines (66 loc) · 2.17 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-- MPI --
- benchmark against uniform on kyoto
- match performance of uniform with non-uniform
- merge partition and treeMPI
- LBT MPI <- ICELL is being calculated from localBounds
- Test difficult distributions for partitioning and LET
MPI-3.0 lock all
cycle counter based weights
2:1 refinement for precomputation
non-orthogonal recursive bisection
-- automake --
- Add m4/ax_mpi variants from PVFMM
- Theoretical error bound using kernel.cxx
- Accuracy check in examples
- performance regression, see Chromium project for details
move contents of macros.h to configure.ac
examples/Makefile options to args
unit test for each header file in /unit_test
-- types --
args.h, verify.h from class to namespace
logger.h split to print.h
bodies -> bodies + fields, bodyPos -> bodies, bodyAcc -> fields
AoS, SoA union by Strzodka
Use compressed Cell struct of Bonsai
Remove M, L from Cell struct
-- tree build --
Separate key manipulation namespace, e.g. interleaveMorton(), deinterleaveHilbert() (controllable key_t)
Morton -> Hilbert
-- kernels --
- Define M, L inside kernel namespace and template over P
- Use reinterpret_cast pointers to reference them inside Cell struct
- Complie all P during make -> provide option to make for specific P value
- Helmholtz breaks for very low P
Stokes kernels
Use getIndex for NO_P2P
Kahan + fixed precision
Solid harmonics kernel
hack vecmathlib and import essential features
Calculate Flops
Cycle counter kernel timer
FX10 sin, cos, exp intrinsic
-- driver --
Don't try to instantiate all the classes at the top (wait until useful parameters are ready)
TBB -> OpenMP with atomics -> flush
splitRange defined in both dataset.h and partition.h
UVWX-list with precomputation
Periodic B.C. by one precomputed translation matrix
Teng's BH MAC with M2P option during DTT
Interface with BEM code
charmm2: remove repartition inside Ewald & VdW
-- GPU --
CUDA 6.0 debug
Unify dataset
.h -> .cxx/.cu
MPI Bonsai
Ewald, VdW
Zero softening
-- comparisons --
DTT vs. UVWX-list
Separate +- tree vs. Single tree
Cartesian vs. Spherical vs. Planewave
ORB vs. HOT
OpenMP vs. TBB vs. Cilk vs. MThreads
Geometric vs. Algebraic Mat-Vec
-- documentation --
ipython notebook tutorial (from Andreas)