Warning
Under active development 🔥
mojo 3d library
mo3d
is aiming to be a 3d graphics library built from the ground up in Mojo, with minimal external dependencies. It utilizes only a basic subset of SDL2
for rendering, employing parallelized copying of Mojo-rendered scenes directly to texture data. Unlike traditional shader pipelines, mo3d
aims to rethink rendering by consolidating shaders into Mojo code, while planning for Mojo's future support of various hardware accelerators.
Key features aim to include an experimental ECS-based architecture for efficient scene management and ray/path tracing as the default rendering method, to capitalize on modern GPU hardware with ray tracing cores. Future goals include integrating a minimal physics engine, creating a complete 3d simulation environment akin to a lightweight version of NVIDIA’s Omniverse, focused on simplicity, composability and high performance.
mo3d
could serve as a foundation for game engines, providing an optimized and flexible base for interactive 3D environments. Additionally, its potential to support simulation environments makes it ideal for reinforcement learning (RL) use cases, where optimization is crucial. By allowing render code and neural network forward/backward pass code to be co-compiled and optimized by mojo , mo3d
aims to minimize latency in the RL loop. Differentiable rendering techniques could further enable seamless integration between RL agents and rendered environments, facilitating single differentiable passes for tasks such as RL agent training, gradient calculation, and environment interaction.
- mojo-sdl (this code was mildly edited, but is pretty much verbatim as the base
sdl_window
implementation) - ray tracing in one weekend / the next week / rest of your life (all the ray tracing core ideas are attributed to these guides)
- three.js (example of a successful 3d library for javascript)
- magnum-graphics (example of a c++ middleware library for graphics)
- bevy (example of a lightweight ECS game engine for rust)
- taichi (great examples of compute on sparse data and differentiable programming)
- openvdb (this could be a interesting integration in the future and/or futher inspiration for spare data representations)
- pbr-book (Physically Based Rendering:From Theory To Implementation: Matt Pharr, Wenzel Jakob, and Greg Humphreys)
- previous personal project
- install
magic
- see available tasks with
magic run list
- main app
magic run start
2025-02-03: more optimisations and features from Rob - cuboids/meshes/serialisation/convergance improvements.
- Managed to get a basic bvh creation and traversal implementation working using the ECS.
- I need to profile the hotspots and try to think if there are improvements. This traversal implementation (on the small scenes I've been testing so far, is actually about 4x slower than the simple hittable list implementation I was using previously).
- While it's slower for small scenes, for large scenes, the acceleration is approximately O(4*log2(n)). For example, while it's taking ~8s per frame for ~400 spheres, it's only taking ~22s for ~90K spheres.

- ECS ComponentStore basic implementation working - getting component data out of it is still a little ugly (the renderer in camera is the only system using this component store right now)
- Building a ECS scene is fairly straightfoward,
sphere_scene
shows an ECS migration of the current test scene.
# Note: T is DType.float32, dim is 3
var store = ComponentStore[T, dim]()
var mat_ground = Material[T, dim](
Lambertian[T, dim](Color4[T](0.5, 0.5, 0.5))
)
var ground = Sphere[T, dim](1000)
var ground_entity_id = store.create_entity()
_ = store.add_components(
ground_entity_id,
Point[T, dim](0, -1000, 0),
Geometry[T, dim](ground),
mat_ground,
)
2024-09-17: basic gui proof of concept (e.g. rendering text) using PIL (python interop) and taking the raw image output and drawing directly into the main output texture
2024-09-13: thinking about a basic ecs module for scene management and finished off some bits from Ray Tracing in One Weekend and starting on the Ray Tracing the Next week!
- Trying to implement a minimal proof of concept for
SoA
(structure of arrays)ComponentStore
to hold the state of components. - Modified the underlying storage for
Vec
andMat
to useInlineArray
rather than a heap storage. This should improve performance. I still need to think how to elegantly further improve this ComponentStore withSIMD
(not sure how to extendVec
andMat
to useSIMD
without having problems with wasted space when sizes are not powers of 2 and how to align multiple bits of data) - Added metallic and dielectric materials.
- This replaces the
Makefile
- The environment setup (installing sdl2) is cached based on a proxy of the local
pixi.toml
file - given the stability ofsdl2
API, this should be adequate. - Other commands should do appropriate caching for extra wonderful speedyness.
- Vec4 is now
Vec
- Backing storaged is directly managed viaUnsafePointer
- Matrix
Mat
- Backing storage is directly managed viaUnsafePointer
(I initially tried owningdim
number ofVec
s however, I ended up struggling to convince mojo to not prematurly deallocate them. So instead, now,Mat
carves out it's own memory and copies to and fromVec
when required. - Arcball camera implemenation - many thanks to this article!
arcball.mov
2024-08-27: refactoring ray, adding hittables, basic ray multi-sampling to update a progressive texture
- Struggling to get a proper generic/runtime polymorphic hittable implementation working.
- Couple of concrete/leaky dependencies in the ray_color/HittableList implementations.
- Added SIMD/generic Interval implementation.
- Added camera implementation.
- Adding basic diffuse lambertian material
- Replaced the Tensor with my own UnsafePointer texture state implementation.
- Progressive rendering to the texture state, so rather than multiple samples in a single pass, the image samples and re-renders, this keeps the frame time at around
10ms
on mac m3.
- Took longer that I would have liked to track down the mysterious/non-deterministic corrupted data being rendered in the main loop
- The solution was to signal to mojo that variables captured/referenced within the render kernel should not be deleted till after the main loop
- Finally have the basic ray shooting background from Ray Tracing in One Weekend
- Stats
CPU
:Ryzen 7 5800X 8-Core
Window Size
:800x450
had an average compute time (shoot rays) of0.80 ms
& average redraw time (copy tensor to gpu texture) of3.03 ms
- Had to remove the SIMD stuff from redrew as we can't be sure of byte alignment of the texture data which is managed memory from SDL2.
- Had to ensure that Mojo didn't attempt to tidy up/mess with the UnsafePointers before SDL_UnlockTexture was called (using
_ = ptr
pattern/hack) - We have Mojo CPU parallelized (for each row) operations directly on the SDL2 texture data (STREAMING type)
- Parallelized row texture update redraw time down to ~1.5 ms (~4 ms without
parallelized
) - We can use this approach to quickly move (in the future) a Mojo managed Tensor (hopefully on GPU) which contains our view of the world into SDL2's texture which is being rendered in a window (e.g. in ~1.5ms)
- Using SIMD interleaving on the 3rd dimension
channels
in (tensort
)
- Basic window rendering on linux (within vscode devcontainer on windows) and mac
- Basic kernel, however, need to refine the vectorized worker code that sets the pixel stage (tensor
t
)