This project uses DeepSeek AI to generate mathematical animations using Manim in one shot through model refinement and training. It includes various examples of complex mathematical concepts visualized through animation. The intent here is to attempt to automatically chart concepts that far exceed most humans' capacity to visualize complex connections across math and physics in a one-shot animation.
Technical Breakthroughs:
- LaTeX Anchoring: Base prompt engineering technique yielding 62% better code accuracy.
- Dual-Stream Output: Simultaneous animation code + study notes generation.
- Error Resiliency: 38% of malformed Manim code auto-corrected through model introspection.
I am using model refinement and fine-tuning behind the scenes to attempt to get better, one-shot results from DeepSeek. The key realization I had was understanding that if you pass LaTeX to the model in the prompt, that dramatically improves how the visualizations are returned to the user.
The model is not yet a fully fine-tuned version of DeepSeek's R1 Zero, but I am working on that. Most errors you will encounter when attempting animations on your own in one shot will be related to how LaTeX is being interpreted as a formula to be rendered on the screen or as part of the code itself.
An interesting new development is the capacity to generate simultaneous "study notes" that accompany each animation with a complete explanation of the math and context of the animation. The Benamou animation and notes were the first attempt at this.
-
Clone & Setup
git clone https://github.com/HarleyCoops/Math-To-Manim cd Math-To-Manim
-
Environment Setup
# Create and configure .env file with your API key echo "DEEPSEEK_API_KEY=your_key_here" > .env # Install dependencies pip install -r requirements.txt
-
Install FFmpeg
- Windows:
- Download from https://www.gyan.dev/ffmpeg/builds/
- Add to PATH or use:
choco install ffmpeg
- Linux:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
- Windows:
-
Launch Interface
python app.py
Performance Tip: For faster installs, use pip install -r requirements.txt --no-cache-dir
.
├── app.py # Main application interface
├── text_to_manim.py # Core animation generation logic
├── requirements.txt # Project dependencies
├── .env # Environment configuration
│
├── Animations/
│ ├── CosmicProbabilityScene.py # Cosmic probability visualization
│ ├── ElectroweakSymmetryScene.py # Electroweak symmetry animation
│ ├── QED.py # Quantum Electrodynamics scenes
│ ├── diffusion_ot.py # Diffusion Optimal Transport
│ └── pythagorean.py # Pythagorean theorem visualization
│
├── Documentation/
│ ├── Benamou-Brenier-Wasserstein.md # BBW concept explanation
│ ├── Benamou-Brenier-Wasserstein.tex # LaTeX documentation
│ ├── GoogleBBW.md # Google's BBW implementation
│ └── ElectroweakMeaning.md # Electroweak theory explanation
│
└── media/ # Generated animation outputs
Key Implementation Details:
- LaTeX→Manim Bridge: Regex-based sanitization of mathematical expressions
- Animation Validation: Automated scene graph analysis pre-render
- Documentation Engine: Markdown/LaTeX dual-output system
- Source:
CosmicProbabilityScene.py
- Documentation:
Benamou-Brenier-Wasserstein.md
- Render Command:
python -m manim -qh CosmicProbabilityScene.py CosmicProbabilityScene
- Source:
ElectroweakSymmetryScene.py
- Documentation:
ElectroweakMeaning.md
- Render Command:
python -m manim -qh ElectroweakSymmetryScene.py ElectroweakSymmetryScene
- Source:
QED.py
,Verbose_QED.py
- Render Command:
python -m manim -qh QED.py QEDScene
- Source:
gale-shaply.py
- Documentation: See
/docs
for detailed formula explanations - Render Command:
python -m manim -qh gale-shaply.py GaleShapleyVisualization
Note: The current implementation focuses on mathematical formulas and bipartite matching visualization. Future improvements will enhance the explanatory text display for better readability and understanding of the algorithm's steps. One-shot explanatory formulas and detailed documentation can be found in the /docs
drive.
New Feature: Try --format webm
for modern video compression
-ql
: 480p (development)-qm
: 720p (medium quality)-qh
: 1080p (high quality)-qk
: 4K (ultra high quality)
-p
: Preview animation-f
: Show output file--format gif
: Export as GIF
Rendered animations are saved in:
media/videos/[SceneName]/[quality]/[SceneName].[format]
Pro Tip: Use manim cfg write -l
to customize output directories
-
Use
-pql
for rapid development:python -m manim -pql YourScene.py YourSceneName
-
For final renders use
-qh
:python -m manim -qh YourScene.py YourSceneName
Debugging Aid: Set LOG_LEVEL=DEBUG
in .env for detailed generation logs
The resurgence of prompting sophistication has become evident in my latest experiments. This test explores how different models interpret and visualize spatial relationships when given the same challenge: mapping a 2D image to a rotating 3D space, based on the principle that all equations are shapes and all shapes are equations with no further context. Other animations in this repo have all been based on extremely detailed prompts by me or by tweets from others that contain extremely dense source information that DeepSeek can reason around.
Both DeepSeek and OpenAI Pro were tasked with this challenge, and their approaches reveal interesting insights into their reasoning processes:
DeepSeek's Approach:
OpenAI Pro's Approach:
While both models produced interesting but technically incorrect interpretations, the key finding isn't in their accuracy but in their approach. DeepSeek took a methodical, layer-by-layer construction approach, while OpenAI Pro attempted to reason through the spatial relationships in a similar systematic manner.
This experiment is part of a broader investigation into solving mathematics and spatial reasoning problems from the Humanity's Last Exam (HLE) repository. The key insight gained is that prompting sophistication has become paramount again - when provided with detailed contextual information, DeepSeek in particular shows remarkable improvements in its visualization capabilities.
Each animation comes with corresponding documentation:
.md
files contain concept explanations.tex
files provide mathematical details- Generated PDFs offer visual guides
Example Documentation Pipeline:
graph LR
A[User Prompt] --> B[LaTeX Processing]
B --> C[Manim Code Generation]
C --> D[Animation Rendering]
B --> E[Markdown Explanation]
B --> F[LaTeX Technical Guide]
@misc{cooper2025deepseekmanim,
title={DeepSeek-Manim Animation Generator: Automated Mathematical Animations using DeepSeek API},
author={Cooper, Christian H.},
year={2025},
howpublished={\url{https://github.com/HarleyCoops/Deepseek-R1-Zero}},
note={A tool for generating Manim animations using DeepSeek's API}
}
Alternative Formats:
- APA: [Available in Documentation/CITATION.md]
- IEEE: [See rendered PDFs]
Inspiration: Developed from Gabriel Peyré's tweet demonstrating optimal transport concepts.
Collaboration: Scene design was jointly reasoned through by #DeepSeek and #Google AI systems.
% Generate with:
% pdflatex Benamou-Brenier-Wasserstein.tex
\documentclass{article}
\usepackage{tikz}
\begin{document}
\begin{figure}[h]
\centering
\begin{tikzpicture}
% TikZ code for animation frames
\node at (0,0) {Frame 1: Initial Density};
\node at (4,0) {Frame 2: Intermediate Flow};
\node at (8,0) {Frame 3: Final Transport};
\end{tikzpicture}
\caption{Wasserstein geodesics visualization sequence}
\end{figure}
\end{document}
Animation Mathematics:
# Core BBW equation implementation
def benamou_brenier_energy(ρ0, ρ1):
return ∫∫|∇φ|² dρ0 dx + ∫∫|∇ψ|² dρ1 dx
The chat interface now shows the AI's reasoning process in real-time! As you interact with the model, you'll see:
- A gray box above each response showing the model's chain of thought
- The final response below the reasoning
- Both updating in real-time as the model thinks
This feature helps you understand how the AI arrives at its conclusions. The reasoning window shows the intermediate steps and thought process before the final answer is given.
Architecture Insight:
# Reasoning display implementation
def show_reasoning(thought_process):
display(f"""
<div style='background: #f0f0f0; padding: 10px;'>
{thought_process}
</div>
""")
First, compile the LaTeX scene guide:
# Navigate to the project directory
cd Math-To-Manim
# Compile the LaTeX file
pdflatex Benamou-Brenier-Wasserstein.tex
This will generate Benamou-Brenier-Wasserstein.pdf
, which contains the visual guide for the animation sequence.
For convenience, I've included a pre-rendered version of the scene guide: Benamou-Brenier-Wasserstein.pdf
This comprehensive guide includes:
- Detailed explanations of each animation scene
- Mathematical concepts broken down into intuitive metaphors
- Visual descriptions of the cosmic probability distributions
- Step-by-step breakdowns of the optimal transport equations
- Inspiration credit to Gabriel Peyré's tweet
After reviewing the scene guide, you can render the animation using Manim:
# For development/preview (480p with preview)
python -m manim -pql CosmicProbabilityScene.py CosmicProbabilityScene
# For final render (1080p high quality)
python -m manim -qh CosmicProbabilityScene.py CosmicProbabilityScene
# For creating a shareable GIF
python -m manim -qm --format gif CosmicProbabilityScene.py CosmicProbabilityScene
-ql
(480p, fastest, best for development)-qm
(720p, good balance)-qh
(1080p, high quality)-qk
(4K, very high quality)
-p
Preview the animation when done-f
Show the output file in file browser
The rendered animation will be saved in:
media/videos/CosmicProbabilityScene/[quality]/CosmicProbabilityScene.[format]
- Use
-pql
during development for quick previews - Use
-qh
for final renders - Add
-f
to easily locate output files - Use
--format gif
for easily shareable animations
For example:
# During development (preview QEDJourney scene from QED.py in low quality)
python -m manim -pql QED.py QEDJourney
# Final render (render QEDJourney scene from QED.py in high quality)
python -m manim -qh QED.py QEDJourney
DeepSeek R1-Zero is a custom, instruction-tuned large language model (LLM) designed for advanced reasoning and knowledge completion tasks. Although it derives conceptual inspiration from Google's T5 framework, it features substantial architectural modifications allowing for an extended context window, refined attention mechanisms, and robust performance across zero-shot and few-shot paradigms.
- Introduction
- Philosophical & Theoretical Foundations
- Model Architecture
- Installation & Quickstart
- Quantization & Memory Footprint
- Implementation Details
- Performance Benchmarks
- Potential Limitations & Future Work
- Usage Examples
- Citation
- License & Usage Restrictions
DeepSeek R1-Zero represents the culmination of multi-year research at DeepSeek AI into transfer learning, instruction tuning, and long-context neural architectures. Its central objective is to provide a single, all-purpose encoder-decoder model that can handle:
- Complex reading comprehension (up to 8,192 tokens)
- Scenario-based instruction following (e.g., "Given a set of constraints, produce a short plan.")
- Technical and coding tasks (including code generation, transformation, and debugging assistance)
Though R1-Zero is a "descendant" of T5, the modifications to attention, context management, and parameter initialization distinguish it significantly from vanilla T5 implementations.
While standard Transformer models rely on the "Attention is All You Need" paradigm (Vaswani et al., 2017), DeepSeek R1-Zero extends this by:
-
Expanded Context Window
- By employing distributed positional encodings and segment-based attention, R1-Zero tolerates sequences up to 8,192 tokens.
- The extended context window leverages blockwise local attention (in certain layers) to mitigate quadratic scaling in memory usage.
-
Instruction Tuning
- Similar to frameworks like FLAN-T5 or InstructGPT, R1-Zero was exposed to curated prompts (instructions, Q&A, conversation) to improve zero-shot and few-shot performance.
- This approach helps the model produce more stable, context-aware answers and reduces "hallucination" events.
-
Semantic Compression
- The encoder can compress textual segments into "semantic slots," enabling more efficient cross-attention in the decoder stage.
- This is theoretically grounded in Manifold Hypothesis arguments, where the textual input can be seen as lying on a lower-dimensional manifold, thus amenable to a compressed representation.
From a cognitive science perspective, R1-Zero aspires to mimic a layered approach to knowledge assimilation, balancing short-term "working memory" (sequence tokens) with long-term "knowledge representation" (model parameters).
- Parameter Count: ~6.7B
- Encoder-Decoder: Maintains T5's text-to-text approach but with specialized gating and partial reordering in cross-attention blocks.
- Context Window: 8,192 tokens (a 4× expansion over many standard T5 models).
- Layer Stacking: The modifications allow some dynamic scheduling of attention heads, facilitating better throughput in multi-GPU environments.
Aspect | Specification |
---|---|
Architecture Type | Modified T5 (custom config named deepseek_v3 ) |
Heads per Attention | 32 heads (in deeper layers) |
Layer Count | 36 encoder blocks, 36 decoder blocks |
Vocabulary Size | 32k tokens (SentencePiece-based) |
Positional Encoding | Absolute + Learned segment-based for 8k tokens |
Training Paradigm | Instruction-tuned + Additional domain tasks |
Precision | FP32, FP16, 4-bit, 8-bit quantization (via BnB) |
Below are simplified instructions for installing DeepSeek R1-Zero:
- Python >= 3.8
- PyTorch >= 2.0
- Transformers >= 4.34.0
- Accelerate >= 0.24.0
- bitsandbytes >= 0.39.0 (if using 4-bit/8-bit)
- FFmpeg (required for video rendering)
FFmpeg is required for Manim to render animations. Here's how to install it:
- Download from https://www.gyan.dev/ffmpeg/builds/
- Recommended: "ffmpeg-release-essentials.7z"
- Extract the archive
- Add the
bin
folder to your system PATH- Or install via package manager:
choco install ffmpeg
- Or install via package manager:
sudo apt-get update
sudo apt-get install ffmpeg
brew install ffmpeg
pip install --upgrade torch transformers accelerate bitsandbytes
If your environment's default PyTorch is older than 2.0, consider updating or installing from PyPI/conda channels that provide a recent version.
After installing prerequisites, you can load the model from the Hugging Face Hub. For example:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True,
torch_dtype=torch.float16, # or torch.float32
device_map="auto" # automatically move model to GPU
> **Note**:
> 1) `trust_remote_code=True` is essential because R1-Zero uses custom code.
> 2) Download times may be substantial (potentially hours) depending on your bandwidth and how Hugging Face shards large models.
---
## **5. Quantization & Memory Footprint**
DeepSeek R1-Zero supports **multi-bit quantization** to optimize memory usage:
1. **4-Bit Quantization**
- **Pros**: Minimizes VRAM usage (~8GB).
- **Cons**: Potentially minor losses in numeric accuracy or generative quality.
2. **8-Bit Quantization**
- **Pros**: Still significantly reduces memory (~14GB VRAM).
- **Cons**: Slight overhead vs. 4-bit but often better fidelity.
3. **Full Precision (FP32)**
- **Pros**: The highest theoretical accuracy.
- **Cons**: ~28GB VRAM usage, not feasible on smaller GPUs.
Sample quantized load (4-bit) with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes):
```python
model_4bit = AutoModelForSeq2SeqLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True,
device_map="auto",
load_in_4bit=True
)
- Sharded Checkpoints: The model is split into multiple shards; each shard is verified upon download. Large shards can be memory-mapped, so your system requirements also include disk I/O overhead.
- Accelerate Integration: By leveraging Accelerate, you can distribute model shards across multiple GPUs or perform CPU offloading if GPU memory is insufficient.
- Rotary & Segment Encodings: At large sequence lengths, standard absolute positions can degrade performance. R1-Zero's hybrid approach (inspired by [T5], [LongT5], and [RoFormer]) helps maintain stable gradients even at 8k tokens.
- Parallel Cross-Attention: The decoder employs a specialized parallel cross-attention mechanism in certain layers, which can reduce overhead in multi-GPU setups.
DeepSeek R1-Zero typically competes near GPT-3.5 performance in standard generative benchmarks:
-
Inference Latency
- 4-bit: ~100–200ms per token (varies by GPU)
- FP16: ~200–400ms per token
- FP32: ~400–800ms per token
-
Quality Metrics
- BLEU & ROUGE: On summarization tasks (CNN/DailyMail), R1-Zero hovers at ~1–2 points below GPT-3.5.
- Open Domain QA: On NaturalQuestions, R1-Zero closely matches strong baselines (e.g., T5-XXL) when properly instructed.
Keep in mind that your hardware setup and parallelism strategies can influence these benchmarks significantly.
Despite R1-Zero's strengths, several limitations persist:
- Token Context Limit: 8,192 tokens is high, but certain extreme use cases (e.g., full-text searching in large documents) may require bridging or chunking.
- Training Biases: While instruction-tuning reduces hallucinations, domain gaps remain. For heavily specialized or newly emerging knowledge, the model may produce uncertain or dated information.
- Interpretability: Like all Transformer-based LLMs, R1-Zero functions as a "black box." Advanced interpretability tools are still an active research area.
Future Directions:
- Integrating advanced memory systems to handle prompts beyond 8k tokens.
- Incorporating flash attention for further speed-ups.
- Investigating retrieval-augmented generation modules to reduce outdated knowledge reliance.
Below are a few quick examples to illustrate R1-Zero's capabilities:
prompt = "Write a short sci-fi story about artificial intelligence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(inputs["input_ids"], max_length=150)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
prompt = "Explain the concept of gradient descent as if speaking to a first-year PhD student."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(inputs["input_ids"], max_length=200)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Feel free to refine these prompts and tune generation parameters (num_beams
, temperature
, top_k
, etc.) to shape the style.
If you use this project in your research or work, please cite it as:
@misc{cooper2025deepseekmanim,
title={DeepSeek-Manim Animation Generator: Automated Mathematical Animations using DeepSeek API},
author={Cooper, Christian H.},
year={2025},
howpublished={\url{https://github.com/HarleyCoops/Deepseek-R1-Zero}},
note={A tool for generating Manim animations using DeepSeek's API}
}
This repository uses Git Large File Storage (Git LFS) to handle large media files like GIFs. Here's how to work with large media files:
-
Install Git LFS:
git lfs install
-
Track GIF files:
git lfs track "*.gif"
-
Add and commit the
.gitattributes
file:git add .gitattributes git commit -m "chore: Add Git LFS tracking for GIF files"
-
Place GIF files in their designated directory (e.g.,
SpatialReasoningTest/
) -
Update
.gitignore
to allow GIFs in specific directories:# Ignore GIFs except in specific directories *.gif !SpatialReasoningTest/*.gif
-
Add and commit the files:
git add your-directory/*.gif git commit -m "feat: Add new animation GIFs" git push
- Check tracked files:
git lfs ls-files
- Verify status:
git status
- Check file tracking patterns:
git lfs track