Q1) SophiaH, AdaHessian optimizers give RuntimeError: ~ tensors does not require grad and does not have a grad_fn
in compute_hutchinson_hessian()
.
create_graph
must be set True
when calling backward()
. here's an example.
torch.autograd.grad
with complex gradient flows sometimes leads memory leak issues, and you might encounter OOM issue. related issue
Run python3 -m examples.visualize_optimizers
on the project root.