Final model save is failing #681

klromans557 · 2025-02-10T19:16:12Z

What happened?

When training a SDXL model I get a MemoryError message when saving the final model. During training, the model saves happen normally without issue but fail during the final model save. This issue was not happening with previous model trains last week.

What did you expect would happen?

That the final model save works if the intermediate model saves work.

Relevant log output

step: 100%|██████████| 28/28 [00:48<00:00,  1.74s/it, loss=0.0256, smooth loss=0.0907]7]
Saving C:/Users/klrom/Desktop/Model Training Mats/amber_img/workspace\save\tier1_fast2025-02-10_13-02-43-save-560-20-0.safetensors
step: 100%|██████████| 28/28 [01:00<00:00,  2.17s/it, loss=0.0329, smooth loss=0.0858]8]
step: 100%|██████████| 28/28 [00:49<00:00,  1.76s/it, loss=0.0377, smooth loss=0.085]5] 
step:  21%|██▏       | 6/28 [00:12<00:45,  2.05s/it, loss=0.107, smooth loss=0.0844]4] 
epoch:  55%|█████▌    | 22/40 [19:12<15:42, 52.38s/it]
Saving C:/Users/klrom/Desktop/Model Training Mats/amber_img/workspace/model/amber.safetensors
Exception in thread Thread-3 (__training_thread_function):
Traceback (most recent call last):
  File "threading.py", line 1016, in _bootstrap_inner
  File "threading.py", line 953, in run
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\modules\ui\TrainUI.py", line 579, in __training_thread_function
    trainer.end()
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\modules\trainer\GenericTrainer.py", line 766, in end
    self.model_saver.save(
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\modules\modelSaver\StableDiffusionXLFineTuneModelSaver.py", line 28, in save
    base_model_saver.save(model, output_model_format, output_model_destination, dtype)
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\modules\modelSaver\stableDiffusionXL\StableDiffusionXLModelSaver.py", line 109, in save
    self.__save_safetensors(model, output_model_destination, dtype)
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\modules\modelSaver\stableDiffusionXL\StableDiffusionXLModelSaver.py", line 83, in __save_safetensors
    save_file(save_state_dict, destination, self._create_safetensors_header(model, save_state_dict))
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\venv\lib\site-packages\safetensors\torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\venv\lib\site-packages\safetensors\torch.py", line 496, in _flatten
    return {
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\venv\lib\site-packages\safetensors\torch.py", line 500, in <dictcomp>
    "data": _tobytes(v, k),
  File "C:\Users\klrom\Desktop\StableMatrix\Data\Packages\OneTrainer\venv\lib\site-packages\safetensors\torch.py", line 460, in _tobytes
    return data.tobytes()
MemoryError

Output of `pip freeze`

absl-py==2.1.0
accelerate==1.0.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiohttp-retry==2.9.1
aiosignal==1.3.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.8.0
async-timeout==4.0.3
attrs==24.2.0
av==13.1.0
backoff==2.2.1
bcrypt==4.2.1
bitsandbytes==0.44.1
boto3==1.35.94
botocore==1.35.94
Brotli==1.1.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.8
cloudpickle==3.1.0
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.3.0
cryptography==43.0.3
customtkinter==5.2.2
cycler==0.12.1
dadaptation==3.2
darkdetect==0.8.0
decorator==5.1.1
Deprecated==1.2.15
-e git+https://github.com/huggingface/diffusers.git@c944f0651f679728d4ec7b6488120ac49c2f1315#egg=diffusers
dnspython==2.7.0
email_validator==2.2.0
exceptiongroup==1.2.2
fabric==3.2.2
fastapi==0.115.6
fastapi-cli==0.0.7
filelock==3.16.1
flatbuffers==24.3.25
fonttools==4.54.1
frozenlist==1.5.0
fsspec==2024.10.0
ftfy==6.3.1
grpcio==1.67.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
inquirerpy==0.3.4
intel-openmp==2021.4.0
invisible-watermark==0.2.0
invoke==2.2.0
itsdangerous==2.2.0
Jinja2==3.1.4
jmespath==1.0.1
kiwisolver==1.4.7
lightning-utilities==0.11.8
lion-pytorch==0.2.2
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
mdurl==0.1.2
-e git+https://github.com/Nerogar/mgds.git@fcaec253ddff9dccd0f9644836fe87b0103f23f7#egg=mgds
mkl==2021.4.0
mpmath==1.3.0
multidict==6.1.0
networkx==3.4.2
numpy==1.26.4
nvidia-ml-py==12.560.30
omegaconf==2.3.0
onnxruntime==1.19.2
onnxruntime-gpu==1.19.2
open_clip_torch==2.28.0
opencv-python==4.10.0.84
orjson==3.10.13
packaging==24.1
paramiko==3.5.0
pfzy==0.3.4
pillow==11.0.0
platformdirs==4.3.6
pooch==1.8.2
prettytable==3.12.0
prodigyopt==1.1.1
prompt_toolkit==3.0.48
propcache==0.2.0
protobuf==4.25.5
psutil==6.1.0
py-cpuinfo==9.0.0
pycparser==2.22
pydantic==2.9.2
pydantic-extra-types==2.10.1
pydantic-settings==2.7.1
pydantic_core==2.23.4
Pygments==2.18.0
PyNaCl==1.5.0
pynvml==11.5.0
pyparsing==3.2.0
pyreadline3==3.5.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.20
pytorch-lightning==2.4.0
pytorch_optimizer==3.3.0
PyWavelets==1.7.0
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
rich==13.9.3
rich-toolkit==0.12.0
runpod==1.7.4
s3transfer==0.10.4
safetensors==0.4.5
scalene==1.5.45
schedulefree==1.3
sentencepiece==0.2.0
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
starlette==0.41.3
sympy==1.13.1
tbb==2021.13.1
tensorboard==2.18.0
tensorboard-data-server==0.7.2
timm==1.0.11
tokenizers==0.21.0
tomli==2.2.1
tomlkit==0.13.2
torch==2.5.1+cu124
torchmetrics==1.5.1
torchvision==0.20.1+cu124
tqdm==4.66.6
tqdm-loggable==0.2
transformers==4.47.0
typer==0.15.1
typing_extensions==4.12.2
ujson==5.10.0
urllib3==2.2.3
uvicorn==0.34.0
watchdog==6.0.0
watchfiles==1.0.3
wcwidth==0.2.13
websockets==14.1
Werkzeug==3.0.6
wrapt==1.17.0
xformers==0.0.28.post3
yarl==1.17.0
zipp==3.20.2

The text was updated successfully, but these errors were encountered:

klromans557 added the bug Something isn't working label Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final model save is failing #681

Final model save is failing #681

klromans557 commented Feb 10, 2025

Final model save is failing #681

Final model save is failing #681

Comments

klromans557 commented Feb 10, 2025

What happened?

What did you expect would happen?

Relevant log output

Output of pip freeze

Output of `pip freeze`