You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running lab1 on SageMaker.
Image: Pytorch 1.13 Python 3.9 CPU optimized
Kernel: Python3.9
Instance: ml.t3.medium
Here's the error message when running estimator.fit
---------------------------------------------------------------------------
UnexpectedStatusException Traceback (most recent call last)
Cell In[17], line 3
1 # Passing True will halt your kernel, passing False will not. Both create a training job.
2 # here we are defining the name of the input train channel. you can use whatever name you like! up to 20 channels per job.
----> 3 estimator.fit(wait=True, inputs = {'train':s3_train_path})
File /opt/conda/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:346, in runnable_by_pipeline.<locals>.wrapper(*args, **kwargs)
342 return context
344 return _StepArguments(retrieve_caller_name(self_instance), run_func, *args, **kwargs)
--> 346 return run_func(*args, **kwargs)
File /opt/conda/lib/python3.9/site-packages/sagemaker/estimator.py:1341, in EstimatorBase.fit(self, inputs, wait, logs, job_name, experiment_config)
1339 self.jobs.append(self.latest_training_job)
1340 if wait:
-> 1341 self.latest_training_job.wait(logs=logs)
File /opt/conda/lib/python3.9/site-packages/sagemaker/estimator.py:2680, in _TrainingJob.wait(self, logs)
2678 # If logs are requested, call logs_for_jobs.
2679 if logs != "None":
-> 2680 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
2681 else:
2682 self.sagemaker_session.wait_for_job(self.job_name)
File /opt/conda/lib/python3.9/site-packages/sagemaker/session.py:5766, in Session.logs_for_job(self, job_name, wait, poll, log_type, timeout)
5745 def logs_for_job(self, job_name, wait=False, poll=10, log_type="All", timeout=None):
5746 """Display logs for a given training job, optionally tailing them until job is complete.
5747
5748 If the output is a tty or a Jupyter cell, it will be color-coded
(...)
5764 exceptions.UnexpectedStatusException: If waiting and the training job fails.
5765 """
-> 5766 _logs_for_job(self, job_name, wait, poll, log_type, timeout)
File /opt/conda/lib/python3.9/site-packages/sagemaker/session.py:7995, in _logs_for_job(sagemaker_session, job_name, wait, poll, log_type, timeout)
7992 last_profiler_rule_statuses = profiler_rule_statuses
7994 if wait:
-> 7995 _check_job_status(job_name, description, "TrainingJobStatus")
7996 if dot:
7997 print()
File /opt/conda/lib/python3.9/site-packages/sagemaker/session.py:8048, in _check_job_status(job, desc, status_key_name)
8042 if "CapacityError" in str(reason):
8043 raise exceptions.CapacityError(
8044 message=message,
8045 allowed_statuses=["Completed", "Stopped"],
8046 actual_status=status,
8047 )
-> 8048 raise exceptions.UnexpectedStatusException(
8049 message=message,
8050 allowed_statuses=["Completed", "Stopped"],
8051 actual_status=status,
8052 )
UnexpectedStatusException: Error for Training job shuxucao-ddp-mnist-2024-03-19-03-40-53-406: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
# may not use this file except in compliance with the License. A copy of
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen zipimport>", line 259, in load_module
File
The installed pip package protobuf is 3.20.2. Should I run this lab at python3.8?
The text was updated successfully, but these errors were encountered:
I'm running lab1 on SageMaker.
Image: Pytorch 1.13 Python 3.9 CPU optimized
Kernel: Python3.9
Instance: ml.t3.medium
Here's the error message when running
estimator.fit
The installed pip package protobuf is 3.20.2. Should I run this lab at python3.8?
The text was updated successfully, but these errors were encountered: