Error in training network #2

abvazquez23 · 2022-05-11T22:06:42Z

When I run Train_3DConv_Binary.py on the annotated dataset I get the following errors:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: 'Groom_Dataset' object has no attribute '__get_train_example_queue'
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\utils\data_utils.py", line 560, in data_generator_task
    generator_output = next(self._generator)
  File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 274, in get_train_generator_parallel
    pool_index.start()
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Traceback (most recent call last):
  File "Train_3DConv_Binary.py", line 136, in <module>
    main(sys.argv[1:])
  File "Train_3DConv_Binary.py", line 131, in main
    multigpunet.fit_generator(dataset.get_train_generator_parallel(n_threads), train_steps_per_epoch, epochs=200, verbose=2, validation_data=dataset.get_valid_generator_parallel(n_threads), validation_steps=valid_steps_per_epoch, initial_epoch=0, workers=1, callbacks=[ckpt_saver, tensorboard_out, early_stopper, reduce_lr])
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\engine\training.py", line 1809, in fit_generator
    generator_output = next(output_generator)
StopIteration

I'm running the code on Windows 10 and using the same versions of software listed in the documentation: Python 3.6.2, Tensorflow 1.2.1, Keras 2.0.6, OpenCV 3.3.0, R 3.3.0

The text was updated successfully, but these errors were encountered:

SkepticRaven · 2022-05-12T12:45:06Z

It looks like the parallel spawner is looking for an attribute instead of a function here.
It could either be a change they made to multiprocessing or the windows implementation of multiprocessing that is the issue.

I don't have a windows dev environment to try and debug it, but an easy fix would be to just use the non-parallel loader functions.
e.g. changing this line in the training code replacing both instances of "_parallel(n_threads)" with just "()".

abvazquez23 · 2022-05-18T16:38:20Z

That fixed it thanks!
However, I'm now receiving this TypeError:
File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 182, in get_train_generator raw_batch = [self.__get_train_example() for x in range(self.batch_size)] File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 182, in <listcomp> raw_batch = [self.__get_train_example() for x in range(self.batch_size)] File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 145, in __get_train_example raw_data = self.__retrieve_data(data_path, range(data_min+1, sample+1)) File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 123, in __retrieve_data vid_data = temp_h5file[data_path + '/video'][data_chunk,:,:] File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "C:\Users\Windows\anaconda3\envs\tf_gpu\lib\site-packages\h5py\_hl\dataset.py", line 777, in __getitem__ selection = sel.select(self.shape, args, dataset=self) File "C:\Users\Windows\anaconda3\envs\tf_gpu\lib\site-packages\h5py\_hl\selections.py", line 82, in select return selector.make_selection(args) File "h5py\_selector.pyx", line 272, in h5py._selector.Selector.make_selection File "h5py\_selector.pyx", line 213, in h5py._selector.Selector.apply_args TypeError: Simple selection can't process range(4755, 4771)

SkepticRaven · 2022-05-18T18:13:21Z

Oh, this now looks like a python2 vs 3 issue. I only ever trained using python2. That might actually explain the versioning with multiprocessing a bit better as well.

It's another fairly straight forward fix, but quite a few more edits.
All range() in the Training/ReadHDF5 file needs to be replaced with either np.arange() or list(range()).

abvazquez23 · 2022-05-20T17:47:45Z

I've successfully gone on to train the model, but now when running TestInfer.py I get the following error:
Traceback (most recent call last): File "TestInfer.py", line 138, in <module> main(sys.argv[1:]) File "TestInfer.py", line 32, in main net.pop() AttributeError: 'Model' object has no attribute 'pop'
Thanks for the help!

SkepticRaven · 2022-05-25T13:40:54Z

This looks like a change in the keras module. This stackoverflow answer may be able to provide an option of updating it: https://stackoverflow.com/a/57321117
It seems like they changed how the Model.pop() function operated and eventually removed it. Might work with something like net = keras.models.Sequential(net.layers[:-1])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in training network #2

Error in training network #2

abvazquez23 commented May 11, 2022

SkepticRaven commented May 12, 2022

abvazquez23 commented May 18, 2022

SkepticRaven commented May 18, 2022

abvazquez23 commented May 20, 2022

SkepticRaven commented May 25, 2022

Error in training network #2

Error in training network #2

Comments

abvazquez23 commented May 11, 2022

SkepticRaven commented May 12, 2022

abvazquez23 commented May 18, 2022

SkepticRaven commented May 18, 2022

abvazquez23 commented May 20, 2022

SkepticRaven commented May 25, 2022