Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in training network #2

Open
abvazquez23 opened this issue May 11, 2022 · 5 comments
Open

Error in training network #2

abvazquez23 opened this issue May 11, 2022 · 5 comments

Comments

@abvazquez23
Copy link

When I run Train_3DConv_Binary.py on the annotated dataset I get the following errors:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: 'Groom_Dataset' object has no attribute '__get_train_example_queue'
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\utils\data_utils.py", line 560, in data_generator_task
    generator_output = next(self._generator)
  File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 274, in get_train_generator_parallel
    pool_index.start()
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Traceback (most recent call last):
  File "Train_3DConv_Binary.py", line 136, in <module>
    main(sys.argv[1:])
  File "Train_3DConv_Binary.py", line 131, in main
    multigpunet.fit_generator(dataset.get_train_generator_parallel(n_threads), train_steps_per_epoch, epochs=200, verbose=2, validation_data=dataset.get_valid_generator_parallel(n_threads), validation_steps=valid_steps_per_epoch, initial_epoch=0, workers=1, callbacks=[ckpt_saver, tensorboard_out, early_stopper, reduce_lr])
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Windows\anaconda3\envs\grooming\lib\site-packages\keras\engine\training.py", line 1809, in fit_generator
    generator_output = next(output_generator)
StopIteration

I'm running the code on Windows 10 and using the same versions of software listed in the documentation: Python 3.6.2, Tensorflow 1.2.1, Keras 2.0.6, OpenCV 3.3.0, R 3.3.0

@SkepticRaven
Copy link
Contributor

It looks like the parallel spawner is looking for an attribute instead of a function here.
It could either be a change they made to multiprocessing or the windows implementation of multiprocessing that is the issue.

I don't have a windows dev environment to try and debug it, but an easy fix would be to just use the non-parallel loader functions.
e.g. changing this line in the training code replacing both instances of "_parallel(n_threads)" with just "()".

@abvazquez23
Copy link
Author

That fixed it thanks!
However, I'm now receiving this TypeError:
File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 182, in get_train_generator raw_batch = [self.__get_train_example() for x in range(self.batch_size)] File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 182, in <listcomp> raw_batch = [self.__get_train_example() for x in range(self.batch_size)] File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 145, in __get_train_example raw_data = self.__retrieve_data(data_path, range(data_min+1, sample+1)) File "C:\Users\Windows\Desktop\Grooming_2\MouseGrooming\Training\ReadHDF5.py", line 123, in __retrieve_data vid_data = temp_h5file[data_path + '/video'][data_chunk,:,:] File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "C:\Users\Windows\anaconda3\envs\tf_gpu\lib\site-packages\h5py\_hl\dataset.py", line 777, in __getitem__ selection = sel.select(self.shape, args, dataset=self) File "C:\Users\Windows\anaconda3\envs\tf_gpu\lib\site-packages\h5py\_hl\selections.py", line 82, in select return selector.make_selection(args) File "h5py\_selector.pyx", line 272, in h5py._selector.Selector.make_selection File "h5py\_selector.pyx", line 213, in h5py._selector.Selector.apply_args TypeError: Simple selection can't process range(4755, 4771)

@SkepticRaven
Copy link
Contributor

Oh, this now looks like a python2 vs 3 issue. I only ever trained using python2. That might actually explain the versioning with multiprocessing a bit better as well.

It's another fairly straight forward fix, but quite a few more edits.
All range() in the Training/ReadHDF5 file needs to be replaced with either np.arange() or list(range()).

@abvazquez23
Copy link
Author

I've successfully gone on to train the model, but now when running TestInfer.py I get the following error:
Traceback (most recent call last): File "TestInfer.py", line 138, in <module> main(sys.argv[1:]) File "TestInfer.py", line 32, in main net.pop() AttributeError: 'Model' object has no attribute 'pop'
Thanks for the help!

@SkepticRaven
Copy link
Contributor

This looks like a change in the keras module. This stackoverflow answer may be able to provide an option of updating it: https://stackoverflow.com/a/57321117
It seems like they changed how the Model.pop() function operated and eventually removed it. Might work with something like net = keras.models.Sequential(net.layers[:-1])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants