Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice index out of bounds problem during training #36

Open
numancelik34 opened this issue Feb 6, 2021 · 11 comments
Open

slice index out of bounds problem during training #36

numancelik34 opened this issue Feb 6, 2021 · 11 comments

Comments

@numancelik34
Copy link

Getting following error while trying to adapt into my dataset. My dataset is composed of 256,256,3 for raw images and 256,256,1 for labels; but I use different type of images and not using slices. I try to work with binary segmentation problem.

ValueError: slice index 1 of dimension 3 out of bounds. for 'strided_slice_27' (op: 'StridedSlice') with input shapes: [?,256,256,1], [4], [4], [4] and with computed input tensors: input[1] = <0 0 0 1>, input[2] = <0 0 0 2>, input[3] = <1 1 1 1>.

Why getting this problem?

Thanks

@cchen-cc
Copy link
Owner

cchen-cc commented Feb 8, 2021

Hi, I never had this issue before. From reading the error information it's difficult for me to guess the reason. You may want to have a step-by-step debugging about the data reading.

@numancelik34
Copy link
Author

Hi again @cchen-cc -

I think because my dataset may have different format what you have. And problem is probably caused by defining labels in dataloader.py
Let's say I have train -> image(256,256,3, dtype = float32) labels (256,256,1, dtype = float32); and target sets -> image(256,256,1, dtype = float32) and labels (256,256,1, dtype = float32).

My aim is to do a binary segmentation, thus I have slightly changed the dataloader.py

--> label_vol = tf.decode_raw(parser['label_vol'], tf.float32)
label_vol = tf.reshape(label_vol, label_size)
label_vol = tf.slice(label_vol, [0, 0, 0], label_size)

     batch_y = tf.one_hot(tf.cast(tf.squeeze(label_vol), tf.uint8), 1) -- because my problem is binary segmentation (0 and 1)
    return data_vol[:, :, :], batch_y

Let me know please if these are correct for defining data loader for binary segmentation problem?

Thanks

@cchen-cc
Copy link
Owner

cchen-cc commented Feb 8, 2021

I'm not very sure, but I think the depth set in tf.one_hot to be 1 seems incorrect.
if you use sigmoid cross entropy to calculate loss for binary segmentation task, you don't need to change your label to one-hot format. And if you want to use softmax cross entropy, the the depth in tf.one_hot should be 2.

@numancelik34
Copy link
Author

(@cchen-cc ) Now getting the error below:

OutOfRangeError (see above for traceback): RandomShuffleQueue '_1_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 8, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

@muddasser-mm
Copy link

muddasser-mm commented Feb 10, 2021

@cchen-cc Hello. Sorry for commenting in between the discussion. But I'm facing the same error when trying to train, as mentioned above by @numancelik34 :

Error - tensorflow.python.framework.errors_impl. OutOfRangeError: RandomShuffleQueue '_3_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 8, current size 0)

I did refer to your answer relating to the same issue #13 #21 :
' This type of error usually happens when data cannot be correctly read.
There can be various causes for why data are not read correctly and usually careful debugging is needed.
Usually you could check the following things:

whether the data paths stored in imagea_list and imageb_list are correct
whether the data written into the tfrecords are with correct shape and type
whether the data reading in the data_load.py is with correct shape and type'

I paths and data in the variables is correctly taken, but still the issue seems to occur when shuffle_batch() is executed during sess.run(self.inputs) in main.py. @cchen-cc it would be great if you could provide any more inputs or suggestions to debug the issue.

@numancelik34
Copy link
Author

Same here @cchen-cc
I have checked already and here is the details what I have regarding the shape and type:
1- data paths are stored in imagea_list and imageb_list correctly
2- the data written into tfrecords with 256,256,3 for image and 256,256,1 for the labels for both train and target datasets. dtype of both image sets is float32.
3 - while reading this tfrecords with the same shape and dtype --> 256,256,3 (image) and 156,256,1 (for labels). Dtype to read the files is float32 as well.

Let me know please if there is missing point here.

Thanks

@muddasser-mm
Copy link

@cchen-cc Thanks. The issue indeed was with the paths. It's solved for me now.

@numancelik34
Copy link
Author

Hi @muddasser27 - what type of dataset you are using - same with the one that SIFA paper implemented? or your own dataset? If it is your own dataset, then how is your data is composed? like the usual shape arrangement - 3 dims and no any slices?

Thanks

@muddasser-mm
Copy link

Hi @numancelik34 . I'm using the same dataset used in SIFA and shared by the the authors.

@cchen-cc
Copy link
Owner

@muddasser27 Congrats you've found the cause.
@numancelik34 Since you are using your own dataset, you may want to check your data conversion into tfrecords more carefully.

@chke097
Copy link

chke097 commented Jun 7, 2022

@muddasser27 Here I see that you have successfully trained using the data set of the original author. I also try to train using the data set of the original author, but this error keeps appearing. How can I solve it? Or would you like to share your modified code?

[07:35<73:03:58, 26.32s/it]iter 6: processing time 0.4044947624206543
0%| | 7/10000 [07:36<49:32:30, 17.85s/it]iter 7: processing time 0.3983621597290039
0%| | 8/10000 [07:36<158:27:30, 57.09s/it]
Traceback (most recent call last):
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 626, in
main(config_filename='./config_param.json')
File "main.py", line 622, in main
sifa_model.train()
File "main.py", line 411, in train
images_i, images_j, gts_i, gts_j= sess.run(self.inputs)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Caused by op 'shuffle_batch', defined at:
File "main.py", line 626, in
main(config_filename='./config_param.json')
File "main.py", line 622, in main
sifa_model.train()
File "main.py", line 355, in train
self.inputs = data_loader.load_data(self._source_train_pth, self._target_train_pth, True)
File "/data2/xxb/SIFA-master/SIFA-master/data_loader.py", line 112, in load_data
1,50,10,num_threads=4)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 1300, in shuffle_batch
name=name)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 846, in _shuffle_batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 476, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3480, in queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/data2/xxb/anaconda/envs/tensorflow10/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

This problem always occurs when iterating over a certain number of steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants