Skip to content
This repository has been archived by the owner on Feb 6, 2020. It is now read-only.

Segfault when training with 3D (outsz z > 1) #100

Open
jkopinsky opened this issue Oct 7, 2016 · 4 comments
Open

Segfault when training with 3D (outsz z > 1) #100

jkopinsky opened this issue Oct 7, 2016 · 4 comments

Comments

@jkopinsky
Copy link

If I run using the default config.cfg file (with N4_relu.znn) except with outsz changed from 1,100,100 to 2,2,2 I get a segfault. I checked a couple other networks which seem to have the same issue.

Here is a dump from gdb:

/home/jkopinsky/ZNN/python/test.py:9: RuntimeWarning: to-Python converter for boost::shared_ptrznn::v4::parallel_network::network already registered; second conversion method ignored.
from core import pyznn
reading config parameters...
net file not exist: ../experiments/ac3/net_current.h5
initialize a new network...
parse_net file: ../networks/VD2D.znn
construct the network class using the edges and nodes...
field of view: (1, 109, 109)
setting up the network...

create train samples...

create input image class...
/home/jkopinsky/ZNN/python/tifffile.py:1898: UserWarning: failed to import _tifffile.decodepackbits
warnings.warn("failed to import %s" % module_function)
/home/jkopinsky/ZNN/python/tifffile.py:1898: UserWarning: failed to import _tifffile.decodelzw
warnings.warn("failed to import %s" % module_function)
/home/jkopinsky/ZNN/python/tifffile.py:1898: UserWarning: failed to import _tifffile.unpackints
warnings.warn("failed to import %s" % module_function)
boundary mirror...

create label image class...
/home/jkopinsky/ZNN/python/tifffile.py:1995: UserWarning: decodelzw encountered unexpected end of stream (code 514)
"decodelzw encountered unexpected end of stream (code %i)" % code)

create test samples...
start training...
start from 1
save as ../experiments/ac3/net_init.h5
stdpre: /processing/znn/train/statistics/

Program received signal SIGSEGV, Segmentation fault.
znn::v4::get_segmentation (affs=..., threshold=threshold@entry=0.5) at ../../src/include/flow_graph/computation/get_segmentation.hpp:94
94 if ( zaff[z][y][x] > threshold )

@xiuliren
Copy link
Member

xiuliren commented Oct 7, 2016

thanks for reporting this issue. I have reproduced your error.

In the frontend, it seems feeding the correct shape of volume.

create label image class...
start training...
start from  1
save as  ../experiments/test/N4/net_init.h5
stdpre:  /processing/znn/train/statistics/
(2, 2, 32, 32)
(1, 2, 126, 126)
input volume size: (1, 2, 126, 126)
Segmentation fault (core dumped)

I always use 2D output size, have not tried 3D output size yet.
@torms3 have you ever tried 3D output size training? Is that possible that it is a bug in C++ core?

@torms3
Copy link
Contributor

torms3 commented Oct 7, 2016

There should be no problem with using 3D output patch. I ran backend test code using N4 and VD2D with 2x2x2 output, and checked there is no problem.

I don't understand why the backend get_segmentation function was called. I believe this function shouldn't be called (the only exception is MALIS training, which is not guaranteed to be bug-free).

Program received signal SIGSEGV, Segmentation fault.
znn::v4::get_segmentation (affs=..., threshold=threshold@entry=0.5) at ../../src/include/flow_graph/computation/get_segmentation.hpp:94
94 if ( zaff[z][y][x] > threshold )

Also I don't understand @jingpengwu's output either. Did you use 2x2x2 output? How come the input volume size be (1, 2, 126, 126), given that N4's fov is (1, 95, 95)?

@xiuliren
Copy link
Member

xiuliren commented Oct 7, 2016

nope, I am using 2,32,32

@torms3
Copy link
Contributor

torms3 commented Oct 7, 2016

@jingpengwu This is a frontend bug that has nothing to do with backend.

gdb debugging tells that this is caused by calling get_rand_error.

#2  0x00007fffda7df922 in pyget_rand_error(boost::numpy::ndarray&, boost::numpy::ndarray&) () from ../python/core/pyznn.so

I don't understand why this code is not conditioned with the is_malis flag, but this shouldn't be called when training with boundary output. Maybe you were using this code for experimenting with 2D MALIS?

I would recommend that all the MALIS-related code need be removed from the master codebase. It's experimental codes that have never been proven either useful or stable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants