Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DarkMark crashes #30

Closed
hassoun123 opened this issue Dec 23, 2023 · 11 comments
Closed

DarkMark crashes #30

hassoun123 opened this issue Dec 23, 2023 · 11 comments

Comments

@hassoun123
Copy link

although the training process is finished, weights and cfg files are all good with correct paths, whenever i try to load after having the best weights and cfg files DarkMark crashes

@stephanecharette
Copy link
Owner

This is not enough details for me to determine what might have gone wrong. At the very least, you need to look at the log file and see where the crash happened, and the call stack that gets logged.

@hassoun123
Copy link
Author

The output from Valgrind indicates that there are memory allocation issues within the DarkMark application, specifically within functions that are part of the OpenCV library (libopencv_core.so) and the DarkHelp::NN class of DarkMark. The possibly lost memory bytes suggest that there are allocations that were not freed, which can be indicative of memory leaks.
and the log indicates that DarkMark is experiencing a segmentation fault.

User
2023-12-23 19:19:57 finding all images and markup files in /home/hassan/YOLOv4/Paddle
2023-12-23 19:19:57 number of images found in /home/hassan/YOLOv4/Paddle: 3955
2023-12-23 19:19:57 loading darknet neural network
2023-12-23 19:19:57 attempting to load neural network /home/hassan/YOLOv4/Paddle/Paddle.cfg / /home/hassan/YOLOv4/Paddle/Paddle_best.weights / /home/hassan/YOLOv4/Paddle/Paddle.names
2023-12-23 19:19:57 neural network loaded in 166.778 milliseconds
2023-12-23 19:19:57 number of name entries: 1
2023-12-23 19:19:57 aborting due to signal: "Segmentation fault" [signal #11]
2023-12-23 19:19:57 backtrace #0: ./DarkMark: get_backtraceabi:cxx11 +0x4f [0x56505d617ccf]
2023-12-23 19:19:57 backtrace #1: ./DarkMark: dm::DarkMarkApplication::signal_handler(int) +0x1ca [0x56505d61908a]
2023-12-23 19:19:57 backtrace #2: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f353aeb1520]
2023-12-23 19:19:57 backtrace #3: /lib/x86_64-linux-gnu/libc.so.6(+0x1af84e) [0x7f353b01e84e]
2023-12-23 19:19:57 backtrace #4: /lib/libdarknet.so: im2col_cpu_ext +0x65e [0x7f353c2d57fe]
2023-12-23 19:19:57 backtrace #5: /lib/libdarknet.so: forward_convolutional_layer +0x165 [0x7f353c258b15]
2023-12-23 19:19:57 backtrace #6: /lib/libdarknet.so: forward_network +0x87 [0x7f353c2fa8f7]
2023-12-23 19:19:57 backtrace #7: /lib/libdarknet.so: network_predict +0x87 [0x7f353c2fc127]
2023-12-23 19:19:57 backtrace #8: ./DarkMark: DarkHelp::NN::predict_internal_darknet() +0x23f [0x56505da53acf]
2023-12-23 19:19:57 backtrace #9: ./DarkMark: DarkHelp::NN::predict_internal(cv::Mat, float) +0x2e2 [0x56505da590a2]
2023-12-23 19:19:57 backtrace #10: ./DarkMark: DarkHelp::NN::predict(cv::Mat, float) +0xa7 [0x56505da5b4d7]
2023-12-23 19:19:57 backtrace #11: ./DarkMark: dm::DMContent::load_image(unsigned long, bool, bool) +0x76b [0x56505d6a431b]
2023-12-23 19:19:57 backtrace #12: ./DarkMark: dm::DMContent::set_sort_order(dm::ESort) +0x171 [0x56505d6a4fb1]
2023-12-23 19:19:57 backtrace #13: ./DarkMark: dm::DMContent::start_darknet() +0x525 [0x56505d6a57c5]
2023-12-23 19:19:57 backtrace #14: ./DarkMark: dm::DMWnd::DMWnd(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) +0x2de [0x56505d6dad5e]
2023-12-23 19:19:57 backtrace #15: ./DarkMark: dm::StartupWnd::buttonClicked(juce::Button*) +0x1e03 [0x56505d6f9a33]
2023-12-23 19:19:57 backtrace #16: ./DarkMark: juce::Button::sendClickMessage(juce::ModifierKeys const&) +0x228 [0x56505d979fd8]
2023-12-23 19:19:57 backtrace #17: ./DarkMark: juce::Button::mouseUp(juce::MouseEvent const&) +0xe9 [0x56505d975439]
2023-12-23 19:19:57 backtrace #18: ./DarkMark: juce::Component::internalMouseUp(juce::MouseInputSource, juce::PointerState const&, juce::Time, juce::ModifierKeys) +0x1b7 [0x56505d975837]
2023-12-23 19:19:57 backtrace #19: ./DarkMark: juce::MouseInputSourceInternal::setButtons(juce::PointerState const&, juce::Time, juce::ModifierKeys) +0x119 [0x56505da01a79]
2023-12-23 19:19:57 backtrace #20: ./DarkMark: juce::MouseInputSource::handleEvent(juce::ComponentPeer&, juce::Point, long long, juce::ModifierKeys, float, float, juce::PenDetails const&) +0x2f0 [0x56505d978c40]
2023-12-23 19:19:57 backtrace #21: ./DarkMark: juce::XWindowSystem::handleButtonReleaseEvent(juce::LinuxComponentPeer*, XButtonEvent const&) const +0x1bb [0x56505d9a380b]
2023-12-23 19:19:57 backtrace #22: ./DarkMark: juce::XWindowSystem::handleWindowMessage(juce::LinuxComponentPeer*, _XEvent&) const +0x2ad [0x56505d9a570d]
2023-12-23 19:19:57 backtrace #23: ./DarkMark(+0x59bbab) [0x56505d9a5bab]
2023-12-23 19:19:57 backtrace #24: ./DarkMark: juce::MessageManager::runDispatchLoop() +0x1a1 [0x56505d7e48c1]
2023-12-23 19:19:57 backtrace #25: ./DarkMark: juce::JUCEApplicationBase::main() +0x41 [0x56505d614461]
2023-12-23 19:19:57 backtrace #26: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f353ae98d90]
2023-12-23 19:19:57 backtrace #27: /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x80 [0x7f353ae98e40]
2023-12-23 19:19:57 backtrace #28: ./DarkMark: _start +0x25 [0x56505d615fb5]
Aborted (core dumped)

note that this happens only when i press the load button after creating the darknet files and training the model

@stephanecharette
Copy link
Owner

I and the other people on the Darknet/YOLO discord have zero crashes or problems with DarkMark. It is working well for us.

If you'd like me to dig into the problem you are seeing, then we'd need more details, or some files we can load to replicate the problem you are seeing. Without being able to replicate the problem, we're 100% reliant on your description to reproduce the issue.

Here is a screenshot I took just now of DarkMark v1.8.18-1 -- the latest version -- running on my rig where I'm training a new network today. So I'm 100% certain that it does work.
image

Just out of curiosity, which version of Darknet are you using? You should be using the latest from this repo: https://github.com/hank-ai/darknet#table-of-contents

@hassoun123
Copy link
Author

hassoun123 commented Dec 25, 2023 via email

@stephanecharette
Copy link
Owner

The "dirty" in your version string shows that you made local modifications to darknet. What changes did you make? Run "git status" and/or "git diff" to see what you modified.

@hassoun123
Copy link
Author

hassoun123 commented Dec 25, 2023 via email

@stephanecharette
Copy link
Owner

Git appends the word "dirty" if a repo has changes in it. It has nothing to do with Mint. Run git status and/or git diff to see the changes you have.

@stephanecharette
Copy link
Owner

I believe I managed to reproduce the problem. Fix is in progress.

@stephanecharette
Copy link
Owner

Please see if the latest version of Darknet combined with the latest version of DarkHelp have solved the issue.

@hassoun123
Copy link
Author

hassoun123 commented Dec 27, 2023 via email

@stephanecharette
Copy link
Owner

A change recently made to darknet. When using darknet as a library instead of a CLI tool, the GPU index number remained uninitialized at -1, which then prevented the memory allocation needed to transfer image data between the CPU and the GPU. This is what led to the segfault.

In my case, I deal mostly with virtual machines, which don't have GPUs. So I've been using the CPU version of Darknet instead of the GPU version, and thus I wasn't running into this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants