Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

mesos-slave service sometimes does not startup #76

Open
ryoichitaniguchi opened this issue Apr 14, 2016 · 1 comment
Open

mesos-slave service sometimes does not startup #76

ryoichitaniguchi opened this issue Apr 14, 2016 · 1 comment

Comments

@ryoichitaniguchi
Copy link

Hi experts,

Currently I run 10 mesos-slaves on ubuntu-trusty using latest deb package(version 0.28.0-2.0.16)

I installed and configured "mesos" package by using ansible role (https://github.com/AnsibleShipyard/ansible-mesos) But it uses default configuration (of this, mesosphere/mesos-deb-packaging)

I noticed that after my scheduled OS reboot, sometimes a few of them (2-3 of 10 total) fail launch mesos-slave with below error.

dmesg:

[  138.321970] init: mesos-slave main process (2811) killed by ABRT signal
[  138.321977] init: mesos-slave main process ended, respawning
[  138.341067] init: mesos-slave main process (2835) killed by ABRT signal
[  138.341075] init: mesos-slave main process ended, respawning
[  138.359238] init: mesos-slave main process (2851) killed by ABRT signal
[  138.359254] init: mesos-slave main process ended, respawning
[  138.377498] init: mesos-slave main process (2867) killed by ABRT signal
[  138.377507] init: mesos-slave main process ended, respawning
[  138.395897] init: mesos-slave main process (2883) killed by ABRT signal
[  138.395906] init: mesos-slave main process ended, respawning
[  138.414475] init: mesos-slave main process (2899) killed by ABRT signal
[  138.414483] init: mesos-slave main process ended, respawning
[  138.432855] init: mesos-slave main process (2915) killed by ABRT signal
[  138.432863] init: mesos-slave main process ended, respawning
[  138.451119] init: mesos-slave main process (2932) killed by ABRT signal
[  138.451127] init: mesos-slave main process ended, respawning
[  138.469644] init: mesos-slave main process (2948) killed by ABRT signal
[  138.469652] init: mesos-slave main process ended, respawning
[  138.488002] init: mesos-slave main process (2964) killed by ABRT signal
[  138.488010] init: mesos-slave main process ended, respawning
[  138.506841] init: mesos-slave main process (2980) killed by ABRT signal
[  138.506849] init: mesos-slave respawning too fast, stopped

All of slaves which faces this issue printed out below syslog.

corresponding code:
https://github.com/apache/mesos/blob/845fa6abdc163676cde225e2dc72cee9e3e964f5/3rdparty/libprocess/src/process.cpp#L889

I bet, it likely EADDRNOTAVAIL (errno=99) occured on bind() ? (-> interface is not ready to be used for bind??) :

syslog:

Apr  4 01:26:15 jptolx10221 mesos-slave[2822]: WARNING: Logging before InitGoogleLogging() is written to STDERR
Apr  4 01:26:15 jptolx10221 mesos-slave[2822]: F0404 01:26:15.485184  2822 process.cpp:889] Failed to initialize: Failed to bind on 10.XX.XX.XX:5051: Cannot assign requested address: Cannot assign requested address [99]
Apr  4 01:26:15 jptolx10221 mesos-slave[2822]: *** Check failure stack trace: ***
・・・
Apr  4 01:26:15 jptolx10221 kernel: [  137.778244] init: mesos-slave main process (2973) killed by ABRT signal
Apr  4 01:26:15 jptolx10221 kernel: [  137.778253] init: mesos-slave main process ended, respawning
Apr  4 01:26:15 jptolx10221 mesos-slave[2989]: WARNING: Logging before InitGoogleLogging() is written to STDERR
Apr  4 01:26:15 jptolx10221 mesos-slave[2989]: F0404 01:26:15.667261  2989 process.cpp:889] Failed to initialize: Failed to bind on 10.XX.XX.XX:5051: Cannot assign requested address: Cannot assign requested address [99]
Apr  4 01:26:15 jptolx10221 mesos-slave[2989]: *** Check failure stack trace: ***
Apr  4 01:26:15 jptolx10221 kernel: [  137.796619] init: mesos-slave main process (2989) killed by ABRT signal
Apr  4 01:26:15 jptolx10221 kernel: [  137.796628] init: mesos-slave respawning too fast, stopped

Actually this can be recovered with service mesos-slave start but could I avoid that ?
appreciate someone fix upstart script

regards

ryoichitaniguchi pushed a commit to ryoichitaniguchi/mesos-deb-packaging that referenced this issue Apr 25, 2016
@ryoichitaniguchi
Copy link
Author

For the below error, I internally uploaded 18934f5, to delay launching mesos-* service on startup, hope someone kindly review it

Apr 4 01:26:15 jptolx10221 mesos-slave[2989]: F0404 01:26:15.667261 2989 process.cpp:889] Failed to initialize: Failed to bind on 10.XX.XX.XX:5051: Cannot assign requested address: Cannot assign requested address [99]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant