Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it easier to know what service crashed in sqpoller #177

Open
jopietsch opened this issue Jun 19, 2020 · 1 comment
Open

make it easier to know what service crashed in sqpoller #177

jopietsch opened this issue Jun 19, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@jopietsch
Copy link
Contributor

Is your feature request related to a problem? Please describe.
if a service fails, it's hard to know what service crashed

this is what we see now which doesn't say which service failed

root@c8d842401bf3:/suzieq# sq-poller -D /suzieq/inventory 
Traceback (most recent call last):
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/sq-poller", line 191, in <module>
    asyncio.run(start_poller(userargs, cfg))
  File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/sq-poller", line 131, in start_poller
    await asyncio.gather(*tasks)
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/services/service.py", line 578, in run
    result = self.process_data(output)
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/services/service.py", line 350, in process_data
    tmpres = self._process_each_output(i, item)
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/services/service.py", line 311, in _process_each_output
    norm_str, in_info)
  File "/root/.local/lib/python3.7/site-packages/suzieq/poller/services/svcparser.py", line 386, in cons_recs_from_json_template
    subele = subele.get(subfld)
AttributeError: 'str' object has no attribute 'get'
root@c8d842401bf3:/suzieq# 

this requires you to go to the debugger, which is too much to ask for users. We need to come up with some better reporting. In this case it was bgp that was crashing.

@jopietsch
Copy link
Contributor Author

root> sqpoller show                                                                                                                                                                                          
    namespace             hostname     service  status gatherTime totalTime svcQsize wrQsize nodeQsize  pollExcdPeriodCount               timestamp
2   NX-OS_DC1  leaf101-N93180YC-EX       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:52:41.794
4   NX-OS_DC1  leaf101-N93180YC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:23.316
7   NX-OS_DC1  leaf102-N93180YC-EX       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:51:33.048
10  NX-OS_DC1  leaf102-N93180YC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:53:39.974
16  NX-OS_DC1  leaf106-N9348GC-FXP       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:53:39.907
23  NX-OS_DC1  leaf106-N9348GC-FXP          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:52.697
27  NX-OS_DC1  leaf107-N9348GC-FXP       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:52:41.895
33  NX-OS_DC1  leaf107-N9348GC-FXP          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:52.785
37  NX-OS_DC2  leaf103-N93108TC-EX       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:51:33.048
38  NX-OS_DC2  leaf103-N93108TC-EX         bgp       0         []        []       []      []        []                    0 2020-06-19 10:51:32.989
39  NX-OS_DC2  leaf103-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:33.045
46  NX-OS_DC2  leaf104-N93108TC-EX       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:52:02.887
50  NX-OS_DC2  leaf104-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:52:26.287
52  NX-OS_DC2  leaf104-N93108TC-EX  ifCounters     404         []        []       []      []        []                    0 2020-06-19 10:53:05.113
54  NX-OS_DC2  leaf105-N93108TC-EX       arpnd       0         []        []       []      []        []                    0 2020-06-19 10:51:33.195
64  NX-OS_DC2  leaf105-N93108TC-EX         bgp      16         []        []       []      []        []                    0 2020-06-19 10:51:52.252
74  NX-OS_DC2  leaf105-N93108TC-EX     evpnVni      16         []        []       []      []        []                    0 2020-06-19 10:52:13.775
84  NX-OS_DC2  leaf105-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:52:41.420
root> sqpoller show status=fail                                                                                                                                                                              
    namespace             hostname     service  status gatherTime totalTime svcQsize wrQsize nodeQsize  pollExcdPeriodCount               timestamp
4   NX-OS_DC1  leaf101-N93180YC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:23.316
10  NX-OS_DC1  leaf102-N93180YC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:53:39.974
23  NX-OS_DC1  leaf106-N9348GC-FXP          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:52.697
33  NX-OS_DC1  leaf107-N9348GC-FXP          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:52.785
39  NX-OS_DC2  leaf103-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:51:33.045
50  NX-OS_DC2  leaf104-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:52:26.287
52  NX-OS_DC2  leaf104-N93108TC-EX  ifCounters     404         []        []       []      []        []                    0 2020-06-19 10:53:05.113
64  NX-OS_DC2  leaf105-N93108TC-EX         bgp      16         []        []       []      []        []                    0 2020-06-19 10:51:52.252
74  NX-OS_DC2  leaf105-N93108TC-EX     evpnVni      16         []        []       []      []        []                    0 2020-06-19 10:52:13.775]
84  NX-OS_DC2  leaf105-N93108TC-EX          fs     404         []        []       []      []        []                    0 2020-06-19 10:52:41.420

what does the 16 status code mean for bgp and evpn? Would that have been the right indicator that bgp was failing?

@jopietsch jopietsch added the enhancement New feature or request label Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant