Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lost mqtt connexion? #176

Closed
vbarrier opened this issue May 10, 2022 · 27 comments · Fixed by #199
Closed

lost mqtt connexion? #176

vbarrier opened this issue May 10, 2022 · 27 comments · Fixed by #199
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@vbarrier
Copy link
Contributor

Hi! thank you for this great addon :)

However, I have a problem since I use it :(. It seems that the backend loose the connection with the Mosquitto server or doesn't not refresh the cached data.

I'm polling the api/v1/cars/*/status every 1 minutes to populate a tracking database.

@vbarrier
Copy link
Contributor Author

I can confirm that of the docker container fix the problem. Any insight on what happen is happening?

@tobiasehlert tobiasehlert added good first issue Good for newcomers question Further information is requested labels May 11, 2022
@tobiasehlert
Copy link
Owner

Hi @vbarrier,
What version of TeslaMateApi and Mosquitto are you using in your setup?
Has this issue appeared after you have done some upgrade?
You you see some suspicious entries in the container logs?

I can confirm that of the docker container fix the problem. Any insight on what happen is happening?

What you mean with docker container fix?

@vbarrier
Copy link
Contributor Author

Hi I'm using the latest version of TeslMateApi (1.14) and MQTT version is 2.

It seems that the api lost connexion or doesn't take any more mqtt updates..

1652680291: Client teslamateapi-Lv85 closed its connection.
1652680293: New connection from 172.18.0.11:53518 on port 1883.
1652680293: New client connected from 172.18.0.11:53518 as teslamateapi-Jp2A (p2, c1, k2).

About docker container, I mean, that a restart of the docker container fix the issue for couple hours..

@virusbrain
Copy link
Contributor

Hi @vbarrier,

have you done some more debugging for this issue? I currently experiencing the exact same behavior.

@vbarrier
Copy link
Contributor Author

Hi, still facing the bug I ended up restarting the container every 4 hours. :(

@LelandSindt
Copy link
Contributor

I looks like connections should be re-established...

opts.SetAutoReconnect(true) // if connection drops automatically re-establish it
opts.AutoReconnect = true

but also.. it looks like there are some some handlers that could be better utilized at a minimum for the purpose of logging... for example....
https://pkg.go.dev/github.com/eclipse/paho.mqtt.golang#ClientOptions.SetConnectionLostHandler

... again.. might be worth adding some logging.. then we may be able to better determine the nature of the failure.

virusbrain added a commit to virusbrain/teslamateapi that referenced this issue Sep 21, 2022
@virusbrain
Copy link
Contributor

I have forked the project and tried to add some debugging output as suggested. This version is running now inside a docker container. We will see if the problem occurs again, then I will keep you informed. If you look in my forked sourcecode you may see that I am not a go developer ;-).

@virusbrain
Copy link
Contributor

I have forked the project and tried to add some debugging output as suggested. This version is running now inside a docker container. We will see if the problem occurs again, then I will keep you informed. If you look in my forked sourcecode you may see that I am not a go developer ;-).

The problem occurred again. For me it looks like there is no MQTT issue. I cannot see any log entries regarding lost connections or reconnecting. But if I call the status api end point I get:

teslamateapi_1 | 2022/09/21 16:27:00.611535 [debug] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) returned data:
teslamateapi_1 | 2022/09/21 16:27:00.611642 [debug] {"data":{"car":{"car_id":1,"car_name":"XXXXXX"},"status":{"display_name":"","state":"suspended", [...]

In the TeslaMate UI I see the state as "online" not "suspended".

I have tried to restart the my MQTT-Server (mosquitto in this case) and got the following messages in the logfile:

teslamateapi_1 | 2022/09/21 16:25:55.177510 [error] MQTT connection lost: %s
teslamateapi_1 | 2022/09/21 16:25:55.177578 [info] mqtt reconnecting...
teslamateapi_1 | 2022/09/21 16:25:56.209363 [info] mqtt reconnecting...

So the "extended logging" seems to work. Maybe there is a problem with the cache?

@virusbrain
Copy link
Contributor

I have tried to restart the my MQTT-Server (mosquitto in this case) and got the following messages in the logfile:

teslamateapi_1 | 2022/09/21 16:25:55.177510 [error] MQTT connection lost: %s
teslamateapi_1 | 2022/09/21 16:25:55.177578 [info] mqtt reconnecting...
teslamateapi_1 | 2022/09/21 16:25:56.209363 [info] mqtt reconnecting...

So the "extended logging" seems to work. Maybe there is a problem with the cache?

Another example:
teslamateapi_1 | 2022/09/27 01:30:03.468099 [error] MQTT connection lost: pingresp not received, disconnecting teslamateapi_1 | 2022/09/27 01:30:03.468264 [info] mqtt reconnecting...

Maybe we have to "resubscribe" the topics after a reconnect? Does anybody know?

@vbarrier
Copy link
Contributor Author

I also think this is a cache problem somewhere. I'm querying the teslamate-api every minutes and after sometime not working anymore. The restart thing remove all cache.. I will try to have a mirror setup as I don't want to break my current production setup.

@virusbrain
Copy link
Contributor

virusbrain commented Sep 28, 2022

I have another point found and currently do some tests with this patch implemented: virusbrain@3ce7126

Test is currently ongoing and as I am not a MQTT expert I am not sure which side effects this patch would produce... But if it will help I may be able to make the code a bit more fancy and create a pull request for this.

@LelandSindt
Copy link
Contributor

virusbrain@3ce7126#r85320141

If I read your response correctly, I should be able to re-create this scenario by restarting the MQTT service, is that correct?

@virusbrain
Copy link
Contributor

virusbrain@3ce7126#r85320141

If I read your response correctly, I should be able to re-create this scenario by restarting the MQTT service, is that correct?

This should be the case.

As already mentioned I do currently running a version with opts.SetCleanSession(false) set. Because I guess the subscriptions get lost when a reconnect is necessary. While googeling I found this which - I think - goes in the same direction of this bug: emitter-io/go#18

@virusbrain
Copy link
Contributor

virusbrain commented Oct 1, 2022

As already mentioned I do currently running a version with opts.SetCleanSession(false) set. Because I guess the subscriptions get lost when a reconnect is necessary. While googeling I found this which - I think - goes in the same direction of this bug: emitter-io/go#18

Just FYI: With SetCleanSession(false) my Docker container runs for about 72h without any problems. All data seems to be refreshed correctly.

So from my point of view we do have two possibilities:

  1. Set SetCleanSession to false (as I have done with virusbrain@3ce7126)
  2. Find a way to resubscribe to the mqtt topcis on a reconnect.

@LelandSindt
Copy link
Contributor

This is really good information, thank you for taking the time to troubleshoot and document.

I will see what I can do to make time this next week to dig in... I am inclined to say we should certainly be re-subscribing on re-connect... I would need to read up more on SetCleanSession.

@LelandSindt
Copy link
Contributor

After reviewing/testing some... it looks like opts.SetCleanSession(false) is the simplest solution. I would like to do a little more testing (likely tomorrow while my wife drives to/from work) and some more reading to confirm/support the change.

@LelandSindt
Copy link
Contributor

also considering returning 500 when mqtt is not connected...

LelandSindt@8ecda10

@LelandSindt
Copy link
Contributor

Test went well this morning. I was able to interrupt the connection between teslamateapi and mosquitto see the connection loss, re-connection and new data flow. I think that the 500 on connection loss provides value but I am open to feedback.

opts.SetCleanSession(false) only fixes situations where the connection is lost but mosquitto maintains state. If mosquitto is restarted and looses state, when teslamateapi re-connects its subscriptions will have been lost and no new updates will be sent... 🤔

@virusbrain
Copy link
Contributor

Test went well this morning. I was able to interrupt the connection between teslamateapi and mosquitto see the connection loss, re-connection and new data flow. I think that the 500 on connection loss provides value but I am open to feedback.

opts.SetCleanSession(false) only fixes situations where the connection is lost but mosquitto maintains state. If mosquitto is restarted and looses state, when teslamateapi re-connects its subscriptions will have been lost and no new updates will be sent... 🤔

Aren't we able to simply resubscribe to the mqtt topics after a reconnect occured? According to my understandig we just have to refactor the subscription part (https://github.com/LelandSindt/teslamateapi/blob/96d53a66dd30574a1f1d15a4d86f77766d9b5018/src/v1_TeslaMateAPICarsStatus.go#L151-L156) somehow that it can subscribe to the topics in a reconnecting-handler. Or do I miss something?

@LelandSindt
Copy link
Contributor

I started down the re-subscribe path last night, but could not quite grok how it would best be done cleanly.

... I will have another look at it tonight.

@LelandSindt
Copy link
Contributor

After reviewing this.. eclipse-paho/paho.mqtt.golang#22

I made these changes... LelandSindt@59784a7 they look good in simple/synthetic tests, I will test again in the morning.

@virusbrain
Copy link
Contributor

After reviewing this.. eclipse/paho.mqtt.golang#22

I made these changes... LelandSindt@59784a7 they look good in simple/synthetic tests, I will test again in the morning.

Yeah, on the first look this looks promising! I will setup a docker container to test this too.

@vbarrier
Copy link
Contributor Author

vbarrier commented Oct 6, 2022

Thank you for your time & testing

I will do this too this evening

@LelandSindt
Copy link
Contributor

watch "curl -s localhost:8080/api/v1/cars/1/status | jq '.data.status.odometer' "

kubectl --namespace mosquitto delete pod mosquitto-56dc565797-m6mv8

[GIN] 2022/10/06 - 07:22:07 | 200 |    8.492773ms |       127.0.0.1 | GET      "/api/v1/cars/1/status"     
2022/10/06 07:22:07.572515 [error] MQTT connection lost: EOF                                                                                                                                                              
2022/10/06 07:22:07.572517 [info] mqtt reconnecting...                                                                                                                                                                    
2022/10/06 07:22:07.572543 [info] mqtt connecting...                                                         
2022/10/06 07:22:08.581626 [info] mqtt reconnecting... 
2022/10/06 07:22:08.581692 [info] mqtt connecting...
2022/10/06 07:22:09.149964 [notice] TeslaMateAPICarsStatusV1 mqtt is disconnected.. can not return status for car without mqtt!
2022/10/06 07:22:09.149994 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.
[GIN] 2022/10/06 - 07:22:09 | 500 |     103.194µs |       127.0.0.1 | GET      "/api/v1/cars/1/status"
2022/10/06 07:22:10.583033 [info] mqtt reconnecting... 
2022/10/06 07:22:10.583079 [info] mqtt connecting...
...
2022/10/06 07:22:21.414245 [notice] TeslaMateAPICarsStatusV1 mqtt is disconnected.. can not return status for car without mqtt!                                                                                           
2022/10/06 07:22:21.414260 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.                                                                                                               
[GIN] 2022/10/06 - 07:22:21 | 500 |      52.859µs |       127.0.0.1 | GET      "/api/v1/cars/1/status"                                                                                                                    
2022/10/06 07:22:22.585712 [info] mqtt reconnecting...                                                                                                                                                                    
2022/10/06 07:22:22.585733 [info] mqtt connecting...                                                                                                                                                                      
2022/10/06 07:22:22.654997 [info] mqtt connected...                                                                                                                                                                       
2022/10/06 07:22:22.659715 [info] subscribed to: teslamate/cars/#                                                                                                                                                         
2022/10/06 07:22:23.470383 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.                                                                                                               
[GIN] 2022/10/06 - 07:22:23 | 200 |   23.586126ms |       127.0.0.1 | GET      "/api/v1/cars/1/status"                                                                                                                    
2022/10/06 07:22:25.489866 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.

After K8s started a new mosquitto pod teslamateapi re-connected/re-subscribed and curl | jq ... displayed updated odometer readings as my wife drove to work this morning... (I miss driving my car to work.. but I don't miss driving to work... )

looks good to me, y'all have any feedback or concerns?

@LelandSindt
Copy link
Contributor

@virusbrain do you want the honour of issuing the PR? --- if not let me know, I can put it together right quick.

@virusbrain
Copy link
Contributor

@virusbrain do you want the honour of issuing the PR? --- if not let me know, I can put it together right quick.

I would really be happy issuing the PR. I try to cleanup the MQTT_ClientID stuff and then open the PR. I think I have some time this evening.

@LelandSindt
Copy link
Contributor

specific to MQTT_CLIENTID I should have mentioned I cleaned that up last night...
e649d38

Let me know if you have any questions/issues pulling my commits/changes in!

@tobiasehlert tobiasehlert added bug Something isn't working and removed question Further information is requested labels Oct 11, 2022
tobiasehlert pushed a commit that referenced this issue Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants